CN116496365A

CN116496365A - Acidic surface-assisted dissolution short peptide tag for improving recombinant protein expression efficiency

Info

Publication number: CN116496365A
Application number: CN202211575052.2A
Authority: CN
Inventors: 赵晟; 李鹏; 孙秀莲; 仲明; 王志民
Original assignee: Jinan Yiming Medical Technology Co ltd
Current assignee: Jinan Yiming Medical Technology Co ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-07-28

Abstract

The invention relates to an acidic surface-assisted dissolution short peptide tag for improving the expression efficiency of recombinant proteins, and relates to an isolated peptide whichComprising the amino acid sequence VDNKFNKEQQX ₁ AFYEILH LPNLNEEQRNAFIQX ₂ LKDDPX ₃ X ₄ SX ₅ X ₆ X ₇ LX ₈ EAX ₉ X ₁₀ LNDAQPK (SEQ ID NO: 9), wherein: x is X ₁ ‑X ₁₀ Are negatively charged amino acids, preferably E or D, fusion proteins comprising said peptide, polynucleotides encoding said peptide or fusion protein, constructs and host cells comprising said polynucleotides, and methods for producing said fusion proteins or polypeptides of interest.

Description

Acidic surface-assisted dissolution short peptide tag for improving recombinant protein expression efficiency

Technical Field

The invention belongs to the field of protein purification, and particularly relates to an acidic surface dissolution assisting short peptide tag designed by utilizing computer-aided calculation.

Background

With the continuous development of biotechnology in recent years, recombinant protein expression and purification technology has been receiving attention and wide application, wherein prokaryotic expression cost is the lowest, and operation and purification efficiency are the highest. Coli (e.coli) is one of the most commonly used host organisms for recombinant protein production.

Many recombinant proteins rich in hydrophobic amino acid residues, especially some recombinant proteins from non-prokaryotes, often face problems of poor solubilization and massive entry into inclusion bodies due to the lack of corresponding protein folding and protection mechanisms inside the prokaryotic cell. These recombinant proteins which precipitate during expression are largely inactivated even after redissolution.

In order to increase the expression level and improve the correct conformation proportion and solubility of the product, the downstream purification operation is convenient, and the fusion expression mode of the auxiliary soluble protein is adopted by people, such as Green Fluorescent Protein (GFP), maltose Binding Protein (MBP), glutathione-transferase (GST) and the like. However, because of the large molecular weight of the auxiliary soluble proteins, the defects of long folding time, limited auxiliary soluble effect, easy interference on the activity of recombinant proteins and the like still exist. Thus, there is a need in the art to design and develop new production and purification of the recombinant proteins of the helping-to-dissolve peptide molecules.

Disclosure of Invention

Because most proteins have a slightly acidic or neutral isoelectric point [ Link A J et al Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12, ELECTROPHOORESIS, 1997,18 (8): 1259-1313], the invention provides a computer-aided design of the charge distribution of a short and small stable protein domain, which allows the surface to contain multiple negative charges, which increases its own polarity for easier dissolution in water, and which allows mutual exclusion with proteins having a slightly acidic isoelectric point, which reduces aggregation during protein purification, which not only provides superior solubilization, but also reduces interference with the folding process of the recombinant protein itself. Solvent-exposed surfaces of some receptor proteins of bacteria, particularly those containing helical bundle structures, have proven to be very stable [ Alexander P et al, thermodynamic analysis of the folding of the streptococcal protein G IgG-binding domains B1 and B2: why small proteins tend to have high denaturation temperatures, biochemistry,1992.31 (14): 3597-3603]. These stable solvents can be used to design stable pro-lytic peptides by exposing the surface area.

The acidic surface-solubilizing short peptide tag (Sacid) of the invention is based on the B domain of staphylococcal protein AN et al.，A synthetic IgG-binding domain based on staphylococcal protein A，Protein Eng,1(2):107-113，1987]Is reformed. The structural domain is short and stable, only contains 57 amino acid residues, has the molecular weight of only 8kD, is suitable for modifying charge distribution and being added to recombinant protein as a fusion tag, can efficiently assist in dissolving the recombinant protein, stabilizes the conformation of the recombinant protein in the purification process, improves the expression efficiency of the recombinant protein, and maintains the biological functional activity of the recombinant protein. The invention grafts the short polypeptide with the surface rich in negative charges on the target recombinant protein, and has more advantages in improving the expression efficiency, solubility and biological activity of the recombinant protein compared with positive charge dissolution-assisting labels published by other researchers in early stage.

In a first aspect, the invention providesAn isolated peptide comprising the amino acid sequence VDNKFNKE QQX is provided ₁ AFYEILHLPNLNEEQRNAFIQX ₂ LKDDPX ₃ X ₄ SX ₅ X ₆ X ₇ LX ₈ EAX ₉ X ₁₀ LNDA QPK (SEQ ID NO: 9), wherein: x is X ₁ -X ₁₀ Are negatively charged amino acids, preferably glutamic acid (E) or aspartic acid (D).

In a second aspect, the invention provides an isolated polynucleotide encoding a peptide of the first aspect.

In a third aspect, the invention provides an isolated fusion protein comprising a first peptide and a second peptide, wherein the first peptide is a peptide of the first aspect and the second peptide is a polypeptide of interest.

In a fourth aspect, the invention provides an isolated polynucleotide encoding the fusion protein of the third aspect.

In a fifth aspect, the invention provides a construct comprising a polynucleotide of the fourth aspect.

In a sixth aspect, the invention provides a host cell comprising the polynucleotide of the fourth aspect or the construct of the fifth aspect, wherein the host cell is capable of expressing the fusion protein.

In a seventh aspect, the present invention provides a method of producing a fusion protein comprising: (a) Culturing the host cell of the sixth aspect under conditions suitable for expression of the fusion protein; and (b) recovering the fusion protein, optionally, (c) cleaving the fusion protein to release the polypeptide of interest and (d) recovering the polypeptide of interest.

In an eighth aspect, the present invention provides the use of a peptide of the first aspect or a polynucleotide of the second aspect for the production of a protein of interest.

In a ninth aspect, the present invention provides a method for producing a protein of interest, comprising: (a) Expressing in a host cell a fusion protein formed by fusion of a protein of interest with a peptide of the first aspect; (b) cleaving the fusion protein to release the protein of interest; and (c) optionally, isolating and/or purifying the protein of interest.

Drawings

Fig. 1: sequence alignment of amino acid sequences of different short peptide tags. Wherein Wt corresponds to SEQ id no:4, a step of; zbasic corresponds to SEQ ID NO:7, preparing a base material; sacid corresponds to SEQ ID NO:1, a step of; sacid1 corresponds to SEQ ID NO:2; and Sacid2 corresponds to SEQ ID NO:3.

fig. 2: electrostatic potential models of different short peptide tags at physiological pH; panel A shows electrostatic potential models of the N-terminal (left) and C-terminal (right) domains of staphylococcal protein A at physiological pH; panel B shows electrostatic potential models of Zbasic at physiological pH with N-terminal (left) and C-terminal (right); panel C shows electrostatic potential models of Sacid at physiological pH with N-terminal (left) and C-terminal (right); panel D shows electrostatic potential models of Sacid1 at physiological pH for the N-terminal (left) and C-terminal (right); FIG. E is a model of electrostatic potentials at physiological pH for Sacid2 at the N-terminus (left) and C-terminus (right).

Fig. 3: solubility and isoelectric point (website https:// protein-sol. Manchester. Ac. Uk /) predicted by computer assistance. Which predicts protein solubility based on sequence. Based on observations of bimodal distribution of cell-free expressed E.coli protein solubility [ Niwa T et al Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins, proceedings of the National Academy ofSciences of the United States of America,2009,106 (11): 4201-4206], these measurements report the ratio of the soluble fraction of a protein (in the supernatant after centrifugation) to the total amount of the protein, rather than a thermodynamic property. The tool computes 35 features, including: 20 amino acids; 7 amino acid combinations: K-R, D-E, K + R, D + E, K +R-D-E, K +R+D+ E, F +W+Y;8 predicted features: length, isoelectric point, hydrophobicity, net charge of the protein at ph=7.0, folding propensity, disordered region, sequence entropy, and β -strand propensity. A linear model combining 35 features gives a preliminary fit to the solubility data. The return value is between 0 and 1, if higher than 0.45 (average value of the E.coli expressed protein dataset Niwa et al (2009)), meaning that the solubility of the protein may be higher than the average solubility of the E.coli expressed protein.

Panels a and B are the solubilities of the different solubilizing tags. From the figure, the self solubility of the acidic short peptide solubilizing label (Sacid) and the Wild Type (WT) label before modification is not very different and even slightly lower than that of the SUMO label (A), but because the charged groups are concentrated on the surface, the acidic short peptide solubilizing label can interact with positively charged protons in water more effectively than other labels to form a hydration film after being fused with the recombinant protein, thereby helping the dissolution of the recombinant protein. Therefore, the solubility of the target protein can be obviously improved after the Sacid tag is fused with the target protein (B diagram). The acid short peptide solubilizing label (Sacid) has higher self solubility than the known published positive charge label (Zbasic) (A graph), and charges are distributed on the surface, but because negative charges on the acid surface can repel more negative charges on the surface of the recombinant protein which is mostly slightly acidic, aggregation is not easy to occur, and the solubilizing effect is better than that of the alkaline solubilizing peptide label. Thus, after fusion with the protein of interest, the solubilizing effect was significantly better than the positive charge tag (Zbasic) (panel B). Similarly, the acid short peptide solubilizing label (Sacid) of the present invention has significantly better solubilizing effect than other commonly known solubilizing labels.

Panels C and D are isoelectric points of different solubilizing tags. From the figure, it can be seen that the acidic short peptide dissolution-aiding tag of the invention shows better acidity. After fusion with the target protein, the pI of the fusion protein still shows stronger bias acidity.

Panel E shows the predicted characteristics of the acidic short peptide solubilizing tag (Sacid) of the present invention (net charge of protein at pH=7.0, folding tendency, etc.).

Fig. 4: the protein expression and purification technology adopted by the invention is schematically shown.

Fig. 5: and (3) carrying out electrophoresis identification on the purified target proteins of different purification tags.

Panel A shows the expression of NsiI protein alone. M: protein molecular weight standard; f: purifying the flow through liquid by Strep-tag II; w1, W2: purifying the Strep-tag II impurity washing liquid; e: the eluate was purified by Strep tag II. The arrow points to the target protein, and the expression amount of the protein is low because the protein is easy to aggregate and precipitate before being correctly folded when the NsiI protein is expressed singly, and the target protein is obviously reduced or even completely lost after purification.

Panel B shows the soluble expression of NeonGreen-NsiI fusion protein using NeonGreen as a lytic peptide. M: protein molecular weight standard; f: purifying the flow through liquid by Strep-tag II; w1, W2: purifying the Strep-tag II impurity washing liquid; e: the eluate was purified by Strep tag II. The arrow points to the fusion protein, which shows that the NeonGreen is adopted as a fusion expression mode of the auxiliary fusion protein, although the purification and enrichment of the fusion protein can be realized, the defects of longer folding time, limited auxiliary fusion effect, easy interference on the activity of recombinant protein and the like still exist due to the larger molecular weight of the self, and the fact that most of the protein exists in insoluble inclusion bodies and the fusion protein is easy to degrade can be seen from the figure.

Panel C shows the electrophoretic identification of fusion proteins purified by Strep-tag II affinity column chromatography using positive charge solubilizing tags (Zbasic) as purification tags published by other researchers. M: protein molecular weight standard; f: purifying the flow through liquid by Strep-tag II; w1, W2: purifying the Strep-tag II impurity washing liquid; e: the eluate was purified by Strep tag II. Arrows point to fusion proteins, compared to fig. 5D, published positive charge solubilizing tag (Zbasic) yields are lower; the side surface shows that the Sacid label can efficiently assist in dissolving the recombinant protein, stabilize the conformation of the recombinant protein in the purification process, and improve the expression efficiency of the recombinant protein.

FIG. D shows the electrophoretic identification of fusion proteins purified by Strep-tag II affinity column chromatography using Sacid as a purification tag. M: protein molecular weight standard; f: purifying the flow through liquid by Strep-tag II; w1: purifying the Strep-tag II impurity washing liquid; e: the eluate was purified by Strep tag II. The arrow points to the fusion protein, and compared with FIG. 5C, the Sacid tag obviously improves the solubility of the fusion protein, and the target protein is obtained after Strep-tag II purification.

Fig. 6: comparison of Sacid with NeonGreen purification tag purification of protein of interest

Panel A is an SDS PAGE of TEVP protein expressed using Sacid and NeonGreen as the pro-lytic peptides. M: protein molecular weight standard; ST: sacid-TEVP fusion proteins; NT: neonGreen-TEVP fusion protein. The arrow points to the fusion protein, and only a thicker band at 39KD is observed in Sacid-TEVP, but the target band cannot be well seen in the way of fusion expression of the auxiliary protein by NeonGreen.

Panel B is a Western blot of TEVP expression using Sacid and NeonGreen as lytic aiding peptides. M: protein molecular weight standard; ST: sacid-TEVP fusion proteins; NT: neonGreen-TEVP fusion protein. The arrow points to the fusion protein, and it can be seen from the figure that the Western blot results show a distinct band at 39KD using Sacid as the dissolution-aiding peptide, but no band of interest was observed in the Western blot using NeonGreen as the dissolution-aiding peptide, which is substantially identical to the SDS PAGE results of panel A. The expression level of the target protein is greatly improved after the fusion expression of the invention is adopted; compared with other tags, the Sacid tag provided by the invention obviously improves the solubility of the fusion protein.

Fig. 7: activity test of purified protein, plasmid containing single NsiI enzyme cutting site and single HindIII enzyme cutting site is used as substrate, after double enzyme cutting reaction, 3392bp and 1987bp target band is obtained. Gradient of purified NsiI enzyme (10) ¹ ～10 ⁸ Double cleavage reaction (37 ℃ C., 1 hour) was performed after double dilution. Panel A shows the double cleavage identification of the NsiI protein and the NeonGreen-NsiI protein expressed separately; panel B shows the double cleavage identification of the NsiI protein expressed by the positive charge dissolution assisting tag (Zbasic) and the Sacid-NsiI protein of the invention. As can be seen from the figure, the NsiI protein expressed alone was almost inactive; neonGreen-NsiI fusion protein and NsiI protein expressed by positive charge dissolution assisting tag (Zbasic) are basically consistent in activity; the purified protein Sacid-NsiI has greatly improved activity by taking Sacid as a purification tag by adopting the technology of the invention, and has higher activity than NsiI protein expressed by NeonGreen-NsiI fusion protein and positive charge dissolution-aiding tag (Zbasic).

Fig. 8: specific activities of the enzymes purified in different ways (one unit is defined as the amount of enzyme required to digest 1. Mu.g of plasmid DNA containing a single NsiI cleavage site in 50. Mu.l of reaction buffer at 37℃for 1 hour).

Detailed Description

Unless otherwise defined or clear from context, abbreviations used herein have their conventional meaning in the chemical and biological arts, and all technical and scientific terms used in the present invention have the same meaning as commonly understood by one of ordinary skill in the art. The experimental methods which are not specifically described in the invention are all carried out according to the specific methods in the J.Sam Brookfield of the guidelines for molecular cloning experiments (fourth edition) or according to the specifications of related products. The biological agents used in the present invention, without specific description, are commercially available. Numerous variations, changes, and substitutions will occur to those skilled in the art without departing from the spirit of the invention.

Unless otherwise indicated, reference herein to a nucleic acid sequence is in a 5 'to 3' direction from left to right; amino (N) to carboxyl (C) directions are referred to as amino acid sequences from left (upstream) to right (downstream).

The invention is based on an amino acid sequence shown in a B structural domain of staphylococcal protein A, uses computer-aided calculation to modify a C end structural domain of the B structure of natural staphylococcal protein A, introduces negatively charged amino acids, and designs an acidic surface dissolution assisting short peptide tag (Sacid) by concentrating the negatively charged amino acids on one surface.

According to the invention, the surface design of charge distribution is carried out on a short and small stable protein domain through computer-aided calculation, so that the surface of the short and small stable protein domain contains a plurality of negative electricity, and several acidic short peptide tags, namely Sacid (SEQ ID NO: 1), sacid1 (SEQ ID NO: 2) and Sacid2 (SEQ ID NO: 3), are designed, so that the charge of the Sacid tag is concentrated and more acidic than other designed tags, and the tag can have mutual exclusion effect with proteins with low isoelectric points, so that aggregation is reduced in the protein purification process, and superior dissolution assisting effect is achieved. The tag is fused with the target protein, and the tag is found to be capable of efficiently helping to dissolve the recombinant protein, stabilizing the conformation of the recombinant protein in the purification process, improving the expression efficiency of the recombinant protein and maintaining the biological functional activity of the recombinant protein. According to the method, the acidic surface-soluble short peptide tag to be researched and the target protein are fused and expressed, and the target protein is purified rapidly through affinity chromatography.

In a first aspect, the present invention provides an isolated peptide comprising the amino acid sequence VDNKFNKE QQX ₁ AFYEILHLPNLNEEQRNAFIQX ₂ LKDDPX ₃ X ₄ SX ₅ X ₆ X ₇ LX ₈ EAX ₉ X ₁₀ LNDA QPK (SEQ ID NO: 9), wherein: x is X ₁ -X ₁₀ Are negatively charged amino acids, preferably glutamic acid (E) or aspartic acid (D).

In one embodiment, X ₁ Is D. In one embodiment, X ₂ E is defined as E. In one embodiment, X ₃ E is defined as E. In one embodiment, X ₄ E is defined as E. In one embodiment, X ₅ Is D. In one embodiment, X ₆ E is defined as E. In one embodiment, X ₇ E is defined as E. In one embodiment, X ₈ E is defined as E. In one embodiment, X ₉ Is D. In one embodiment, X ₁₀ Is D.

The isolated peptide provided by the invention is an acidic surface-soluble short peptide capable of improving the expression efficiency of recombinant proteins, and is based on the amino acid sequence (SEQ ID NO: 4) of the B domain of staphylococcal protein A, introducing negatively charged amino acids and allowing the negatively charged amino acids to concentrate on the surface. Protein solubility was predicted based on sequence by computer-aided prediction (see https:// protein-sol. Manchester. Ac. Uk /). The tool computes 35 features, including: 20 amino acids; 7 amino acid combinations: K-R, D-E, K + R, D + E, K +R-D-E, K +R+D+ E, F +W+Y;8 predicted features: length, isoelectric point, hydrophobicity, net charge of the protein at ph=7.0, folding propensity, disordered region, sequence entropy, and β -strand propensity. A linear model combining 35 features gives a preliminary fit to the solubility data. The predicted return value is between 0 and 1, if higher than 0.45 (average value of the E.coli expressed protein dataset Niwa et al (2009)), meaning that the solubility of the protein may be higher than the average solubility of the E.coli expressed protein.

Thus, in a preferred embodiment, SEQ ID NO:9 is greater than 0.45 as predicted based on https:// protein-sol. In one embodiment, SEQ ID NO:9 is equal to or greater than 0.70 as predicted based on https:// protein-sol.

In one embodiment, the SEQ ID NO:9 is equal to or less than 5.0.

In one embodiment, the peptide comprises SEQ ID NO:1 or an amino acid sequence represented by SEQ ID NO:1, and a polypeptide having the amino acid sequence shown in 1.

As used herein, the terms "peptide," "polypeptide," and "protein" are used interchangeably and are defined as a biological molecule consisting of amino acid residues joined by peptide bonds.

In a second aspect, the invention relates to a polynucleotide encoding the peptide of the first aspect.

As used herein, the terms "nucleotide sequence," "polynucleotide," "nucleic acid," and "nucleic acid sequence" are used interchangeably to refer to a macromolecule in which multiple nucleotides are linked by 3'-5' -phosphodiester bonds, wherein the nucleotides include ribonucleotides and deoxyribonucleotides. The sequences of the polynucleotides of the invention may be codon optimized for different host cells (e.g., E.coli) to improve expression of the fusion protein. Methods for performing codon optimization are known in the art.

In a third aspect, the invention relates to an isolated fusion protein comprising a first peptide and a second peptide, wherein the first peptide is a peptide of the first aspect of the invention and the second peptide is a polypeptide of interest.

As used herein, "polypeptide of interest" refers to any polypeptide or protein that can be produced and purified by the methods of the invention, non-limiting examples of which include enzymes, hormones, immunoglobulin chains, therapeutic polypeptides such as anti-cancer polypeptides, diagnostic polypeptides, or polypeptides or biologically active fragments thereof that can be used for immunization purposes, and the like. The polypeptide of interest may be from any source, including polypeptides of microbial origin, polypeptides of mammalian origin, and artificial proteins (e.g., fusion proteins or mutated proteins), and the like.

The polypeptide of interest may be any length of polypeptide or protein. In one embodiment, the polypeptide of interest that can be produced and purified by the methods of the invention can be 20-500 amino acid residues in length, e.g., about 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500 amino acid residues.

In one embodiment, the polypeptide of interest described herein is one having an acidic or neutral or weakly basic isoelectric point of less than, equal to, or slightly greater than 7.0, e.g., equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5, or 4.0.

In some embodiments, the first peptide may be located upstream (N-terminal) or downstream (C-terminal), preferably upstream (N-terminal), of the second peptide. In some embodiments, the first peptide is located upstream of the second peptide. In some embodiments, the first peptide is downstream of the second peptide.

As used herein, a first peptide being "upstream" of a second peptide means that the C-terminal residue of the first peptide is located before the N-terminal residue of the second peptide; the first peptide being "downstream" of the second peptide means that the N-terminal residue of the first peptide is located after the C-terminal residue of the second peptide.

The production and purification of the polypeptide of the present invention can be carried out under the condition of milder pH change (for example, pH 7-11), the kind and the nature of the polypeptide of the present invention are not particularly limited, the polypeptide of the present invention can be used for the expression and the purification of a plurality of different polypeptides, and the final yield and the yield of the polypeptide of the present invention are both higher.

Examples of "polypeptides of interest" that can be produced and purified by the methods of the invention include, but are not limited to, bacillus subtilis lipase a (LipA), green Fluorescent Protein (GFP) and aspergillus fumigatus type II ketoamine oxidase (AMA), glucagon-like peptide (GLP-1), stromal cell derived factor (SDF-1α), semorelin, pleurocidin-like cationic antimicrobial peptide NRC-03 (PNRC 03), and hinneavin II-Melanocyte (HM), or biologically active fragments thereof, and the like.

In one embodiment, the polypeptide of interest is an NsiI protein, e.g., comprising the amino acid sequence of SEQ ID NO: 5.

In one embodiment, the polypeptide of interest is TEVP protein, e.g., comprising SEQ ID NO:8, and a polypeptide having the amino acid sequence shown in FIG. 8.

In some embodiments, the first peptide and the second peptide in the fusion protein of the invention are linked by a spacer.

As used herein, "spacer" or "spacer sequence" refers to a peptide consisting of amino acids of low hydrophobicity and low charge effect that have a length that, when used in a fusion protein, allows the attached moieties to be sufficiently unfolded and folded into their respective native conformations without interfering with each other. Spacers commonly used in the art include, for example, flexible GS-type linkers rich in glycine (G) and serine (S); rigid PT-type linkers enriched in proline (P) and threonine (T). Because the GS-type linker has a suitable amino acid length, has hydrophobicity and ductility, and can make the functional protein have better stability and bioactivity, the GS-type linker is preferably used in the invention.

In certain applications, such as in the production of polypeptides, it is desirable that the recombinantly produced polypeptide has a sequence identical to the polypeptide of interest, i.e., no additional amino acid residues at both ends. To this end, in some embodiments, the spacer in the fusion proteins of the invention further comprises a cleavage site. The polypeptide of interest can be released from the fusion protein by cleavage at the cleavage site.

Suitable cleavage sites include cleavage sites that can be chemically cleaved, enzymatically cleaved, or self-cleaving, or any other cleavage site known to those of skill in the art. Preferred cleavage sites in the present invention may be self-cleaving, e.g., they comprise the amino acid sequence of a self-cleavable intein. This is because intein-based cleavage methods do not require the addition of enzymes or the use of hazardous substances such as hydrogen bromide as used in chemical methods, but simply induce cleavage by changing the buffer environment in which the aggregates are located. Various self-cleaving inteins are known in the art, such as a series of inteins from NEB corporation with different self-cleaving properties.

In some embodiments, the spacer sequence may contain a specific sequence that is capable of being specifically chemically cleaved or biologically cleaved, e.g., met (methionine) residues may be chemically cleaved by CNBr, lys (lysine) or Arg (arginine) residues may be cleaved by trypsin, lysArg or Arg may be cleaved by a double base protease such as Kex2, gluAsnLeuTyrPheGln may be recognized and cleaved by tobacco etch virus protease (TEV protease, TEVP), ileGluyArg may be recognized and cleaved by factor Xa (Xa protease), and other suitable protease cleavage sites or endopeptides (Intein).

In further embodiments, the fusion protein may further comprise a moiety for isolation and purification of the second peptide (polypeptide of interest) linked to the second peptide, e.g., a 6 histidine tag, a GST tag, a Strep tag, a Twin-Strep tag, an MBP tag, etc. Thus, after removal of the first peptide, the second peptide can be isolated and purified by this fraction.

In one embodiment, the fusion protein comprises the peptide of the first aspect of the invention, a spacer, a polypeptide of interest, and a moiety for isolating and purifying a second peptide.

In a fourth aspect, the invention relates to an isolated polynucleotide comprising a nucleotide sequence encoding said fusion protein, and in a fifth aspect, the invention relates to a construct, in particular an expression construct, comprising the polynucleotide of the fourth aspect.

In the expression constructs of the invention, the sequence of the polynucleotide encoding the fusion protein is operably linked to expression control sequences to effect the desired transcription and ultimately the production of the fusion protein in a host cell. Suitable expression control sequences include, but are not limited to, promoters, enhancers, ribosome action sites such as ribosome binding sites, polyadenylation sites, transcriptional splice sequences, transcriptional termination sequences, and mRNA stabilizing sequences, and the like.

Vectors for constructing the expression constructs of the invention include those that autonomously replicate in the host cell, such as plasmid vectors; also included are vectors that are capable of integrating into and replicating with host cell DNA. Many vectors suitable for the present invention are commercially available. In one embodiment, the plasmid is a plasmid suitable for use in a prokaryotic or eukaryotic expression system. In a specific embodiment, the plasmid is or is derived from pET28a.

In a sixth aspect, the invention relates to a host cell capable of expressing the fusion protein of the third aspect of the invention. The host cell comprises a polynucleotide of the fourth aspect of the invention or a construct, e.g. an expression construct, of the fifth aspect of the invention, wherein the host cell is capable of expressing the fusion protein.

Host cells useful for expressing the fusion proteins of the invention include prokaryotes, yeast, and higher eukaryotic cells. Exemplary prokaryotic hosts include bacteria of the genera Escherichia (Escherichia), bacillus (Bacillus), salmonella (Salmonella) and Pseudomonas (Pseudomonas) and Streptomyces (Streptomyces). In a preferred embodiment, the host cell is an escherichia cell, a mammalian cell, or an insect cell. In a more preferred embodiment, the host cell is E.coli (Escherichia coli), bacillus subtilis (Bacillus subtilis) or Bacillus megaterium (Bacillus megaterium). In a specific embodiment of the invention, the host cell used is a cell of the E.coli Rosetta (DE 3) strain.

The polynucleotides or constructs of the invention, e.g., expression constructs, may be introduced into a host cell to express the encoded amino acid sequence by one of many well-known techniques, including, but not limited to: heat shock transformation, electroporation, DEAE-dextran transfection, microinjection, liposome-mediated transfection, calcium phosphate precipitation, protoplast fusion, microprojectile bombardment, viral transformation and the like.

In one embodiment, the polynucleotide encoding the fusion protein may be integrated into the genome of the host cell, which may express the encoded fusion protein under appropriate conditions or constitutively express the encoded fusion protein.

In one embodiment, the polynucleotide encoding the fusion protein is present in the host cell in extrachromosomal form (e.g., a plasmid or construct such as an expression vector).

In a seventh aspect, the invention also relates to a method for preparing a fusion protein according to the third aspect of the invention, comprising: culturing the host cell of the sixth aspect of the invention under conditions suitable for expression of the fusion protein; and recovering the fusion protein, optionally, cleaving through the fusion protein to release the polypeptide of interest and recovering the polypeptide of interest.

In one embodiment, the method comprises: transforming a host cell with a construct, in particular an expression construct, of the fifth aspect of the invention, culturing the transformed host cell under conditions suitable for expression of the fusion protein and recovering the fusion protein, optionally by cleavage of the fusion protein to release the polypeptide of interest and recovering the polypeptide of interest.

Conditions for expression of fusion proteins are known in the art, such as temperature, pH, medium, and the like. Any suitable method for recovering the fusion protein is known in the art, including but not limited to, for example, by chromatography, centrifugation, dialysis, and the like.

Recovery of the fusion protein may be performed using any suitable method known in the art, as described herein. For example, after expression of the fusion protein, the host cells are harvested, the cells lysed, the supernatant is harvested (e.g., by centrifugation to remove cell debris), and the fusion protein is optionally isolated (e.g., by a specific tag or specific binding molecule such as an antibody).

In one embodiment, the method comprises: transforming a host cell with a construct, in particular an expression construct, according to the fifth aspect of the invention, culturing the transformed host cell under conditions suitable for expression of the fusion protein, harvesting and lysing the cells, harvesting the supernatant, optionally isolating the fusion protein.

In a preferred embodiment, a spacer sequence is included between the short peptide tag of the fusion protein and the protein of interest, said spacer sequence comprising a cleavage site (as described herein) that can be cleaved, whereby by cleavage of the cleavage site the short peptide tag can be removed, thereby releasing the protein of interest.

In an eighth aspect, the present invention provides the use of a peptide of the first aspect or a polynucleotide of the second aspect for preparing a protein of interest in a host cell.

The peptide of the first aspect of the present invention can form a fusion protein with a target protein, so that after expression in a host cell, the solubility of the fusion protein can be increased, the expression efficiency of the fusion protein can be improved, and the biological activity of the target protein can be maintained.

For example, the polynucleotide of the second aspect may be linked in-frame with a polynucleotide encoding a protein of interest and placed under the control of a suitable expression regulatory element (e.g., a promoter) to express the desired fusion protein in a suitable expression system.

In a ninth aspect, the present invention provides a method of producing a protein of interest, comprising expressing in a host cell a fusion protein of a peptide according to the first aspect of the invention with a protein of interest, for example a fusion protein according to the third aspect of the invention, recovering the fusion protein, cleaving the fusion protein to release the protein of interest, and optionally isolating and purifying the released protein of interest.

In one embodiment, the method comprises: constructing an expression construct (e.g. a plasmid or viral vector) comprising a nucleotide sequence encoding a fusion protein formed by fusion of a peptide according to the first aspect of the invention with a protein of interest, transforming a host cell with said expression construct, culturing the transformed host cell under conditions suitable for expression of the fusion protein, recovering the fusion protein, cleaving said fusion protein to release the protein of interest, and optionally isolating and purifying the released protein of interest.

In one embodiment, the method comprises: transforming a host cell with a construct, in particular an expression construct, according to the fifth aspect of the invention, culturing the transformed host cell under conditions suitable for expression of the fusion protein, collecting and lysing the cells, collecting the supernatant, cleaving the fusion protein to release the protein of interest, optionally isolating and purifying the released protein of interest.

Cleavage of the fusion protein may be performed using any suitable means known in the art, such as including a spacer sequence between the short peptide tag and the protein of interest, the spacer sequence comprising a cleavage site that can be cleaved (e.g., a cleavage site as described herein), whereby the short peptide tag can be removed by cleavage of the fusion protein, thereby releasing the protein of interest.

In further embodiments, the fusion protein may further comprise a moiety for isolation and purification purposes, such as a 6 histidine tag, GST tag, strep tag, twin-Strep tag, MBP tag, etc., linked to the protein of interest. Thus, after removal of the first peptide, the protein of interest can be isolated and purified by this moiety.

In one embodiment, the method comprises constructing an expression construct comprising a nucleotide sequence encoding the peptide of the first aspect of the invention, a nucleotide sequence encoding a spacer sequence, a nucleotide sequence encoding a protein of interest, and a nucleotide sequence encoding a portion for isolating and purifying a second peptide.

In one embodiment, the host cell is selected from the group consisting of prokaryotes, yeasts and eukaryotic cells, such as mammalian cells or insect cells, more preferably from the group consisting of escherichia, bacillus, salmonella, pseudomonas and streptomyces, more preferably from the group consisting of escherichia, more preferably escherichia coli, bacillus subtilis or bacillus megaterium, more preferably escherichia coli Rosetta (DE 3).

In one embodiment, the protein of interest has an acidic or neutral or weakly basic isoelectric point of less than, equal to or slightly greater than 7.0, e.g., equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5 or 4.0, more preferably comprises the amino acid sequence of SEQ ID NO:5 or a NsiI protein comprising the amino acid sequence shown in SEQ ID NO:8, and a TEVP protein having an amino acid sequence shown in seq id no.

Compared with the prior art, the invention has the following beneficial effects:

1) Compared with the prior art, the method can overcome the defect that the target protein is easy to form inclusion bodies in a prokaryotic expression system, can efficiently assist in dissolving the recombinant protein, stabilize the conformation of the recombinant protein in the purification process, improve the expression efficiency of the recombinant protein, maintain the biological functional activity of the recombinant protein, and can be widely applied to the field of fermentation production of insoluble proteins;

2) The acid short peptide solubilizing-aiding tag can replace solubilizing-aiding tags such as GST, MBP, neonGreen, is shorter and smaller than the conventional commonly used solubilizing-aiding tags, has a stable structure, only contains 57 amino acid residues, has a molecular weight of only 8kD, and has less potential influence on recombinant protein when expressed in fusion;

3) Compared with the prior known positive charge solubilizing label (Zbasic), the acidic solubilizing label (Sacid) and the widely existing meta-acidic protein in the organism are mutually exclusive, so that aggregation effect is less likely to occur, the solubilization and folding of the recombinant protein are more facilitated, and the method has the advantages in the aspects of improving the solubility, the expression efficiency, the bioactivity and the like of the recombinant protein;

4) The tag of the invention can also obtain high-purity target protein in application of affinity purification, which shows that transformation has no influence on affinity purification.

As used herein, "optional" or "optionally" means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not. For example, an optionally included step refers to the presence or absence of that step.

As used herein, the term "about" refers to a range of values that includes the specified value, which one of ordinary skill in the art would reasonably consider to be similar to the specified value. In some embodiments, the term "about" means within standard error of measurement using measurements commonly accepted in the art. In some embodiments, about +/-10% of a particular value.

While various embodiments and aspects of the present invention are shown and described herein, it will be obvious to those skilled in the art that such embodiments and aspects are provided by way of example only. Numerous variations, changes, and substitutions will occur to those skilled in the art without departing from the spirit of the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention.

The invention will be further illustrated by the following non-limiting examples, which are well known to those skilled in the art, and many modifications can be made to the invention without departing from the spirit thereof, and such modifications also fall within the scope of the invention.

The following experimental methods are conventional methods unless otherwise specified, and the experimental materials used are readily available from commercial companies unless otherwise specified.

The experiments of PCR, digestion, ligation and the like, as well as the experiments of transformation, bacterial culture and the like, which are related to the construction of conventional plasmids, are familiar to researchers in the field, so specific relevant experimental details are not noted in detail, and specific reference can be made to the conventional experimental conditions described in the molecular cloning experimental guidelines [ J. SammBroker et al, molecular cloning experimental guidelines (third edition) [ M ], scientific Press, 2002 ].

In addition to the specific methods, devices, materials used in the embodiments, any methods, devices, and materials of the prior art similar or equivalent to those described in the embodiments of the present invention may be used to practice the present invention according to the knowledge of one skilled in the art and the description of the present invention.

Examples

Example 1: acid tag design

Based on the amino acid sequence shown in the B domain of staphylococcal protein A, the sequence shown in SEQ ID NO:4 is shown in the figure; an acidic surface-solubilizing short peptide tag (Sacid, SEQ ID NO: 1) (website: https:// www.expasy.org/resources/swiss-model) was designed using computer-aided calculation, and negatively charged amino acids were introduced to concentrate the negatively charged amino acids on the surface. This tag is only one manifestation of the predicted acidic tag.

Swiss-Model is a tool for predicting protein structure models, and uses a homology modeling method to realize the prediction of the tertiary structure of an unknown sequence.

Example 2: construction of a protein expression vector of interest

The target protein expression vector pET-28a/Sacid-NsiI is obtained by fusing a target protein coding gene sequence and an acidic surface-soluble short peptide tag (Sacid) coding gene sequence to obtain a target protein expression unit (wherein the Sacid amino acid sequence is positioned upstream of the target protein amino acid sequence), inserting the target protein expression unit into an expression plasmid pET-28a (Novagen, USA), and artificially synthesizing.

Example 3: expression of NsiI protein alone

The NsiI nucleotide sequence (coding amino acid sequence SEQ ID NO: 5) is inserted into a prokaryotic expression vector pET28a to obtain an expression vector pET28a/NsiI, and after the correct construction of the vector is verified by sequencing, the plasmid is extracted and transformed into an expression host bacterium Rosetta (DE 3) (Tiangen Biochemical technology (Beijing) Co., ltd.) according to an alkaline lysis method. Single colonies of expression host bacteria containing recombinant plasmid pET28a/NsiI were inoculated into LB liquid culture of kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL), and cultured at 37℃and 220rpm for 16 hours as seed bacteria.

Seed bacteria are inoculated into LB culture medium of sterile kanamycin (50 mug/mL) and chloramphenicol (34 mug/mL) according to the volume ratio of 1:50, the culture is carried out at 37 ℃ under shaking at 200rpm until the proper logarithmic phase (OD 600 = 0.6-0.8), 0.5mM IPTG is added, induced expression is carried out at 27 ℃ for 4h-16h, bacterial liquid is collected and centrifuged, bacterial precipitation is resuspended into buffer solution of 20mM Tris-HCl and 300mM NaCl pH 8.0 according to the concentration of 10% -20%, and cells are crushed by ultrasound. The lysate was centrifuged at 8000rpm for 30min, and the supernatant and pellet were separated and subjected to SDS-PAGE to identify the expression of the target protein.

As a result, as shown in panel A of FIG. 5, the NsiI target protein was expressed alone, the molecular weight thereof was about 50kDa, the apparent target band at the position indicated by the arrow was difficult to see from SDS-PAGE, the expression level was not high, the protein was significantly reduced or even completely lost, and the amount of the purified hetero protein was large.

Example 4: fusion expression of neonGreen as a pro-lytic peptide and NsiI

(1) Construction of NeonGreen and NsiI fusion expression vector and host bacterium

The amino acid sequence of NeonGreen is shown as SEQ ID NO: shown at 6.

The genes of NeonGreen and NsiI (SEQ ID NO: 5) and the prokaryotic expression vector pET28a are connected together through artificial synthesis (wherein the NeonGreen amino acid sequence is positioned upstream of the NsiI protein amino acid sequence) to obtain the target protein expression vector pET-28a/NeonGreen-NsiI.

(2) Expression and purification of recombinant NeonGreen-NsiI protein

Recombinant plasmid pET-28a/NeonGreen-NsiI was transformed into expression host bacterium Rosetta (DE 3), and single colony of the expression host bacterium was selected and inoculated into LB liquid culture of kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL), and cultured at 37℃and 220rpm for 12 hours to obtain seed bacterium.

Seed bacteria were inoculated in a 1:50 volume ratio into an LB liquid medium containing sterile kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL), cultured at 37℃with shaking at 200rpm to a suitable logarithmic growth phase (OD 600 = 0.6-0.8), and cultured at 27℃for 4-16 hours with the addition of an IPTG inducer. After 200mL of the induced bacterial solution was collected by centrifugation, the bacterial cells were resuspended in 30mL 20mM Tris HCl and 300mM NaCl (pH 8.0), and then sonicated. The lysate was centrifuged at 8000rpm for 30min, the supernatant and pellet were separated, the supernatant was filtered with a 0.45uM filter to remove insoluble mycoproteins, and the pellet was resuspended with deionized water or PBS and the expression product was subjected to SDS-PAGE analysis.

(3) Analysis of results

As shown in a diagram B of FIG. 5, neonGreen is adopted as a dissolution-aiding peptide, the expression condition of the NeonGreen-NsiI fusion protein has a molecular weight of about 70KD, and an arrow points to the fusion protein, and the fact that NeonGreen is adopted as a fusion expression mode of the dissolution-aiding protein can realize purification and enrichment of the fusion protein, but the defects of long folding time, limited dissolution-aiding effect, easy interference on recombinant protein activity and the like still exist due to the fact that the self molecular weight is large, and the fact that most proteins exist in insoluble inclusion bodies and the fusion protein is easy to degrade can be seen from the diagram.

Example 5: published fusion expression of the Positive charge solubilizing Label (Zbasic) with NsiI

(1) Construction of published fusion expression vector of positive charge dissolution assisting tag (Zbasic) and NsiI and host bacteria

The published positive charge solubilizing tag Zbasic [ Hedhammar M, hober S, Z (basic) - -, a novel purification tag for efficient protein recovery, journal of Chromatography A,2007,1161 (1-2): 22-28] has the amino acid sequence shown in SEQ ID NO: shown at 7. The Zbasic coding sequence was ligated to the NsiI coding sequence to obtain a fusion protein expression vector pET-28a/Zbasic-NsiI, wherein the Zbasic amino acid sequence was located upstream of the NsiI amino acid sequence, as described in example 4.

(2) Expression and purification of published positive charge solubilizing tags (Zbasic) and nsiI recombinant proteins

Transforming the plasmid into expression host bacteria Rosetta (DE 3) to obtain the target engineering strain. Specific examples were expressed using laboratory small scale shake flasks. Shaking activation was performed in LB liquid medium containing kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL) at 37℃for 16h, after which overnight cultures were transferred to new LB medium (about 200 mL) containing kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL) in a 1:50 ratio, shaking cultured at 37℃to a suitable logarithmic growth phase (OD 600 = 0.6-0.8), and induced to express with 0.5mM IPTG for 4h-16h at 27 ℃.

The induced bacterial liquid was collected by centrifugation at 1mL at 8000rpm for 2min at 4℃and the supernatant was collected, and after resuspension with 200. Mu.L volume of PBS, 50. Mu.L volume of 5 XSDS loading buffer (300 mM Tris-HCl (pH 6.8), 20% beta-mercaptoethanol, 20% SDS, 25% glycerol, 0.05% bromophenol blue) was directly added, followed by storage in a boiling water bath for 10min at-20 ℃.

The induced sample was centrifuged at 200mL,8000rpm at 4℃for 20min, the supernatant was discarded, the cells were collected, resuspended in 30mL volume of Tris HCl buffer, 1% Triton X-100,1 XCokitail protease inhibitor and 1mM PMSF were added, mixed well and sonicated for 30min. The whole cell lysate sample was centrifuged at 12000rpm at 4℃for 20min, and the supernatant was separated by precipitation. A volume of 40. Mu.L of the supernatant was added to a volume of 10. Mu.L of 5 XSDS loading buffer, and the mixture was stored at-20℃in a boiling water bath for 10 minutes. The pellet was resuspended in 1mL of PBS and 40. Mu.L of the solution was added to 10. Mu.L of 5 XSDS loading buffer, and the solution was stored at-20℃in a boiling water bath for 10 min.

The purified fusion protein is obtained by crushing the supernatant and using an affinity chromatography column, 40uL of eluent is taken, 10 uL of 5 XSDS loading buffer solution is added, the mixture is preserved in boiling water bath for 10min at minus 20 ℃, target protein is obtained by purification through a Strep Tag II affinity chromatography column, 40uL of eluent is taken, 10 uL of 5 XSDS loading buffer solution is added, and the mixture is preserved in boiling water bath for 10min at minus 20 ℃ to obtain target protein samples. The dilutions of the above samples in the electrophoretic test were identical, with a direct comparability of protein content.

(3) Protein electrophoresis (SDS-PAGE)

SDS-PAGE gels were prepared according to conventional methods. The prepared protein samples were loaded in equal volumes and tested by 10% SDS-PAGE. Electrophoresis was performed at a constant pressure of 80V for 30min, followed by a constant pressure of 125V. And (3) after electrophoresis, coomassie brilliant blue R-250 is dyed, and an electrophoresis result is obtained after decolorization.

(4) Analysis of results

FIG. 5C is a graph showing NsiI protein expressed using a published positive charge solubilizing tag (Zbasic) and having a molecular weight of about 45kDa, with the arrow indicating the position of the band of the protein of interest. From panel D of FIG. 5, the published positive charge solubilizing label (Zbasic) showed relatively poor protein expression compared to Sacid.

Example 6: fusion expression of Sacid and target protein NsiI

(1) Construction of Sacid and NsiI fusion expression vector and host bacterium

Genes of Sacid (SEQ ID NO: 1) and NsiI (SEQ ID NO: 5) and a prokaryotic expression vector pET28a are connected together through artificial synthesis to obtain a recombinant expression plasmid pET-28a/Sacid-NsiI for expressing fusion protein, wherein the amino acid sequence of Sacid is positioned at the upstream of the amino acid sequence of NsiI protein.

(2) Expression and purification of recombinant Sacid-NsiI

The plasmid was transformed to express host bacteria Rosetta (DE 3) (purchased from Tiangen Biochemical technology (Beijing) Co., ltd.) to obtain the objective engineering strain. Specific examples were expressed using laboratory small scale shake flasks. Shaking activation was performed in LB liquid medium containing kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL) at 37℃for 16h, after which overnight cultures were transferred to new LB medium (about 200 mL) containing kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL) in a 1:50 ratio, shaking cultured at 37℃to a suitable logarithmic growth phase (OD 600 = 0.6-0.8), and induced to express with 0.5mM IPTG for 4h-16h at 27 ℃.

The induced sample was centrifuged at 200mL,8000rpm at 4℃for 20min, the supernatant was discarded, the cells were collected, resuspended in 30mL volume of Tris HCl buffer, 1% Triton X-100,1 XCokitail protease inhibitor and 1mM PMSF were added, mixed well and sonicated for 30min. The whole cell lysate sample was centrifuged at 12000rpm at 4℃for 20min, and the supernatant was separated by precipitation. A volume of 40. Mu.L of the supernatant was added to a volume of 10. Mu.L of 5 XSDS loading buffer, and the mixture was stored at-20℃in a boiling water bath for 10 minutes. The pellet was resuspended in lmL volumes of PBS and 40. Mu.L of the pellet was added to 10. Mu.L of 5 XSDS loading buffer, and stored in a boiling water bath for 10min at-20 ℃.

(3) Protein electrophoresis (SDS-PAGE)

(4) Analysis of results

FIG. 5D shows that the fusion expression of Sacid-NsiI by the method of the invention has a molecular weight of about 45kDa, and the arrow indicates that the position has an obvious target protein band in the purified sample, so that the expression level of the target protein is greatly improved after the fusion expression by the method of the invention; compared with other tags, the Sacid tag provided by the invention obviously improves the solubility of the fusion protein.

Example 7: fusion expression of NeonGreen as a lytic peptide and TEVP

(1) Construction of NeonGreen and TEVP fusion expression vector and host bacterium

The amino acid sequence of NeonGreen is shown as SEQ ID NO: shown at 6.

The NeonGreen and TEVP (SEQ ID NO: 8) genes of the prokaryotic expression vector pET28a were artificially joined together (wherein the NeonGreen amino acid sequence was located upstream of the TEVP protein amino acid sequence) to obtain the target protein expression vector pET-28a/NeonGreen-TEVP.

(2) Expression and purification of recombinant NeonGreen-TEVP protein

Recombinant plasmid pET-28a/NeonGreen-TEVP was transformed into expression host bacterium Rosetta (DE 3), and single colony of the expression host bacterium was selected and inoculated into LB liquid culture of kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL), and cultured at 37℃and 220rpm for 12 hours to obtain seed bacterium.

Seed bacteria were inoculated in a 1:50 volume ratio into an LB liquid medium containing sterile kanamycin (50. Mu.g/mL) and chloramphenicol (34. Mu.g/mL), cultured at 37℃with shaking at 200rpm to a suitable logarithmic growth phase (OD 600 = 0.6-0.8), and cultured at 27℃for 4-16 hours with the addition of an IPTG inducer. After 200mL of the induced bacterial solution was collected by centrifugation, the bacterial cells were resuspended in 30mL 20mM Tris HCl and 300mM NaCl (pH 8.0), and then sonicated. The lysate was centrifuged at 8000rpm for 30min, the supernatant and pellet were separated, the supernatant was filtered with a 0.45uM filter to remove insoluble mycoproteins, and the pellet was resuspended with deionized water or PBS and the expression product was subjected to SDS-PAGE and Western blot analysis.

(3) Analysis of results

As shown in panel A of FIG. 6, neonGreen was used as a solubilizing peptide, neonGreen-TEVP fusion protein was disrupted to express supernatant, NT molecular weight was about 70KD, arrow points to fusion protein, SDS PAGE from panel A did not clearly distinguish the target bands, protein was subjected to Western blotting for better observation of both bands, and as shown in NT of FIG. 6B, neonGreen-TEVP fusion protein was not expressed from the Western blotting results.

Example 8: fusion expression of Sacid and target protein TEVP

(1) Construction of Sacid and TEVP fusion expression vector and host bacterium

The genes of Sacid (SEQ ID NO: 1) and TEVP (SEQ ID NO: 8) and prokaryotic expression vector pET28a are artificially connected together (wherein the Sacid amino acid sequence is positioned upstream of the TEVP protein amino acid sequence) to obtain recombinant expression plasmid pET-28a/Sacid-TEVP.

(2) Expression and purification of recombinant Sacid-TEVP

(3) Protein electrophoresis (SDS-PAGE) and Western blotting

The proteins on the gel were wet transferred to PVDF membrane, blocked with 5% nonfat milk powder for 2h, incubated overnight at 4 ℃ with HRP-labeled secondary antibody for 1h at room temperature, and exposed using chemiluminescent kit and GelView 6000Plus equipment. The destination stripe is shown in ST of fig. 6B.

(4) Analysis of results

FIG. 6A and B are lanes ST of Sacid-TEVP fusion expressed by the method of the invention, with a molecular weight of about 39kDa, the position indicated by the arrow indicating a distinct band of the protein of interest in the disrupted supernatant sample and the Western blot results consistent therewith.

From FIG. 6, it can be seen that the expression level of the target protein is greatly improved after fusion expression of the invention; compared with other tags, the Sacid tag provided by the invention obviously improves the solubility of the fusion protein.

Example 9: purified protein Activity assay

The resulting protein eluate was subjected to ultrafiltration displacement buffer, the protein was stored in Tris HCl buffer (20mM Tris HCl,20mM NaCl,50% glycerol), and the protein activity was verified by digestion of the plasmid.

Using a plasmid containing a single NsiI cleavage site and a single HindIII cleavage site as substrates, a double cleavage reaction was performed, and the purified NsiI enzyme was subjected to a gradient (10 ¹ ～10 ⁸ Double enzyme digestion reaction was performed after dilution, each dilution was used as a sample, and agarose gel electrophoresis was performed at 37℃for 1 hour. The enzyme and plasmid are the same in dosage and have comparability.

Table 1 double cleavage reaction System

As a result, as shown in FIG. 7, a double cleavage reaction was performed using a plasmid containing a single NsiI cleavage site and a single HindIII cleavage site as substrates, and target bands of 3392bp and 1987bp were obtained. Panel A shows the double cleavage identification of the NsiI protein and the NeonGreen-NsiI protein expressed separately; panel B shows the published double cleavage identification of the NsiI protein expressed by the positive charge-assisted solubility tag (Zbasic) and the Sacid-NsiI protein of the invention. As can be seen from the figure, the NsiI protein expressed alone was almost inactive; neonGreen-NsiI fusion protein and published NsiI protein expressed by positive charge dissolution assisting tag (Zbasic) have basically consistent activity; the purified protein Sacid-NsiI has greatly improved activity by taking Sacid as a purification tag by adopting the technology of the invention, and has higher activity than NeonGreen-NsiI fusion protein and NsiI protein expressed by a published positive charge dissolution-aiding tag (Zbasic).

As a result of measuring the specific activity of the enzyme, the technology of the invention uses Sacid as a purification tag, the enzyme activity of the purified protein Sacid-NsiI is 496KU/mg, the enzyme activity of the known NsiI protein expressed by a positive charge dissolution assisting tag (Zbasic) is 5.5KU/mg, the enzyme activity of the NeonGreen-NsiI fusion protein is 4.9KU/mg, and the single expressed NsiI protein is almost inactive and has the problem of impurity protein pollution as shown in the figure 8. As can be seen from the figure, the enzyme activity of the purified protein Sacid-NsiI is about 100 times that of the enzyme purified by the other two known dissolution-aiding tags by taking Sacid as a purification tag.

SEQ ID NO：1(Sacid)

Val-Asp-Asn-Lys-Phe-Asn-Lys-Glu-Gln-Gln-Asp-Ala-Phe-Tyr-Glu-Ile-Leu-His-Leu-Pro-Asn-Leu-Asn-Glu-Glu-Gln-Arg-Asn-Ala-Phe-Ile-Gln-Glu-Leu-Lys-Asp-Asp-Pr o-Glu-Glu-Ser-Asp-Glu-Glu-Leu-Glu-Glu-Ala-Asp-Asp-Leu-Asn-Asp-Ala-Gln-Pro-Lys

SEQ ID NO：2(Sacid 1)

Val-Asp-Asn-Lys-Phe-Asn-Lys-Glu-Gln-Gln-Asp-Ala-Phe-Tyr-Glu-Ile-Leu-His-Leu-Pro-Asn-Leu-Asn-Glu-Glu-Gln-Arg-Asn-Ala-Phe-Ile-Arg-Ser-Leu-Arg-Asp-Asp-Pr o-Ser-Gln-Ser-Ala-Asn-Leu-Leu-Ala-Glu-Ala-Asp-Asp-Leu-Asn-Asp-Ala-Gln-Pro-Lys

SEQ ID NO：3(Sacid 2)

Val-Asp-Asn-Lys-Phe-Asn-Lys-Glu-Arg-Arg-Arg-Ala-Phe-Tyr-Glu-Ile-Leu-His-Leu-Pro-Asn-Leu-Asn-Glu-Glu-Gln-Arg-Asn-Ala-Phe-Ile-Arg-Ser-Leu-Arg-Asp-Asp-Pr o-Ser-Gln-Ser-Ala-Asn-Leu-Leu-Ala-Glu-Ala-Asp-Asp-Leu-Asn-Asp-Ala-Gln-Pro-Lys

SEQ ID NO:4 (B domain of staphylococcal protein A, wt)

Val-Asp-Asn-Lys-Phe-Asn-Lys-Glu-Gln-Gln-Asn-Ala-Phe-Tyr-Glu-Ile-Leu-His-Leu-Pro-Asn-Leu-Asn-Glu-Glu-Gln-Arg-Asn-Ala-Phe-Ile-Gln-Ser-Leu-Lys-Asp-Asp-Pr o-Ser-Gln-Ser-Ala-Asn-Leu-Leu-Ala-Glu-Ala-Lys-Lys-Leu-Asn-Asp-Ala-Gln-Pro-Lys

SEQ ID NO：5(NsiI)

Met-Ile-Asn-His-Ser-Ile-Leu-Lys-His-His-Ser-Phe-Thr-Gly-Lys-Ile-Ile-Ser-Ile-Leu-Lys-Asp-Glu-Phe-Gly-Asp-Asp-Ala-Ile-Tyr-Ile-Phe-Glu-Asn-Ser-Pro-Ile-Leu-Gly-Tyr-Leu-Asn-Ile-Lys-Thr-Lys-Ser-Ala-Glu-Arg-Gly-Ser-Lys-Ser-Arg-Gly-Ser-Phe-Ala-Asn-His-Tyr-Ala-Leu-Tyr-Val-Ile-Ile-Glu-Asp-Tyr-Ile-Asn-Lys-Gly-Tyr-Leu-Gly-Asp-Asp-Leu-Asp-Tyr-Ser-Lys-Tyr-Asp-Gly-Ala-Lys-Phe-Thr-Asp-Leu-Phe-Arg-Arg-Gln-Arg-Glu-Leu-Pro-Phe-Gly-Ser-Lys-Leu-Gln-Asn-His-Ala-Leu-Asn-Ser-Arg-Leu-Asn-Asp-Glu-Phe-Lys-Lys-Phe-Phe-Pro-Thr-Leu-Gly-Ile-Val-Pro-Ile-Ile-Arg-Asp-Val-Arg-Thr-Ser-Arg-Tyr-Trp-Ile-Gln-Glu-Asp-Leu-Ile-Lys-Val-Ser-Val-Arg-Asn-Lys-Asn-Gly-Ile-Glu-Arg-Arg-Glu-Asn-Leu-Ala-Pro-Ser-Ile-Ile-Arg-Ile-Ile-Asp-Glu-Tyr-Ile-Ala-Thr-Lys-Lys-Glu-Ser-Phe-Glu-Leu-Phe-Leu-Lys-Thr-Cys-Gln-Glu-Ile-Ala-Asn-Leu-Ser-Ser-Ser-Asp-Pro-His-Ser-Val-Val-Lys-Phe-Ile-Gln-Glu-Gln-Leu-His-Pro-Ser-Ser-Asp-Ala-Arg-Val-Phe-Glu-Ile-Val-Ser-Tyr-Ala-Val-Leu-Lys-Glu-Arg-Tyr-Ser-Asn-Gln-Thr-Ile-Trp-Ile-Gly-Asp-Ser-Arg-Asp-Asp-Val-Ala-Glu-Glu-Ser-Leu-Val-Leu-Tyr-Lys-Thr-Gly-Arg-Thr-Asn-Ala-Asn-Asp-Gly-Gly-Ile-Asp-Phe-Val-Met-Lys-Pro-Leu-Gly-Arg-Phe-Phe-Gln-Val-Thr-Glu-Thr-Ile-Asp-Ala-Asn-Lys-Tyr-Phe-Leu-Asp-Ile-Asp-Lys-Val-Gln-Arg-Phe-Pro-Ile-Thr-Phe-Val-Val-Lys-Thr-Asn-Ser-Ser-Tyr-Glu-Glu-Ile-Glu-lys-Ile-Ile-Lys-Glu-Gln-Ala-Lys-Ala-Lys-Tyr-Asn-Ile-Glu-Ala-Ile-Val-Asn-Ser-Tyr-Met-Asp-Ser-Ile-Glu-Glu-Ile-Ile-Asn-Val-Pro-Asp-Leu-Met-Lys-Tyr-Phe-Glu-Glu-Met-Ile-Tyr-Ser-Asp-Ser-Leu-Lys-Arg-Ile-Met-Asp-Glu-Ile-Ile-Val-Gln-Ser-Lys-Val-Glu-Phe-Asn-Tyr-Glu-Glu-Asp-Val-Ser

SEQ ID NO：6(NeonGreen)

Met-Leu-Ser-Lys-Gly-Glu-Glu-Asp-Asn-Met-Ala-Ser-Leu-Pro-Ala-Thr-His-Glu-Leu-His-Ile-Phe-Gly-Ser-Ile-Asn-Gly-Val-Asp-Phe-Asp-Met-Val-Gly-Gln-Gly-Thr-Gly-Asn-Pro-Asn-Asp-Gly-Tyr-Glu-Glu-Leu-Asn-Leu-Lys-Ser-Thr-Lys-Gly-Asp-Leu-Gln-Phe-Ser-Pro-Trp-Ile-Leu-Val-Pro-His-Ile-Gly-Tyr-Gly-Phe-His-Gln-Tyr-Leu-Pro-Tyr-Pro-Asp-Gly-Met-Ser-Pro-Phe-Gln-Ala-Ala-Met-Val-Asp-Gly-Ser-Gly-Tyr-Gln-Val-His-Arg-Thr-Met-Gln-Phe-Glu-Asp-Gly-Ala-Ser-Leu-Thr-Val-Asn-Tyr-Arg-Tyr-Thr-Tyr-Glu-Gly-Ser-His-Ile-Lys-Gly-Glu-Ala-Gln-Val-Lys-Gly-Thr-Gly-Phe-Pro-Ala-Asp-Gly-Pro-Val-Met-Thr-Asn-Ser-Leu-Thr-Ala-Ala-Asp-Trp-Cys-Arg-Ser-Lys-Lys-Thr-Tyr-Pro-Asn-Asp-Lys-Thr-Ile-Ile-Ser-Thr-Phe-Lys-Trp-Ser-Tyr-Thr-Thr-Gly-Asn-Gly-Lys-Arg-Tyr-Arg-Ser-Thr-Ala-Arg-Thr-Thr-Tyr-Thr-Phe-Ala-Lys-Pro-Met-Ala-Ala-Asn-Tyr-Leu-Lys-Asn-Gln-Pro-Met-Tyr-Val-Phe-Arg-Lys-Thr-Glu-Leu-Lys-His-Ser-Lys-Thr-Glu-Leu-Asn-Phe-Lys-Glu-Trp-Gln-Lys-Ala-Phe-Thr

SEQ ID NO：7(Zbasic)

Val-Asp-Asn-Lys-Phe-Asn-Lys-Glu-Arg-Arg-Arg-Ala-Arg-Arg-Glu-Ile-Arg-His-Leu-Pro-Asn-Leu-Asn-Arg-Glu-Gln-Arg-Arg-Ala-Phe-Ile-Arg-Ser-Leu-Arg-Asp-Asp-Pro-Ser-Gln-Ser-Ala-Asn-Leu-Leu-Ala-Glu-Ala-Lys-Lys-Leu-Asn-Asp-Ala-Gln-Pro-Lys

SEQ ID NO：8(TEVP)

Lys-Gly-Pro-Arg-Asp-Tyr-Asn-Pro-Ile-Ser-Ser-Ser-Ile-Cys-His-Leu-Thr-Asn-Glu-Ser-Asp-Gly-His-Thr-Thr-Ser-Leu-Tyr-Gly-Ile-Gly-Phe-Gly-Pro-Phe-Ile-Ile-Thr-Asn-Lys-His-Leu-Phe-Arg-Arg-Asn-Asn-Gly-Thr-Leu-Val-Val-Gln-Ser-Leu-His-Gly-Val-Phe-Lys-Val-Lys-Asp-Thr-Thr-Thr-Leu-Gln-Gln-His-Leu-Val-Asp-Gly-Arg-Asp-Met-Ile-Ile-Ile-Arg-Met-Pro-Lys-Asp-Phe-Pro-Pro-Phe-Pro-Gln-Lys-Leu-Lys-Phe-Arg-Glu-Pro-Gln-Arg-Glu-Glu-Arg-Ile-Cys-Leu-Val-Thr-Thr-Asn-Phe-Gln-Thr-Lys-Ser-Met-Ser-Ser-Met-Val-Ser-Asp-Thr-Ser-Cys-Thr-Phe-Pro-Ser-Gly-Asp-Gly-Ile-Phe-Trp-Lys-His-Trp-Ile-Gln-Thr-Lys-Asp-Gly-Gln-Cys-Gly-Ser-Pro-Leu-Val-Ser-Thr-Arg-Asp-Gly-Phe-Ile-Val-Gly-Ile-His-Ser-Ala-Ser-Asn-Phe-Thr-Asn-Thr-Asn-Asn-Tyr-Phe-Thr-Ser-Val-Pro-Lys-Asn-Phe-Met-Glu-Leu-Leu-Thr-Asn-Gln-Glu-Ala-Gln-Gln-Trp-Val-Ser-Gly-Trp-Arg-Leu-Asn-Ala-Asp-Ser-Val-Leu-Trp-Gly-Gly-His-Lys-Val-Phe-Met-Val-Lys-Pro-Glu-Glu-Pro-Phe-Gln-Pro-Val-Lys-Glu-Ala-Thr-Gln-Leu-Met-Asn-Glu-Gly。

Claims

1. An isolated peptide comprising the amino acid sequence VDNKFNKEQQX ₁ AFYEILHLPNLNE EQRNAFIQX ₂ LKDDPX ₃ X ₄ SX ₅ X ₆ X ₇ LX ₈ EAX ₉ X ₁₀ LNDAQPK (SEQ ID NO: 9) wherein: x is X ₁ -X ₁₀ Are all negatively charged amino acids, preferably E or D,

preferably, the isoelectric point pI of the peptide is equal to or less than 5.0.

2. The peptide of claim 1 comprising SEQ ID NO:1 or a sequence represented by SEQ id no:1, and a polypeptide having the amino acid sequence shown in 1.

3. An isolated polynucleotide encoding the peptide of claim 1 or 2.

4. An isolated fusion protein comprising a first peptide and a second peptide, wherein the first peptide is a peptide of claim 1 or 2 and the second peptide is a polypeptide of interest, optionally the second peptide is linked to the first peptide by a spacer.

5. The fusion protein of claim 4, wherein:

the spacer comprises a cleavage site, preferably selected from the group consisting of a chemical cleavage site, an enzymatic cleavage site and a self-cleavage site, and/or,

the fusion protein further comprises a moiety for isolation and purification, such as a 6 histidine tag, GST tag, strep tag, twin-Strep tag or MBP tag, linked to the second peptide, and/or,

the second peptide has an acidic or neutral or weakly basic isoelectric point of less than, equal to or slightly greater than 7.0, e.g. equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5 or 4.0, more preferably the second peptide is 20-500 amino acid residues in length, e.g. comprising the amino acid sequence of SEQ ID NO:5 or a NsiI protein comprising the amino acid sequence shown in SEQ ID NO:8, and a TEVP protein having an amino acid sequence shown in seq id no.

6. An isolated polynucleotide comprising a nucleotide sequence encoding the fusion protein of claim 4 or 5.

7. A construct, preferably an expression construct, comprising the polynucleotide of claim 6.

8. A host cell comprising the polynucleotide of claim 6 or the construct of claim 7, preferably an expression construct, wherein the host cell is capable of expressing the fusion protein, preferably the host cell is selected from the group consisting of prokaryotes, yeasts and eukaryotic cells, such as mammalian cells or insect cells, more preferably from the group consisting of Escherichia (Escherichia), bacillus (Bacillus), salmonella (Salmonella), pseudomonas (Pseudomonas) and Streptomyces (Streptomyces), more preferably from the group consisting of Escherichia, more preferably Escherichia coli (Escherichia coli), bacillus subtilis (Bacillus subtilis) or Bacillus megaterium (Bacillus megaterium), more preferably Escherichia coli Rosetta (DE 3).

9. A method of producing the fusion protein of claim 4 or 5, comprising:

(a) Culturing the host cell of claim 8 under conditions suitable for expression of the fusion protein; and

(b) Recovering the fusion protein,

optionally, the composition may be used in combination with,

(c) Cleaving the fusion protein to release the polypeptide of interest; and

(d) Recovering the polypeptide of interest from the sample,

preferably, the polypeptide of interest has an acidic or neutral or weakly basic isoelectric point of less than, equal to or slightly greater than 7.0, e.g. equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5 or 4.0, more preferably comprises the amino acid sequence of SEQ ID NO:5 or a NsiI protein comprising the amino acid sequence shown in SEQ ID NO:8, and a TEVP protein having an amino acid sequence shown in seq id no.

10. A method of producing a polypeptide of interest, comprising:

(a) Expressing in a host cell a fusion protein formed by fusion of a polypeptide of interest with a peptide according to claim 1 or 2, preferably the peptide according to claim 1 or 2 is located upstream of the polypeptide of interest;

(b) Recovering and cleaving the fusion protein to release the polypeptide of interest; and

(c) Optionally, isolating and purifying the polypeptide of interest,

preferably, the polypeptide of interest has an acidic or neutral or weakly basic isoelectric point of less than, equal to or slightly more than 7.0, e.g. equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5 or 4.0, more preferably the polypeptide of interest is 20-500 amino acid residues in length, more preferably comprises the amino acid sequence of SEQ ID NO:5 or a NsiI protein comprising the amino acid sequence shown in SEQ ID NO:8, and a TEVP protein having an amino acid sequence shown in seq id no.

11. The method of claim 10, wherein:

The polypeptide of interest is linked to the peptide of claim 1 or 2 by a spacer, preferably the spacer comprises a cleavage site, preferably selected from the group consisting of a chemical cleavage site, an enzymatic cleavage site and a self-cleavage site, and/or

The fusion protein further comprises a moiety for isolation and purification, such as a 6 histidine tag, GST tag, strep tag, twin-Strep tag or MBP tag, linked to the polypeptide of interest.

12. The method of any one of claims 9-11, wherein the host cell is selected from the group consisting of prokaryotes, yeasts and eukaryotic cells, such as mammalian cells or insect cells, more preferably from the group consisting of escherichia, bacillus, salmonella, pseudomonas and streptomyces, more preferably from the group consisting of escherichia, more preferably escherichia coli, bacillus subtilis or bacillus megaterium, more preferably escherichia coli Rosetta (DE 3).

13. Use of a peptide according to claim 1 or 2 or a polynucleotide according to claim 3 for the preparation of a protein of interest in a host cell, wherein the host cell is preferably selected from the group consisting of prokaryotes, yeasts and eukaryotic cells such as mammalian cells or insect cells, more preferably from the group consisting of escherichia, bacillus, salmonella, pseudomonas and streptomyces, more preferably from the group consisting of escherichia, more preferably escherichia coli, bacillus subtilis or bacillus megaterium, more preferably escherichia coli Rosetta (DE 3),

Preferably, the protein of interest has an acidic or neutral or weakly basic isoelectric point of less than, equal to or slightly more than 7.0, e.g. equal to or less than 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5 or 4.0, more preferably the protein of interest is 20-500 amino acid residues in length, more preferably comprises the amino acid sequence of SEQ ID NO:5 or a NsiI protein comprising the amino acid sequence shown in SEQ ID NO:8, and a TEVP protein having an amino acid sequence shown in seq id no.