CN101583370A - Proteinaceous pharmaceuticals and uses thereof - Google Patents

Proteinaceous pharmaceuticals and uses thereof Download PDF

Info

Publication number
CN101583370A
CN101583370A CNA2006800340492A CN200680034049A CN101583370A CN 101583370 A CN101583370 A CN 101583370A CN A2006800340492 A CNA2006800340492 A CN A2006800340492A CN 200680034049 A CN200680034049 A CN 200680034049A CN 101583370 A CN101583370 A CN 101583370A
Authority
CN
China
Prior art keywords
protein
cysteine
natural
disulfide bond
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006800340492A
Other languages
Chinese (zh)
Inventor
W·P·C·斯泰默
V·舍伦贝格尔
M·巴德尔
M·肖勒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amunix Inc
Original Assignee
Amunix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Amunix Inc filed Critical Amunix Inc
Publication of CN101583370A publication Critical patent/CN101583370A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Peptides Or Proteins (AREA)

Abstract

The present invention provides cysteine-containing scaffolds and/or proteins, expression vectors, host cell and display systems harboring and/or expressing such cysteine-containing products. The present invention also provides methods of designing libraries of such products, methods of screening such libraries to yield entities exhibiting binding specificities towards a taraget molecule. Further provided by the invention are pharmaceutical compositions comprising the cysteine-containing products of the present invention.

Description

Pharmaceutical grade protein and uses thereof
Cross reference
The application requires the U.S. Provisional Application Nos.60/721 that submitted in 27th in JIUYUE in 2005, and 270 and 60/721,188 and the U.S. Provisional Application No.60/743 that submits on March 21st, 2006,622 priority, described provisional application is incorporated herein by reference.
Background of invention
A molecular biological basic conception is that every kind of native protein adopts single " natural " structure or folding.Adopt natural folding any folding be considered to " misfolding " in addition.Adopt multiple natural functional folding native protein to have only few example or do not have example.Misfolding is a serious problem, the infectivity of Protein virus for example, and its " mistake " is folding to make other prion proteins with the catalytic way misfolding, and causes encephalopathy and some death.As if but almost any protein misfolding all when degeneration forms fibrous polymer, and such polymer is relevant with many degenerative disease.An example is the amyloid-beta fibril relevant with Alzheimer.Proteinic misfolding causes irreversibly forming insoluble aggregates usually, exists but the protein of degeneration also can be used as molten spherical formula.It is generally acknowledged protein along the infundibulate approach,, reduce the multiformity of folding intermediate gradually, up to reaching single, stable folding natural structure from adopting the molten spherical attitude of multifarious unstable structure.Native protein can regulate by other structure, domain moves, structurally changes with induced-fit after part combines or by force of crystallization with respect to lid/wing (lid/flap) type of other domain, but these changes are usually directed to moving of hinge spline structure, rather than basic folding basic change.All examples that can obtain are all supported following viewpoint: native protein is evolved to adopting single stable folding realizing its biological function, and to depart from this natural structure be deleterious.
There is a few examples, there be (not comprising the variant that produces by alternative splicing, glycosylation or Proteolytic enzyme processing) in its identical protein sequence so that more than one form is natural, but second kind of form just lost inactive by-product (people such as Schulz, 2005 of disulfide bond usually; People such as Petersen, 2003; People such as Lauber, 2003).In the microprotein family that comprises small protein matter (mainly being toxin and receptor-domain) with high disulfide bond density, find that some examples have closely-related sequence, owing to (not being simple defective) that be completed into, but alternate disulfide bonding pattern adopts different structures.The example comprises somatomedin (Somatomedin) (people such as Kamikubo, 2004) and Shandong charybdotoxin (Maurotoxin) (people such as Fajloun, 2000) not.
The protein display libraries is using single fixed protein folding traditionally, immunoglobulin domains, interferon, protein A, ankyrin, A-domain, TXi Baoshouti, fibronectin III, γ-crystalline protein, ubiquitin and many other protein as each kind, as Binz, people such as A.. (2005) NatureBiotechnology23:1257 summary.In some cases, for example for deriving from all immunoglobulin libraries of constituent of people's immunity, single library uses multiple different V-region sequence as support, but their total basic immunoglobulin folding.Dissimilar libraries is peptide or cyclic peptide library at random, but they are not considered to protein, because they are without any determine folding and do not adopt the structure of single stable.
Still be starved of protein structure, to obtain to show the therapeutic agent of one or more desired character by the new suitable choose reasonable of (for example) orthogenesis design.The character of these expectations includes but not limited to the immunogenicity that reduces, the stability of raising or half-life, polyspecific, polyvalency and the high target binding affinity that prolongs.
Summary of the invention
One aspect of the present invention is the novel protein structure that design shows high disulfide bond density.This protein structure is particularly suitable for reasonably designing and selecting by (for example) orthogenesis, with the therapeutic agent of the character that obtains to show one or more expectations.The character of these expectations includes but not limited to high target binding affinity and/or affinity, lower molecular weight, the tissue penetration that improves, the heat stability and the protease stability that improve, increased shelf-life, the hydrophilic that improves, enhanced prescription (particularly high concentration) and the immunogenicity that reduces.
In one embodiment, the invention provides for example the range protein structure of rack form and the library of these protein structures.In one aspect, support shows diversified folding or other non-primary structures.On the other hand, support has definite topology and realizes biological function.In the another one embodiment, the invention provides the method in the library that makes up these protein structures, in heredity carrier or package body (for example viral package body such as phage etc., (show with non-viral package body as yeast, the escherichia coli surface display, ribosomal display, or CIS (DNA-connects) displaying) go up the method for showing these libraries, and these libraries of examination are to obtain the method for therapeutic agent or candidate therapeutic agent.The present invention further provides carrier, host cell and other vitro system of expression or application of themes protein structure.
In the another one embodiment, the invention provides the support that contains cysteine (C) that a kind of non-natural exists, it shows the binding specificity to target molecule, and wherein the support that contains cysteine (C) of this non-natural existence comprises the interior cysteine of support of the arrangement that is selected from the following formula representative
The disulfide bond number that forms of the n cysteine residues that equals to predict wherein, and wherein
Amassing of representative (2i-1), wherein i is 1 to n positive integer.
In the another one embodiment, the invention provides the protein that contains cysteine (C) that a kind of non-natural exists, it comprises to have and is no more than 35 amino acid whose polypeptide, wherein at least 10% aminoacid is cysteine in this polypeptide, form at least two disulfide bond by cysteine pairing in the support, and wherein said pairing produces the complexity index method greater than 3.
In one aspect, the protein that contains cysteine (C) that non-natural exists can comprise to have and is no more than about 60 amino acid whose polypeptide, wherein at least 10% aminoacid is cysteine in this polypeptide, contained cysteine forms at least four disulfide bond in this polypeptide by matching, and wherein said pairing produces the complexity index method greater than 4,6 or 10.
On the other hand, the protein that contains cysteine (C) that non-natural of the present invention exists be heated to above about 50 ℃, preferably be higher than about 80 ℃ or even be higher than 100 ℃ temperature hold in range behind 0.001 second to 10 minutes special time, show the target binding ability.
In some aspects, the protein that contains cysteine (C) and a part coupling that non-natural as herein described exists, this part is selected from: label (as, GFP, the HA-label, Flag, Cy3, Cy5, FITC), effector (as, enzyme, cytotoxic drug, chelate), antibody (as, whole antibody, the Fc district, dAbs, scFvs, bispecific antibody (diabodies)), molecule is concentrated on targeting module (peptide or domain in required tissue or compartment such as the tumor, as the VEGF heparin in conjunction with exon), enhancing is by organizing barrier (percutaneous, per os, through intestinal, the oral cavity, vagina, rectum, nose, lung, blood-brain barrier, through sclera) the barrier transhipment conjugate of transhipment, as be rich in arginic peptide, the alkyl saccharide, simulation detergent and form contains or (ion or nonionic) amphipathic or facultative peptide of the micelle of display protein matter, part with prolong half-life, comprise micromolecule (for example with albumin bound or insert the micromolecule of cell membrane), chemical polymerization thing such as Polyethylene Glycol (PEG) or multiple peptide and protein sequence (comprise and to insert in the film or the hydrophobic peptide of non-specific binding), (people) serum albumin, transferrins, be rich in the polymerization sequence of glycine, as poly (GGGS) joint.The key that forms these conjugates can form by heredity or chemical method.The protein that contains cysteine also can homology or allos multimerization, form 2-mers, 3-mers, 4-mers, 5-mers, 6-mers, 7-mers, 8-mers, 9-mers, 10-mers, 11-mers, 12-mers, 14-mers, 16-mers, 18-mers, 20-mers or more high-grade polymer, they will prolong the proteinic half-life, improve the concentration of binding site, therefore improve binding constants, and, may also improve binding affinity according to the difference of target.Higher polymer can followingly produce: by being fused in the big gene, perhaps add to and make on the protein separately that expressed protein is bonded to each other at N and/or C-terminal by binding peptide by peptide-combination-peptide (" binding peptide ") with gene code, form the protein polymer, perhaps by the number of chemical key.Suitable half-life prolongation includes but not limited to and the part of the protein bound that serum albumin, IgG, erythrocyte and serum can reach.Every kind of target and every kind of therapeutic use are supported the various combination of multiple mentioned component.
The present invention also provides the non-natural protein in a kind of 20-60 of containing amino acid whose single structure territory, and it has 3 or more a plurality of disulfide bond, and the protein bound that exposes with human serum, and contains and be less than 5% aliphatic amino acid.
The present invention further provides the protein of the non-natural existence in a kind of 20-60 of containing an amino acid whose single structure territory, it has 3 or more a plurality of disulfide bond, and the protein bound that exposes with human serum, and the score in the T-Epitope program be lower than proteinic meansigma methods among the data base 90%, be preferably lower than proteinic meansigma methods among the data base 99%, more preferably be lower than human protein among the data base meansigma methods 99%.The present invention also comprises the proteinic library that the theme non-natural exists, the expression vector that comprises this proteinic gene package body of encode, and expresses or these proteinic other host cells of displaying.
The present invention further comprises the preparation method that contains the microprotein of cysteine disclosed herein.
The present invention also comprises detecting between the allogenic polypeptide of showing on target and the gene package body whether have the interactional method of specificity.This method may further comprise the steps: (a) provide gene package body of the present invention to show; (b) make this gene package body under the condition that is fit to the stable polypeptide-target complex of generation, contact target; (c) formation of stable polypeptide-target complex on the detection gene package body, the interactional existence of detection specificity thus.This method may further include following steps: separate the gene package body of showing the polypeptide with required character, the sequence that the gene package body of the required polypeptide of perhaps will encoding has partly checks order.Exemplary gene package body includes but not limited to virus (for example phage), cell and spore.
Description of drawings
Fig. 1-12,14-16,20-35,37-73,75-83,85-93,95-97,99,101-102,104-107,111,113-115,123 show various supports and the motif that wherein comprises.
The motif of Fig. 1:
1)CxPhxxxCxxxxdCCxxxCxrrGxxxxxrC
2)CxPxxxxCxxxxxCCxxxCxxxxGxxxxxC
3)CxxxxxxCxxxxxCCxxxCxxxxxxxxxxC
CDP:C6C5C0C3C10C
The motif of Fig. 2:
1)fCCPxxryCCw
2)CCPxxxxCCW
3)CCxxxxxCC
CDP:C0C5C0C
The motif of Fig. 3:
1)CxxxfWxCxxxxxCCgWxxCxxgxC
2)CxxxxWxCxxxxxCCxWxxCxxxxC
3)CxxxxxxCxxxxxCCxxxxCxxxxC
CDP:C6C5C0C4C4
The motif of Fig. 4:
1)CxgydxxCxxxxpCCxxxxxxxCxxxxgyWWyxxxyC
2)CxxxxxxCxxxxxC?CxxxxxxxCxxxxxxWWxxxxxC
3)CxxxxxxCxxxxxCCxxxxxxxCxxxxxxxxxxxxxC
CDP:C6C5C0C7C13C
The motif of Fig. 5:
1)CxfxCxxxxxgxxpCxxxxxxxxxxxxxxxxxCxggWxCxxxxC
2)CxxxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxWxCxxxxC
3)CxxxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxC
CDP:C3C9C17C5C4C
The motif of Fig. 6:
1)CxxxxxxCxxHxxCCxxxCxxgxCxxxxxwxxxgC
2)CxxxxxxCxxHxxCCxxxCxxxxCxxxxxxxxxxC
3)CxxxxxxCxxxxxCCxxxCxxxxCxxxxxxxxxxC
CDP:C6C5C0C3C4C10C
The motif of Fig. 7:
1)CxxxgxxCxxdgxCCxgxCxxxfxgxxC
2)CxxxxxxCxxxxxCCxxxCxxxxxxxxC
CDP:C6C5C0C3C8C
The motif of Fig. 8:
1)CxdxxCxxyCxgxxyxxgxCdgpxxCxC
2)CxxxxCxxxCxxxxxxxxxCxxxxxCxC
CDP:C4C3C9C5C1C
The motif of Fig. 9:
1)ChfxxCxxdCrrxxPGxyGxCxxxxxGxxCxC
2)CxxxxCxxxCxxxxPGxxGxCxxxxxGxxCxC
3)CxxxxCxxxCxxxxxxxxxxCxxxxxxxxCxC
CDP:C4C3C10C8C1C
The motif of Figure 10:
1)
CixxgxxCxG(xx)xxxxCxCCxxxxyCxCxxx(xxx)FG(x)xxxxCxC(x)xxxxxCxxxxxx(x)xxxxxC
2)
CxxxxxxCxG(xx)xxxxCxCCxxxxxCxCxxx(xxx)FG(x)xxxxCxC(x)xxxxxCxxxxxx(x)xxxxxC
3)
CxxxxxxCxx(xx)xxxxCxCCxxxxxCxCxxx(xxx)xx(x)xxxxCxC(x)xxxxxCxxxxxx(x)xxxxxC
The motif of Figure 11:
1)CxPCfttxxxxxxxCxxCCxxx(x)xgxCxxxqCxC
2)CxPCxxxxxxxxxxCxxCCxxx(x)xxxCxxxxCxC
3)CxxCxxxxxxxxxxCxxCCxxx(x)xxxCxxxxCxC
CDP:C2C10C2C0C6(7)C4C1C
The motif of Figure 12:
CxxxxxxCxxxxxxCCxxxCxxxxC
CDP:C6C6C0C3C4C
The motif of Figure 14:
1)Cxx(x)xCxxxxxxxxxxCxCxxxCxxxxxCCxxxxxxC
2)Cxx(x)RCxExxxxxxxxCxCxxxCxxxxxCCxD[yf]xxxC
CDP:C3-4C10C1C3C5C6C
The motif of Figure 15:
1)
Cxxxxx(x)x(x)xxxxxCpxgxxxC[yf]xkxxxx(xx)CxxrxxxxxrGCxxtCPxxxx(x)xxxxxCCxtdxCN
2)
Cxxxxx(x)x(x)xxxxxCxxxxxxCxxxxxxx(xx)CxxxxxxxxxGCxxxCPxxxx(x)xxxxxCCxxxxCN
3)
Cxxxxx(x)x(x)xxxxxCxxxxxxCxxxxxxx(xx)CxxxxxxxxxxCxxxCxxxxx(x)xxxxxCCxxxxC
CDP:C6-8C6C7-9C10C3C10-11C0C4C
The motif of Figure 16:
1)CxxCxxxxxxxxC(xxx)xxxxxxCxxxxxxCxxxxxxxxxxxxxxxxxxxxCxxx(xx)xC(p)xx(x)xxxxxxxxxx(x)xxxxxCCxxxxC
The motif of Figure 20:
1)CgxqxxxxxCxxxxCCsxxGxCGxxxxyCxx(x)xCx(x)xxC
2)CxxxxxxxxCxxxxCCxxxxxCxxxxxxCxx(x)xCx(x)xxC
CDP:C8C4C0C5C6C3-4C3-4C
The motif of Figure 21:
1)
Cxxx(x)xxxxxxx(xx)xxxC(x)xxxxxCxxxxxx(x)xxxCxxxxxxxxxxxxCxxxxx(xx)xxC
2)
Cxxx(x)xxxxxxx(xx)xxxC(x)xx[yf]xxCxxxxxx(x)xxxCxxxxx[yf]xxxxxxCxxxxx(xx)xxC
CDP:C13-16C5-6C9-10C12C7-9C
The motif of Figure 22:
1)
C(xx)xY(gg)xxxxxxCxxxCxx(x)xxxCxxxCxx(x)xgaxxgxCxxxx(x)xxxxxC[wylf]C
2)C(xx)xx(xx)xxxxxxCxxxCxx(x)xxxCxxxCxx(x)xxxxxxxCxxxx(x)xxxxxCxC
CDP:C8-12C3C5-6C3C9-10C9-10C1C
The motif of Figure 23:
1)
CxxxxxxxxCxxxCxxxCxxxxx(xxxx)xxxCxxxx(xxxx)xxCxxxxCxCxxxxxxxxxx(x)xCxxxxxC
2)
CpxxxxxxxCxxxCxxxCxxxxx(xxxx)xxxCxxxx(xxxx)xxCxxxxCxCxxgxxxxxxx(x)xCvxxxxC
CDP:C8C3C3C8-12C6-10C4C1C
The motif of Figure 24:
1)CxxxCxxxxxxxxCPxxxxx(x)xxxxxCxxCCxxxxxCxxxxxxxxxxC
2)CtxxCdxxxxxxxCPxxxxx(xx)xxxxxCxxCCxxgxGCx[yfl][yfl]xxxxGxx[ivl]C
CDP:C3C8C11-12C2C0C5C10C
The motif of Figure 25:
1)CxxxxSxx[Fwy]xGxCxxxxxCxxxCxxexxx(xx)xGxCxx(xx)xxr[rk]CxCxxxC
2)CxxxxSxxFxGxCxxxxxCxxxCxxxxxx(xx)xGxCxx(xx)xxxxCxCxxxC
3)CxxxxxxxxxxxCxxxxxCxxxCxxxxxx(xx)xxxCxx(xx)xxxxCxCxxxC
CDP:C11C5C3C9-11C6-8C1C3C
The motif of Figure 26:
C(xxx)xxxxxxCCxxx(x)xCxx(xx)xxxC
CDP:C6-9C0C4-5C5-7C
The motif of Figure 27:
1)CxxxCxshxxCxxxCxCxxxx[xc]x[xc]
The motif of Figure 28:
1)CxgrxxrCppxCCxgxxCxrgxxxxC
2)CxxxxxxCxxxCCxxxxCxxxxxxxC
CDP:C6C3C0C4C7C
The motif of Figure 29:
1)CCxxpxxCxxrxCxpxxCC
2)CCxxxxxCxxxxCxxxxCC
CDP:C0C5C4C4C0C
The motif of Figure 30:
1)CCgxypxxxChpCxCxxxrpxyC
2)CCxxxxxxxCxxCxCxxxxxxxC
CDP:C0C7C2C1C7C
The motif of Figure 31:
1)CxxtGxxCxxxxx[cx]C?sx(x)Ga[cx]sxxFxxC
2)CxxxxxxCxxxxx[cx]Cxx(x)xx[cx]xxxxxxC
The motif of Figure 32:
1)CxxxxC(x)xxxCxxGxxxDxxgCxx(xx)xCxC
2)CxxxxC(x)xxxCxxxxxxxxxxCxx(xx)xCxC
CDP:C4C3-4C10C2-4C?1C
The motif of Figure 33:
1)CxxxxxxCCDPCaxCxCRFFxxxCxCR
2)CxxxxxxCCxxCxxCxCxxxxxxCxC
CDP:C6C0C2C2C1C6C1C
The motif of Figure 34:
1)CxpgxxxkxxCNxCxCxxxx(x)xxxTxxxC
2)CxxxxxxxxxCNxCxCxxxx(x)xxxTxxxC
3)CxxxxxxxxxCxxCxCxxxx(x)xxxxxxxC
CDP:C9C2C1C11-12C
The motif of Figure 35:
1)Cxx(xx)xxxxxCxxxxxxx(x)CxxxxxxxxxxxxCxxxCxxC
2)Cxx(xx)DxxxxCxxxxxxx(x)CxxxxxxxxxxxxCxxxCxxC
3)Cxx(xx)DxxxxCxx[wylfim]xxxx(x)CxxxxxxxxxxxxCxxtCxxC
CDP:C7-9C7-8C12C3C2C
The motif of Figure 37:
1)C(xxxx)CxxxxxCxxx(xxxxxxx)xxxCxCxxxx(xx)xxxxxC
2)C(xxxx)CxxxGxCxxx(xxxxxxx)xxxCxCxxxx(xx)xxGxxC
3)C(xxxx)CxxxGxCxxx(xxxxxxx)xxxCxCxxxx(xx)[ywflh]xGxxC
CDP:C0-4C5C6-13C1C9-11C
The motif of Figure 38:
1)Cxxxx(x)xCxxxxxCxxxxx(xx)xxxCxCxxx(xxx)xxxxxxC
2)Cxxxx(x)xCxxxgxCxxxxx(xx)xxxCxCxxg(xxx)xxxgxxC
CDP:C5-6C5C8-10C1C9-12C
The motif of Figure 39:
1)
CxCxxxxxxx(xx)xxCxxx(xxxxxxxx)xxxxxxCxCxxxxxxxxCxxCxxxxxxxxx(xx)xxxxxC
2)
CxCxxxxxxx(xx)xxCxxx(xxxxxxxx)xxxxGxCxCxxxxxGxxCxxCxxxxxxxxx(xx)xxxxxC
CDP:C1C9-11C9-17C1C8C2C14-16C
The motif of Figure 40:
1)
DxdECxxxxxxCx(xx)xxxxxCxNxxGx[fy]xCx(xxx)xCxxg[yf]x(xxxx)xxxxxxxC
2)DxxECxxxxxxCx(xx)xxxxxCxNxxGxxxCx(xxx)xCxxxxx(xxxx)xxxxxxxC
3)CxxxxxxCx(xx)xxxxxCxxxxxxxxCx(xxx)xCxxxxx(xxxx)xxxxxxxC
CDP:C6C6-8C8C2-5C12-16C
The motif of Figure 41:
1)CsxHGxxxxDGxx(x)xxGxxPxCeCxxCyxGxxCsxxxxxC
2)CxxHGxxxxDGxx(x)xxGxxPxCxCxxCxxGxxCxxxxxxC
3)Cxxxxxxxxxxxx(x)xxxxxxxCxCxxCxxxxxCxxxxxxC
CDP:C19-20C1C2C5C6C
The motif of Figure 42:
1)CxxxxGxCRxkxxxnCxxxxxxxCxnxxqkCC
2)CxxxxGxCRxxxxxxCxxxxxxxCxxxxxxCC
3)CxxxxxxCxxxxxxxCxxxxxxxCxxxxxxCC
CDP:C6C7C7C6C0C
The motif of Figure 43:
1)CxxxxxxCxxxxCxxxxxxxxxCxxxxxxCC
2)CxxxxgxCxxxxCxxxxxxxgxCxxxxxxCC
CDP:C6C4C9C6C0C
The motif of Figure 44:
1)CxxHCxxxgxxggxCxx(xxx)xxxCxC
2)CxxHCxxxxxxxxxCxx(xxx)xxxCxC
3)CxxxCxxxxxxxxCxx(xxx)xxxCxC
CDP:C3C8C5-8C1C
The motif of Figure 45:
1)CxCRxxxCxxxExxxGxCxxxxxx[yfh]x[yfl]CC
2)CxCRxxxCxxxExxxGxCxxxxxxxxxCC
3)CxCxxxCxxxxxxxxxCxxxxxxxxxCC
CDP:C1C3C9C9C0C
The motif of Figure 46:
1)CCxxxxxRxx[yf]nxCrxxGxxxxxCaxxxxCxiisgxxC
2)CCxxxxxRxxxxxCxxxGxxxxxCxxxxxCxxxxxxxC
3)CCxxxxxxxxxxxCxxxxxxxxxCxxxxxCxxxxxxxC
CDP:C0C11C9C5C7C
The motif of Figure 47:
1)CxxaxxxCxxxxCxxxCxx(x)xxxxxCxxx[vi]xx(x)xxC
2)CxxxxxxCxxxxCxxxCxx(x)xxxxxCxxxxxxx(x)xxC
The motif of Figure 48:
1)Cxxxxxxx(x)xxxxxCCCxxxx(x)xxxxxxCxxC
2)Cxxxxxxx(x)xxkxxCCCxxxx(x)xx[wfiv]gxxCexC
CDP:C12-13C0C0C10-11C2C
The motif of Figure 49:
1)Cxxxxxx[yfh]xxxxxWxxxx(xxxx)xxxCx(x)xCxCxx(xxxxxxxx)xxxxCxxxxCxx(xxxxx)xxCxxx(xxx)xxxxxxxgeCCx(xx)xC
2)CxxxxxxxxxxxxWxxxx(xxxx)xxxCx(x)xCxCxx(xxxxxxxx)xxxxCxxxxCxx(xxxxx)xxCxxx(xxx)xxxxxxxxCCx(xx)xC
3)Cxxxxxxxxxxxxxxxxx(xxxx)xxxCx(x)xCxCxx(xxxxxxxx)xxxxCxxxxCxx(xxxxx)xxCxxx(xxx)xxxxxxxxCCx(xx)xC
The motif of Figure 50:
1)CxxxxxxCxxxxxCCxxxxCxxx(xxx)x(xx)x[wylfi]C
2)CxxxxxxCxxxxxCCxxxxCxxx(xxx)x(xx)xxC
CDP:C6C5C0C4C6-11C
The motif of Figure 51:
1)CxexCvxxxCxxxxxxGCxCxxxvC
2)CxxxCxxxxCxxxxxxxCxCxxxxC
CDP:C3C4C7C1C4C
The motif of Figure 52:
1)CxfCCxCCxxxxCgxCC
2)CxxCCxCCxxxxCxxCC
CDP:C2C0C1C4C2C0C
The motif of Figure 53:
1)CxxxxxWCgxxedCCCpmxCxxxWyxqxgxCqxxxxxxxxkxxC
2)CxxxxxWCxxxxxCCCxxxCxxxWxxxxxxCxxxxxxxxxxxxC
3)CxxxxxxCxxxxxCCCxxxCxxxxxxxxxxCxxxxxxxxxxxxC
CDP:C6C5C0C0C3C10C12C
The motif of Figure 54:
1)CxxCxxxCxxxxxxxxCxxx(xx)xCxC
The motif of Figure 55:
1)CxxxxxCxxxCxxxxx(x)xxxxxCxxxxCxC
2)CxxxxxCxxxCxxxxx(x)xxxgkCxxxkCxC
CDP:C5C3C10-11C4C1C
The motif of Figure 56:
1)CPxxxxxCxxdxdCxxxCxCxxxx(x)xC
2)CPxxxxxCxxxxxCxxxCxCxxxx(x)xC
3)CxxxxxxCxxxxxCxxxCxCxxxx(x)xC
CDP:C6C5C3C1C5-6C
The motif of Figure 57:
1)CCxdgxxxxx(x)xxxxCxxrxxxxxxxxxCxxxfxxCC
2)CCxxxxxxxx(x)xxxxCxxxxxxxxxxxxCxxxxxxCC
CDP:C0C12-13C12C6C0C
The motif of Figure 58:
1)CxsxxxPCxnxxxCCxgxCxxxxWxCxxxxxxCskxC
2)CxxxxxPCxxxxxCCxxxCxxxxWxCxxxxxxCxxxC
3)CxxxxxxCxxxxxCCxxxCxxxxxxCxxxxxxCxxxC
CDP:C6C5C0C3C6C6C3C
The motif of Figure 59:
1)CxxWx[wylf]xxCxxxxxdCgxgxrexx(xx)CxxxxxxxxCxxPC
2)CxxWxxxxCxxxxxxCxxxxxxxx(xx)CxxxxxxxxCxxPC
3)CxxxxxxxCxxxxxxCxxxxxxxx(xx)CxxxxxxxxCxxxC
CDP:C7C6C8-10C8C3C
The motif of Figure 60:
1)CxdxxxCxxygxyxxCxxCCxxxgxxxgxCxxxxCxC
2)CxxxxxCxxxxxxxxCxxCCxxxxxxxxxCxxxxCxC
CDP:C5C8C2C0C9C4C1C
The motif of Figure 61:
1)
Cxxxxx(x)x(x)xxxxxCpxgxxxC[yf]xkxxxx(xx)CxxxxxxxxxGCxxtCPxxxx(x)xxxxxCCxxdxC
2)
Cxxxxx(x)x(x)xxxxxCxxxxxxCxxxxxxxx(xx)CxxxxxxxxxGCxxxCPxxxx(x)xxxxxCCxxxxC
3)
Cxxxxx(x)x(x)xxxxxCxxxxxxCxxxxxxx(xx)CxxxxxxxxxxCxxxCxxxxx(x)xxxxxCCxxxxC
CDP:C11-13C6C7-9C10C3C10-11C0C4C
The motif of Figure 62:
1)CPxxx(xx)xxxxxCxxx(xxx)CxxDxxCxxxxkCCxxxCxxxC
2)CPxxx(xx)xxxxxCxxx(xxx)CxxDxxCxxxxCCxxxCxxxC
3)Cxxxx(xx)xxxxxCxxx(xxx)CxxxxxCxxxxxCCxxxCxxxC
CDP:C9-11C3-6C5C5C0C3C3C
The motif of Figure 63:
1)Cxx(x)xyxxCxxgxxxCCxxr(x)xCxCxxxxxNCxC
2)Cxx(x)xxxxCxxxxxxCCxxx(x)xCxCxxxxxNCxC
3)Cxx(x)xxxxCxxxxxxCCxxx(x)xCxCxxxxxxCxC
CDP:C6-7C6C0C4-5C1C6C1C
The motif of Figure 64:
1)CxxxxxxCxdWxxxxCCxgxyCxCxxxpxCxC
2)CxxxxxxCxxWxxxxCCxxxxCxCxxxxxCxC
3)CxxxxxxCxxxxxxxCCxxxxCxCxxxxxCxC
CDP:C6C7C0C4C1C5C1C
The motif of Figure 65:
1)CxxxCrxxydxCxxCxgxWxgxxgxCxxhCxxxxxxCxxxC
2)CxxxCxxxxxxCxxCxxxWxxxxxxCxxxCxxxxxxCxxxC
3)CxxxCxxxxxxCxxCxxxxxxxxxxCxxxCxxxxxxCxxxC
CDP:C3C6C2C10C3C6C3C
The motif of Figure 66:
1)CxPxGxPCPyxxxCCxxxCxxxxxxxgxxxxrC
2)CxxxxxxCxxxxxCCxxxCxxxxxxxxxxxxxC
3)CxPxGxPCPxxxxCCxxxCxxxxxxxxxxxxxC
CDP:C6C5C0C3C13C
The motif of Figure 67:
1)CxxxxxxxxxxxCPxgxxxxxCxCgxxCgsWxxxxxxxCxCxCxxxdWxxxrCC
2)CxxxxxxxxxxxCPxxxxxxxCxCxxxCxxWxxxxxxxCxCxCxxxxWxxxxCC
3)CxxxxxxxxxxxCxxxxxxxxCxCxxxCxxxxxxxxxxCxCxCxxxxxxxxxCC
CDP:C11C8C1C3C10C1C1C9C0C
The motif of Figure 68:
1)Cx(xx)xxxCxxxxx[nd]gxCx[wylf]DGxDC
2)Cx(xx)xxxCxxxxxxxxCxxDGxDC
3)Cx(xx)xxxCxxxxxxxxCxxxxxxC
CDP:C4-6C8C6C
The motif of Figure 69:
1)Cxxxx[yf]xx(xx)xxx(x)xxCxxCxxCxx(xx)gxxxxxxCxxxxxtxC
2)Cxxxxxxx(xx)xxx(x)xxCxxCxxCxx(xx)xxxxxxxCxxxxxxxC
The motif of Figure 70:
1)CxfPFx[yf]xxxxxxxCtxxgxxxxxxWCxttxxxdxDxxxx[fy]C
2)CxxPFxxxxxxxxxCxxxxxxxxxxWCxxxxxxxxDxxxxxC
3)CxxxxxxxxxxxxxCxxxxxxxxxxxCxxxxxxxxxxxxxxC
CDP:C13C11C14C
The motif of Figure 71:
1)Cxx(xx)xxxxyxCCxxx(xx)xxxxxxdxxxxWgxxnxxwC
2)Cxx(xx)xxxxxxCCxxx(xx)xxxxxxxxxxxWxxxxxxxC
3)Cxx(xx)xxxxxxCCxxx(xx)xxxxxxxxxxxxxxxxxxxC
CDP:C8-10C0C22-24C
The motif of Figure 72:
1)CCxxxx(x)CxxxxpxxxCG
2)CCxxxx(x)CxxxxxxxxC
CDP:C0C4-5C8C
The motif of Figure 73:
1)CGGxxxxGxxxCxxgxxC
2)CGGxxxxGxxxCxxxxxC
CDP:C10C5C
The motif of Figure 75:
1)Cx(xxc)xxxCxxxxxxxCxpxx(xxxx)xxxx(c)xxxxxxxGCgCCxxCxxxxgxxCxxxxxx(dx)xxglxCxxg(xx)xxxxxlxC
2)Cx(xxc)xxxCxxxxxxxCxxxx(xxxx)xxxx(c)xxxxxxxGCxCCxxCxxxxxxxCxxxxxx(xx)xxxxxCxxx(xx)xxxxxxxC
3)Cx(xxc)xxxCxxxxxxxCxxxx(xxxx)xxxx(c)xxxxxxxxCxCCxxCxxxxxxxCxxxxxx(xx)xxxxxCxxx(xx)xxxxxxxC
The motif of Figure 76:
1)CxCxxxxdkcCx[yfli]xChxd[ivl][ivl]W
2)CxCxxxxdkeCx[yfli]xC
3)CxCxxxxxxxCxxxC
CDP:C1C7C3C
The motif of Figure 77:
1)CExCxxxxaCtGC
2)CExCxxxxxCxGC
3)CxxCxxxxxCxxC
CDP:C2C5C2C
The motif of Figure 78:
1)CyrxCWregxdeetCkerC
2)CxxxCWxxxxxxxxCxxxC
CDP:C3C9C3C
The motif of Figure 79:
1)DCxxxGxxCxGxxkxCCxpxxxCxxYanxC
2)CxxxGxxCxGxxxxCCxxxxxCxxYxxxC
3)CxxxxxxCxxxxxC?CxxxxxCxxxxxxC
CDP:C6C5C0C5C6C
The motif of Figure 80:
1)CPx[ivlf]xxxCxxdxdCxxxCxCxxxxxxCg
2)CPxxxxxCxxxxxCxxxCxCxxxxxxC
3)CxxxxxxCxxxxxCxxxCxCxxxxxxC
CDP:C6C5C3C1C6C
The motif of Figure 81:
1)CdxgeqCaxrkgxrxgkxCdCPrgxxCnxfllkC
2)CxxxxxCxxxxxxxxxxxCxCxxxxxCxxxxxxC
CDP:C5C11C1C5C6C
The motif of Figure 82:
1)CvkkdelCxpyyxdCCxpxxCxxxxWWdhkC
2)CxxxxxxCxxxxxxCCxxxxCxxxxWWxxxC
3)CxxxxxxCxxxxxxCCxxxxCxxxxxxxxxC
CDP:C6C6C0C4C9C
The motif of Figure 83:
1)CxGxCsPFExPPCxssxCrCxPxxlxxGxcxxPxxxxxxxkxxxxHxnlCxsxxxCxkkxsGcFCxxYPNxxixxGWC
2)CxGxCxPFExPPCxxxxCxCxPxxxxxGxcxxPxxxxxxxxxxxxHxxxCxxxxxCxxxxxGxFCxxYPNxxxxxGWC
3)CxxxCxxxxxxxCxxxxCxCxxxxxxxxxcxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxCxxxxxxxxxxGxC
The motif of Figure 85:
1)CCPCxxCxYxxGCPWGqxxxxxgC
2)CCPCxxCxYxxGCPWGxxxxxxxC
3)CCxCxxCxxxxxCxxxxxxxxxxC
CDP:C0C1C2C5C10C
The motif of Figure 86:
1)CxgxxgxRxxxxxxxxxCxDCxNxxRxxxxxxxCrxxCxxxxxFxxC
2)CxxxxxxRxxxxxxxxxCxDCxNxxRxxxxxxxCxxxCxxxxxFxxC
3)CxxxxxxxxxxxxxxxxCxxCxxxxxxxxxxxxCxxxCxxxxxxxxC
CDP:C16C2C12C3C8C
The motif of Figure 87:
1)CxCxxxxPxxrxxxxxGxx(x)xxxxxC(x)xxxxxWxxCxxxxxxxxxCC
2)CxCxxxxPxxxxxxxxGxx(x)xxxxxC(x)xxxxxWxxCxxxxxxxxxCC
3)CxCxxxxxxxxxxxxxxxx(x)xxxxxC(x)xxxxxxxxCxxxxxxxxxCC
CDP:C1C21-22C8-9C9C0C
The motif of Figure 88:
1)CxxnCxqCkxmxgxxfxgxxCaxsCxkxxGkxxPxC
2)CxxxCxxCxxxxxxxxxxxxCxxxCxxxxGxxxPxC
3)CxxxCxxCxxxxxxxxxxxxCxxxCxxxxxxxxxxC
CDP:C3C2C12C3C10C
The motif of Figure 89:
1)CxxxCxxCxxxxxxxxxxxnxxxCxleCxxxxxxxxxWxxC
2)CxxxCxxCxxxxxxxxxxxxxxxCxxxCxxxxxxxxxWxxC
3)CxxxCxxCxxxxxxxxxxxxxxxCxxxCxxxxxxxxxxxxC
CDP:C3C2C15C3C12C
The motif of Figure 90:
1)CdxxxxxsxCqmxxxxCxxaxxCxxxieeCktsxxexC
2)CxxxxxxxxCxxxxxxCxxxxxCxxxxxxCxxxxxxxC
CDP:C8C6C5C6C7
The motif of Figure 91:
1)CxGxdrPCxxCCPCCPGxxCxxxexxgxxyC
2)CxGxxxPCxxCCPCCPGxxCxxxxxxxxxxC
3)CxxxxxxCxxCCxCCxxxxCxxxxxxxxxxC
CDP:C6C2C0C1C4C10C
The motif of Figure 92:
1)CxxxxxxCCxxxxxxCxxxxxCxxxxxxCxxxC
2)CgxxxxyCCsxxgxyCxwxxvCyxsxxxCxkxC
3)CxxxxxxCCxxxxxxCxxxxxCxxxxxxCxxxC
CDP:C6C0C6C5C6C3C
The motif of Figure 93:
1)
CxxxxxCxxCxxxxxx(x)xCxWCxx(x)xxxCxxxx(xxxxxx)xCxxxx(xxxxxxxxx)xxxxxxC
2)
CxxxxxCxxCxxxxxx(x)xCxxCxx(x)xxxCxxxx(xxxxxx)xCxxxx(xxxxxxxxx)xxxxxxC
CDP:C5C2C7-8C2C5-6C5-11C10-19C
The motif of Figure 95:
1)CxxxxxxxRxxCgxxxitxxxCxxxgCCfdxxxxxxxwC
2)CxxxxxxxRxxCxxxxxxxxxCxxxxCCxxxxxxxxxxC
3)CxxxxxxxxxxCxxxxxxxxxCxxxxCCxxxxxxxxxxC
CDP:C10C9C4C0C10C
The motif of Figure 96:
1)CsvtCgxGxxxRxrxCxxxx(pxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
2)CxxxCxxGxxxRxxxCxxxx(xxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
3)CxxxCxxxxxxxxxxCxxxx(xxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
CDP:C3C10C9-12C9-12C4-5C
The motif of Figure 97:
1)CxxCxCxx(x)sxppxCxCxDxxxx(x)C
2)CxxCxCxx(x)xxxxxCxCxDxxxx(x)C
3)CxxCxCxx(x)xxxxxCxCxxxxxx(x)C
CDP:C2C1C7-8C1C6-7C
The motif of Figure 99:
1)CxxCGPxxxGxCxGPxiCCGxxxGCxxGxxxxxxCxxexxxxxPCxxxxxxCxxxxGxCxxxGxCCxxxxCxxdxxC
2)CxxCGPxxxGxCxGPxxCCGxxxGCxxGxxxxxxCxxxxxxxxPCxxxxxxCxxxxGxCxxxGxCCxxxxCxxxxxC
3)CxxCxxxxxxxCxxxxxCCxxxxxCxxxxxxxxxCxxxxxxxxxCxxxxxxCxxxxxxCxxxxxCCxxxxCxxxxxC
CDP:C2C7C5C0C5C9C9C6C6C5C0C4C5C
The motif of Figure 101:
1)CDCGxxxxC(xx)xxxCC(x)xxxxCxlxxxxxCx(xx)xgxCCx(x)xCxxxxxxxxCrxxxx(x)xCxxxxxCxGxxxxC
2)CDCGxxxxC(xx)xxxCC(x)xxxxCxxxxxxxCx(xx)xxxCCx(x)xCxxxxxxxxCxxxxx(x)xCxxxxxCxGxxxxC
3)CxCxxxxxC(xx)xxxCC(x)xxxxCxxxxxxxCx(xx)xxxCCx(x)xCxxxxxxxxCxxxxx(x)xCxxxxxCxxxxxxC
CDP:C1C5C3-5C0C4-5C7C4-6C0C1-3C8C6-7C5C6C
The motif of Figure 102:
1)CCxxxxgxxxCCPxxxxxCCxDxxHCCPxgxxCxxxxxxC
2)CCxxxxxxxxCCPxxxxxCCxDxxHCCPxxxxCxxxxxxC
3)CCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxC
CDP:C0C8C0C6C0C5C0C5C6C
The motif of Figure 104:
1) medicated cap (tCtxxxxCxxax) n
2) medicated cap (xCxxxxxCxxxx) n
The motif of Figure 105:
1)Cxx(x)Cxx(xxxx)xxxxCxxxx(xxxx)xxxRCWxxxxxxCQxxxxxxCxxxCxx(x)xxCxxxxxxxCChxxCxggCx(xx)xPxx(x)xxCxaCxxfxxxgxCxxxCP
2)Cxx(x)Cxx(xxxx)xxxxCxxxx(xxxx)xxxRCWxxxxxxCQxxxxxxCxxxCxx(x)xxCxxxxxxxCCxxxCxgxCx(xx)xPxx(x)xxCxxCxxxxxxxxCxxxCP
3)Cxx(x)Cxx(xxxx)xxxxCxxxx(xxxx)xxxxCxxxxxxxCxxxxxxxCxxxCxx(x)xxCxxxxxxxCCxxxCxxxCx(xx)xxxx(x)xxCxxCxxxxxxxxCxxxC
The motif of Figure 106:
1)xxx[wyfl]xxxxCxCxCx
2)xxxxxxxxCxCxCx
The motif of Figure 110:
1)CxsxxxxxCxxxxxxx(xx)xxxxxCxx(x)xxxxCxxxxxx(x)xxxxrGCxxxxxxxxxxxCx(x)xxxxCxxCxxx(x)xCNxxxxxpxxxxxCxqCxgxxxxx[cx]xxxxxxlxxxxCxxxx(x)xxxxCyxxxxx(xxx)xxxxRGCxxxxxxxxx[cx]xdxxCxxC
2)CxxxxxxxCxxxxxxx(xx)xxxxxCxx(x)xxxxCxxxxxx(x)xxxxxGCxxxxxxxxxxxCx(x)xxxxCxxCxxx(x)xCNxxxxxxxxxxxCxxCxxxxxxx[cx]xxxxxxxxxxxCxxxx(x)xxxxCxxxxxx(xxx)xxxxRGCxxxxxxxxx[cx]xxxxCxxC
3)CxxxxxxxCxxxxxxx(xx)xxxxxCxx(x)xxxxCxxxxxx(x)xxxxxxCxxxxxxxxxxxCx( xxxxCxxCxxx(x)xCxxxxxxxxxxxxCxxCxxxxxxx[cx]xxxxxxxxxxxCxxxx(x)xxxxCxxxxxx(xxx)xxxxxxCxxxxxxxxx[cx]xxxxCxxC
The motif of Figure 111:
xxxxxxCxxxxxx(x)Ctxxx(xx)xg(x)xxCxxxxxxCxxyxxxxxCxxxx(xx)xxxxxCxWxxxx(x)xxCxxxx(xxxx)Cx
xxxxxxCxxxxxx(x)Cxxxx(xx)xx(x)xxCxxxxxxCxxxxxxxxCxxxx(xx)xxxxxCxWxxxx(x)xxCxxxx(xxxx)Cx
xxxxxxCxxxxxx(x)Cxxxx(xx)xx(x)xxCxxxxxxCxxxxxxxxCxxxx(xx)xxxxxCxxxxxx(x)xxCxxxx(xxxx)Cx
The motif of Figure 113:
1)nxCtxdxCxxxxgCxxxxxxCxxx
2)CxxxxCxxxxxCxxxxxxCxxx
CDP:C4C5C6C3
The motif of Figure 114: xxxx[cx] xxCxxx[Cx] xxCxxxCxxxx
The motif of Figure 21 0: xxCxxxCxxxCxx (x) xCxx CDP:2C3C3C3-4C2
The motif of Figure 123:
1)CtxxGxxxC(vilm)CxGxxxCGxGxxCxxxxxGxxnxC
2)CxxxGxxxCxCxGxxxCGxGxxCxxxxxGxxxxC
3)CxxxxxxxCxCxxxxxCxxxxxCxxxxxxxxxxC
CDP:C7C1C5C5C10C
The motif of Figure 162:
1)CxxxxCxxxxxCxxx(x)xxxxxxCx(x)CxxxCxxxxxx(x)xxxCxxdxxtyxxxCxxxxaxCxxxxxxxxxxxgxC
2)CxxxxCxxxxxCxxx(x)xxxxxxCx(x)CxxxCxxxxxx(x)xxxCxxxxxxxxxxCxxxxxxCxxxxxxxxxxxxxC
CDP:C4C5C9-10C1-2C3C9-10C10C6C13C
Figure 13 shows amino acid whose abundance figure in the protein.
Figure 17-18,74,84,94,98,100 shows the firsts and seconds structure of exemplary sequence.
Figure 19 and 36 sequence alignments that show between various invertebratess and the phytoprotein.
Figure 103 shows the sequence and the tertiary structure of granulin.
Figure 107 shows CXC motif repeating part.
Figure 108 shows the sequence of VEGF C-end structure territory and Balbiani ring (balbani ring) secretory protein.
Figure 109 shows the deduction structure of the repeating part that contains cysteine.
Figure 112 and 116 shows the exemplary sequence that contains the repetitive proteins matter of cysteine.
Figure 117 shows the structure of exemplary antifreeze protein.
Figure 118 shows the structure of erabutoxin (erabutoxin).
Figure 119 shows the structure of clump albumen (plexin).
Figure 120 shows the proteic sequence of clump.
Figure 121 shows the structure of somatomedin (somatometin).
Figure 122 shows the SDS-PAGE gel that separates the microprotein of expressing according to molecular weight.
Figure 124 shows the affinity maturation scheme of the repetitive proteins matter be used to be rich in cysteine.
Figure 125 shows the structure of granulin repetitive proteins matter.
Figure 126 shows randoming scheme.
Figure 127 shows the structure and the sequence of the deutero-repetitive proteins matter of antifreeze protein.
Figure 128 shows the design of spiral type repetitive proteins matter support.
Figure 129 shows the affinity maturation scheme of repetitive proteins matter.
Figure 130-132 shows the name that contains cysteine repetitive proteins matter.
Figure 133 shows the deutero-repetitive proteins matter of A-domain.
Poly--trilobal that Figure 134 shows (trefoil) support.
That Figure 135 shows is many-clump albumen support.
Figure 136 shows little collagen scaffold.
Figure 137-142,160 shows various affinity maturation schemes.
Figure 143 shows plasmid cyclisation and big primer (megaprimers).
Figure 144 is a hydrophobicity profile.
Figure 145 shows that expansion contains the whole bag of tricks in cysteine minor structure territory.
Figure 146-147 shows that various use antifreeze proteins connect the method for different structure.
Figure 148 shows the strategy that designs the library.
Figure 149 shows A-domain structure.
Figure 150 is the folding sketch map of the inductive microprotein of target.
Figure 151 shows the structure organization and the sequence of folliculus chalone (follistatin) domain.
Figure 152-153 shows the structure diversity that contains cysteine protein matter.
Figure 154-155 shows by disulfide bond the shuttle back and forth structure evolution and the natural evolution that contains cysteine protein matter of reorganization (shuffling).
Figure 156 shows 508 kinds of proteinic families that contain disulfide bond.
Figure 157 shows the sequence relation between the different integrins.
Figure 158 shows the comparison of various product forms.
Figure 159 shows various microprotein product forms.
Figure 161 shows the immunogenic mechanism of reduction.
Figure 162 shows the gel of demonstration by the various supports of escherichia coli expression.
Figure 163 shows that the bonded combination of HLA-reduces.
Figure 164 shows the sequence and the structure of various TNFR family microprotein
Figure 165 shows that 2-3-4 assembles (build-up) method.
Figure 166 shows the people of prediction and the MHCII binding affinity of microprotein.This figure shows that the score of every kind of protein that 5 main HLA allele are calculated distributes.Red curve: 26,000 kinds of total length human proteins, length intermediate value are 372 aminoacid.Blue curve: the microprotein of 10,525 kinds of 25-90 aminoacid (intermediate value is 38 aminoacid), contain at least 10% cysteine and even number cysteine, from disulfide bond mode database (22).Green curve: 26,000 kinds of human protein fragments that the size distribution with the microprotein data base is complementary.For every kind of human protein sequence, we have produced the fragment that is complementary with the proteinic length of selecting at random at random from our microprotein data base.5 kinds of MHCII combinations have been analyzed at the HLA allele (HLA*101, HLA*301, HLA*401, HLA*701, HLA*1501) of Caucasia crowd's high frequency existence.Use is based on the MHCII associate(d) matrix of TEPITOPE.Associate(d) matrix is downloaded from program ProPred.The TEPITOPE matrix does not contain the score of cysteine residues, and uses the alanine score to replace.For whenever
Figure A20068003404900311
Protein and each HLA allele, we have identified the highest TEPITOPE score.The meansigma methods of the top score by deducting everyone proteinoid makes each allelic data normalization.
The last figure of Figure 167 has shown that aminoacid is for the bonded affinity contribution of MHCII.The P1 score of all non-hydrophobic residue changes into-2 from-999 in the TEPITOPE matrix, to stop P1 score domination average.Aminoacid is according to they average orderings for each epi-position.The figure illustrates the average order of 5 HLA allele the most general (* 101, * 301, * 401, * 701, * 1501).Figure below shows that microprotein is with respect to amino acid whose relative abundance among the human protein.Use the sequence that provides among Figure 166 to calculate the aminoacid abundance of human protein and microprotein.Data show aliphatic hydrophobic residue I, V, M, L have the strongest contribution to immunogenicity, compare with the average man proteinoid, can not fully represent in microprotein.Therefore can realize the immunogenic reduction of protein: IVMLFYSNRAHQTGWKPED according to the following amino acid whose content of the high score of minimizing that puts in order from high to low.
Figure 168 shows the ELISA result of the VEGF microprotein of being expressed by phage clone, as the proof of 2-3-4 aggregation method.
Figure 169 is presented at the SDS-PAGE gel of microprotein under the reducing condition.The 1st road: somatomedin, the 2nd road: clump albumen, the 3rd road: toxin B, the 4th road: potato proteinase inhibitor, the 5th road: spider venom, the 6th road: alkali phosphatase contrast, the 9th road: molecular weight marker.
Figure 170 shows the library of oxidoreduction processing and the comparison in untreated library.
1. incorporated by reference
All publications of mentioning in this specification and patent application specifically and respectively are incorporated herein by reference as each independent publication or patent application all for all purposes are incorporated herein by reference.
2. detailed Description Of The Invention
All publications of mentioning in this specification and patent application specifically and respectively are incorporated herein by reference for all purposes as each independent publication or patent application all for all purposes are incorporated herein by reference.
This paper shows and has described the preferred embodiments of the invention, it will be apparent to one skilled in the art that what these embodiments just provided as an example. In the case of without departing from the present invention, it may occur to persons skilled in the art that a large amount of variations, change and replacement. The various replacement schemes that should be appreciated that embodiment of the present invention as herein described can be used in enforcement of the present invention.
Ordinary skill
Unless otherwise indicated, routine immunization well known in the art, biochemistry, chemistry, molecular biology, microbiology, cell biology, genome and recombinant DNA technology are used in enforcement of the present invention. Referring to Sambrook, Fritsch and Maniatis, MOLECULAR CLONING:A LABORATORY MANUAL, 2ndEdition (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (F.M.Ausubel waits people .eds., (1987)); Series METHODS IN ENZYMOLOGY (Academic Press, Inc.): PCR 2:A PRACTICAL APPROACH (M.J.MacPherson, B.D.Hames and G.R.Taylor eds. (1995)), Harlow and Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and ANIMAL CELL CULTURE (R.I.Freshney, ed. (1987)).
Definition
Term " protein " refers to the amino acid whose condensate of random length. This condensate can be straight chain also can be side chain, can comprise the amino acid of modification, and can be cut off by non-amino acid. This term also comprises modified amino acid condensate; Described modification for example is, disulfide bond formation, glycosylation, esterified, acetylation, phosphorylation or any other operation are for example with the marked member coupling. Term used herein " amino acid " refers to natural and/or non-natural or synthetic amino acid, comprises glycine and D or L optical isomer, and amino acid analogue and plan peptide. Protein can comprise one or more domains.
Term " domain " refers to a stable three-dimensional structure, and regardless of different kinds of how. The tertiary structure in typical structure territory is stable in solution, and this member is that separate or stable with remaining in the solution of other domain covalency fusions. The domain of this paper definition has the specific tertiary structure that the spatial relationship by secondary structure element such as beta sheet, alpha-helix and non-structure ring forms. In the domain of microprotein family, disulfide bond normally determines the main element of tertiary structure. In some cases, domain is the module that the specific function activity can be provided, described functional activity such as affinity (for a plurality of binding sites of same target), polyspecific (for the binding site of different targets), half-life (using domain, cyclic peptide or linear peptides), it and serum proteins such as human serum albumins (HSA) or IgG (hIgG1,2,3 or 4) or erythrocyte binding.
" ring " be to and the interactional affinity of target and the contributive cysteine of specificity between sequence, their amino acid forms the solubility that also affects protein, and solubility is very important for high concentrate formulation, for example be used for oral, through intestines, through skin, intranasal, through the preparation of lung, blood-brain barrier, family's injection and other administration route and form.
Term " microprotein " refers to the classification in the SCPO database. Microprotein normally has the protein of the minimum of fixed structure, typically still also not necessarily has to contain the few to 15 amino acid of two disulfide bond, perhaps contains 200 amino acid that reach of 10 above disulfide bond. Microprotein can contain one or more microprotein domains. Some microprotein domains or domain family can have a plurality of stable and a plurality of in varying degrees similarly by the structures of different disulfide bonding mode producing in varying degrees, so term " stable " is used for distinguishing microprotein and peptide and non-microorganism protein domain in relative mode. The most of microbe archon is comprised of the single structure territory, but the cell surface receptor microprotein has a plurality of domains usually. Microprotein also can be very little, because the folding of theirs passes through disulfide bond and/or stable by ion or multiple other multivalent ions such as calcium, magnesium, manganese, copper, zinc, iron, rather than stable by typical hydrophobic core.
Term " support " refers to be used as conservative common homotactic minimum polypeptide " framework " or " motif " in protein library makes up. Between the fixing or conservative residue/site of support, have variable and hv sites. In the variable region, provide very diversified amino acid between the fixed support residue, so that the specific binding with target molecule to be provided. Support is generally limited by the conserved residues of observing in the comparison of Serial relation protein families. Folding or structure may need the residue fixed, particularly in the function of the protein that is compared not simultaneously. Comprehensive description of microprotein support can comprise number, position or spacing and the bonding pattern of cysteine, and position and the identity of any fixedly residue in the ring, comprises for such as the isoionic binding site of calcium.
The main key pattern by disulfide bond of " folding " of microprotein (as, 1-4,2-6,3-5) limit. This pattern is the topology constant, is not having (for example) by reduction and oxidation (reductant-oxidant)Connect and connect again in the situation of disulfide bond, usually be not suitable for being converted into another pattern. Usually, the native protein that has correlated series is taked identical disulfide bonding pattern. Main determining factor be cysteine distance mode (CDP) and some fixing non--the cys residue, and metal binding site (if present). Under a few cases, folding of protein also is subjected to the impact of sequence (such as propetide) on every side, in some cases, allowed the impact of protein and the chemical derivatization (such as γ-carboxylated) of the residue that helps its folding bivalent metal ion (such as Ca++) combination. For most of microbe protein, do not need this folding help.
Yet the protein with same keys syntype may still comprise multiple folding, and this is based on its size is enough to make protein to have the length of ring of very different structures and the difference of composition.Example comprises conotoxin, ring toxin (cyclotoxin) and anato domain family, and they have identical DBP, but has very different CDP, therefore is considered to different folding.The determiner of protein folding is with respect to the folding any attribute that changes structure greatly of difference, as the difference (the particularly folding fixed ring residue that may need) of the motif that encircles between the spacing of the number of cysteine and bonding pattern, cysteine, cysteine or the position or the composition of calcium (or other metals or cofactor) binding site.
Term " disulfide bonding pattern " or " DBP " are meant the connection mode of cysteine, and these cysteine are numbered 1-n from proteinic N-end to the C-end.The disulfide bonding pattern is constant on topology, and the meaning is that they only can change by the connection of for example using the oxidoreduction condition to remove one or more disulfide bond.Possible 2-, 3-and 4-disulfide bond cysteine syntype are listed in the 0048-0075 section below.
Term " cysteine distance mode " or " CDP " are meant the number that separates the non-cysteine amino acids of cysteine on linear protein chain.Use several representations: C5C0C3C equals C5CC3C and equals CxxxxxCCxxxC.
Term " position n6 " or " n7=4 " are meant the ring between the cysteine, and " n6 " is defined as the ring between C6 and the C7; " n7=4 " meaning is the ring between C7 and the C8, and long 4 aminoacid do not calculate cysteine.
It is folding that term " reduction is separated folding " relates to folding protein separating in the presence of Reducing agent (for example dithiothreitol, DTT)." oxidation more folding " relates in the presence of oxidant from separating the folding approach that folds with reducing condition fully.
Term " complicated " be meant cysteine bonding pattern, wherein cysteine and the cysteine that is on average separated by a plurality of amino acid sites on linear α chain main chain passes through disulfide bonding." complexity " is quantified as total (accumulation) linear backbone distance that disulfide bond strides across.For example, the topological maximum of 3-disulfide bond is 9 (1-42-53-6=3+3+3), and minima is 3 (that is, 1-23-45-6).Complicated pattern may have how different folding owing to the length multiformity, but occurrence frequency is lower than the lower pattern of complexity.For example, observe maximum native sequences family of number and the highest structure of rigidity for pattern 1-42-53-6,1-62-43-5,1-52-43-6 and 1-42-63-5.All these is the most complicated pattern (the proteinic complexity of the 3SS of 3-9 grade must be divided into 9), shows that more complicated topology may can produce how different cysteine spacings, and is promptly more folding.Therefore, the frequency expection of eliminating or reducing simple disulfide bonding pattern (as 1-23-45-6) will improve the average that folds that every kind of disulfide bonding pattern forms (promptly, very different cys-at interval, as conotoxin with respect to cyclotide with respect to anato).A kind of straightforward procedure of removing the simple bonding pattern of great majority is to use and is less than about 9 amino acid whose ring lengths, because in native protein, and normally about 9 aminoacid of minimum range (being called " span ") between the cys residue that disulfide bond connects.The proteinic range in complexity of 2SS is 2-4, and 4SS protein is 4-16, and 5SS protein is 5-25.
" span " of term disulfide bond is meant the aminoacid distance between the cysteine of connection, do not comprise cysteine itself.Average span is a 10-14 aminoacid, preferred about 12, as shown in the following Table 1.Can utilize and make 11-14 the maximized cysteine spacing of aminoacid multiformity promote structure diversity, method comprises the disulfide bond (forming) of eliminating nearside between adjacent cysteine, and span is provided is a large amount of combinations of the cysteine residues of about 12 aminoacid (and 18,24 etc.).Example comprises CX 6CX 6CX 6CX 6CX 6C (' 3X6 '), CX 6CX 6CX 6CX 6CX 6CX 6CX 6C (' 4X6 '), CX 5CX 5CX 5CX 5CX 5C (' 3X5 '), CX 5CX 5CX 5CX 5CX 5CX 5CX 5C (' 4X5 ') or have 5-6,4-7 or the similar motif of the amino acid whose ring combination of 3-8.CX 6C and CX 5C lacks very much usually and can not make two adjacent cysteine bondings (minimum span is generally about 9 aminoacid), has stoped the formation of cyclic peptide structures, is sometimes referred to as " subdomain " or " micro structure territory ", and is not considered to complete domain usually.Some exemplary disulfide bond spans show in following table.
Table 1. disulfide bond span
Figure A20068003404900361
Term " is rich in the repetitive proteins (CRRP) of cysteine " but is meant the protein of " repetitive " (being also referred to as " module ", " repetitive sequence " or " member ") that typically not necessarily has the single polypeptide chain and comprise specific conserved amino acid sequence (" repeat pattern " or " repetition motif "), its cysteine content is higher than about 1%, preferably be higher than about 5%, perhaps even 10%.With to be rich in leucic repetitive proteins irrelevant, the latter comprises ankyrin family on sequence in this family.The CRRP unit is interact with each other, produces one and is independent of the folding macrostructure territory of other domains.Can be by the size of adding or the deletion repetitive is regulated CRRP.Preferred repetitive proteins includes but not limited to the repetitive sequence end to end of identical motif, and they are different from the substance complex sequences of being separated by irrelevant sequence usually.
Term used herein " pharmaceutically acceptable carrier " comprises any standard drug carrier, as phosphate buffer, water and Emulsion, as oil/water or water/oil emulsion and various types of wetting agent.Compositions also can comprise stabilizing agent and antiseptic, for example carrier, stabilizing agent and adjuvant, and referring to Martin, REMINGTON ' S PHARM.SCI., 15th Ed. (Mack Publ.Co., Easton (1975).
" pharmaceutical composition " comprises the combination of activating agent and inertia or active carrier, makes said composition be fit in external, the body or the diagnosis or the therapeutic use that exsomatize.
Term " non-natural exists " is meant when being used for nucleic acid or protein not at nucleic acid of finding naturally or protein.Nucleic acid and proteinic example that non-natural exists include but not limited to those nucleic acid and the protein of recombinant modified.
Contain the design of the protein and the protein library of cysteine
As detailed below, one aspect of the present invention is to produce the protein library with a large amount of structure diversities, people can select from this library and develop the conjugated protein with required character, be used for multiple use, include but not limited to treatment, prevention, veterinary, diagnosis, reagent or material applications.
In one embodiment, the invention provides and contain cysteine protein matter library, it has at least 2,3,4,5,10,30,100,300,1000,3000,10000 kind or more preferably different different structure on topology.In certain embodiments, contain cysteine protein matter library and comprise high disulfide bond density (HDD) protein.The protein of HDD family typically has the cysteine residues of 5-50% (5,6,7,8,9,10,12,14,16,18,20,25,30,35,40,45 or 50%), and each domain typically contains at least two disulfide bond and optional cofactor such as calcium or another ion.
The existence of HDD support allows these protein less and still take the structure of relative stiffness.Rigidity is important for the high binding affinity of acquisition, for the protease (comprising the protease (classification of protease sees below) that participates in antigen processing) and the toleration of heat, therefore these proteinic reduced immunogenicities or non-immunogenic is had contribution.The disulfide bond framework makes protein folding need not have a large amount of hydrophobic side chains to interact in most protein inside, and the latter is called as hydrophobic core.All non-HDD supports all have hydrophobic core, and hydrophobic core is the common source of specificity or folding problem.HDD protein tends to stronger than non-HDD protein hydrophobic, causes binding specificity to improve.Small size is for tissue infiltration fast and send as per os, per nasal, through intestinal for substituting, to send through lung, blood brain barrier etc. also be favourable.In addition, small size also helps to reduce immunogenicity.Higher disulfide bond density can be by improving the disulfide bond number or having identical disulfide bond number but less amino acid whose domain obtains by use.Also wish to reduce the fixedly number of residue of non-cysteine, so that the aminoacid of higher percent can combine with target.
The disulfide bond framework allows to have sequence polymorphism extremely in the ring between cysteine in each family.There are huge difference in length of encircling between the family and cysteine spacing.Because the combinatorial property that disulfide bond forms, disulfide bond framework allow to form a large amount of different bonding patterns and different structures, and because folding can be inhomogenous, therefore exist by orthogenesis and optimize structure and the evolutionary path gradually of sequence.Special prediction HDD protein has makes simple sequence adopt the stable folding unique ability of multiple difference.
In order to produce multiple disulfide bonding pattern, the multiple different condition that is beneficial to the different isomerization body with different disulfide bonding patterns (DBP) can be stood to have in the library.For example, can utilize the redox potential of solvent, it depends on the relative concentration and the intensity of reduction and oxidant, to realize the formation of different DBP.In order to produce the reproducibility solvent, can adopt multiple Reducing agent, include but not limited to 2 mercapto ethanol (beta-mercaptoethanol, BME), the cysteine of the glutathion (GSH) of 2-mercaptoethylmaine-HCl, TCEP (Tris (2-carboxyethyl) phosphine), sodium borohydride, dithiothreitol, DTT (DTT, reduction form), reduction form, reduction form.In order to produce the oxidisability solvent, can use multiple oxidant, include but not limited to dithiothreitol, DTT (DTT, oxidised form), hydrogen peroxide, glutathion (oxidised form, GSSG), the cysteine (cystine) of phenanthroline copper (oxidised form), oxygen (air), trace metal and oxidised form.
Useful especially is protein repeatedly to be formed and the mixture and the gradient of the reductant-oxidant of the disulfide bond that ruptures, and forms and the speed of fracture disulfide bond enough adopts multiple disulfide bonding pattern and allows stable form to accumulate in time allowing soon.Have maximum multiformity rather than stability if wish DBP, then can stop mixture to become equilibrium state.Condition (reduction fully, the high-temperature) flip-flop that will help high structure diversity is highly oxidized, cryogenic conditions, makes to form structure in the insufficient time, to find the most stable DBP.A kind of alternative method that produces structure diversity is slowly to form disulfide bond under diversified condition, as different chemical substances (as the volume eliminant, as Polyethylene Glycol, its quicken slow with the cysteine of apart from each other/form disulfide bond difficultly), different solvents (polarity, nonpolar, alcohol), different metal ion (Ca, Zn, Cu, Fe, Mg etc.) or different pH (pH1,2,3,4,5,6,7,8,9,10,11,12).Can utilize this diversified condition preparation independent or combination in any to adopt the multiple folding identical protein sequence that substitutes.
The formation of disulfide bond and/or the existence of cofactor can be by providing reduction or oxidant or the control easily by adding cofactor.
Protein folding is the character that the ability of multiple alternative rock-steady structure generally depends on the interior interactional number of bonding of protein and intensity and utilizable folding pathway.When not having disulfide bond, generally need a large amount of weak side chain contacts (salt bridge, Van der Waals contact, hydrophobic interaction etc.) to obtain stable folding protein.Therefore, in order to guide different alternative stable folding formation or to combine with target, many residues need to modify.On the contrary, as long as (for example two or three) disulfide bond just is enough to give the structure of protein stabilization a small amount of, and make every other amino acid sites (generally about 65-80%) can be used for the mating surface of generation at required target (in 80% the situation of surpassing, conotoxin is the most extreme example).Therefore disulfide bond is a kind of construction method (that is, taking place at the random sequence high frequency) of low information content, makes the most a high proportion of aminoacid can be used for combination and various other functions.
Do not contain the folding pathway and the interactions of a large amount of amino acid side chains of stability needs of the larger protein of disulfide bond, cause a high proportion of residue to fix more or less, so the ability that protein adapts to this sequence reduces greatly.This situation generally occurs in bigger scaffold protein such as immunoglobulin, fibronectin and the lipocalin protein, usually have only less CDR sample ring just can be randomized this moment, and do not cause misfolding, as containing the protein of hydrophobic core, mean irreversible protein aggregation for protein usually.The structure function that single disulfide bond of introducing by a pair of sudden change can be taken over a large amount of amino acid residues discharges their sequence and is that various objectives is evolved, and for example combines with required protein target.Even in non-HDD protein, add disulfide bond gradually and also can protein be played a crucial role in the complexity that improves continues to evolve.As if cysteine (C) is later than 20 kinds of amino acid whose repertoires of biology to be added, the frequency of cysteine shows rising gradually in the protein evolution process.
In addition, the folding of disulfide bond mediation makes protein hydrophobic stronger (because it has replaced hydrophobic core), and this proteinic misfolding does not cause irreversible gathering usually, but makes protein become soluble and final renaturation (renate).
The specific characteristic of disulfide bond is that same group of cysteine can connect by multiple alternative disulfide bonding pattern in principle, because disulfide bond makes up.For example, the protein with two disulfide bond may have three kinds of different disulfide bonding patterns (DBP), and the protein with three disulfide bond may have
Figure A20068003404900401
Plant different DBP, the protein with four disulfide bond has up to 105 kinds of different DBP.All 2SS DBP, most of 3SS DBP and have natural example less than half 4SS DBP.In one aspect, can calculate the sum of disulfide bonding pattern according to following formula:,
The disulfide bond number of the prediction that forms of n=cysteine residues wherein, wherein
Amassing of representative (2i-1), wherein i is the positive integer of 1-n.
Therefore, in one embodiment, what the invention provides that a kind of non-natural exists contains cysteine (C) support, it shows the binding specificity at target molecules, wherein this non-natural exists contains cysteine (C) support and comprises cysteine in the support according to the pattern that is selected from the arrangement that following general formula represents, the disulfide bond number that forms of the n cysteine residues that equals to predict wherein
Wherein representative (2i-1) is long-pending, and wherein i is 1 to n positive integer.In one aspect, cysteine (C) protein that contains that non-natural exists comprises the polypeptide with two disulfide bond, and described disulfide bond is according to being selected from C by cysteine contained in the polypeptide 1-2,3-4, C 1-3,2-4And C 1-4,2-3Pattern pairing and form, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of this polypeptide N-.On the other hand, cysteine (C) support that contains that non-natural exists comprises the polypeptide with three disulfide bond, and described disulfide bond is according to being selected from C by cysteine in the support 1-2,3-4,5-6, C 1-2, 3-5,4-6, C 1-2,3-6,4-5, C 1-3,2-4,5-6, C 1-3,2-5,4-6, C 1-3,2-6,4-5, C 1-4,2-3,5-6, C 1-4,2-6,3-5, C 1-5,2-3,4-6, C 1-5,2-4,3-6, C 1-5,2-6,3-4, C 1-6,2-3,4-5And C 1-6,2-5,3-4Pattern pairing and form, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of this polypeptide N-.On the other hand, non-natural exists contain that cysteine (C) protein comprises that a peptide species non-natural exists contain cysteine (C) protein, it shows the binding specificity at target molecules, comprise the polypeptide with at least four disulfide bond, described disulfide bond is to form according to being selected from the pattern pairing of arranging shown in the following formula by cysteine contained in the polypeptide.On the other hand, cysteine (C) protein that contains that non-natural exists comprises the polypeptide with at least five disulfide bond, and described disulfide bond is according to being selected from C by cysteine in the protein 1-9, C 1-10, C 2-9, C 2-10, C 3-9, C 3-10, C 4-9, C 4-10, C 5-9, C 5-10, C 6-9, C 6-10, C 7-9, C 7-10, C 8-9, C 8-10And C 9-10Pattern pairing and form, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of this polypeptide N-.On the other hand, non-natural exists contain cysteine (C) protein show at target molecule in conjunction with special
Figure A20068003404900411
, comprising polypeptide with at least six disulfide bond, described disulfide bond is according to being selected from C by cysteine in the protein 1-11, C 1-12, C 2-11, C 2-12, C 3-11, C 3-12, C 4-11, C 4-12, C 5-11, C 5-12, C 6-11, C 6-12, C 7-11, C 7-12, C 8-11, C 8-12And C 9-11, C 9-12, C 10-11, C 10-12And C 11-12Pattern pairing and form, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of this polypeptide N-.
General all cysteine all participate in same domain in the disulfide bonding of other cysteine.Microprotein with two disulfide bond (2SS) can be taked three kinds of (promptly not interconvertible by simple rotation) disulfide bonding pattern: 1-23-4,1-32-4 or 1-42-3 that different topologys is different, and each all has different α chain backbone structures.
Similarly, microprotein with three disulfide bond has the disulfide bonding pattern different up to 15 kinds, microprotein with 4 disulfide bond has the disulfide bonding pattern different up to 105 kinds, microprotein with 5 disulfide bond has the disulfide bonding pattern different up to 945 kinds, microprotein with 6 disulfide bond can have up to 10,395 kinds of disulfide bonding patterns, protein with 7 disulfide bond can have up to 135,135 kinds of different bonding patterns, equally so (multiple is 3 for higher disulfide bond number, 5,7,9,11,13 times).Listed proteinic disulfide bonding pattern (DBP) below with two, three or four disulfide bond.
The proteinic 3 kinds of DBP patterns of 2SS are:
1-2?3-4,1-3?2-4,1-4?2-3
The proteinic 15 kinds of DBP of 3SS are:
1-6?2-5?3-4,1-4?2-5?3-6,1-6?2-4?3-5,1-5?2-6?3-4,1-5?2-4?3-6,1-4?2-6?3-5,
1-2?3-4?5-6,1-2?3-5?4-6,1-2?3-6?4-5,1-6?2-3?4-5,1-4?2-3?5-6,1-5?2-3?4-6,
1-3?2-6?4-5,1-3?2-4?5-6,1-3?2-5?4-6。
The proteinic 105 kinds of DBP of 4SS are:
1-2?3-4?5-6?7-8 1-2?3-4?5-7?6-8 1-2?3-4?5-8?6-7 1-2?3-5?4-6?7-8 1-2?3-5?4-7?6-8 1-2?3-5?4-8?6-7
1-2?3-6?4-5?7-8 1-2?3-6?4-7?5-8 1-2?3-6?4-8?5-7 1-2?3-7?4-5?6-8 1-2?3-7?4-6?5-8 1-2?3-7?4-8?5-6?1-23-8?4-5?6-7 1-2?3-8?4-6?5-7 1-2?3-8?4-7?5-6 1-3?2-4?5-6?7-8 1-3?2-4?5-7?6-8 1-3?2-4?5-8?6-7
1-3?2-5?4-6?7-8 1-3?2-5?4-7?6-8 1-3?2-5?4-8?6-7 1-3?2-6?4-5?7-8 1-3?2-6?4-7?5-8 1-3?2-6?4-85-7
1-3?2-7?4-5?6-8 1-3?2-7?4-6?5-8 1-3?2-7?4-8?5-6 1-3?2-8?4-5?6-7 1-3?2-8?4-6?5-7 1-3?2-8?4-75-6
1-4?2-3?5-6?7-8 1-4?2-3?5-7?6-8 1-4?2-3?5-8?6-7 1-4?2-5?3-6?7-8 1-4?2-5?3-7?6-8 1-4?2-5?3-86-7
1-4?2-6?3-5?7-8 1-4?2-6?3-7?5-8 1-4?2-6?3-8?5-7 1-4?2-7?3-5?6-8 1-4?2-7?3-6?5-8 1-4?2-7?3-85-6
1-4?2-8?3-5?6-7 1-4?2-8?3-6?5-8 1-4?2-8?3-7?5-6 1-5?2-3?4-6?7-8 1-5?2-3?4-7?6-8 1-5?2-3?4-86-7
1-5?2-4?3-6?7-8 1-5?2-4?3-7?6-8 1-5?2-4?3-8?6-7 1-5?2-6?3-4?7-8 1-5?2-6?3-7?4-8 1-5?2-6?3-84-7
1-5?2-7?3-4?6-8 1-5?2-7?3-6?4-8 1-5?2-7?3-8?4-6 1-5?2-8?3-4?4-7 1-5?2-8?3-6?4-7 1-5?2-8?3-74-6
1-6?2-3?4-5?7-8 1-6?2-3?4-7?5-8 1-6?2-3?4-8?5-7 1-6?2-4?3-5?7-8 1-6?2-4?3-7?5-8 1-6?2-4?3-85-7
1-6?2-5?3-4?7-8 1-6?2-5?3-7?4-8 1-6?2-5?3-8?4-7 1-6?2-7?3-4?5-8 1-6?2-7?3-5?4-8 1-6?2-7?3-84-5
1-6?2-8?3-4?5-7 1-6?2-8?3-5?4-7 1-6?2-8?3-7?4-5 1-7?2-3?4-5?6-8 1-7?2-3?4-6?5-8 1-7?2-3?4-85-6
1-7?2-4?3-5?6-8 1-7?2-4?3-6?5-8 1-7?2-4?3-8?5-6 1-7?2-5?3-4?6-8 1-7?2-5?3-6?4-8 1-7?2-5?3-84-6
1-7?2-6?3-4?5-8 1-7?2-6?3-5?4-8 1-7?2-6?3-8?4-5 1-7?2-8?3-4?5-6 1-7?2-8?3-5?4-6 1-7?2-8?3-64-5
1-8?2-3?4-5?6-7 1-8?2-3?4-6?5-7 1-8?2-3?4-7?5-6 1-8?2-4?3-5?6-7 1-8?2-4?3-6?5-7 1-8?2-4?3-75-6
1-8?2-5?3-4?6-7 1-8?2-5?3-6?4-7 1-8?2-5?3-7?4-6 1-8?2-6?3-4?5-7 1-8?2-6?3-5?4-7 1-8?2-6?3-74-5
1-8?2-7?3-4?5-6 1-8?2-7?3-5?4-6 1-8?2-7?3-6?4-5。
Big low cysteine protein matter needs secondary, three grades and even quarternary structure widely, prevents to substitute the folding formation that substitutes of disulfide bonding pattern mediation.In microprotein, have only seldom or secondary or the tertiary structure except that the inductive structure of disulfide bond not, it is big especially to encircle the variability of sequence (primary structure) in aminoacid is formed between cysteine.Therefore microprotein more may have enough sequence motilities than other protein, so that they take multiple different bonding pattern.
A spot of cysteine can provide the diverse topology structure of highly diverse, just can not change mutually if mean their disulfide bond that do not rupture.The acquisition of these structures does not generally have sequence to require or has only minimum requirement encircling, and makes the ring sequence can be used for producing binding specificity and affinity to particular target.The specified protein sequence may show that other foldingly have intensive preference to some folding comparison, may can not adopt some folding at all.From the motif of natural microbial protein families, the spacing of cysteine may have contribution to DBP, and non-cys ring residue has less contribution.The average length scope of cysteine ring is about 0 to about 10 for most preferred support in the high disulfide bond density protein, for most of supports is about 3 to about 15 aminoacid, this provides the highdensity cysteine from about 50% to 25%-20% (most preferred) of some support to 15%-10% (less preferred) and even 5%, they all are much higher than the density of cysteine in the average protein, and the latter only is 0.8%.If wish, through engineering approaches is close cysteine closely, and disulfide bond effectively and is correctly formed.Effectively key forms and allows weak bond fracture of a plurality of circulation and new key to form again, causes the proteinic accumulation of the most stable bonding gradually.In the larger protein low-density of cysteine as if cause invalid, therefore may form by incorrect disulfide bond.
Expect that different disulfide bonding patterns is different on the stability to temperature and protease.Therefore, what the invention provides that a kind of non-natural exists contains cysteine (C) support, its (a) can combine with target molecule, (b) have at least two disulfide bond that form by cysteine pairing in the support and (c) be heated to above about 50 ℃, preferably be higher than about 80 ℃ or even be higher than about 100 ℃ temperature and continue behind 0.01 second to 10 seconds special time, to show the target binding ability.If hope, cysteine (C) support that contains that non-natural exists can be designed as the disulfide bond that contains at least three, four, five, six, seven, eight, nine, ten, 11,12 or more pass through cysteine pairing formation in the support.
The protein expection of more highly cross-linked (for example having the high complexity numeral) is than forming " Asia
Figure A20068003404900441
The structure territory ", contain one or two disulfide bond, but can be more stable with respect to the protein that proteinic other parts rotate freely.When relating to linear peptides, advantages of higher stability is relevant with (accumulation) length (being called folding " complexity ") of disulfide bond, and relevant with the number of times that intersects each other of disulfide bond among the DBP figure that adopts the linear peptides sequence.Yet different disulfide bonding pattern expections form with different output, have represented most of crosslinked forms minimumly.Promote disulfide bond for the cysteine proximity and form, the disulfide bond most probable between the adjacent cysteine takes place, and stable prospect is least optimistic, because they form little or subdomain.
Correspondingly, in some embodiments, the invention provides have that non-natural exists contain the proteinic protein library of cysteine (C), each protein contains and is no more than 35 aminoacid, wherein at least 10% aminoacid is cysteine in the polypeptide, form at least two disulfide bond by cysteine pairing in the support, and wherein pairing produces complexity index method greater than 3.In some other embodiment, the invention provides have that non-natural exists contain the proteinic protein library of cysteine (C), each protein contains and is no more than about 60 aminoacid, wherein at least 10% aminoacid is cysteine in the polypeptide, form at least four disulfide bond by the pairing of cysteine contained in the polypeptide, and wherein pairing produces complexity index method greater than 4,6 or 10.
In some aspects, described microprotein can show the activity at the picomole of particular target, and heating (even boiling) and protease are had toleration highly.In other respects, described microprotein tends to high hydrophobicity, and tends to each domain and have two different faying faces (two-sided).
Although every kind of disulfide bonding pattern is complementary with the different cysteine spacings of wide region in theory, some interval modes and particular key syntype are than more being complementary with another cysteine interval mode.In native sequences, there is multiple main cysteine interval mode relevant with every kind of disulfide bonding pattern.For example, conotoxin, cyclotide and anato family (thinking different folding) have very different cysteine spacings, but have identical disulfide bonding pattern.Therefore, the frequency distribution of cysteine spacing major decision disulfide bonding pattern, the design of CDP are a kind of practical approaches of control and evolution DBP and structure.The length of encircling between their cysteine of spacing decision of cysteine determines proteinic " folding " to a great extent.The protein that belongs to same sequence family has identical stent sequence or support motif, and this motif comprises the amino acid sites of all high conservatives and theirs is main
Figure A20068003404900451
Distance, they generally are considered to have identical " folding ".
Microprotein of the present invention can be monomer, dimer, trimer or higher polymer.The Multidomain microprotein can be a homopolymer, and perhaps they can be heteromultimers, and wherein domain is different on disulfide bond number, disulfide bonding pattern, structure, folding, sequence or support.Microprotein of the present invention can merge with multiple different structure, comprises that multiple different length, aminoacid are formed and the peptide of function (linearity or annular).Each domain can have one or more mating surfaces at different targets (for example two-sided), is similar to or is different from many natural toxins.
The present invention also provides the microprotein of the non-natural existence with single protein chain, and described protein chain comprises one or more domains and optional one or more (linear or annular) peptide.Usually each domain folds respectively and works.The microprotein domain has high disulfide bond density scheme " support ", and the size of latter's major decision domain, it is to stability and its expression (therefore determining production cost) in escherichia coli of temperature and protease.The expection support plays an important role in the proteinic immunogenicity of decision.Support comprises 4,6,8,10,12,14,16,18 or more cysteine, and these cysteine form 2,3,4,5,6,7,8 or more disulfide bond in same domain.
Some specificity 3-disulfide bond supports that preferably improved multiple character have conotoxin (29 aminoacid altogether, 7 fixed aminoacid, there is not the Ca site, owing to 1-42-53-6 disulfide bonding pattern is a rigid structure), cyclotides (24 aminoacid altogether, 10 fixed aminoacid, there is not the Ca site, rigidity 1-42-53-6 structure), Anato support (37 aminoacid altogether, 10 fixed aminoacid, there is not the Ca site, rigidity 1-42-53-6 disulfide bonding pattern), sozin 1 support (29 aminoacid altogether, 10 fixed aminoacid, there is not the Ca site, rigidity 1-62-43-5 bonding pattern), toxin 2 supports (29 aminoacid altogether, 10 fixed aminoacid do not have the Ca site, rigidity 1-42-63-5 disulfide bonding support), still multiple other existence also have special advantage with novel support.Other preferred supports have cellulose binding domain (CB, CEB), it is the PF00734 of Pfam family, has 173 members, length is 26 aminoacid (from first to a last Cys), the 1-32-4 that has 4 cysteine to connect, CDP is C10C5C9C; Alpha-conotoxin (AC), it is the PF07365 of family, has 25 members, length is 15 aminoacid, 4 1-32-4 that cysteine connects, CDP is C0C4C8C; ω-toxin sample (OT), it is the PF00451 of family, has 68 members, and length is 28 aminoacid, has the 1-4 2-5 3-6 that 6 cysteine connect, and CDP is C5C3C10C4C1C; Pacifastin (PC), it is the PF05375 of family, has 39 members, length is 29 aminoacid, has the 1-4 2-6 that 6 cysteine connect -5, CDP is C9C2C1C8C4C; Serpin (SP), it is the PF00299 of family, has 35 members, and length is 26 aminoacid, has the 1-42-53-6 that 6 cysteine connect, and CDP is C6C5C3C1C6C; Notch (NO), it is the PF00066 of family, has 175 members, and length is 33 aminoacid, has the 1-52-43-6 that 6 cysteine connect, and CDP is C7C8C3C4C6C; Trilobal (Trefoil, TR), it is the PF00088 of family, has 126 members, and length is 39 aminoacid, has the 1-52-43-6 that 6 cysteine connect, and CDP is C10C10C4C0C10C; TNF-receptor-sample (TN), it is the PF01821 of family, has 123 members, and length is 42 aminoacid, has the 1-23-54-6 that 6 cysteine connect, and CDP is C14C2C2C11C7C; Anaphylatoxin sample (AT), it is the PF01821 of family, has 123 members, and length is 37 aminoacid, has the 1-42-53-6 that 6 cysteine connect, and CDP is
Figure A20068003404900462
5C2C8C2C5C1C; Clump albumen (PL), it is the PF01437 of family, has 410 members, and length is 61 aminoacid, has the 1-42-83-64-7 that 8 cysteine connect, and CDP is C5C2C8C2C5C12C19C; Other preferred supports have three to refer to toxin (TF), and its length is approximately 58 aminoacid (first is to a last cys), has the 1-32-45-67-8 that 8 cysteine connect, and CDP is C13C6C16C1C10C0C4C; Somatomedin, its length are 35 aminoacid, have the 1-23-45-67-8 (it is known noting substituting DBP) that 8 cysteine connect, and CDP is C3C9C1C3C5C0C6C; Potato proteinase inhibitor (PI), its length are 47 aminoacid, have 8 cysteine, and CDP is C3C8C11C2C0C5C10C; Chitin binding structural domain (CHB), its length are 37 aminoacid, have the 1-42-53-67-8 that 8 cysteine connect, and CDP is C5C2C8C2C5C12C19C; Spider venom (ST), its length are 34 aminoacid, have 6 cysteine, and CDP is C6C6C0C4C6C; Toxin B (TB), its length is 34 aminoacid, has 6 cysteine, CDP is C6C5C0C3C8C; Cellulose binding domain (CEB), its length are 26 aminoacid, have the 1-3 2-4 that 4 cysteine connect, and CDP is C10C5C9C; Alpha-conotoxin (AC), its length are 15 aminoacid, have the 1-3 2-4 that 4 cysteine connect, and CDP is C0C4C8C;
The microprotein that non-natural of the present invention exists can design based on the native protein sequence.For example, a large amount of native proteins or wherein contained domain have the attractive feature as scaffold protein.The example of indefiniteness is listed in table 2.
Table 2
Protein families Other exemplary members in this family
Insulin-Like
The toxicity hair clip Heat-staple enterotoxin, neurotoxin B-IV
Knottins Phytohemagglutinin, antimicrobial peptide (dambonite-sample agglutinin (agglutinin) domain), antimicrobial peptide 2, AC-AMP2)
Plant rennet and amylase inhibitor Trypsin inhibitor, Carboxypeptidase A inhibitor, alpha-amylase inhibitor
Cyclotides Kalata B1, Cycloviolacin O1, circulin A, Palicourein
The Gurmarin-sample
Wild grey albumen-associated protein
ω-toxin-sample Conotoxin, spider venom, insect toxins, albumin 1
Charybdotoxin-sample Long-chain charybdotoxin (charybdotoxin, alpha toxin, Tx10 α-sample toxin, LQH III α-sample toxin) short chain charybdotoxin, sozin MGD-1, insecticide sozin, plant defense element
Cellulose binding domain Cellobiohydrolase I
The growth factor receptors domain Insulin-like growth factor binding protein-5IGFBP-5,1 type IGF-1 Cys-is rich in domain, receptor protein-tyrosine kinase Erbb-3Cys-is rich in domain, and EGF receptor Cys-is rich in domain, proto-protein Her2 extracellular domain
Colipase-sample (former) colipase enterotoxin 1
The EGF/ laminin EGF-pattern piece (factors IX, coagulation factors VIIa, E-selects plain, factor X, the N-terminus module, activated protein c (autoprothrombin IIa), PGH2 synthase-1, EGF-original mold piece, palatelet-selectin, epidermal growth factor (EGF), transforming growth factor, epiregulin, the EGF-domain, Betacellulin-2, heparin-associative list skin growth factor HBEGF, activator of plasminogen (urokinase type), transfer protein alpha, the EGF domain, thrombomodulin, fibrillin-1, with the bonded mannose of serine protease-conjugated protein 2, complement C1S, complement proteases C1R, activator of plasminogen (tectotype) is (tPA), low density lipoprotein, LDL (LDL) receptor) integrin β EGF-spline structure territory, the EGF-spline structure territory of nestin-1, laminin-pattern piece, laminin γ 1 chain, follistatin module N-end structure territory FS-N, the domain of BM-40/SPARC/ osteonectin, the domain of follistatin, merozoite surface protein 1 (MSP-1)
Bromelain enzyme inhibitor VI (cystatin)
The Bowman-Birk inhibitor
Elastin-sample Elastin, Elastase specificity inhibitor, Nawaprin
The anti-hemostasis of Hirudo protein The Huristasin-sample, hirudin-sample
The granulin repeating part The N-end structure territory of granulin-1, Oryzain β chain
Full factor CART (transcript that cocaine and amfetamine are regulated)
The DPY module Dumpy
Bubble albumen
The PMP inhibitor
1 type TSP-1 repeats Thrombospondin-1
?AmbV
The ophiotoxin sample Ophiotoxin (erabutoxin B, γ-cardiotoxin, Faciculin, the muscarinic toxin, erabutoxin A, neurotoxin I, cardiotoxin V4II (toxin III), cardiotoxin V, α-cobratoxin, long neurotoxin venom 1, FS2 toxin, bungarotoxin, Bucandin, cardiotoxin CTXI, cardiotoxin CTX IIB, cardiotoxin II, cardiotoxin III, cardiotoxin IV, cobratoxin 2, alpha-toxin, neurotoxin II (cobratoxin B), toxin B (long neurotoxin venom), Candotoxin, Bucain) Dendroaspin
The BPTI-sample
The extracellular domain of (people) cell surface receptor CD59, II type activin receptor, bmp receptor Ia ectodomain, TGF-β II receptor extracellular domain
Sozin-sample Sozin, defensin 2, creatoxin
The hairpin loop that contains domain-sample The APPLE domain
Neurotoxin III (ATXIII)
LDL-receptor-original mold piece
Herba Astragali Sinici albumen-sample
The Kringle-sample The Kringle module, II type fibronectin
Kazal-type serpin
Plant protease inhibitor
Trefoil/ clump protein structure domain-sample Trefoil, clump albumen
Necrosis induction albumen 1, NIP1
The cystine knot cytokine The PDGF-sample, TGF-β-sample, Noggin, the household management nutrient protein, gonadotropin/follitropin, interleukin-17 F, coagulagen,
The complement control module, the SCR domain CD46, beta 2-glycoprotein, complement receptor 1,2 (cr1, cr2), complement C1R and C1S protease domain, MASP-2
Actinocongestin k
Blood coagulation inhibitor (lysin) Echiststin, Flavoridin, kistrin, Obtustatin, Salmosin, Schistatin
The methylamine dehydrogenase, the L chain
Serpin The ATI-sample, the BSTI-sample
TB-module/8-cys domain Fibrillin, TGFb-conjugated protein-1
TNF receptor sample TGF-R, NGF-R, BAFF-receptor
Heparin binding structural domain from VEGF
Antifungal protein (AGAFP)
I type fibronectin module Fibronectin, tissue plasminogen activator, t-PA
I type Elityran domain
The plain binding structural domain of X fiber type, CBDX
Cellulose is stopped domain, stops
Carboxyl peptide enzyme inhibitor
The invertebrates chitin is conjugated protein
Pheromone ER-23
The Mollusca pheromone
Hormone receptor antigen
The SM-B domain
The Notch domain
Little collagen I, C-end structure territory
Hormone receptor domain (HRM)
Phylaxin
YAP1 oxidoreduction domain
The GLA domain
Cholecystokinin A receptor N-domain
HIV-1 VPU cytoplasm domain
HIPIP (high-potential iron protein)
Ferredoxin thioredoxin reductase (FTR), catalytic (chain
C2H2 and C2HC zinc refer to
Zn2/Cys6 DNA-binding structural domain
The glucocorticoid receptor (GR) sample
The SBT domain
Retrovirus zinc-finger structure territory s
Rubredoxin-sample
Ribosomal protein L36
The zinc binding structural domain of translation initiation factor 2 β
B-box zinc binding structural domain
The RING/U-box
The bonded protein β of Pyk2-ARF-GAP domain
Metallothionein
The zinc domain of guarding in the transcription factor that yeast copper is regulated
Ada DNA repair structure territory
Cysteine is rich in domain
FYVE/PHD zinc refers to
The Zn-binding structural domain of ADDBP
The inhibitor of apoptosis (IAP) repetitive sequence
CCCH zinc refers to
The Zinc finger domain of archaeal dna polymerase α
The TAZ domain
Be rich in the DNA binding structural domain (DM) of cysteine
The DnaJ/Hsp40 cysteine is rich in domain
The CCHHC domain
The SecC motif
TSP
3 type repetitive sequences
The design of protease resistant microprotein is important making immunogenicity aspect minimizing.Many natural microbial protein are protease inhibitor.Referring to, Rao, people such as M.B.. (1998) Molecularand Biotechnological Aspects of Microbial Proteases.Microbiol Mol Biol Rev.62 (3): 597-635.According to the standard of international biochemistry and NK of molecular biology community (NomenclatureCommittee of the International Union of Biochemistry and Molecular Biology), protease is divided into the 4th subgroup (hydrolase) in the 3rd group.Yet protease also is not easy to follow the common system of enzyme nomenclature, because their effect and structure have huge multiformity.Current, protease-based is classified in three main standard: (i) by the type of catalytic reaction, and the (ii) chemical property of catalytic site and (iii) about the evolutionary relationship of structure.
Protease is further divided into two main groups according to site of action, that is, and and exopeptidase and endopeptidase.The exopeptidase cutting is near the peptide bond of substrate amino or carboxyl terminal, and the endopeptidase cutting is away from the peptide bond of substrate end.Based on the functional group that the active site place exists, protease further is categorized as 4 main groups, that is, and and serine protease, aspartic protease, cysteine proteinase and metalloproteases.Have the minority mixing protease can not accurately be carried out criteria classification, for example its activity needs the ATP-dependence protein enzyme of ATP.Based on their aminoacid sequence, protease is classified as different families, and further is categorized as " clan (clans) " again, to adapt to the ramose peptidase group from the common ancestor.The code letter of distributing an expression catalytic type for each peptide enzyme family for example, is respectively S, C, A, M or U for serine, cysteine, aspartic acid, metal or UNKNOWN TYPE.
Exopeptidase: exopeptidase only works near the polypeptide chain end.Based on their site of action is at the N end or at C-terminal, they are classified as aminopeptidase and carboxypeptidase respectively.
Aminopeptidase: aminopeptidase discharges single amino acids residue, dipeptides or tripeptides in the free N-terminal effect of polypeptide chain.
Carboxypeptidase: carboxypeptidase discharges single amino acids residue or dipeptides in the free C-terminal effect of polypeptide chain.Based on the character of enzyme active sites place amino acid residue, carboxypeptidase can be divided into three main classes
Figure A20068003404900531
: serine carboxypeptidase, metallocarboxypeptidase and cysteine carboxypeptidase.
Endopeptidase: endopeptidase is characterised in that they are preferentially acting on away from the peptide bond place in the polypeptide chain interior zone of N and C-terminal.The existence of free amine group or carboxyl has adverse effect to enzymatic activity.Endopeptidase is divided into four subclass based on catalyst mechanism: (i) serine protease, (ii) aspartic protease, (iii) cysteine proteinase and (iv) metalloproteases.
The human protein enzyme: cathepsin B, C, H, L, S, V, X/Z/P and 1 are the cysteine proteinases of papain family.Known tissue protease L and cathepsin S participate in the antigen processing in the antigen-presenting cell.Cathepsin C is also referred to as DPPI (dipeptides acyl-peptidase I).Cathepsin A is a serine carboxypeptidase, and cathepsin D and E are aspartic proteases.As the lysosomal protein enzyme, cathepsin plays an important role in protein degradation.Because their distribution again or levels in human and animal's tumor raise, cathepsin works in may and shifting in invasion and attack.Cathepsin synthesizes the proenzyme of non-activity, and is processed as sophisticated organized enzyme.The endogenous protein inhibitor as cystatin and some serpins, suppresses organized enzyme.Other cathepsins are protease G, D and E in a organized way.
People can transform has fibrinolysin, Chymotrypsin, trypsin, Carboxypeptidase A, protaminase, adipose cell protease/factor D, kallikrein, human protease 3 (Sigma), thrombin to its other people albuminoid enzyme with pharmaceutical grade protein of resistance.
In addition, naturally occurring HDD protein can be used to design microprotein of the present invention.Natural HDD protein comprises multiple zooblast surface receptor protein matter family, and defence (i.e. picked-up) and attack (injectable) zootoxin, for example toxalbumin of Serpentis, Aranea, Scorpio, spiral shell and sea anemone.These protein classifications are common to be that they are positioned at host-environment/pathogen at the interface.Described herein these and any other native protein are as exemplary support, and it is fit to produce the cysteine support that non-natural of the present invention exists.
Interested especially is this protein of (in host and pathogen) at the interface, and they tend to have the molecule back-up system of specialization, make them can adapt to their sequence fast.Example has pilin, the antibody forming system in the vertebrates, trypanosomicide variant surface glycoprotein, plasmodium surface protein matter (they are actually microprotein) and many other examples in neisser's coccus and other antibacterials.Clearly observe the quick adaptation of the AA sequence of microprotein, the similarity of their sequence inclines
Figure A20068003404900541
In beguine lower according to genome sequence similarity prediction.Adapt to fast sequence keep simultaneously stoping the ability of the rigid structure of being attacked by protease (but not necessarily identical structure) may be this proteinoid in zoogenesis independently repeatedly (7 times) raise with reason as toxin source.Repeat to raise and show that this proteinoid has the feature that especially can be used for making up toxin.Other constant characteristic have small size (they are minimum unfolded proteins) and they for protease and temperature stability extremely.
Receptor protein and toxin show the rapid serial rate of change, make closely-related spiral shell toxin seem irrelevant fully.Tachytelic evolution is considered to the basic feature of toxin, because venom need be caught up with the variation (for the toxin resistance, showing the evolution rate that improves) of multiple receptor protein in the multiple prey kind.A very useful feature of this class is the low immunogenicity that the protease stability of high disulfide bond density support is brought, as described in many pieces of publications.But this may be important for the resistance of being avoided generation contratoxin in being bitten the prey of fleeing from.Because receptor and toxin all need to adapt to fast sequence, therefore in some cases these two kinds all to contain HDD microprotein domain be not surprising.For example, contain the extracellular domain of ophiotoxin and human cell surface receptor, the part of some of them and same structure (as TGF β-TGF beta-receptor) interaction based on the ophiotoxin sample protein classification of structure (by protein structure classification (SCOP) database definition).Exemplary protein comprises ophiotoxin sample protein, for example the extracellular domain of ophiotoxin and human cell surface receptor.The non-limiting example of ophiotoxin comprises erabutoxin B, γ-cardiotoxin, Faciculin, the muscarinic toxin, erabutoxin A, neurotoxin I, cardiotoxin V4II (toxin III), cardiotoxin V, α-cobratoxin, long neurotoxin venom 1, the FS2 toxin, bungarotoxin, Bucandin, cardiotoxin CTXI, cardiotoxin CTX IIB, cardiotoxin II, cardiotoxin III, cardiotoxin IV, cobratoxin 2, alpha-toxin, neurotoxin II (cobratoxin B), toxin B (long neurotoxin venom), Candotoxin, Bucain.The non-limiting example of the extracellular domain of (people) cell surface receptor comprises CD59, II type Activin receptor, bmp receptor Ia ectodomain, TGF-β II receptor extracellular domain.
In most of natural HDD protein familieses, independent disulfide bond support can provide high-caliber rigidity, and this helps high-affinity by being avoided induced-fit with relevant entropy point penalty.In multiple microprotein family, just in time have 4,6,8 or 10 kind of cysteine residues it seems and can determine fully such as main character such as proteinic structure, heat stability and protease resistants, make in the ring all (for example in conotoxins) or other nearly all residues freely adopt the required any preface of binding specificity simultaneously
Figure A20068003404900551
Cysteine provides crucial function, sequence definition minimum (" low information content "), and this helps this support statistically and replenishes with respect to the independence of the alternative support with more fixed amino acid and bigger information content.For example, 2 extra fixed amino acid have improved information content and have reduced the frequency that one group of random sequence predicting is additional or take place, and 20x20=400-doubly.Protein stability based on the amino acid whose similar level of non-cys will adopt more residue, produce bigger and/or the lower protein of adaptability on evolving.
A kind of source of the structure diversity of natural toxin is by proving that HDD (high disulfide bond multiformity) albumen puts on the difference in length of demonstration and cause when evolving.Describe Serpentis in detail and de-connect albumen (Calvete, J.J., Moreno-Murciano, M.P., Theakston, R.D.G., Kisiel, D.G. and Marcinkiewicz, C. (2003) Snake venom disintegrin:Novel dimeric disintegrin and
Figure A20068003404900552
Tructural diversification by disulfphide bond engineering.Biochem J.372:725-734.Calvete, J.J., Marcinkiewcz, C., Monleon, D., Esteve, V., Celda, B., Juarez, P. and Sanz, L. (2005) Snake venom disintegrin:Evolution of structure and function.Toxicon45:1063-1074).
The disappearance (or insertion/interpolation) of the proteinic Gene Partial of big HDD of encoding can produce a large amount of less (or bigger) variants, although they and former sequence homology will be considered to different structures.In disclosed example, most of disulfide bond are guarded, but the minority cysteine forms new bonding pattern.Its natural mechanism may relate to modification, mRNA alternative splicing, degraded, protein (striding-) montage or other forms of truncate or interpolation, alternative translation and degraded or the other forms of truncate at arbitrary end on the dna level.Regardless of natural mechanism, this principle can use molecular biology and (phage) display libraries to implement, and has the protein of optimum capacity and stability and minimum dimension with development.
People also can produce new and the support of modifying by natural protein sequence, and described sequence comprises following preferred family: the family of A-domain, EGF, Ca-EGF, TNF-R, Notch, DSL, trilobal (Trefoil), PD, TSP1, TSP2, TSP3, Anato, integrin β, Elityran, sozin 1 and disclosed herein other.The existing protein domain family of have 2 or more disulfide bond, playing the animal detoxifying function comprises following preferred family: toxin 1,2,3,4,5,6,7,9,11,12, sozin 1, defensin 2, Cyclotide, SHKT, de-connect albumen, muscle poison
Figure A20068003404900561
, γ-Thioneins, conotoxin, mu-conotoxin, ω-Atracotoxins, δ-Atracotoxins and other families of herein listing.The support of modifying may be different from natural support in the following areas: the cysteine number, the disulfide bonding pattern, spacing, size/length from first cysteine to a last cysteine, ring structure (having different fixedly residue or size), the ions binding position (has different positions, aminoacid is formed and the ion specificity), the performance correlated characteristic (comprises safety, non-immunogenic, more be similar to the mankind, more be not similar to the mankind, temperature stability, protease stability, hydrophobicity index, the percentage ratio of hydrophobic amino acid, preparation nature such as eutectic point, high concentration, there is not specific residue, rigidity, disulfide bond density, library residue percent, the complexity of disulfide bonding pattern etc.).
In some cases, the subfamily that reflection exists with natural multiformity is useful, this can be undertaken by the multiple-length variation (the general use divided other oligonucleotide) that comprises specific ring design in same support library, every kind is used for different subfamilies, and length and sequence difference between the reflection subfamily.
In some applications, it may be useful producing the improvement variant that has support now.For example, the neomorph of ldl receptor A type domain (" A-domain ") or EGF domain can produce by multiple conservative relatively method, may produce than original improved support.There is several different methods can modify variant, comprises the motif of putting upside down independent cysteine motif (comprising spacing) or A-conservative domain residue (comprising non-cys), by the N-end is switched to the C-end.Shown that it is feasible putting upside down for some little peptide, had only a small amount of aminoacid to be reversed in the case.Other modifications comprise change protein length (shorter or longer), outside the length range that drops on the protein domain in disclosed library or the native sequences, the calcium binding site is moved to not on the same group the ring, and change one or more fixed non-cys residue in the ring.If fixed residue is D, then purpose is to obtain non-D residue in this position.Realizing this purpose and detecting a large amount of specific amino acids site is that a kind of good method of new compositions is to use a kind of codon, this codon provide with naturally occurring aminoacid or open library in the ispol of the mixture opposite (i.e. complementation) that uses.If contain I, L, V a position in the open library, then can obtain new motif by all 20 seed amino acids except that I, L, V are provided in this position.Each position needs different to aminoacid of structure at it, even also be like this to function.
The support library also can be used to find the better variant of existing stent sequence motif.People can seek
Figure A20068003404900571
At the following support that is better than known brackets aspect one or more: different disulfide bonding patterns, and/or different disulfide bond spacings and/or different ring sequence motifs, and/or the difference of fixed ring residue, and/or the position of different calcium binding sites, lack or aminoacid is formed or the ion specificity.
Those skilled in the art will know that how these principles are applied to A-domain support in addition, comprise the EGF of domain family, Ca-EGF, TNF-R, Kunitz, Notch/LNR/DSL, the Trefoil/PD/P-type, TSP1, TSP2, TSP3, Anato, integrin β, Elityran, toxin 1,2,3,4,5,6,7,9,11,12, sozin 1, defensin 2, Cyclotide, SHKT, de-connect albumen, creatoxin, γ-Thioneins, conotoxin, mu-conotoxin, ω-Atracotoxins, other family that lists in δ-Atracotoxins and the table.
Example by the new support of the deutero-modification of A-domain comprises that having the non-natural sequence (and is less than 0 aminoacid) protein domain, it contains sequence C 1(xx) xxEDsxDxC 2DxxGDC 3XWxx[ps] xC 4(xx) xxxC 5XFxxx (xx) C 6Add the another one disulfide bond.Have a large amount of 4-disulfide bond domains to be similar to (for example) 3-disulfide bond A-domain, but rigidity is stronger, because they have an extra cysteine in the position of the A-domain structure of stablizing relative flexibility.An example is a 1-8 2-4 3-6 5-7 bonding pattern, and it contains the 3SS folding (1-32-5 4-6) of A-domain, but stablizes it with a disulfide bond on A-domain sequence either side, has repaired crucial structure weakness thus.The high-quality 4-disulfide bond of other of A-domain form has: 1-5 2-4 3-7 6-8,1-32-6 4-8 5-7,1-4 2-7 3-6 5-8,1-4 2-7 3-6 5-8 or the like.Size should be similar to the A-domain, long just several amino acid (2-12 preferably is less than 8 aminoacid).This identical analysis and scheme can be used for every other 3-disulfide bond family, also can be used to have the 2-and the 4-disulfide bond family of following ordinary construction:
Protein domain (have the non-natural sequence, and be less than 50 aminoacid), it contains sequence C 1X (xxx) xFxC 2Xxx (xxx) C 3Xx (xx) xxxC 4DGxxDC 5XDxSDE (xxxx) xC 6, at C 1And C 6Between have and surpass 36 aminoacid.
Protein domain (have the non-natural sequence, and be less than 50 aminoacid), it contains sequence C 1X (xxx) xFxC 2Xxx (xxx) C 3Xx (xx) xxxC 4DGxxDC 5XDxSDE (xxxx) xC 6, at C 1And C 6Between have and be less than 32 aminoacid.
Have the non-natural sequence and be less than 50 amino acid whose protein domains, at C 1And C 6Between tool
Figure A20068003404900581
The 1-3 2-5 4-6 that three disulfide bond connect and surpass 36 aminoacid.
Protein domain (have the non-natural sequence, and be less than 50 aminoacid), it contains sequence C 1X (xxx) xFxC 2Xxx (xxx) C 3Xx (xx) xxxC 4DGxxDC5xDxSDE (xxxx) xC 6, at C 1And C 6Between have and be less than 32 aminoacid.
Have the protein domain (being less than 50 aminoacid) of non-natural sequence, it contains sequence C 1(xx) xxxxxxxxC 2XxxxxC 3XxxxxxC 4(xx) xxxC 5Xxxxx (xx) C 6(the A-domain of putting upside down).
Protein domain (have the non-natural sequence and be less than 50 aminoacid), one of aminoacid that lines out below does not wherein exist:
C 1x[ aps](x)[ ekq] FxC 2xxxx(x)C 3[ilv][ ps]xx[ lw][ lrv]C 4 DG[ dev][ pnd]DC 5x D[ dgns]SDE( aps)( lps)xxC 6.
The different manifestations of same procedure is (to have shown 3 kinds of different motif levels; Line out below the change that needs):
C 1X (xx) xxx Non-FXC 2Xxxx (xx) C 3XxxxxxC 4Xxxx Non-DC 5X (x) xxx The non-E of non-D(x) xxxC 6Or
C 1X (xx) xxx Non-FXC 2Xxxx (xx) C 3[ Non-ILV] [nonPS] xxxxC 4 Non-DnonGXx Non-DC 5X (x) Non-Dx The non-E of the non-D of non-S(x) xxxC 6
Has the folding protein domain of (non-natural sequence and) Selenocosmiahuwena neurotoxin (Huwentoxin) II, it is a kind of spider venom, it has identical bonding pattern with the A-domain is folding, but has very different cysteine spacings and irrelevant fully protein sequence.
The domain family that does not contain repetitive sequence.This classification mainly contains the zootoxin support and derives from the cell surface receptor support.Archon in Serpentis, Aranea, Scorpio, spiral shell and the sea anemone venom can be considered to naturally occurring injectable bio-pharmaceutical.These venom generally contain and surpass 100 kinds of relevant and the different toxin that have nothing to do, and have multiple receptor and species specificity.Most of toxin are the small protein matter of high disulfide bond density.Typical size has 25-45 aminoacid of 3 disulfide bond for having 15-25 aminoacid of 2 disulfide bond, has 35-50 aminoacid of 4 disulfide bond, and has 5,6,7,8 or many examples of more disulfide bond.Example has δ-Atracotoxin (1-4 2-6 3-7 5-8), charybdotoxin (1-82-5 3-6 4-7), ω-America spider venom (1-4 2-5 3-4 7-8), not Shandong charybdotoxin (1-5 2-6 3-4 7-8) and J-Atracotoxin (1-4 2-7 3-4 5-8).
Phylogenetic analysis shows that these protein are examples of convergent evolution, and irrelevant animal groups begins to produce independently similar toxin structure from irrelevant starting point.Suppose identical design principle at least seven independent situations (each is in irrelevant sorted group) successfully, other supports that this expected design ratio is used for making up other types toxin (being the microprotein toxin) have important advantage.
The total unique feature of these protein is high disulfide bond density.These proteinic aminoacid sequences (except that cys) are alterable height (seeing the conotoxin comparison), and have produced the different structure (protein folding) of wide region.
The character of these proteinic expectations is especially little sizes; Microprotein is minimum rigidity protein, and this is that the tissue infiltration is required fast.Second common trait is their rigidity, is higher than other protein of similar size, and allows these protein to avoid allowing higher binding affinity combine the back induced-fit with target.The 3rd character is these proteinic special stability, heat stability (most of microbe protein can boil and invariance) and to the resistance of multiple protein enzyme.Multiple native protein works as protease inhibitor.Stability is important for the bio-pharmaceutical of vein (IV) or subcutaneous (SC) injection, for percutaneous, per nasal, per os, through intestinal or menses brain barrier administration even more important.Stability is for long preservation and to transport easily and store also be important.Significant another one character is these proteinic non-immunogenics, has reported by them in antigen-presenting cell (APC) proteoclastic resistance to be mediated, and is provided by high disulfide bond density structure according to open.Other factors that keep reduced immunogenicity are proteinic small size and their hydrophobicity.
The domain family of containing repetitive sequence also can be used to produce microprotein of the present invention and library thereof.Among the following embodiment a large amount of examples have been described.
The domain family of containing repetitive sequence: the repetitive proteins matter (CRRPs) that is rich in cysteine: the homocysteine content that is rich in the repetitive proteins matter of cysteine allows to form a plurality of disulfide bond in repetitive and/or between two repetitives.This produces the disulfide bond repeat pattern.This pattern provides fixed topology, although identical sequence may take (perhaps may evolve to taking) to substitute the disulfide bonding pattern under few situation.Disulfide bond in the repetitive proteins matter is characterised in that CRRP motif (X A1, X A2)/(X B1, X B2)/(X C), X wherein ABe the cysteine distance between the cysteine that connects, this is the cysteine number between first cysteine and second cysteine in the same disulfide bond.This cysteine distance can be 1,2,3,4,5,6,7,8,9 or 10.In the CRRP motif two (or
Figure A20068003404900601
Individual) key of two different (or multiple) types of numeral, X A1Represent the key that first is such, X A2Represent second disulfide bond.For example, having the topological CxCxCxCxCxCxCxC of 1-42-3 and be+3 for the cysteine of first disulfide bond type distance, is+1 (' 3,1 ') for the second disulfide bond type.
X BThe expression disulfide bond cysteine to the cysteine between first cysteine of next disulfide bond apart from (cysteine number) (for example for having the topological CxCxCxCxCxC of 1-42-3, X BBe+1).In the situation of two dissimilar disulfide bond, X B1The cysteine distance of expression from first cysteine of one type disulfide bond to first cysteine of adjacent disulfide bond, X B2The cysteine distance of expression from first cysteine of second type disulfide bond to first cysteine of next disulfide bond, next in this case disulfide bond is arranged in next repetitive sequence.In this embodiment, X B2Be+3 (from C2 to C5), but it can be 1,2,3,4,5,6,7,8,9,10.X CThe disulfide bond number of every circle spiral in the expression spiral repetitive proteins matter, it can be 1 mark, or integer, as 1,2,3,4,5,6,7,8,9,10.
Each domain typically (but not necessarily) contains a distal end cap at N-and/or C-end.This distal end cap typically has one or two than conventional repetitive sequence cysteine still less, because they only are connected with a repetitive sequence, rather than is connected with two repetitive sequences.
" span " (two continuous cysteine between the amino acid whose number of non--cys) that comprise every kind of disulfide bond type in this protein about the more detailed description of repetitive proteins matter.Another mode of describing repetitive proteins matter is to describe the sequence of repetitive, for example (CxxxCxCxxxxCxxCCxx) nC aAnd C bCan be used for representing which cysteine connects, for example (C aXxxC aXC bXxxxC cXxC bC cXx) n
A key character that is rich in the repetitive proteins matter of cysteine is that they can be at arbitrary end promptly in N-end or the terminal extension of C-.Two kinds of library methods for designing are 1) randomization of naturally occurring repetitive proteins matter, with 2) synthetic repetitive sequence, they obtain by abstract from natural repetitive proteins matter typically, and may have different slightly and spacing natural repetitive sequence (betterization).Naturally occurring CRRPs comprises the fruit bat domain (PF05444) of granulin (PF00396), insecticide antifreeze protein (PF02420), furin-spline structure territory (PF00757), CxCxCx repetitive sequence (PF03128), paramecium surface antigen (PF01508) and unknown function.
When wishing, protein and/or the support that contains cysteine of the present invention can merge with the biological respinse trim.The example of reaction trim include but not limited to fluorescin such as green fluorescent protein (GFP),
Figure A20068003404900611
Intracellular cytokine or lymphokine are as interleukin-2 (IL-2), interleukin 4 (IL-4), GM-CSF and gamma interferon.Another useful fusion sequence is the sequence that is beneficial to purification.The example of this sequence is known in the art, and comprises the sequence of the following epi-position of encoding, for example Myc, HA (deriving from influenza virus hemagglutinin), His-6 or FLAG.Other fusion sequences that are beneficial to purification derive from the Fc part of protein such as glutathione S-transferase (GST), maltose-binding protein (MBP) or immunoglobulin.
Library construction: the invention provides the library that contains the cysteine support of the present invention.The protein of experience natural selection needs evenly folding, and the protein with sequence new, that do not evolve can be folded into multiple stable structure in principle, perhaps induces like this by the change condition at least.The difference of same protein sequence copy is folded into different rock-steady structures makes the structure diversity in library expand number above independent cloning in this library to.The number of independent cloning is generally equal to not homotactic number in the library, is called as " library size ", is approximately 10 for phage display library 10Yet the actual number of the phage particle that uses when the elutriation phage library is generally big 10 to 10,000 times than the library size.The multiple that surpasses is called as " library equivalents ", has several method to utilize this species diversity to obtain bigger library performance.If each (promptly all have same acid sequence) among clone's the 10-10,000 copy takes different, stable DBP and structure, then structure diversity may substantially exceed sequence polymorphism (10 11-10 14).Use the interim unsettled structure of different structure that adopts and further to improve structure diversity.Yet if each phage particle is showed unsettled protein, multiformity can improve further, and it takes multiple structure, is similar at random peptide and has confers similar advantages and shortcoming.Can adopt the protein of a large amount of unstable structures multiformity can be expanded to above phage particle number (10 12-10 15).Low-affinity clone's recovery may need a large amount of library equivalents (for example to reclaim the clone with about 100 library equivalents, organic efficiency is 1%), the high-affinity clone reclaims and tends to 100% effective (confirming by affinity chromatograph), improves structure diversity expection can the raising greatly high-affinity clone's ratio.Use unsettled structure to improve structure diversity and have a balance because target in conjunction with after need be in the protein of showing inducement structure (inducing the cooperation of conjugated protein, may not be target) the expection reduction these clones' binding affinity.
A kind of method is (can reach 2 disulfide bond with 4 cysteine, with can reach 3 kinds of bonding patterns), 6 cysteine (can reach 3 disulfide bond, with can reach 15 kinds of disulfide bonding patterns), 8 cysteine (can reach 4 disulfide bond, with can reach 105 kinds of bonding patterns) or 10 cysteine (can reach 5 disulfide bond
Figure A20068003404900621
Can reach 945 kinds of bonding patterns) or 12,14,16,18,20 or even more cysteine make up the library.
In one aspect, the sum of disulfide bonding pattern can be summarized according to following formula:
The number of the disulfide bond that forms by cysteine residues of n=prediction wherein, wherein
Amassing of representative (2i-1), wherein i is the positive integer of 1-n.
If desired, can produce bigger construct, the big still cysteine of variable (being 10-30) of its coding number.The cysteine product that contains that obtains can be folding with the different modes of variedization, the different combination that produces structural detail, and each contains 2,3,4 or 5 disulfide bond, and between them, have potential crosslinked.In orthogenesis processes of these big constructs, people can destroy the construct of selecting in the past and be less sheet, for example by with crusher machine, PCR (for example using random primer) or (for example 4bp) restrictive diges-tion.In case long proteinic library multiformity reduces, people just can improve multiformity by reorganization or other directed evolution methods subsequently once more by produce multiple fragment from each big construct.
A kind of possible consideration in the proteinic this library of HDD is to have unpaired cysteine after most of disulfide bond form.Free sulfenyl can be interact with each other, because they combine with the multivalence of target, produces the aggregation that tends to secure satisfactory grades in blocking test.Yet these free sulfenyls can be closed, and for example, use iodoacetamide or other well-known sulfydryl sealers to prevent that them from forming aggregation or attacking the correct disulfide bond that forms.
The consensus sequence of a plurality of microprotein family and the disulfide bond of similar number (as, three disulfide bond produce 15 kinds of possible connection modes) comparison show that the spacing between the cysteine forms the distribution about equally of scope from 0 to 12; For average ring length simple and that maintenance is less, ring has 0-10 amino acid whose family between our preferred each cysteine.
Use synthetic oligonucleotide, people can make up the library, make between the dna encoding cysteine 6 cysteine and 0-10 NNK (or similar ambiguous codon) residue in the ring.NNK codon whole 20 seed amino acids of encoding are termination codon (than using low 3 times of NNN codon) but have only 1/64 codon, and this proteinic ratio that causes containing the premature termination codon reduces.Suppose between 5 cysteine and encircle that these protein will contain average 25 NNK codons (supposes each ring 0-10
Figure A20068003404900631
Aminoacid; Average 5), the clone who causes low ratio with premature termination codon.Use number to be lower than ambiguity 10 or that do not comprise termination codon (mixing base composition) codon and can improve the ratio of whole protein.As shown in drawings, each oligonucleotide starts from and ends at cysteine codon (for justice is arranged, being antisense at the other end at one end), has 0-10 NNK codon (or adversative) between the cysteine codon.In this method for preparing synthetic library, all ring sequences can be used at ring position arbitrarily, so all cysteine are generally by identical codon coding.All oligo are mixed, and people .1995.Gene such as () Stemmer produces one by overlapping PCR and is combined into gene as mentioned previously.
The Scholle version that a kind of method different and that effectively produce phage library is Kunkel mutation (Scholle, people such as M.. (2005) Comb.Chem.﹠amp; HTP Screening 8:545-551), wherein the oligonucleotide of encoded libraries makes the termination codon in the plasmid be converted into the nonterminal codon.A kind of new model of this method is included in circulation repeatedly between any two termination codoies (generally being amber codon and ochre codon).This allows the clone aggregation of Scholle method cycle applications in evolution, and need not insert termination codon again after each mutation circulation.
3SS (3-disulfide bond; 15 kinds of possible structures) and 4SS (105 kinds of possible structures) blended support library particularly useful.We are the spacings of cysteine to the major control of disulfide bonding pattern.For example, can control the structure (disulfide bonding pattern, " DBP ") that protein adopts to a certain extent by certain folding again environment is provided.DBP can analyze by trypsinization and/or MS/MS.
The problem of structure diversity is all similar for many-support library and single-side stand library, and the difference of degree can be regulated continuously.In fact, have seriality based on the design of the library of cysteine spacing, they can change (each encircles an average 0-15 aminoacid) more or less, and are similar to the natural family of existence more or less.Significant length variations (simulation natural variation) is generally also contained in the single-side stand library.Attention family produces by sequence similarity, and generally only a few member is determined by experiment structure (bonding pattern), and therefore the native sequences of possibility significant number has and the different structure of inferring according to sequence.Expect (being high information content) sequence that natural height is evolved, highly fine setting unidirectional-folding reliably usually, but the protein of low information content, low fine setting (as in the preprophage display libraries and/or after an elutriation circulation with before the orthogenesis, derive from the structure diversity library ) show usually several different folding.
Based on the library of the conservative support of specific natural protein families,, typically contain (being inhomogeneous folding, not folding, accumulative or relatively poor expression) clone of about 5-10% with variety of issue as Ig domain or fibronectin III.Improve the length multiformity or allow higher sequence and structure diversity can produce the worse clone of performance.Usually sift out unwanted monomer, carry out other mutation circulation afterwards, comprise the polymer of preparation dimer and higher level.Yet orthogenesis tends to make non-best clone's performance very effective aspect better, and by eliminating the clone and/or change and/or by structural change by sequence, and people can improve clone's crowd average quality gradually by orthogenesis.To the activity improved thereby to the folding orthogenesis screening that improves may be the active easy method of a kind of improvement, active orthogenesis is a kind of protein folding efficient (Leong that is proved, effectively obtains raising, S.R., wait the people. (2003) Proc.Natl.Acad.Sci USA 100:1163-1168; Crameri, people such as A.. (1996) Nature Biotechnology14:315-319) and the method for the temperature stability (many disclosed embodiment) that improves.Reason is to adopt the clone of active structure to show that more effectively activity is higher so help screening technique.We at method be a kind of like this method, severally take turns elutriation and produce and manyly have a multiple folding while and may have the clone of high-caliber variety of issue (incomplete folding, uneven folding, low expression, gathering etc.) wherein at first, orthogenesis (multiple possible form, comprise fallibility PCR, homologous recombination, based on the reorganization of box or multi-turns screen just) will help having the clone who evenly folds consumingly with the applied in any combination expection of selecting by the powerful of (phage) elutriation.Also may repeatedly reduce, folding and the same library of elutriation (adopting or not adopting the phage amplification) more again, to improve evenly folding clone's frequency.The free sulfhydryl groups affinity column can use in each circulation, is used for removing not exclusively folding protein, and perhaps free sulfhydryl groups can add medicated cap agent (FITC-maleimide, iodoacetamide, iodoacetic acid, DTNB etc.) reaction with various.Also may fold whole library or partial reduction again and reoxidize, to reduce the frequency of free sulfhydryl groups.Phage display and soluble protein help multivalence solution usually in conjunction with measuring.Have that the protein of disulfide bond is the common source of polyvalency between protein, need remove, because they can not prepare.A plurality of circulation phage displays (the not interrupted soluble protein of measuring) tend to only evolve to the effective solution of phage.Therefore expectation screening soluble protein stops these clones to be taken over usually.The multiformity of protein structure is useful in early days, but wishes to remove gradually the clone who forms disulfide bond between protein.The existence of disulfide bond is relevant between structure diversity and indecisive folding and protein, structure evolution can with uneven folding separate, therefore need develop the method that tolerates inhomogeneity to a certain degree.
For the balance of desired structure diversity and folding homogeneity being estimated different library designs, people can prepare little library and screen a limited number of clone (30-1000), so that the multiformity of quick evaluation libraries design.
Disulfide bond different in the same protein may differently react, and allow some controls.A kind of method with clone of disulfide bond between protein of removing from phage library is to make between low-level only the most weak disulfide bond of reduction of this phage library contact such as protein the Reducing agent of disulfide bond in the disulfide bond and protein, since these disulfide bond too a little less than, we preferably eliminate these clones, the library that makes partial reduction then is by the free sulfhydryl groups post, to remove these clones.
The proteinic structure evolution of HDD
As mentioned above, HDD protein is adapted at the protein structure of evolving on each level, comprises one-level (sequence), secondary (alpha-helix, beta sheet etc.), three grades (folding, disulfide bonding pattern) and level Four (with other combination of proteins) structure level.The ability that changes tertiary structure fully makes HDD protein be suitable for the appropriate design of therapeutic agent or pharmaceutical composition most.(alpha-helix, beta sheet) may produce with existing directed evolution method although limited secondary structure is evolved, and being to use orientation and rational design to produce high-quality modification in tertiary structure is difficult in practice.
Add by disulfide bond and to evolve to 3SS and to evolve to 4SS, and be reversed by deletion from 2SS, as if frequent the generation, also to Serpentis de-connect albumen obtain proof (Calvete, people such as J.J. (2003) Biochem.J.372:725-734).The structure again of DBP may also take place in the dependency prompting nature of the DBP of nature family, and this obtains the support of the open file of specific family such as somatomedin.
15 kinds of different 3SS structures, 105 kinds of 4SS or 945 kinds of 4SS structures are different on topology, and the meaning is if without fracture with form disulfide bond again then they can not transform mutually.Each 3SS albumen has the isomer of 6 (fully) disulfide bond-bondings, they are that (2 disulfide bond have the bonding pattern of change to " arest neighbors " variant, 1 disulfide bond has the bonding pattern of maintenance), each 4SS albumen has the variant of the arest neighbors of 12 kinds of isomeries, each has the disulfide bond of 2 kinds of maintenances and the disulfide bond of 2 changes, therefore the progressively path that produces structure evolution.
The process that structure direction is evolved comprises and excites multifarious structure (and not all be possible, and frequency is with difference) at first, phases down structure then, and part modification structure (promptly by DBP change) gradually, selects the coalition of becoming better and better simultaneously.Beginning diversified structure is in order to enlarge the number that effective library size surpasses the different aminoacids sequence.Yet diversity structure is high more, their folding heterogeneity more, so these protein are useful to become for folding common significant evolution that need of homology.Structure with ring length of optimization will be more folding, and protease resistant is stronger, and immunogenicity is lower.Except the ad-hoc location of chance, the ring sequence does not influence tertiary structure, and ring tends to not contain secondary structure.
A kind of method of preferred optimization ring length is from long relatively ring (as, 6,7,8 aminoacid) beginning, reduces its length then gradually, each ring is replaced with other different big circlets (having lower mean size) of certain limit.This method is similar to tightening up of kink.The position of ring is conservative constant (being C2-C3) generally, but their position may change, when particularly a plurality of little binding site in protein is useful scheme.
A kind of preferable methods be with the ring among the clone of a group selection (promptly encircle C1-C2, C2-C3, C3-C4, C4-C5, C5-C6, C6C7 or C7-C8, C8-C9, C9-C10) replace with before from the unselected one group of new ring that is mainly random sequence.Different cysteine is used different codons, use the base of a few fixed that is positioned at the cysteine flank in case of necessity, can produce the PCR site,, perhaps can use the restriction site method in the overlapping reaction of PCR, to encircle exchange (preferably).
Select with the bonded set of protein target in different clone may combine with the different loci on the protein.Even they use similar sequence to combine with same loci, the clone also may be different on its record, and for example, some is cloned in the ring 1 has bioactive sequence, and some is cloned in the ring 5 has bioactive sequence.Having how fixed aminoacid may will cause the clones with identical recordings more, and this is favourable for the orthogenesis by homologous recombination.
There is several different methods to recombinate to the clone's that selects set.In most of forms, environmental protection is held complete, and changes toward each other, but the form of utilizing the interannular homology to drive homologous recombination is also arranged.Usually each environmental protection is held in (as C4-C5) on the identical position, but even this also may change.In some form, all rings in selected clone's set are untied, and then connect, but more conservative method is only to untie a specific ring (as C4-C5), keep other rings to connect simultaneously, produce the clone library that has only 1-2 exchange point rather than a plurality of exchange points.Purpose is to produce many different progressively paths, and this needs the arrangement of many conservative changes.
Not that preparation has many folding libraries or has only a folding library, we have limited variable library at preparation on the interval, it is designed to allow to select the structure of negligible amounts (is that lower bound is 2,5,10,30,100,300, height is limited to 10,30,100,300,1000,3000), because their bonding pattern causes rigid structure or takes place in natural family, provide about best cysteine details at interval.An example is cxxx (x) cxxcxxxx (xx) cxxxcxxx (x) xxcxxxx (x) cxxxc.
Effective multiformity in library and quality all are very important, but tend to have opposite design needs.Quality depends primarily on the correct clone's who folds ratio.Open the theoretical multiformity (how randomized amino acid sites) in library and tend to improve non-folding clone's ratio.Improve folding step and be included in the maintenance that each amino acid sites uses natural amino acid and the residue of guarding naturally.For this easy realization of single-side stand library, but then be not easy, therefore the non-folding clone that must have higher proportion for many supports library.Folding fixing required lucky 2 aminoacid of randomization, folding clone's ratio reduces by 400 times, has reduced effective library size.
Produce various libraries and determine that by the ratio of using the FITC-maleimide to measure the residual ionization sulfydryl folding clone's ratio is useful (reaction, bonded FITC is measured in washing).In addition, it may be useful using the solid support with free sulfhydryl groups to remove not folding clone and/or folding more whole library or not folding clone.A kind of method is to make the library be exposed to the Reducing agent of certain level, and this Reducing agent expection reduction part or relatively poor folding protein are not stablized folding protein but do not reduce.
Yet relatively poor library design will have quite low-level folding clone.A kind of method is to make up a plurality of single-side stands library respectively, and mixes these libraries before elutriation.This will produce high-quality, multifarious library.
If suitably handle, inhomogenous folding should be useful.Because the size in conventional library is 10 8-10 9, people can produce about 10 13Phage particle, each sequence is by 10 4-10 5Individual granule representative.Carry out elutriation, make its 100% effective (that is, catching 1nM or better clone) at every turn, each sequence shows as 10 then 3Individual different structure is very beneficial for effective multiformity and hit rate and quality.Effectively elutriation need target, the rising of phage, the high concentration of high concentration temperature (balance faster),
Figure A20068003404900681
Long-pending eliminant such as 10-15% Polyethylene Glycol (PEG), with respect to fixed target target solubility target, or the like.
Proteinic correct folding in order to help, a kind of method can be in the presence of volume eliminant such as PEG folding (the most at the beginning), and this significantly improves the efficient (the overlapping PCR of compound fragment) of oligonucleotide hybridization rate and reorganization reaction.PEG only improves the valid density of mercaptan, causes more reaching interchain disulfide bond in the multichain.
Usually, separate folding clone and do not expect, and inhomogenous folding expected.Separate folding and inhomogenous folding obviously closely related.Target is inductive separates the folding particularly useful of folding clone, but may seldom take place.Because effective library size in expection combined support library reduces, preferred usually effectively mutation strategy.People can select reorganization or length variations and point mutation.The reorganization that derives from the sequence of random library may be difficult.For short gene, fallibility PCR has the error rate of quite low (0.7%), and needs clone again.Resynthesis need and be cloned the resynthesis in the cloning and sequencing selected and library again.In addition, in order to support to be beneficial to correct folding clone, people also can carry out a plurality of circulation elutriations and amplification to colibacillary mutator.In addition, people can use the method for Evogenix.
The captivation of 2-3-4 method is that it has increased random sequence in each step by PCR, and does not need other forms of mutation.Microprotein can be made up by new or existing peptide part or protein fragments.This method is used has or does not have the short amino acid sequence in conjunction with character that is pre-existing in.Binding amino acid sequence at one end or the flank at two ends can be at random or the aminoacid sequence of the single cysteine of fixed coding.Oligonucleotide be designed to encode binding sequence and flank cysteine coding DNA.The new cysteine of introducing randomly can be at random or the flank of nonrandom sequence.Double-stranded DNA is mixed, assembled and be converted into to all changes that will contain the cysteine flanking sequence.The sequence of these assemblings randomly can be positioned at coding restriction enzyme enzyme recognition site or with the flank of the annealed DNA of DNA sequence that is pre-existing in.This method can produce new or existing cysteine distance mode.
Be rich in the repetitive proteins matter (CRRP) of cysteine
The verified repetition antifreeze protein that is rich in cysteine from beetle tenebrio molitor (Tenebrio molitor) can C-is terminal extends (C.B.Marshall waits the people. (2004) Biochemistry, 43:11637-46).This extension contains CRRP motif 1/2/1.Systematically use the extreme systematicness that still contains (β-spiral) antifreeze protein of beta sheet of spiral, to test the relation between freeze proof activity and the ice binding site area.Each 12-of β-spiral center convolution amino acid whose, disulfide bonding contains Thr-Xaa-Thr ice binding motif.By on seven spiral parent antifreeze proteins, adding spiral or deletion spiral, prepared the construct of a series of 6-11 of having spiral.By icing the misfolding form that the affinity purification is removed these antifreeze proteins, with the specific activity of accurate each construct of comparison.Freeze proof activity increases 10-100 doubly behind the spiral from 6 to 9, and this depends on the concentration of comparison.
Our interest is that preparation has the deutero-protein of a plurality of multiple antifreeze proteins, at minimum conservative amino acid sites randomization, and is used to select bonding agent (agonist or antagonist) at the human therapy target of selecting.
Granulin (Figure 102 and 103) is naturally occurring CRRP (spiral is referring to Figure 130-132) with CRRP motif of 3/2/2.Proposed evidence and shown that each repetitive has the character of altitude module, therefore can be used for by to C-terminal add a plurality of repeat to extend core cell (D.Tolkatchev waits the people. (2000) Biochemistry, 39:2878-86; W.F.Vranken waits the people. (1999) J Pept Res, 53:590-7).After air oxidation, be formed on observed disulfide bond pairing in the native protein corresponding to the peptide of the terminal substructure of the 30-residue N-of Cyprinus carpio granulin-1 is spontaneous.Use NMR to carry out structural characterization and be presented at the secondary structure that existence is determined in this peptide.The Structure Calculation of peptide show fragments of peptides take with native protein in the identical conformation that forms.The N-terminal peptide of the 30-residue of Cyprinus carpio granulin-1 is first example that folding two β-hair clip of the independence of disulfide bond reinforcement between two hair clips is piled up.
Our interest is that preparation has the deutero-protein of a plurality of multiple granulins, and it is at least conservative amino acid sites randomization, and be used to select human treatment's target at bonding agent (agonist or antagonist) (Figure 102).
The advantage of repetitive proteins matter structure and affinity maturation: CRRP is that they can be prepared as the length that special-purpose needs, and is different with other domains of great majority.Therefore, they can obtain 1,2,3,4,5,6,7,8,9,10 or more binding sites to identical or different target.
CRRP with respect to be rich in leucic and other advantages that contain non-cysteine repetitive proteins matter be more aminoacid can be in a library randomization, because the folding existence of depending on the existence of disulfide bond rather than depending on hydrophobic core of CRRP, the latter needs how fixed residue.Therefore the have more mutable sites clone of (>50,60,70 or 80%) is contained in the CRRP library, and this has improved possible surperficial contact area and to the ability of the high-affinity of target considerable scale.Be rich in leucic repetitive proteins matter, as
Figure A20068003404900701
Albumen, general changes in 6 aminoacid, per 33 amino acid whose repetitions, 24 aminoacid in perhaps per 6 repetitive structure territories are not because end cap is randomized.
Various affine maturation methods show in Figure 140,14,142 and 160.These affinity maturation principles are explained best with repetitive proteins matter, but are applicable to the described every other support of the application similarly.
The affinity maturation of CRRP can be realized by two different strategies: module is added and the module displacement.
The repetitive (for example 1-3) that " module additive process " starts from relatively small amount adds randomized repetitive in each step of affinity maturation, selects coalition then.In each evolutionary circulation, add one or several new, randomized module, select the active clone of tool then.This method has improved proteinic size in each circulation, takes turns the extension back at each simultaneously and selects required combination active.This method is converted into randomized sequence the sequence of selection.
" module displacement method " starts from relatively large repetition (4-10 for example; " final number "), each that produces in the library is taken turns one group of new repetition of randomization (being generally 1-3), selects the target combination then.In the method, proteinic size remains unchanged.Unselected sequence (generally being fixed) gradates and is randomized sequence, and the latter is converted into the sequence of selection again.
These two kinds of methods all produce the repetitive proteins matter with single big binding site or a plurality of separating and combining site, select according to the binding affinity to 1,2,3,4,5,6 or more targets that improves.Add and to repeat to allow to extend binding site, cause and compare in conjunction with the domain of its target, improved binding affinity in single site.Repetitive proteins matter domain can be connected by the short circuit head sequence that does not contain repetitive sequence with other repetitive proteins matter domains.This be with natural repetitive proteins matter in the similar repetitive proteins matter tissue found, series connection exists usually, connected by short amino acid sequence, and be scattered with non-repetitive proteins matter (people such as H.K.Binz. (2005) Nature Biotechnology).
Yet repetitive proteins matter also can be used for forming between two binding sites and be rigidly connected, thereby these sites are combined with target simultaneously.Opposite with the flexible peptide linker that in the domain that separates, exists usually, based on the higher binding affinity of body expection generation that is rigidly connected of repetitive proteins matter.The another kind of method that produces the body that is rigidly connected between binding site is to use the sequence of the proline rich of self coiling, perhaps collagen sample sequence.
Affinity maturation is undertaken by (part) randomization on dna level, at single continuous sequence or a plurality of discontinuous sequence.The randomized consecutive steps of DNA also can be discontinuous or successive (being sequential) on dna level.On protein level, mutation also can be discontinuous or successive, and this depends on application.For example, for spiral repetitive proteins matter, generally on DNA and protein chain level, use discontinuous maturation, to obtain successive mating surface in proteinic the same side.This is called as discontinuous, because randomized aminoacid is being discontinuous on α-chain backbone and on dna level, even the randomization zone is successive on protein surface.On the other hand, maturation relates to one group of successive amino acid whose randomization on dna level and protein skeleton level continuously, make all sides of spiral all be randomized, and can become the binding site of target, thereby allow between repetitive proteins matter and target protein, to take place more complicated three-dimensional interactions.In the situation of discontinuous (dna level) affinity maturation, can utilize fixed sequence program common between the randomized sequence in the library or between a plurality of libraries, to recombinate, provide the raising can be according to the another one step of clone's number of the binding affinity screening that improves by restriction endonuclease or overlapping PCR.
A kind of preferred affinity maturation method is a continuous randomization, comprise a district of (part) randomization scaffold protein for the first time, select one group of best clone, second district during this group of randomization selection is cloned then, select the best clone of (second) group again, the 3rd district of clone in this second group of the randomization, and the clone who selects (the 3rd) group to improve.This for example is presented among Figure 136.A kind of preferable methods makes three mutation districts (n-end, middle part and c-end) for non-overlapped.Can use any mutation order, but n-end/middle part/c-end and n-end/c-end/middle part are preferred selections.It is useful making the not mutation of stent sequence of 15-20bp between the mutation district, and the annealed zone that is used as oligonucleotide is used for the mutation of Kunkel-type.This method has been avoided the synthetic mutation again of the sequence of former mutation, and this is a time-consuming procedure, needs resynthesis and the generation of synthetic library newly of the oligo of cloning and sequencing, sequence alignment, family's motif deduction and these motifs of encoding usually.A kind of preferred form is to use codon to select, and makes randomization mainly be created in the naturally occurring aminoacid in each site.
Synthetic CRRP
Synthetic CRRP is by motif C ax 0-nC bx 0-nC cX 0-nC dX 0-nC eX 0-nC fx 0-nC gx 0-nC ix 0-nC ix 0-nnC ix 0-jForm, wherein C is that x can be the aminoacid of the arbitrary number between 0 to 12 between each cysteine at the cysteine residues of determining the site.These designs are limited by the CRRP motif, for example,
Figure A20068003404900721
Cysteine distance between first cysteine of cysteine distance between the individual disulfide bond and first cysteine of disulfide bond and next disulfide bond.Following motif can be used for the library design: 3/4/1, and C ax 0-nC bx 0-nC cX 0-nC dX 0-nC eX 0-nC fx 0-nC gx 0-n, C wherein aWith C dForm disulfide bond; (3,4)/(1,4)/2, C ax 0-nC bx 0-nC cX 0-nC dX 0-nC eX 0-nC fx 0-nC gx 0-n, C wherein aWith C dForm disulfide bond, C cWith C gForm disulfide bond; (4/2), (3/1), C ax 0-nC bx 0-nC cX 0-nC dX 0-nC eX 0-nC fx 0-nC gx 0-n, C wherein aWith C eForm disulfide bond, (3,5)/(1,2)/2, C ax 0-nC bx 0-nC cX 0-nC dX 0-nC eX 0-nC fx 0-nC gx 0-n, C wherein aWith C fForm disulfide bond, C bWith C eForm disulfide bond, C dWith C iForm disulfide bond; (3,5,7)/(1,2,3)/3, wherein C aWith C fForm disulfide bond, C bWith C eForm disulfide bond, C cWith C jForm disulfide bond; (4,5)/(1,4)/2, wherein Cd and Ci form disulfide bond, and Cf and Cj form disulfide bond (referring to Figure 125-133).
New CRRP can followingly design: start from containing the single structure territory family of the disulfide bond of unknown topologies, and at N-or terminal this motif that extends of C-.In order to realize that between two repetitives disulfide bond connects, and may need to introduce two other cysteine residues by direct mutagenesis.Topology 1-4 2-53-6 is rich in modal disulfide bond topology between the little microprotein of cysteine.Having repetition that relevant topology learns by interpolation can extend and have this topological domain.Site introducing cysteine residues between cysteine 1 and cysteine 2 and after the cysteine 6.Even when having two other cysteine, tend to form 1-4 2-5 3-6 topology consumingly, because structure stand only allows this topology.
Connect different structures: referring to Figure 146,147,148.The microprotein module can connect with different ways.For example, topology is that the C5C5C5C5C5C module of 1-42-53-6 does not need joint to be connected with the such module of another one, produces the C5C5C5C5C5CC5C5C5C5C5C module.Module can connect with structure PPPP joint.In addition, can utilize the replicated blocks that are rich in cysteine to connect two modules.Granulin-sample repetitive is as having common repetition motif (CC5) nJoint.Also can have 1324 topologys and motif (Cx by two 0-nCx 0-nCx 0-nC) nThe joint that contains disulfide bond realize to merge, wherein x is the aminoacid from 0 to n=12 arbitrary number.At C AAnd C BBetween the antifreeze protein that forms with disulfide bond repeat (2C A5C B3) nAs the joint between the disparate modules or connection microprotein and other protein.
The design of the synthetic repetitive proteins matter of typical case: the design naturally of repetitive proteins matter is to repeat to add solid memder on core motif allow.This method can be simulated in external evolution.Antifreeze protein contains typical 3-two
Figure A20068003404900731
The key microprotein is as the medicated cap (C of N-end aXxxxxC bXxC cXxxC dXxC eXxC fXxxx).Can utilize molecular biology the part of this structure to be added to the C-end of this sequence.Select repetitive to have two kinds of probability: xC bXxC cXxxC dXxC eX or xxC bXxC cXxxC dXxC eXxC fX can be added continuously to the C-end, to design new repetitive proteins matter.Referring to Figure 104.
Design based on the synthetic support of CXCXCCXCXC motif: the motif of being made up of following marked graph: Cxxxxxx (xxxxxxx) Cxxxxxx (xxxxxxx) CCxxxxxx (xxxxxxx) Cxxxxxx (xxxxxxx) C is contained in multiple microprotein family, and its disulfide bond topology is 1-4 2-5 3-6.This common consensus sequence is used for the library design.Can comprise other cysteine and disulfide bond at interval.Spacing average out to 13-15 between each disulfide bond.Extra cysteine except that alkaline motif is to representing that with blue or green italics the cysteine of connection has identical color.
Figure A20068003404900732
(conotoxin) CxxxxxxCxxxxxxxxCCxxxxxCxxxxxxxC
(toxin 30) CxxxxxxCxxxxxxCCxxxxxCxxxxxxCxxx
(GURMARIN)CxxxxxxCxxxxxxCCxxxxCxxxxxxxxxCxx
Figure A20068003404900733
1-4 2-5 3-6 Other SS
Toxin
12 13 12 17
Conotoxin 15 15 14
Toxin 30 14 13 13
GURMARIN 14 12 15
Toxin 7 15 13 15 6-7
Chitin BDG 14 11 13 7-8
Wild grey albumen 14 13 16 5-10,7-8
Toxin 9 15 15 15
On average 14 13 15
It is that 6,5,0,3 member and 57 spacings are that 6,5,0,4 member and 34 spacings are that 6,6,03 member and 27 spacings are 6,6,0,4 member that the Swissprot data base is contained 44 spacings.Last spacing (between Cys5 and Cys6) can from 4 to 6 aminoacid wait).
Cysteine distance mode (CDP): the most frequently used method of family that nature protein is classified as is based on the protein sequence homology.The dependency that the purpose of these algorithms is based on protein sequence divides into groups it, and this dependency has in most of the cases reflected evolutionary distance.These algorithm aligned sequences make the number maximum of the identical or chemical related amino acid of each site coupling.Usually, introduce breach and improve comparison.This sequence family based on homology is usually used in evaluation and can allows significant protein to change, therefore can be as the protein scaffolds on new conjugated protein basis.Yet, have limited application based on the family of homology for design, because the conservative program of the sequence between the related microorganisms protein is lower based on the library of microprotein.The usually total few sequence homology of the sequence of closely-related microprotein, rather than their cysteine residues is conservative.Make the comparison of microprotein sequence complicated by introducing breach based on the searching algorithm of homology, this for identify the residue that can suddenly change and for protein structure and/or stability vital residue be crucial.Microprotein and other proteinic different high cysteine residues density that are them of great majority, and this group need be cysteine spacing ordering the comparison method of key parameter, the permission people with microprotein be grouped into have identical cysteine distance mode (CDP) bunch.Therefore, cysteine distance bunch is a histone matter sequence, and they have the separated cysteine residues of several aminoacid by similar number.All members' of comparison cysteine distance bunch sequence is because all bunches member has identical total length.In addition, people easily in the sequence of calculation the average aminoacid in each site form.The residue that this may change when having simplified structure microprotein library greatly and the evaluation of intensity of variation.Microprotein with identical CDP is useful especially for the design microorganism protein library for big bunch, because they provide the variational naturally detailed information about each site.
The CDP bunch of subgroup of related microorganisms protein sequence typically.In many cases, all members of CDP bunch are from the same family of homologous protein.Yet existence contains CDP bunch from the member of polyprotein matter family.An example is CDP bunch of 3_5_4_1_8 (being expressed as C3C5C4C1C8 or CxxxCxxxxxCxxxxCxCxxxxxxxxC sometimes), and it contains 51 members, and some comes from the PF00008 of family, and other come from the PF07974 of family.Sequence (in principle) with this CDP can be taked two kinds of structures.CDP different on preferred these structures obtain structure evolution.
Because DBP is difficult to direct control, and CDP controls easily by gene is synthetic, so CDP is the method for most preferred control DBP and structure.
The evaluation of useful CDP: can find useful CDP by analysing protein data base such as Swiss-Prot or TranslatedEMBL (Trembl).Combination from the database of information of Swiss-Prot and Pfam and note cysteine bonding pattern by Gupta (Gupta, A. wait the people. (2004) Protein Sci, 13:2045-58) describe.Can contain the protein sequence of the cysteine residues of high percentage ratio to these database retrievals, they are that microprotein is typical.People can calculate the distance between conservative or the adjacent cysteine residues, to obtain CDP, retrieve the CDP that repeatedly occurs then.If many natural sequences have identical CDP, then CDP is meaningful especially, because this points out this CDP to allow sequence polymorphism widely.Useful CDP has avoided the long distance (" long ring ") between the adjacent cysteine residues, because this more may be attacked by protease, and more may produce length and enough is incorporated into peptide in the MHC molecule crack.Interested especially is that its middle distance all is no more than 15,14,13,12 or 11 amino acid whose CDP.Distance between the more preferably wherein adjacent cysteine residues all is no more than the CDP of 10,9 or 8 residues.Interested especially is CDP from the family with low abundance hydrophobic amino acid such as tryptophan, phenylalanine, tyrosine, leucine, valine, methionine, isoleucine.These hydrophobic residue appear in the typical protein with about 34% frequency, and with non-specific, hydrophobicity in conjunction with relevant.Interested especially CDP contains a plurality of members that are less than 30,28,26,24 or 22% hydrophobic residue that have.Interested especially is the CDP that shows the multifarious member of high sequence.Table 2 has provided the example that can be used as the CDP of very useful microprotein library support.[table 3] provided most preferred CDP.
The exemplary CDP tabulation of table 2.
Figure A20068003404900771
Figure A20068003404900781
The row that are designated as " member " show the numbering of the natural sequence with special CDP, these sequences Gupta (Gupta, A. wait the people. (2004) Protein Sci, 13:2045-58) confirm among the data base of Miao Shuing." n " disulfide bond number in being bunch." domain length " is the amino acid residue number (first cys is to a last cys) of CDP.Row n1 to n7 listed separate bunch in the number of non-cysteine residues of cysteine residues.The meaning of n2=6 is that the ring between C2 and the C3 is 6 amino acid longs, does not comprise cysteine.
The exemplary CDP tabulation of table 3.
Figure A20068003404900791
Figure A20068003404900801
" member " provided the numbering of the natural sequence with specific CDP, they Gupta (Gupta, A. wait the people. (2004) Protein Sci, 13:2045-58) confirm among the data base of Miao Shuing.The number of disulfide bond during " n " provided bunch." domain length " has provided the amino acid residue number (first cys is to a last cys) of CDP.Row n1 to n7 listed separate bunch in the number (" ring length ") of non-cysteine residues of cysteine residues.
Ring needs size fixing between some cysteine, and other rings may adapt to certain length multiformity simultaneously.The length multiformity that exists in natural sequence family is a kind of method which type of length variations the specific ring of estimation can accept.The scope of the length variations of this permission is from-10, and 9,8,7,6,5,4,3,2,1 aminoacid is to+1,2,3,4,5,6,7,8,9 or 10 aminoacid.
The protein folding of the orthogenesis of DBP and clone's group: a large amount of disulfide bonding patterns (DBPs) is to be used for optimizing proteinic another degree of freedom of HDD (' high disulfide bond density '), be not useable for non-HDD protein, even have many disulfide bond those.A factor is in bigger protein, and the disulfide bond apart from each other can not react, unless other fixed sequence program unfolded proteins make cysteine gather together with high local concentrations and correct direction.Therefore, cysteine has relatively low importance effect than in larger protein folding.What have hydrophobic core tends to have the side chain contact that many participations produce the 3D structure than larger protein.In this so-called as Hubert Yockey (1974) definition In the information content scheme, DBP is locked on the position statistically, and the change of the evolution among the DBP is very impossible.Structure evolution may only be applicable to the protein with low information content, and these protein have the required residue of less 26S Proteasome Structure and Function.The proteinic information content that is defined as random mutagenesis sensitivity not only increases the extra time as the function at protein evolution age.For example, when gene replication, freely evolve for one in two copies, and have extremely low information content effectively, though its information content height, if only there is a gene copy.In the situation of low information content, a large amount of amino acid mutations and main structural change may take place, may be lethal if they take place in single copy gene.Proteinic information content also depends on the specific function aspect of being considered, some function (being catalysis) has the information content higher than other functions vaccine of nine amino acid t cell epitope (that is, based on).Redundance is common in venomous animal, and every kind generally has 100 kinds of different toxin that derive from identical or different gene of surpassing in its venom.Redundance has the proteinic tachytelic evolution of the HDD of helping, as the homologous genes of the multicopy of the extensive multifarious toxin of coding and/or the different genes of single copy.
The one group of clone who selects at the target combination may only provide the part (subdomain or micro structure territory, perhaps one or more rings) of the domain of combined function.Best clone has only about 7 aminoacid of optimizing fully in typical 10e10 library.This is because maximum (on average) information content that may add in elutriation circulation is the size (promptly 10 in library 10).Usually need the library generation and the examination of multicopy to come accumulating information content.10 10Three circulate in and can produce nearly 10 in theory 30Information content, but because to additive physical constraints, generally this numeral is far below this.Generally speaking, the most of aminoacid in the domain directly do not contact target, and they are replaced by several amino acids, if not whole words.A purpose of structure evolution is the DBP of the non-binding part of evolution, produces the structure of modifying, and produces the target combination of higher affinity, and does not produce any variation in conjunction with the aminoacid sequence of target part.
A kind of preferable methods is the information that promotes many structures from each simple sequence, thereby makes it have in a large number (>10 in first circulation or reducing multiformity by an a plurality of elutriations circulation 4) copy every kind of phage clone after, each copy can be taked different DBP and structure.A kind of method that improved structure diversity in the library before elutriation is the library to be heated 10-30 adds high concentration suddenly in the library after second oxidant, to remove the folding structure of any part that may form.Have at protein
Figure A20068003404900821
Can anneal and utilize before its folding pathway, the unexpected formation of disulfide bond should cause multiformity to improve, although this method may reduce the folding average quality that obtains.Opposite method is used for obtaining the folding of homogeneous, generally comprises by dialysis and removes Reducing agent gradually, causes folding gradually and sulfhydryl oxidase gradually.This method also may relate to the progressively decline of temperature, is similar to the annealing of oligonucleotide.If in the first round elutriation DBP-variation is applied to the library, it is excessive importantly to produce big library, for example (is generally 10 than difference clone number 9-10 10) many 10 5Granule doubly may be by a large amount of different structure of each sequence generation to contain.
The variation of DBP: experience diversified different condition by the equal portions that make identical library, can be with the DBP spectrum and the variation that distributes.These conditions comprise pH, temperature, oxidant, Reducing agent such as DTT (dithiothreitol, DTT), BME (beta-mercaptoethanol), glutathion, the Polyethylene Glycol (molecular aggregates, therefore rare DBP may become more frequent) of certain limit, or the like.
Many supports library:, can use many supports library according to following three methods in order to identify and the bonded microprotein domain of target high-affinity:
1. set up Ya Wenku according to many supports or cysteine distance mode (CDPs) and randoming scheme.
2. identify initial hitting by put on a large amount of Ya Wenku of elutriation at target.This can be undertaken by each library of elutriation separately or the mixture by elutriation Ya Wenku.
3. by initial the hitting of affinity maturation optimization, affinity maturation is a kind of cyclic process that comprises mutation and selection or screening.
The use in many supports library significantly is different from the traditional method that concentrates on single support.In single support library, most of library members have similar population structure or folding, and they are mainly different on amino acid side chain.The example in single-side stand library is based on fibronectin (Koide, A., Deng the people. (1998) J MolBiol, 284:1141-51), lipocalin protein (Beste, G. wait the people. (1999) Proc Natl Acad Sci U SA, 96:1898-903) or protein A-domain (Nord, K., wait the people. (1997) Nat Biotechnol, 15:772-).Binz, H.K. waits the people. and (2005) Nat Biotechnol, 23:1257-68 have described many other supports.In some cases, the member who shows little difference on individual ring length, for example CDR of antibody library are contained in the single-side stand library.Limited amount shape space is tended to contain in the single-side stand library.As a result, obtain the low-affinity coalition usually.These molecules can not mate the shape of their target well.Yet the aminoacid that forms the contact area has been optimized for the shortage of part compensation shape complementarity.Many publications are described
Figure A20068003404900831
Improve the trial (being ribosomal display, combination phage library) of library size in order to strengthen the aminoacid multiformity in the contact surface between support and the target.What the single-side stand library produced hits at first and can further optimize by affinity maturation.Yet this method generally concentrates on the little variation of outside, CDR sample ring in the conjugated protein, and does not influence the population structure of domain.Also there is not support bracket fastened affinity maturation to cause the overall folded of conjugated protein and the example of the bigger change of structure; Under few situation that bigger variation takes place, these clones are eliminated usually, because their immunogenicity and nature of production are considered to uncertain.
(irrelevant usually) multifarious clone of support is contained and has in many supports library, has big difference on population structure.Usually, the different shape of each CDP representative, each Ya Wenku contains one group of mutant, the seldom sampling of the sequence space around specific CDP of these mutants.Have multiple difform molecule by (from a plurality of inferior libraries, each has different CDP) test, increased the chance of identifying the close complementary conjugated protein of its structure and target surface.Because the sample of the sequence space that centers on CDP that each Ya Wenku representative is less relatively can not obtain the best combination sequence by this method.From the shape of hitting the target of having simulated them at first in many supports library, but hit and target between the fine structure of contact surface may be time good.As a result, optimize the specified protein sequence concentrating on subsequently and significantly do not change the further improvement that may realize binding affinity in the affinity maturation process of its structure.Briefly, purpose is to find to be fit to the optimum structure of target, finds to cooperate the optimal sequence of this structure then, and provides complementary with the best of target.
Find the experimental technique of new support: another method of carrying out the library design is to compete by the multiformity that makes design, and calculates proteinic preferred plan.Selection is folding fully also checks order with protein good representation.The design (the input number is proofreaied and correct) that preferably has the highest unfolded protein ratio.There are several diverse ways to be used to find preferred CDP and sequence motifs.
Method 1: CDP at random, random sequence
Random interval and serial method are not based on spacing or the sequence that exists in the natural multiformity, therefore can find to accept with them the proportional new existing cys-of the containing interval mode of ability of random sequence.
This method comprises the open library that preparation is wide, as 10 10Display libraries has design CX (0-8) CX (0-8) CX0-8) CX (0-8) CX (0-8) C, use agarose that 25-35 amino acid whose total length selected then, at expression in escherichia coli, (randomly) use free sulfhydryl groups post is from exhibition then
Figure A20068003404900841
Remove all in the library and separate folding protein, (perhaps screening each clone's expression), and coding good representation and proteinic 200-1000 folding fully clone checked order.
All distance mode take place with similar frequency in the library.We are desirably in and find intensive bias in the interval/distance mode that takes place in the nature protein, but many interval modes are new.For example, if distance mode A only allows 0.01% unfolded protein, and Mode B obtains 10% unfolded protein, and the generation that then has the clone of Mode B should be than the clone with Mode B frequent 1000 times.Should be enough to identify the 10-30 spacing that this can fold to 1000 cloning and sequencings, and irrelevant with the ring sequence.Many interval modes of finding with this method may be new, are used for then preparing independent library at interval based on these.
In the infra one side, generally make up with interval based on natural family by the new interval that this method is found.
Method 2: natural CDP, random sequence
It (is NNN that the CDP of the specific natural family of 10-100 kind uses the random amino acid compositions, NNK, NNS or similar codon) synthetic, be converted into the library as singleton then, doubling superimposition expression is selected or is screened as mentioned above, then the folding clone with expressing of the best is checked order.This method produces the ordering of the support of natural family according to the ability of accepting random sequence.This method tends to produce the quality of higher average level, because folding clone's ratio is higher than CDP method at random, but can not estimate too many support.
After selecting preferred interval mode, we determine which non-cys residue is that improvement is folding needed in the specific interval pattern.
Method 3: natural CDP, natural acid sequence mixture
The interval mode of the synthetic specific natural family of 10-100 kind of the natural mixture that the aminoacid that use exists in each site is formed (determining) according to comparison, be converted into the library then as singleton, doubling superimposition expression is selected or is screened as mentioned above, then the cloning and sequencing that the best is folded and expresses.This method tends to produce the quality of high average level, and folding clone's ratio will be much higher than former method, but be limited to the high density retrieval of the sequence space character of having developed more or less.
The library of first water (promptly can be used for commercial object immediately) by synthetic have all fixing non-cys residues, but have in each site certain variation natural family (natural CDP) produce.Well
Figure A20068003404900851
Folded clone's sequence analysis tell we which fixedly residue be real needs, and which residue variation allows.
Structure evolution: contain the 3-D structure that disulfide bond protein matter is folded into good qualification and depend primarily in the body and the character of external reducing environment.For example, the reduction of disulfide bond can cause protein structure to be lost fully, has emphasized that disulfide bond is for structure-preserved importance.On the other hand, in the Protein Folding process that the reduction reconciliation folds fully, many theoretical disulfide bond isomers may be because the oxidation of the cysteine of contact closely in the folding process.The protein that contains four cysteine has the disulfide bond isomer of three kinds of theories, has the 15 kinds of isomers that have of 6 cysteine, has 105 isomers that have of 8 cysteine, or the like.In protein folding procedure, also observe this various and common nonproductive isomer, but in native conformation, only show the cysteine pairing of a combination usually.This is why most of research worker thinks that the disulfide bond isomer is the reason of a subject matter in the external folding research.Yet the disulfide bond isomerization can be used to be rich in the evolution of structure diversity of the microprotein of disulfide bond.Because their small size and high disulfide bond content, the covalent bond that these protein only depend on cysteine usually keeps folded conformation.Many microproteins lack hydrophobic core fully, and this is considered to the folding usual forces of larger protein.For the member's SM-B and the Serpentis conotoxin of microprotein family, in experiment, observed different disulfide bond isomers (Y.Kamikubo waits the people. (2004) Biochemistry, 43:6519-34; J.L.Dutton waits the people. (2002) J Biol Chem, 277:48849-57).Yet, these file descriptions to have multiple isomer be the problem that needs solve, rather than as the chance that is used for the protein design.Therefore can develop common applicable notion and experimental arrangement, use the driving force of disulfide bond isomerization as the microprotein structure evolution.
Structure evolution by disulfide bond reorganization: referring to Figure 152,153,154.Following chapters and sections provide and have utilized the disulfide bond isomer to carry out the concrete experimental technique of structure evolution.After the phage particle secretion of merging with specified microorganisms protein, these granules stand the height reducing condition, by with the reduced glutathion of millimolar concentration, a kind of redox active contain the disulfide bond tripeptides, mixture is hatched.Then in the buffer of the EDTA that contains millimolar concentration from Reducing agent the purification phage particle, to prevent the air oxidation of free sulfhydryl groups.The polypeptide chain of reductive in a large number and structure diversity will be contained in this library.Behind these reductive isomer mixtures of contact, oxidizing condition is stood in this library in the target cohesive process, and for example the oxidized form of glutathione of millimolar concentration locks favourable microprotein conformation with the oxidation by sulfydryl.For interacting to go back ortho states and their target at first, by the microprotein coalition of quick oxidation locking, select this method then in conjunction with conformation.Microprotein set and the target protein shape complementarity selected, this method are called as inductive the folding of target that disulfide bond relies on.Select best coalition, and carry out other orthogenesis circulation (mutation and elutriation), up to target independently mode obtain the conformation of active and complete oxidation, make and induce the conformation of expectation no longer to need target, the chemical compound that generation is easier to prepare.
In addition, phage library stands the buffer of medium redox ability, makes disulfide bond reorganization.This can easily realize by the buffer composition of selecting to have different oxidized forms and reduced glutathion ratio.This allows the partial oxidation and the reorganization of disulfide bond subsequently of cysteine residues subgroup, and for example fracture of already present key and formation again help the accumulation of most of disulfide bond.Therefore, there is one group of multiple different structural grouping (depending on the number of the cysteine residues of given microprotein) under these conditions.Select the strongest clone then, carry out another and take turns disulfide bond reorganization (carrying out or do not carry out aminoacid sequence optimization).
Covalency target combination by disulfide bond: different with the viewpoint of adhering to for a long time, nearest work show the specificity reduction of disulfide bond can in born of the same parents' external environment, take place (P.J.Hogg (2003) Trends BiochemSci, 28:210-4).Endotheliocyte shows secretes reducing activity in supernatant, can be defined as is thrombospondin-1, and a kind of glycoprotein that in the calcium binding structural domain, has a redox active sulfydryl (J.E.Pimanda waits the people. (2002) Blood, 100:2832-8).Significantly, the free sulfhydryl groups of thrombospondin-1 is by the length of the disulfide bond control attachment proteins matter von Willebrand factor between redox molecule.These discoveries can be used for covalently bound new microprotein and contain the target protein of disulfide bond.Select this method to be used for bonded partial reduction and microprotein redox active near the target protein disulfide bond.For example, with after target protein combines, select the phage display library of microprotein variant to resist washing oxidizing condition under, but eluting specifically when under reducing condition, washing.Therefore, in the protein evolution process,, select other to be used for the redox active free sulfhydryl groups simultaneously with forming the disulfide bond of some stabilised microorganism protein structure.
The evolution of structure diversity is meant the variation of the structure of specific cloning experience.Structural change generally depends on sequence variation, but even two identical sequences also may adopt different structures.Architectural difference can be on disulfide bonding pattern or folding level, normally because the variation of significant ring length on the structure.Structure evolution is different from structure diversity (use in for example many many supports library), wherein uses many supporting structures, but each clone always adopts the structure of its parental array.In structure evolution, each clone may have the structure different with its parental array.
Figure 155 has shown that a dominant 3SS bonding pattern (18 kinds of different natural families) and a step can be by the disulphide variants of its generation.Most of naturally occurring families are within a step favored pattern (14 2536).Figure 155 has also shown by going up to advantage 3SS pattern (14 25 36) increases the 4SS variant that disulfide bond can produce, and does not need to change any existing disulfide bond.11/15 naturally occurring 4SS bonding pattern can obtain by increase a disulfide bond on advantage 3SS pattern, and any 3SS disulfide bond pattern does not need to rupture.Owing to always have 105 kinds, Notes of Key Data strong tendency is in increase disulfide bond on existing 3SS protein.I think whether this analysis should be answered preferred path is reverse, promptly produce 3SS protein from 4SS protein deletion disulfide bond.Unless the data base has not exclusively influenced these results (possible), otherwise 142536 and its 4SS derivant by adding a disulfide bond acquisition be preferred starting point.
The microprotein aggregation method: the purpose of aggregation method is the progressively affinity maturation that obtains conjugated protein confrontation target.In each circulation time, add randomized sequence (generally being new ring) and produce the library by on last selection circulation product, adding a pair of cysteine, carry out the library elutriation then and select that target is had high-affinity or active clone.Starting point can be unique sequence or one group of sequence, and the sequence in the randomization district of starting point can be known or unknown.
Produce 1-disulfide bond (" 1SS ") as starting point: can use aggregation method to produce new microprotein from the simple protein that contains disulfide bond with 2 or more disulfide bond.A kind of aggregation method starts from containing two fixedly protein of cysteine residues (1-disulfide bond or " 1SS " protein).Randomly this protein can have identical cysteine spacing or the length of finding in the ring with preferred (generally being natural) disulfide bonding pattern (be called " span ", do not comprise cysteine).This similarity makes and easily 1SS protein is transplanted in existing 2SS, 3SS, 4SS or the more high-grade support.The span in 1SS library generally is a 0-20 amino acid length, preferred 5,6,7,8,9,10,11,12,13,14,15, and more preferably 7,8,9,10,11,12,9,10,11 amino acid lengths ideally.Can be beyond to cysteine (promptly ring or " span " are in addition) randomization residue in addition.Initial 1SS protein is randomization wholly or in part between cysteine generally, but it contains fixed aminoacid (beyond the cysteine) sometimes, provides folding or to the affinity of target molecule.
From 1SS to 2SS or the more assembly of high trestle: the 1SS method of protein that a kind of maturation was selected in the past is to provide two new cys residues as the library in the fixed position or in a plurality of optimum positions.Generally speaking, be positioned at the lateral residue cysteine of these two new residues and new ring will be randomized.
Protein with odd number cysteine tends to toxicity and/or expresses relatively poorly, and is removed effectively by expressive host.Therefore, even the cysteine of coding random number has only the DNA sequence of coding even number cysteine just to be expressed as functional phage particle.Therefore, the method that a kind of (one group) the 1SS peptide that will select is in the past expanded as (one group) 2SS peptide is the library that produces the randomization residue with 1/3rd fixed cysteine and big (with variable) number, the some of them expection Cys residue of encoding statistically.The position encoded cysteine residues of these randomizations of a known part, and after removing sequence with odd number cysteine by the phage growth, 2SS protein with second pair of cysteine will account for phage library>50%, preferably>60-80%, perhaps sometimes even>90-95%.New cysteine and/or new randomized zone can any or two all be positioned on the N-end profile of initiation protein, perhaps any or two are positioned on this proteinic C-end profile, perhaps, more singularly, be positioned at the inside of initiation protein sequence.The disulfide bonding pattern may change in assembling process.Former disulfide bond can be replaced by the disulfide bond (new DBP) that connects different cysteine.
Extension method: can merge and extend with randomized library sequence with target bonded (random length or disulfide bond number) protein, generally comprise a pair of (or many to) separated and randomly have variable spacing by the site at random of some cysteine.For enhanced binding affinity, select this proteinic library to target molecule.This method may cause not homotactic second binding site that folds respectively with first binding site.
Dimerization method: particularly for for homopolymer or be positioned at target on the cell surface, attractive is the binding site of selecting before duplicating, produce the identical dimer that contains the disulfide bond sequence, trimer, the tetramer, pentamer or six aggressiveness, each can combine with the same loci on the target.If target can combine with a plurality of sites simultaneously, then binding affinity increases.Best affinity generally need change the spacing of optimizing integration between the site by the spacer that detection different length and optional difference are formed.An example with the bonded homotype dimerization of people VEGF microprotein has been described herein.Between binding site, use the spacer of forming by Gly-Ser, its length can regulate with provide for
Figure A20068003404900891
The affinity of poly-VEGF target the best.
The series of existing CDP: may add disulfide bond by this way, make each 1SS, 2SS or 3SS construct spacing (" cysteine distance mode ", CDP) CDP with existing protein families is identical, makes, for example, each stage of assembly is used natural CDP.Also 1SS or the 2SS protein selected may be transplanted to the position that has loop-like length in existing 3SS, 4SS or the 5SS support.Can add disulfide bond for the purpose of the disulfide bonding pattern that changes existence, produce the library of structural variant or DBP variant, perhaps keep the bonding pattern that exists.The control of DBP mainly depends on new cysteine to whether only adding on the end of initiation protein (tending to keep existing DBP) to new randomized sequence or whether they add existing proteinic both sides (promptly at cysteine of each side) to, and this tends to cause the change of DBP.If wish to keep existing disulfide bond, then help old cysteine to and the cysteine that increases newly between keep some extra interval residue.This spacer may have arbitrary sequence, but preferably is rich in the spacer (being the polymer of GGS or GGGGS) of glycine.If target molecule is that dimer (solubility) or cell are bonded, then length is enough to make two kinds of microprotein motifs and the bonded spacer of its target to cause combining simultaneously two site, causes affinity or apparent affinity to improve.
The assembly of big primer method: big primer method allows to produce new library from old library, has avoided the complexity that exists the sequence library to cause.It is overlapping that the PCR fragment of the 1SS protein set of selecting before generation contains, this fragment and coding have the Novel DNA fragments (oligo or PCR product) in new library of one or two new Cys residue.Produce ssDNA PCR product out of control (" big primer ") by this overlapping fragments, it contains and the homologous end of carrier, with carrier annealing, be used for driving Kunkel-sample protease extension, use and will contained the template of termination codon by the zone that big primer is replaced.Perhaps, can utilize the new library of the interior generation of vector library of the former selection of restriction site of a pair of uniqueness.Allow on the phage capsid, to present protein with the gene fusion of phage albumen pIII or pVIII.Protein with even number cysteine can be by following secretion: i) phage growth, ii) affinity is selected, iii) free sulfhydryl groups purification, and/or iv) DNA sequence screening.One or more circulations of this method can be used for setting up the disulfide bond content from 1SS, 2SS, 3SS, 4SS, 5SS, 6SS or higher disulfide bond number.Can utilize any disulfide bond number as starting point.
The exemplary assembly method of big measuring is described hereinafter.
234 methods for designing: referring to Figure 138.A kind of preferable methods is called as " 234 ", because it comprises that generation for the first time and elutriation contain the library of 2 disulfide bond of all three kinds of bonding mode mixture things, select one group of best clone then, be used to produce and have other (part) the randomized amino acid sites and the new library of other a pair of cysteine, thereby form the library of three disulfide bond, it can adopt nearly 15 kinds of different structures, some of them have original four cysteine that form different bonding patterns, thereby allow the structure evolution of former 2SS sequence.Each " library extension fragment " several codon of generally encoding, their coded amino acid mixture (promptly by NNK, NNS or similar mixed cipher coding) add one or more cysteine (being positioned at the outside), and 5 of the sequence set of selecting before can adding to ' or the N-end, or the sequence set of selecting in the past 3 ' or C-end, perhaps two ends.For fear of free sulfhydryl groups, wish in each clone, to add even number cysteine (2,4,6).This can realize (1 cysteine of each end and 4-5 randomized codon) by add library extension fragment to two ends, perhaps as the fragment and 6-8 the ambiguous codon (ispol of coding expectation) of two (or 4 or 6) cysteine of coding, it only adds the C-end to or only adds on the N-end.This method can repeat repeatedly.
Therefore 234 directed evolution methods may further comprise the steps: initial library construction (2SS), the target elutriation, (optional: single clone's screening and best clone's merging), extend library construction (3SS), the target elutriation, (optional: single clone's screening and best clone's merging), extend library construction (4SS), target elutriation and single clone's final screening is to identify best 4SS coalition.
Can design many variations of this method.May use 4,5,6,7 or more disulfide bond, for example, the jump of two disulfide bond of preparation replaces the jump of 1 disulfide bond, perhaps at library of a kind of target elutriation, and at the next library of second kind of target elutriation, wherein these targets can be correlated with or be irrelevant.
A kind of preferable methods is preparation 2SS library, and its CDP also finds (preferably identical) in natural 3SS protein, and preparation 3SS library, and its CDP also finds in natural 4SS protein; Like this, can determine reasonably that 2SS protein can maturation be 3SS, 3SS protein can maturation be 4SS protein.
3x0-8 and 4x0-8 design method: referring to Figure 139." 3x0-8 " and " 4x0-8 " preferred design method purpose is to produce the structure of whole 15 kinds of 3-disulfide bond or the structure of whole 105 kinds of 4-disulfide bond, so that for maximum structure diversity and the sequence polymorphism of elutriation target performance.Identical method can expand
Figure A20068003404900911
To 5-, and 6-or 7-disulfide bond microprotein (5x0-8,6x0-8,7x08).
The analysis of the ring length of all natural 3-disulfide bond microproteins shows that it is 0-10 aminoacid that these rings tend to magnitude range.The meansigma methods of 5 rings (C1-C2, C2-C3, C3-C4 and C5-C6) very similar (owing to after not wanting to remove some the longest rings, scope is that 0-8 is to 3-12) is although there is notable difference in the size of encircling between different support family.For example, the ring C1-C2 in the conotoxin is 6 amino acid longs, and is 0 aminoacid in the anato domain, although two kinds have identical disulfide bonding pattern.
Prediction motif C1x 0-8C2x 3-10C3x 0-10C4x 0-8C5x 0-9C6 covers natural 3SS protein sequence and most of more than 90% of unknown 3SS microprotein with useful quality.Use the ring of equal in length, as 0-8, library constructing method is easier to, and produces library motif C1x 0-8C2x 0-8C3x 0-8C4x 0-8C5x 0-8The 4SS form that C6 maybe should design, i.e. C1x 0-8C2x 0-8C3x 0-8C4x 0-8C5x 0-8C6x 0-8C7x 0-8C8.Operable other ring lengths are 0-10,0-9,0-8,0-7,0-6,0-5,0-4,1-5,1-6,1-7,1-8,1-9 or 1-10, although expect that most of ring lengths are all effective.
Type expection in this library contains a large amount of inhomogeneous folding sequences, means that they can adopt the multiple structure of not buying to resell, and can not easily produce with the homogeneous form.It is disadvantageous that this inhomogeneity produces for protein, but the multiformity that improves finds it is an advantage for elutriation and early stage part.
In the multifarious traditional display libraries of synthetic protein, all clones have identical fixing protein support.Although produce huge sequence polymorphism, they all have identical structure, do not have significant structure diversity.On the contrary, about equally 15 kinds or even the mixture of 105 kinds of extremely different structures are contained in 3x0-8 and 4x0-8 library.
A kind of typical phage display library contains 10 9To 10 10Plant different clones, generally each has different sequences.Yet, institute's elutriation be one group of about 10e13 phage particle, on average contain about 1000-10,000 every kind of sequence of copy or clone.This copy number is called as " library equivalents ".Because disulfide bond forms the folding inhomogeneity cause, each among the 1000-10 of identical sequence, 000 copy can adopt different structures.Therefore effective library size in 3x0-8,4x0-8 or 5x0-8 library is bigger 10,100 or 1000 times than single-side stand library.All or possible in theory structure, the disulfide bonding pattern and folding of great majority are contained in the library of this design therefore expection.
In order to keep less average protein, to prevent the frequency that desired results not forms and improves desired results, may dwindle the length range of ring.Can suitable medium ring length, as 2-6,2-7,2-8, a 2-9 or 2-10 aminoacid, or 3-4,3-5,3-6,3-7,3-8, a 3-9 or 3-10 aminoacid, or 4-5,4-6,4-7,4-8, a 4-9 or 4-10 aminoacid or 5-6,5-7,5-8, a 5-9 or 5-10 aminoacid.
Also can select a fixed ring length, be generally 1,2,3,4,5,6,7,8,9 or 10 amino acid long for the library.
Make average protein size keep less a kind of compensation process to be to use dna fragmentation size gel to select to be limited to 20 on the coding, 21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,50,55,60 aminoacid, under be limited to 13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34 or 35 amino acid whose dna fragmentations.
4X6 method for designing: referring to Figure 140.A kind of preferable methods is " 3x6 " or " 4x6 " method, and it starts from having 3 or 4 disulfide bond and 6 libraries that may have the amino acid whose retainer ring size of variable sequence.The protein motif in 4X6 library is C1x 6C2x 6C3x 6C4x 6C5x 6C6x 6C7x 6(subscript expresses possibility the amino acid sites number that contains base mixture (usually by NNK, NNS or similar ambiguous codon coding to C8; The numeral of C back is meant cysteine order from N-to the C-end in protein).In the natural microbial protein families, the cysteine that is bonded together is separated by average 10-14 aminoacid (average 12) on the protein chain main chain; We claim this distance to be " disulfide bond span ".This span seldom is less than about 8-9 aminoacid.When adjacent cysteine disulfide bond, they form subdomain, and this uses for great majority is undesirable, because it has from fever of the body and protease unstability spectrum.These undesirable subdomains can be selected too short and the ring length that adjacent cysteine can not bonding is eliminated, and promptly are less than 9 aminoacid.6 amino acid whose constant spacings seem advantageous particularly, because it has prevented substructure and produced wherein (non-conterminous) cysteine 12 amino acid whose a plurality of positions at interval that this is ideal, because this is the meansigma methods of native protein.Eliminate subdomain and removed 69 kinds of 4SS disulfide bonding patterns the poorest, and may only produce the 4SS disulfide bonding pattern of 36 kinds of the bests.4,5,7 or 8 amino acid whose constant spacings or its combination also are feasible.
Most of known 3SS toxin are included in list " full support " library with following composition:
Figure A20068003404900931
-(x 0-10)-C2-(x 2-12)-C3-(x 0-10)-C4-(x 0-10)-C5-(x 0-12)-C6.The natural toxin and even the toxin that exists of the non-natural of more number more of most of the unknowns contained in this library in addition.The average length of this library encoded protein matter is: 1+5+1+7+1+5+1+5+1+5+1=33 aminoacid.
In order to produce short protein, can use the oligos of the coding short sequence higher than the oligos mol ratio of the long sequence of coding, perhaps with the maximum loop length restriction for having only 8 aminoacid rather than 10-12 aminoacid.
Similarly, most of 4-disulfide bond HDD toxin will be contained in the full support library with following composition, have 105 kinds of different disulfide bonding patterns and surpass 1,000 kinds of protein folding:
C1-(x 0-10)-C2-(x 0-10)-C3-(x 0-10)-C4-(x 0-10)-C5-(x 0-10)-C6-(x 0-10)-C7-(x 0-10)-C8
5-disulfide bond " full support " library will be expressed as:
C1-(x 0-10)-C2-(x 0-10)-C3-(x 0-10)-C4-(x 0-10)-C5-(x 0-10)-C6-(x 0-10)-C7-(x 0-10)-C8-(x 0-10)-C9-(x 0-10)-C10。
The ispol that x general proxy is required.Although can use NNN codon encoding amino acid mixture, other codons have advantage.Each codon provides different ispols.
For example, NNK makes the frequency of termination codon reduce by 3 times.Different codons can be used for different application.The mixture that helps hydrophilic amino acid is expected, avoids termination codon, tryptophan, other hydrophobic amino acids and avoids cysteine also to expect in ring.Molecular biology is known codon how to select to produce desired mixture.The codon that generally is used for selecting contains A, C, G, T or mixes basic group letter N, M, K, S, W, Y, R, B, D, V or H as first base in the codon, contain A, C, G, T or mix basic group letter N, M, K, S, W, Y, R, B, D, V or H as second base in the codon, contain A, C, G, T or mix basic group letter N, M, K, S, W, Y, R, B, D, V or H as the 3rd base in the codon, produce a large amount of possible codons, the different aminoacids mixture of all encoding.
The proteinic ring sequence of natural HDD contains the less fixedly residue of number, may work in protein folding.Method in the past only is to use random cipher and for these residues provide multiformity, if they are for folding important really.Compare with the library that each site uses natural amino acid to form, this random cipher submethod will cause lower library quality, but may be best aspect the new folding potentiality of research.
Yet, if for example W is folding or function is needed, but use the NNK codon in this position, have only 1/64 clone to satisfy this requirement in this library, so effective size in library reduce 64 times, this may be enough to stop and obtains useful coalition.Therefore show in native sequences that it may be important that fixed any residue is also fixed in the library.
A kind of alternative method of using random cipher (in NNK or above-mentioned other multiple codons a kind of) is the oligonucleotide of the consensus sequence of synthetic ring with specified protein family.This method need be encircled ring 2 positions that the library is only introduced in 2 designs, encircles 3 sequences and only introduces ring 3 positions.The cysteine of reaction is all by different in three cysteine codons codings if overlap, and this can realize.For more effective overlapping PCR reaction is provided, 1 to 3 base before or after the cys codon also can be fixed.Overlapping reaction efficiency can limit the multiformity in library, so this is an important risk that can not easily detect or control.Usually, adding the minority base is to reduce an effective method of low library multiformity serious risk.
In all ring sequences that are mixed for different families and after being introduced into by overlapping PCR, all synthetic ring sequences only should appear at their natural place.This library method causes ring never to be reorganized relative to one another with family.
Improve the library multiformity: nature is relevant with the multiformity that stands selection pressure with the ability of orthogenesis.From selecting to produce usually better result the more multifarious clone in a large number.The multiformity that the several different methods of utilizing organism improves protein structure surpasses gene dosage.The natural multiformity of this expansion provide multiaction more in and the selection scheme that improves nature evolution ability.
Have multiple diverse ways can improve the diversity structure that can obtain from the clone or the sequence of equal number, purpose is to improve the ability of orthogenesis.
This principle can be applied to single-gene, polygenes approach, full genome (prokaryote, ancient cell, eukaryote) and even the optimization of whole biome (being microbiologic population).
Usually, monogenic expression produces multiple different mRNA sequence.This may be owing to multiple promoter, because alternative splicing, trans-montage or degraded.Every kind of mRNA sequence can differently fold, and adopts multiple different structure, and the result also can regulate by other RNA (little-, tRNAs or mRNAs) and with the existence of RNA interacting proteins.Wherein every kind of mRNA structure can slightly differently be translated, and the existence by many translation initiations and termination signal has in the difference on ribosome
Figure A20068003404900951
Variant, or but the low variable aminoacid misincorporation of degree goes into, and comprises " non-natural " aminoacid.In addition, every kind of protein translation product can differently fold, some gathering, and some misfolding, some is by proteasome degradation, and some is folded into multistable fixed structure.An important and practical differentiation mechanism is proteinic derivatization, the chemical modification of amino acid chain and micromolecule as sugar and polymer as being connected with the chemistry of protein chain as PEG.These chemical methodes can be used for the single protein of whole library (great majority) or purification.
When being applied to the library, they can significantly improve multiformity, especially when conservative the application, make to obtain inhomogenous colony.For example, PEG or carbohydrate molecule and contain the molecule (122 kinds of variants) that the non-limit coupling of lysine residue on the protein library of 5 lysines produces 5-factorial+1 type.Select optimum variant by elutriation, and the variant of present labelling prescription is used for the library equivalent, clone's set or monoclonal obtain optimal results to find which kind of prescription.In addition, in order to keep and to improve required activity and evolve and select proteinic sequence.Best mutant, for example, will lose four not have the lysine of contribution to activity, and causes that lysine of activity level raising when being retained in derivatization.The reagent of the protein derivedization that is useful on (being Pierce Chemical line directory) may be used to this method in principle.Can quicken to have meticulous balance between the unstability that cell evolves being used for the uniqueness of cell function, stable structure and multiformity and some.
In these mechanism each all is potential experiment intervention point: every kind is controlled on its current change level by evolving naturally and is provided with, but its multiformity improves or reduces according to the purpose of orthogenesis.
Field with special commercial significance is to use display libraries (phage, yeast, bacterium surface, polysome, ribosome, former fusion or gene fusion library) orthogenesis conjugated protein.The clone's of definite optimum selection frequency and quality are directly related with the library size.The library is big more, and the coalition number is high more, and best clone is very good.Because this point has developed several different methods and produced increasing library, for example utilize combined method with 10 6Two kinds of immunoglobulin libraries of clone are combined as 10 12Clone single library.Yet in this example, all library proteins have identical immunoglobulin folding, and its multiformity concentrates on some is used useful single structure is in the whole antibody product, but are not suitable for producing the multiformity of different structure.Also can improve effective library size, rather than improve the clone's number in the library by increasing the structure number that can produce from unique sequence.
A kind of multifarious alternative method in library that improves is to improve the diversity structure that every kind of clone adopts,
Figure A20068003404900961
Not to improve the library multiformity by improving clone's number.This can use and take off stable protein and obtain, and it is similar to molten ball more, because they exist as big multifarious structure, each has portion of time.This method allows the sizable space of search, is included in untouchable new framing structure in the protein library of highly structural.It is folding that this global search allows to identify that the whole world is more optimized, but and can utilize further orthogenesis to produce this new folding variant stable folding and the homogeneous preparation.
Target generally is a protein, but also can be nucleic acid (DNA, RNA, PNA), carbohydrate, lipid, metabolite or any biology or non-biological material.Because library protein (part) is structureless, it takes multiple different structure, every kind of little time.This has improved the molecular diversity in library and has helped utilizing a large amount of libraries equivalent.For elutriation standard phage library, generally use 100 library equivalents, if perhaps the library multiformity is 10 10, 10 12Individual phage.In experiment, found from the library, stably to reclaim specific (structure is arranged) clone need 100 times excessive.Fall for high-affinity gram, it is lower excessive to use, and for the low-affinity clone, it is higher excessive to use.
To produce multifarious method different with other, and we are called " temporary transient multiformity ", because this species diversity is obtained by multiple structure, and every kind of structure only accounts for portion of time.Produce the cardinal principle that diversified structure is biological evolution from same gene, on a plurality of levels of biological tissue, exist.
Enlarge the multiformity of display libraries: phage library generally contains about 10 14Individual phage, multiformity are 10 10Different sequences.Determined that affinity chromatograph can select the simple sequence of expression conjugated protein (promptly 10 from this library 10Enrichment).Because 100% can the bonded phage of high-affinity will also can be predicted by this method and also can easily select single copy phage (10e14 enrichment) in fact by the affinity column combination.
The peptide of phage display is generally with 10 3-10 6Kind different unstable conformations exist, and wherein have only a kind ofly to combine with post.Because post is in conjunction with the activity conformation of having stablized peptide, the effectively enrichment of these peptides produces enrichment 10 17-10 20).Flexible effective library size to 10 that therefore improved in the bone framework image 20After first round elutriation, multiformity has generally reduced by 1000 times, and therefore in library subsequently, each clone is by 1000 or more copy representative, this means all expressions well statistically of all different temporary structures that protein may be taked.In the process of further orthogenesis, purpose is selected at the clone of more and more times of cost in the structure with high target affinity.Purpose be to use various mutation methods with
Figure A20068003404900971
Choosing combines and has improved proteinic affinity and stability gradually.
Target-inductive folding: the structure of microprotein can be induced in conjunction with (by forming disulfide bond at target in conjunction with the back) by target, and the structure that perhaps can optimize microprotein combines with its target simultaneously.
Therefore with the induced-fit that relates to unchangeably to a certain degree that combines of target, expection can be stablized some disulfide bond (part bonded those) and be taken off the disulfide bond of stablizing other, causes the difference sensitivity to Reducing agent.Titration allows fast restore and reoxidizes least stable disulfide bond in reduction and oxidant (variable concentrations and interval), if in the bonding pattern, have change, this will cause the structure adaptation and with better the agreeing with of bonded target.This method improves the viability of the clone with best combination affinity.
In order to produce, may need protein folding evolve for target independently.
Optimizing the aminoacid of microprotein forms: most protein or protein domain comprise the hydrophobic core for protein stability and conformation key.These proteinic hydrophobic core contain a high proportion of hydrophobic amino acid.Aminoacid can characterize based on their hydrophobicity.A large amount of standards have been developed.A kind of standard commonly used by (Levitt, M (1976) J Mol Biol 104,59, #3233) development, it list in (Hopp, TP wait the people. (1981) Proc Natl Acad Sci U S A 78,3824, #3232) in.Hydrophobic residue can be further divided into aliphatic residue leucine, isoleucine, valine and methionine and aromatic residue tryptophan, phenylalanine and tyrosine.Fig. 1 has compared Brooks, DJ, Deng the people. (2002) Mol BiolEvol 19,1645, in the disclosed all proteins of #3234 amino acid whose abundance with for Gupta, A., wait the people. (2004) Protein Sci, the average aminoacid abundance that 8550 kinds of contained microprotein domains calculate among the disclosed data base of 13:2045-58.
Referring to Figure 13: amino acid whose abundance in the protein.This figure shows that microprotein tends to have with respect to the remarkable lower aliphatic hydrophobic amino acid abundance of other protein, and it is not known in the art.On the contrary, (abundance Y) is similar to average protein to the aromatic series hydrophobic amino acid for W, F.This low abundance of aliphatic amino acid has reflected the following fact: several disulfide bond have been stablized the microprotein structure, and this has been avoided the needs to hydrophobic core.It has shown with respect to other protein several other amino acid residues (glutamic acid, lysine, alanine) of fatty family carbon atom in microprotein also have the abundance of reduction.
Application with low hydrophobic support: the abundance that reduces aliphatic amino acid in the protein can show Improve their purposes in pharmacy and other application.Numerous protein tends to form aggregation in folding process.When protein when the heterologous host middle and high concentration produces, and when protein during in external renaturation, this situation may be aggravated.Gathering and misfolding can significantly reduce proteinic output in the commodity production process.By reducing the ratio of aliphatic amino acid in the protein sequence, people can reduce the tendency that forms aggregation, therefore can improve correct folding proteinic output.
Protein with low fat family aminoacid abundance has lower immunogenicity with respect to other protein.Aliphatic amino acid tends to improve combining of peptide and MHC, and this is the committed step during immunoreation forms.As a result, the protein that contains low ratio aliphatic amino acid tends to contain than other protein of great majority t cell epitope still less.
Aliphatic residue has the tendency that forms hydrophobic interaction.As a result, have that the protein of aliphatic amino acid more may be with non-specific mode and other protein, film and other surface combination at high proportion.The aliphatic residue that is exposed to protein surface has extra high and the interactional tendency of other protein formation non-specific binding.Because having certain surface, the small size of microprotein, the most of aminoacid in the microprotein expose.
Correspondingly, the invention provides the non-natural protein in a kind of 20-60 of containing amino acid whose single structure territory, it has 3 or more disulfide bond, and the protein bound of this protein and human serum exposure wherein, and has and be less than 5% aliphatic amino acid.If wish, non-natural protein contains and is less than 4%, 3%, 2% or even 1% aliphatic amino acid.In addition, the invention provides the proteinic library of the non-natural with this character.
Evaluation with low hydrophobic support: contain less aliphatic amino acid although most of microbe protein is compared with most of normal protein matter, aliphatic amino acid content has significant change between the different microorganisms protein families.Table 4 has been listed some microprotein families, and they can be used as starting point especially and are used for the pharmaceutical protein that through engineering approaches has low fat family residue abundance.
The proteinic design of reduced immunogenicity: the protein of reduced immunogenicity is better as therapeutic agent because when to human administration the undesirable immunne response of they more impossible initiations.In some aspects, the described microprotein with required target binding specificity usually than can with identical target in conjunction with still not having required cysteine bonding pattern or folding protein to have lower immunogenicity.In one embodiment, low 1 times of the immunogenicity that described microprotein is, preferably low 2 times, preferably low 3 times, preferably low 5 times, preferably low 10 times, preferably low 100 times, preferably low 500 times, even more preferably low 1000 times.In certain embodiments, the reduced immunogenicity microprotein is a HDD protein described herein.
Proteinic immunogenicity can be predicted with program such as TEPITOPE, this program is measured based on a big group affinity, calculates to derive from immunogenic all eclipsed nine amino acid peptides and the allelic binding affinity of all main people HMCII classes (people .1999 such as Sturniolo; Www.biovation.com; Www.epivax.com; Www.algonomics.com).These programs be widely used in prediction and remove human T-cell's epi-position and, FDA encourages their application.
Use these algorithms, the microprotein that we find to have 25-90 residue and surpass 10% cysteine generally has the affinity in conjunction with MHCII that hangs down 316 times prediction than average protein.Red curve among Figure 166 shows the immunogenicity of all 26,000 kinds of human proteins' prediction, and its length intermediate value is 372 aminoacid.Blue curve shows the immunogenicity of the prediction of all 10,500 kinds of microproteins, and its length intermediate value is 38 aminoacid.Green curve shows the immunogenicity of non-natural protein fragments prediction, and they have the distribution of lengths identical with microprotein, but is made up of the human sequence who selects at random.Each is organized independent log of microprotein size reduction of relatively showing of average and causes immunogenicity to reduce by 67 times, and the aminoacid of microprotein is formed the other 4.7 times reduction of generation.The last figure of Figure 167 shows that it is the strongest contact people 1999 such as () Sturniolo that aliphatic hydrophobic amino acid (I, V, M, L) sort, mainly contributes to the immunogenicity of prediction in the TEPITOPE algorithm.Figure below of Figure 167 shows to be compared with the human protein, and these aliphatic residues also are representational least in the microprotein, has illustrated that the immunogenic logarithm of the deutero-prediction of most compositions reduces.
By lacking the typical hydrophobic core of other protein, low-level aliphatic hydrophobic residue is possible in the microprotein.On the contrary, microprotein contains a spot of cysteine, and they are cross-linked to form interchain disulfide bond.A large amount of hydrophobic amino acids are replaced into the minimum dimension of several disulfide bond when having reduced protein stabilization, allow microprotein littler, and have reduced the frequency of aliphatic amino acid, and the immunogenicity that causes predicting has reduced by 3 logarithms.
The immunogenicity that reduces can be used multiple index determining, comprises for example 1) antigen presenting cell (APC) is as the ability (antigen processing) of dendritic cell (DC) release peptide from immune protein; 2) existence of T-cell epitope in these peptides, this has determined and the combining of HLA II molecule; 3) the inmature T cell number in the blood, the lip-deep peptide of they identification APC-HLAII complex; With 4) antibody horizontal in the serum.
Exist several different methods to reduce proteinic immunogenicity, all these all are applicable to HDD and non-HDD protein.A kind of method is to add disulfide bond by microcomputer modelling and appropriate design.Another method is to utilize orthogenesis or appropriate design to improve the disulfide bond of existence by meticulous adjusting protein.Disulfide bond can be placed proteinic inside or make the cysteine flank is the amino acid side chain with protective effect, and the protection disulfide bond exempts from chemical attack.Proteinic immunogenicity also can be used program such as TEPITOPE or Propred prediction, these programs are calculated and are derived from immunogenic all eclipsed nine amino acid peptides to the allelic binding affinities of all main people HMCII classes (other programs are used for MHC I class) based on the affine mensuration of a big group.Referring to, Sturniolo, T. waits the people. (1999) Generation of
Figure A20068003404901001
Ssue-specific and promiscuous HLA ligand databases using DNA microarrays andvirtual HLA class II matrices.Nature Biotechnol, 17:555.Also referring to www.algonomics.com, www.biovation.com, www.epivax.com and www.genencor.com.These programs are widely used in prediction and remove human t cell epitope, and FDA encourages their application.
Another method that produces the reduced immunogenicity microprotein is crosslinked by using in the chemical cross-linking agent protein.Multiple cross-linking agent can obtain from supplier such as Pierce.The cross-linking agent that is suitable for comprises the arginine reactant cross-linker, with bi-functional cross-linking agent such as the reactive same bi-functional cross-linking agent of amine, the reactive same bi-functional cross-linking agent of sulfydryl, isodigeranyl functional cross-link agent such as amine-carboxyl-reactive isodigeranyl functional cross-link agent and amino-reactive isodigeranyl functional cross-link agent.
Another method is that preparation has the small protein matter of a plurality of binding sites and each domain is divided into two or three binding sites.For example, a face of domain is in conjunction with a target, and second half is in conjunction with another target.Two faces can parallelly design (promptly simultaneously in the library that separates), merge into a domain then.Alternative method is continuously two faces of design, produce a library in the residue on face 1, and this library of elutriation is used in conjunction with target 1, select one or more best clones, and in remaining aminoacid, produce new library 2, be not used in those of library 1, carry out elutriation at target 2 then, and the coalition of screening and target 2 and at the bonded reservation of target 1.Because it is staggered that the aminoacid of face 1 tends to the aminoacid of face 2,, make these fixed alkali if conservative some aminoacid is fixed
Figure A20068003404901011
Required contact can be provided for the overlap extension by PCR, and then these library constructions have among the not homotactic clone and can easily carry out to one group.Because cysteine tends to fix, they are the rational selections that are used for the overlapping point of different oligonucleotide.Yet, if having 4 or more base, overlappingly work better, therefore can be used for fixing an extra aminoacid to the either side of cysteine.Therefore the support in library, two sides has three groups of aminoacid and base: one group is used for 1, one group in face 1/ library and is used for face 2/ library 2, and fixed one group is used for by two libraries of overlap extension combination.Can utilize restriction enzyme site in principle, but method of superposition is worked better usually.
Another method is to reduce proteinic size by minimizing the length of encircling between cysteine.General typical method is to use the ring length of certain limit in the library, and some of them exist naturally, and some are shorter than the nature discovery.
Another method is to improve hydrophilic.Most of HDD protein highly-hydrophilics, this for proteinic function (specificity, non-immunogenic) and folding may be important.Hydrophilic can be controlled by the ispol of selecting each position of protein library to use, selects required codon (mixture) and is used for the synthetic of oligonucleotide.A kind of commonsense method is forming naturally of each amino acid position of simulation, but can tend to the residue of some expectation.Can be size and hydrophilic screening and cloning by dna sequencing.Above-mentioned the whole bag of tricks may be used singly or in combin.
Any described microprotein can be used for further modification.Limiting examples has HDD protein, as A-structure, LNR/DSL/PD, TNFR, Anato, β integrin, Kunitz and the zootoxin family toxin of modifying 1,2,3,4,5,6,7,8,9,10,11,12, creatoxin, conotoxin, δ-and ω-Atracotoxins.Disimmunity method described herein can be used for the multiple mankind or primate protein, as cytokine, somatomedin, receptor extracellular domain, chemotactic factor etc.Also can be used for other non-HDD scaffold proteins,, comprise fibronectin III, and be used for ankyrin, protein A, ubiquitin, crystalline protein, lipocalin protein as immunoglobulin.If immunogenicity can minimize, the non-human support is a support of preferably crossing (closely) natural human proteinoid and human origin because with the cross reaction potentiality reduction of the immunne response of natural human proteinoid.
Big metering method can be used for measuring the immunogenicity of the proteinic reduction of HDD.For example, can measure the protein degradation of human or animal APC.This mensuration relates on human or animal's antigen-presenting cell adds target protein, the lysosome or the APC protease in APC source, and seek protein degradation, for example pass through SDS-PAGE.APC can be the dendritic cell that derive from blood lymphocytes or obtain by other standard methods.Can use animal rather than human APC, perhaps use cell pyrolysis liquid rather than full cell, the enzymesor cell part of perhaps using one or more purification is as lysosome.Proteinic degraded is by the easiest mensuration of degeneration SDS-PAGE gel analysis.The degraded protein on gel with the quick electrophoresis of lower apparent molecular weight.Need be in a large amount of cell proteins testing goal protein.A kind of method is fluorescence or each clone of radioactive label (radioactivity: 3H, 14C, 35S; Dyestuff and fluorescent labeling such as FITC, rhodamine, Cy5, Cy3 etc.) or other suitable chemical labelings arbitrarily, make have only target protein and its catabolite after uv light exposure or autoradiography on gel as seen.Also can use peptide-labeled protein, they can utilize antibody to detect in Western blots.
Another detects immunogenic method is to measure the tendency of protein aggregation.Protein aggregation is measured easily by light scattering, and can carry out with dynamic light scattering (DLS) or a a spectrophotometer (being that OD300-600 is to OD 280).
Also can measure the activatory level of T cytositimulation and cytokine.The cytokine of measuring on the human PBMC by FACS activates, and is used for the existence of the active antigen (CD83 etc.) of dendritic cell, T cell activation (CD69, IL-2r etc.) and multiple costimulating factor (CD28, CD80, existence CD86), all these indication immune system is stimulated.Further, the cytokine that can utilize standard ELISA to measure the detection cell produces as IL-2, and 4,5,6,8,10, TNF α, β, IFN γ, Il-1 β etc.The mitogen of rule and LPC etc. can be used as good contrast.
In addition, can measure and the combining of Toll-receptor.Treatment protein is the useful indication of innate immunity with combining of Toll-sample receptor 1-9 (TLR1-TLR9).A large amount of suppliers such as Invivogen provide all to be connected to transgenic Toll-receptor on the reporter gene in the cell construction body.
In addition, can carry out zooscopy,, come the immunogenicity of assess proteins by protein is injected directly in host animal such as rabbit and the mice.
The example of the microprotein through engineering approaches with low HLA II binding affinity is provided below.
Referring to Figure 161.Helper T cell activation is a committed step, for being important at the immunoreactive initial of exogenous proteins.The T cell activation relates to antigen-presenting cell (APC) antigenic picked-up, antigen is reduced to peptide, and the peptide that obtains is shown as the complex of protein and human leucocyte antigen (HLA) DR group (HLA-DR) on the APC surface.The HLA-DR molecule contains a plurality of and the peptide interaction presented
Figure A20068003404901031
Close bag.The specificity of these HLA-DR bags can be at external test, the specificity of acquisition spectrum can be used for predicted polypeptide to the binding affinity of various HLA-DR types (Hammer, J. (1995) Curr OpinImmunol, 7:263-9).Described the computer program that allows to identify the HLA-DR binding sequence (Sturniolo, T. wait the people. (1999) Nat Biotechnol, 17:555-61).The present invention utilizes these algorithms, and purpose is the proteinic sequence of a kind of mode modified microorganism, makes to reduce required pharmacology and other character that keep parent's microprotein with combining of HLA-DR simultaneously.As the first step, utilize the sequence of HLA-DR prediction algorithm analysis parent microprotein.The sudden change of all possible monamino acid of non-cysteine residues compares with parental array in the parental array, and the combining of prediction and HLA-DR type.Purpose is to identify one group of sudden change, predicts that they have reduced and the combining of HLA-DR type, and this being combined in will be taken place with the patient colony high frequency of parent's microprotein or derivatives thereof treatment.Then, make up combinatorial library, the variant in its Chinese library contains one or more predictions can reduce the bonded sudden change of HLA-DR.It may be favourable making up several Ya Wenku that contain planned sudden change subgroup.Can screen the library or the Ya Wenku that obtain then, identify and the suitable bonded variant of target.In addition, can at stability, solubility, expression and other be for the vital character screening of final character library member.Before screening, also can carry out phage elutriation or similar enrichment method to combinatorial library, keep required target binding affinity and specific combinatory variants to separate.This method will identify proteinic all desirable propertieses of reservation parent but prediction combines the therefore variant of parent's microprotein of immunogenicity reduction of reduction with HLA-DR.Randomly, can so that to the improvement variant carry out next round HLA-DR binding sequence and remove.It can be the repetition of above-mentioned steps that subsequently this taken turns.As an alternative, second combinatorial library can be restricted to and during the first round of this method, be accredited as and desired microorganisms protein function coupling and prediction further reduce the bonded sudden change of HLA-DR.Be restricted to the sudden change that these are selected in advance by second of this method is taken turns, can make up the frequency of less library and the isolating improvement variant of raising.
Table 4. has the microprotein family of low fat family aminoacid abundance
Figure A20068003404901041
Average protein contains 26.1% aliphatic amino acid.
Reduce the method for hydrophobic amino acid ratio in the therapeutic protein
As mentioned above, to have the method for the microprotein of low fat family aminoacid abundance be support and library from containing the minority aliphatic amino acid in a kind of generation.In addition, can utilize the multiple proteins engineering to reduce the abundance of aliphatic amino acid in the protein.For example, can make up protein library, make one or several aliphatic amino acid be allowed to take place the random cipher displacement of a plurality of hydrophilic amino acids.But interested especially is to allow hydrophilic amino acid at high proportion the ambiguous codon of low ratio aliphatic or hydrophobic amino acid.For example, codon VVK allows 12 seed amino acids (alanine, aspartic acid, glutamic acid, glycine, histidine, lysine, agedoite, proline, glutamic acid, arginine, serine, threonine) to occur, and it avoids all aliphatic and aromatic amino acid.People can separate the protein of the character with expectation from this library, therefore reduced the degree of enriching of fragrant hydrophobicity and aliphatic hydrophobic amino acid.Also can make up combined protein matter library in amino acids position at random, it contains aliphatic amino acid.Sequence by determining changeable body from this library and performance allow to replace with hydrophilic amino acid, can identify the position in the described protein.
Estimate the method for stent applications
Produce design based on specific native sequences family.At each amino acid sites, use this site of reflection
Figure A20068003404901051
The natural multifarious ispol of base acid.This is undertaken by selecting optimal single cipher.To HA labelling of the terminal interpolation of proteinic N-, to His6 label of the terminal interpolation of C-.
The oligonucleotide of these protein designs of composite coding.Simultaneously, make up the different sequence of 1-30 kind separately or as the mixture of different designs.
The expression of theme composition
Reach extracellular environment in the cell
In that disulfide bond mainly is found in is excretory (kytoplasm the is outer) protein.Their formation is by the plurality of enzymes catalysis that is present in the multicellular organism endoplasmic reticulum (ER).On the other hand, disulfide bond is not found in the cytoplasmic protein matter under non-stressed condition usually.This is that as glutathion reductase and thioredoxin reductase, their protection free cysteines are not oxidized owing to there is restoring system.For example, ribonucleotide reductase forms disulfide bond in its reaction cycle, and the reduction of this disulfide bond is important (Prinz, J Biol Chem.272 (25): 15661) for what react.
Natural microbial protein is by antibacterial, animal (sanemones, spiral shell, insecticide, Scorpio, Serpentis) and expression of plants.Yet, the proteinic heterogenous expression of recombinant microorganism carries out in escherichia coli usually, although bacillus subtilis, yeast (yeast, kluyveromyces, Pichia sp.) and filamentous fungi such as aspergillosis and Fusarium spp. and mammal cell line such as CHO, COS or PerC6 also can be used to express microprotein.In the document example, the microprotein of heterogenous expression generally produces in colibacillary Cytoplasm.
Recombinant expressed a kind of alternative method is chemosynthesis.Microprotein is little is enough to allow chemosynthesis, and can the synthetic economically preparation of cost.
The irrelevant product (great majority contain the product of Ig domain, comprise Ab fragment and complete Ab) that contains disulfide bond produces by being secreted in pericentral siphon or the culture medium in the mammalian tissues culture or in escherichia coli usually.Excretory product has signal peptide, and signal peptide can be removed by Proteolytic enzyme, stays not formylated N-terminal residue.On the contrary, the protein that produces in Bacillus coli cells matter keeps the terminal formylmethionine of N-usually, and this depends on the aminoacid of back fMet.Document description behind the fMet which aminoacid cause fMet to remove.
Although it is almost completely there is not microprotein (some exception) in antibacterial and the archeobacteria, complete
Figure A20068003404901061
The hydrophilic microprotein can easily prepare in escherichia coli.
Several bacterial micro-organism protein are arranged, Tathagata from escherichia coli (being called ST-Ia and ST-Ib) and relevant enterobacteria heat-staple enterotoxin.Heat-staple enterotoxin such as STa (PFAM 02048) and STb are irrelevant on the sequence level. The sequence alignment of a shows 72 amino acid whose precursors.This protein produces ripe toxin by two independently Proteolytic enzyme cutting incident processing, and it is 142536 disulfide bond that this toxin contains three topologys.The motif of ST-Ia is CxxxxxxxxxxxxxxxxxxxxCCxxCCxxxCxxC.
Express microprotein and in culture medium a kind of promising method of secretion microprotein can use ST-Ia promoter and leader peptide and precursor, but be connected on the different microproteins, replace existing 3SS 142536 modules with different microproteins.ST-Ia is secreted into (not being in the pericentral siphon) in the culture medium, and this is very rare for escherichia coli, has explained how disulfide bond forms.The leader peptide that may have specialization, make its pass through 3 or one of 4 kind of different specialization excretory system from escherichia coli, secrete.Be connected on the microprotein, this leader peptide also can allow the effective secretion of other microproteins and disulfide bond to form, and can be used for the rapid screening culture supernatant.
Microprotein can produce in multiple expression system, comprises protokaryon and eukaryotic system.Suitable expressive host for example is yeast, fungus, mammalian cell cultures, insect cell.The interested especially bacterial expression system that is to use escherichia coli, bacillus cereus and other host living beings.The heterogenous expression of microprotein generally carries out in colibacillary Cytoplasm.Disulfide bond does not form in Cytoplasm usually, because this is the reproducibility environment, they form after lysis.Can promote the sign and the purification of microprotein at protein expression post-heating cell.This process causes the precipitation of lysis and most of Escherichia coli proteins.(Silverman, J. wait the people. (2005) Nat Biotechnol).If microprotein and report albumen such as GFP or enzyme such as HRP, beta-lactamase or alkali phosphatase fusion can adopt the bacterium colony screening to compare different microorganisms protein expression level in the escherichia coli.Interested especially is heat and the stable enzyme of protease, because they allow to measure the stability of microprotein under heat or protease stressed condition.Example be calf intestinal alkaline phosphatase or beta-lactamase thermally-stabilised variant (Amin, N. wait the people. (2004) Protein Eng Des Sel, 17:787-93).Microprotein is with enzyme or report that proteic fusion also helps their analysis that combines character, because people can pass through the report enzyme
Figure A20068003404901071
And detect target-bonded microprotein.Microprotein can be expressed as the fusant with one or more epi-position labels.HA-label, His-label, myc-label, strep-label, E-label, T7-label.These labels help the purification of sample, and can utilize them to use sandwich ELISA or additive method to measure in conjunction with character.Described many other the mensuration in conjunction with character that detect protein or peptide part, these methods can be used for microprotein.Example has surperficial plasmon resonance, flicker approximate test, ELISA, AlphaScreen (Perkin Elmer), beta galactosidase fragment complementation to measure (CEDIA).
The heterogenous expression of microprotein generally carries out in Bacillus coli cells matter.Disulfide bond does not form in Cytoplasm usually, because it is a reducing environment, but they form in the cleaved back of cell.Different microorganisms protein expression level can utilize the bacterium colony screen to compare in the escherichia coli, if microprotein and reporter molecule such as GFP or enzyme such as HRP or alkali phosphatase (preferred heat-staple form, for example calf intestinal alkaline phosphatase) merge.
The present invention also comprises and comprises cysteine support and the segmental fusion rotein thereof of containing disclosed herein.This fusion can be between two or more supports of the present invention and relevant or irrelevant support.Useful fusion partner includes the link coupled sequence that is beneficial to the interior localized sequence of polypeptide cell or prolongs serum half-life reactivity or polypeptide and immunoassay support or vaccine carrier.
The disulfide bond change of stability
Usually, the stability of disulfide bond has certain variation in the protein.For example, the disulfide bond in the excretory protein tends to more stable than " undesirable " disulfide bond in the cytoplasmic protein matter.Usually, disulfide bond tolerance reduction, if they are buried, according to people such as Wedemeyer, disulfide bond is buried usually.Therefore, the disulfide bond in the excretory protein tolerates reduction more, if folding fully, and the denaturant that must add low concentration induces local solution folding, make disulfide bond can and.
When the protein with a plurality of disulfide bond is oriented to cytosol with folded state, and protein is when keeping folding in capture process, and its disulfide bond may tolerate reduction.Its precondition is that disulfide bond all is that Reducing agent is untouchable.In cytosol, thioredoxin and glutathion are as the direct oxidation agent of disulfide bond.Because their molecular weight bigger than DTT, the accessibility of the disulfide bond that buries in the folding protein should be limited.
The accessibility of disulfide bond can utilize crystal structure to determine by computer or pass through in the protein
Figure A20068003404901081
MR is definite through testing, and can compare (being that D50 is the reductant concentration when existing 50% wild type disulfide bond and 50% not exist) with the titration of degeneration sensitivity.
Covalent bond with target
Some protein by disulfide exchange can with other protein covalent bond, produce special binding affinity.A kind of useful example is little collagen, and wherein terminal tailer sequence of c-and N-end targeting sequencing covalent bond cause forming 6 disulfide bond between two protein.Referring to Figure 113.
Screening and characterization tool
Protein library and individual proteins clone that the early stage circulation of above-mentioned 234,3x0-8,4x0-8 and 4x6 method produces fold with tending to heterogeneity.
In a way, can ignore inhomogeneities, and by orthogenesis continue evolution protein up to obtain to have required character, the remarkable protein of high-affinity (being generally picomole) and high specific and evenly folding and high expression level, feasiblely can prepare protein.
The method of structure and elutriation phage library
Show type
Describe multiple permission and identified the method for the binding molecule in the big library of variant.A kind of method is chemosynthesis.The library member can synthesize on pearl, makes each pearl have different peptide sequences.Carrying the pearl with required specific part can identify with the binding partners of labelling.Another method is the Ya Wenku that produces peptide, allow with iterative method identify the specificity binding sequence (Pinilla, C. wait the people. (1992) BioTechniques, 13:901-905).Methods of exhibiting more usually, wherein the variant library is at phage, protein or cell surface expression.These methods have DNA or the RNA and the part physical connection of each variant in the encoded libraries jointly.This can detect people or the searched targets part, measures its peptide sequence by accompanying DNA or RNA are checked order then.Display packing allows those skilled in the art's enrichment from the big library of random variants to have required library member in conjunction with character.Usually, can from the library of enrichment, identify to have required variant by from the library of enrichment, screening individual separated strain at required character in conjunction with character.The example of methods of exhibiting and lac repressor merge (Cull, M. wait the people. (1992) Proc.Natl.Acad.Sci.USA, 89:1865-1869), cell surface display (Wittrup, K.D. (2001) CurrOpin Biotechnol, 12:395-9).Interested especially is the method that is connected with phage particle of peptide or protein at random.M13 phage that commonly used is (Smith, G.P. wait the people. (1997) Chem Rev, and 97:
Figure A20068003404901091
1-410) and the T7 phage (Danner, S. wait the people. (2001) Proc Natl Acad Sci U S A, 98:12954-9).There is several different methods to be used in displayed polypeptide or protein on the M13 phage.In many cases, the N-of the peptide pIII of library sequence and M13 phage is terminal merges.Phage generally has this protein of 3-5 copy, and therefore the phage in this library in most of the cases will be carried 3-5 the library member who copies.This method is called as multivalence and shows.A kind of alternative method is that phasmid shows that its Chinese library is encoded on phasmid.Phage particle can form by infect the cell carry phasmid with helper phage (Lowman, H.B. wait the people. (1991) Biochemistry, 30:10832-10838).This method generally causes unit price to be showed.In some cases, preferred unit price is showed and is obtained the high-affinity coalition.In the other situation, and preferred multivalence displaying (O ' Connell, D. wait the people. (2002) J Mol Biol, 321:49-56).
The multiple method that has the sequence of required feature by the phage display enrichment has been described.A kind of can be by fixing the target target with immunity pipe, microtitration plate, magnetic bead or other surface combination.Then, phage library contacts with fixed target, and flush away lacks the phage of binding partner, and the phage of carrying the target ligands specific can be by multiple condition eluting.Eluting can carry out with low pH, high pH, carbamide or other tend to the to rupture condition of protein-protein contact.Also can be by adding the escherichia coli host that Bacillus coli cells can direct infection adds the phage of eluting, the phage of elution of bound.A kind of interesting method is with can the phage binding partner or the protease eluting of the degraded of fixed target.Protease also can be as the instrument of rich protein enzyme resistance phage binding partner.For example, can before target is put on elutriation, the bonded part of phage library and one or more (people or mice) protease be hatched.This method from the library, degrade and remove the unsettled part of protease (Kristensen, P. wait the people. (1998) Fold Des, 3:321-8).At with the combining of complex biological sample, phage display library that also can the enrichment part.Example be fixed cell membrane part (Tur, M.K. wait the people. (2003) Int J Mol Med, 11:523-7) or intact cell (Rasmussen, U.B. wait the people. (2002) Cancer Gene Ther, 9:606-12; Kelly, K.A. waits the people. and (2003) Neoplasia, 5:437-44) go up elutriation.In some cases, can optimize the elutriation condition with improve enrichment of cell specific coalition from phage library (Watters, J.M. wait the people. (1997) Immunotechnology, 3:21-9).The phage elutriation also can be carried out in live patient or animal.This method is significant especially for the evaluation of the part of bonded blood vessel target.(Arap, W. wait the people. (2002) Nat Med, 8:121-7)
Make up the cloning process in library
Document description multiple permission those skilled in the art produce the method in the DNA sequence library of encoded peptide ligand library.Can utilize the synthetic oligonucleotide that contains one or more random sites of random mixture of nucleotide.This method allows control random site number and degree of randomization.In addition, also can be by obtaining at random or semirandom DNA sequence from biological sample part dna digestion.Can utilize random oligonucleotide to be structured in randomized phasmid in predetermined position or phage library.This can merge by PCR and carry out, as described below (de Kruif, J. waits the people. (1995) J Mol Biol, 248:97-105).Other schemes be based on DNA connect (Felici, F. wait the people. (1991) J Mol Biol, 222:301-10; Kay, B.K. waits the people. (1993) Gene, 128:59-65).Another method commonly used is Kunkel mutation, wherein uses the mutation chain of strand circular DNA as synthetic phage of template or phasmid.Referring to, Sidhu, S.S. waits the people. (2000) Methods Enzymol, 328:333-63; Kunkel, T.A. waits the people. (1987) MethodsEnzymol, 154:367-82.
The template that contains the urine pyrimidine alkali of introducing is at random used in Kunkel mutation, and it can obtain from e. coli strains such as CJ236.Contain the preferential degraded in template strand back in being transformed into escherichia coli of uracil, and the chain of external synthetic mutation keeps.As a result of, most of cell transformed have the phasmid or the phage of mutation form.Multifarious valuable method is that a plurality of inferior libraries are combined in a kind of raising library.These Ya Wenku can produce by above-mentioned any means, and they may be based on identical or different supports.
Described recently a kind of big phage library of useful generation small peptide method (Scholle, M.D. wait the people. (2005) Comb Chem High Throughput Screen, 8:545-51).This method relates to the Kunkel method, but do not need to produce contain uracil base at random single-stranded template DNA.This method starts from having one or more near the template phage for the treatment of the sudden change in mutation zone, and described sudden change makes phage not have infectivity.This method is used the mutagenic oligonucleotide that has randomization codon and correction phage deactivation sudden change in template in some position.As a result, the phage particle that has only mutation is infective after conversion, contains few parental generation phage in this library.This method can further change by several method.For example, can utilize multiple mutagenic oligonucleotide to come a plurality of locus of discontinuities of mutator phage simultaneously.We adopt this one step process, and further>25,30,35,40,45,50,55 and 60 whole by being applied to an amino acid whose microprotein replaces<10,15 or 20 amino acid whose weak points
Figure A20068003404901111
, this has other challenge.This method produces now and surpasses 10 10(can reach 10 11) individual library with single transformant that transforms, expect that therefore obtaining multiformity from 10 conversions is 10 12Single library.
The method of mutation again
A kind of new variation of Scholle method is a design mutagenicity oligonucleotide, makes the succinum termination codon in the template change Haematitum termination codon into, and Haematitum termination codon changes succinum termination codon in next mutation circulation.In this case, the library member of template phage and mutation must suppress to cultivate in the strain different escherichia coli, is used alternatingly ochre suppression strain and succinum and suppresses strain.Alternately this termination codon of two types and two kinds of inhibition strains allow to carry out continuously several phage mutation of taking turns.
Another new variation of Scholle method relates to uses the big primer with single stranded phage dna profiling.Big primer is that the long ssDNA that produces is inserted in the phage group library of selecting from the previous round elutriation.Purpose is to catch whole multiformity that insert in the library from last group, and its mutation in one or more zones, and it is transferred in the new library makes the other zone can be by mutation.Big primer method can be used and contain the identical template of termination codon repeat a plurality of circulations in target gene.Big primer is ssDNA (randomly producing by PCR), it contains 1) with 5 ' and 3 ' overlay region of complementary at least 15 bases of ssDNA template, with 2) one or more library districts of selecting previously (1,2,3,4 or more), they be from before (randomly the passing through PCR) and 3 of copy the clone group of selection) the library district of the new mutation that will in the next round elutriation, screen.Big primer is preparation by the following method randomly: the dna fragmentation in any other library district that optimizes before the 1) oligonucleotide and 2 in synthetic one or more new synthetic library districts of encoding) randomly utilizing overlapping PCR and contain merges (randomly obtaining by PCR).Utilize the out of control or strand PCR that makes up (overlapping) PCR product to produce the big primer of strand, the new library that it contains the zone of optimizing before all and is used for testing in next elutriation another zone of optimizing.Referring to Figure 28.This method expection allows to use the library generation of a plurality of Rapid Cycle to carry out proteinic affinity maturation, produces each circulation 10 11To 10 12Multiformity, elutriation afterwards.
Several different methods can be used in inside calling sequence multiformity (the former selection of microprotein or natural) library or the mutated individual microprotein clone, and purpose is for strengthening its combination or other character, for example production, stability or immunogenicity.In principle, all methods of being used to produce the library also can be used for introducing multiformity in (before select) microprotein library of enrichment.Particularly, can synthesize variant with required combination or other character, and randomized based on these sequence partial design Nucleotide.This method allows randomized position of control and degree.People can utilize multiple computerized algorithm infer application from idiovariation in the protein of sequence library (Jonsson, J. wait the people. (1993) Nucleic Acids Res, 21:733-9; Amin, N. waits the people. (2004) Protein Eng Des Sel, 17:787-93).Enriched library again mutation DNA reorganization especially meaningfully (370:389-391), this method produces the combination of individual sequence in enriched library for Stemmer, W.P.C. (1994) Nature.Reorganization can be carried out with the PCR condition of various changes, and template can partly be degraded to strengthen reorganization.A kind of alternative method is to use based on the predetermined position reorganization of being cloned in of restricted enzyme.Interested especially be to use IIS type restricted enzyme outside the sequence recognition site cutting DNA method (Collins, J. wait the people. (2001) J Biotechnol, 74:317-38).Can utilize the restricted enzyme that produces non-palindrome jag at a plurality of positions cutting plasmid or other dna encoding variant mixture, complete plasmid can by connect assembling again (Berger, S.L. wait the people. (1993) Anal Biochem, 214:571-9).Another introduces multifarious method is PCR-mutation, and wherein encoded libraries member's DNA sequence is carried out PCR under mutagenic condition.Described the sudden change that causes high relatively mutation frequency the PCR condition (Leung, D. wait the people. (1989) Technique, 1:11-15).In addition, can use the fidelity with reduction polymerase (Vanhercke, T. wait the people. (2005) Anal Biochem, 339:9-14).Interested especially a kind of method based on mutator (Irving, R.A. wait the people. (1996) Immunotechnology, 2:127-43; Coia, G. waits the people. (1997) Gene, 201:203-9).They are the bacterial strains that have defective in one or more dna modification genes.Plasmid in these bacterial strains or phage or other DNA accumulate sudden change in the normal replication process.Can in mutator, breed individual clone or enrichment colony, to introduce gene diversity.Above-mentioned several different methods can be used in iterative method.Can use the mutation of many wheels and screening or elutriations to complete genome or Gene Partial, perhaps can be in subsequently each be taken turns the proteinic different piece of mutation (Yang, W.P. wait the people. (1995) J Mol Biol, 254:392-403).
Handle in the library
The known artefact (artifacts) of phage elutriation comprises: 1) based on hydrophobic non-specific binding, with 2) combine with the multivalence of target, this is because a) pentavalent of pIII phage protein, or b) owing between different microorganisms protein, forms disulfide bond, cause polymer, perhaps c) because the high density target coating and 3 on the solid support) depend on the target combination of environment, wherein the environment of the environment of target or microprotein in conjunction with or to suppress activity be crucial.Can go on foot with different processing
Figure A20068003404901131
Minimize the size of these problems.It is desirable to these processing are applied to whole library (library processing), but some useful processing of removing the difference clone can only be applied to the soluble protein group or only be applied to individual soluble protein.
Free sulfhydryl groups may be contained in the library of microprotein, and this may be owing to making orthogenesis complicated with other protein cross.A kind of method is to make it remove the poorest clone by the free sulfhydryl groups post from the library, removes the clone that all have one or more free sulfhydryl groups like this.Clone with free SH base also can with biotin-SH reagent reacting, allow to use the streptavidin post effectively to remove clone with reactive SH group.Another method is not remove free sulfhydryl groups, but by adding medicated cap with its deactivation with reactive chemical reagent of sulfydryl such as iodoacetic acid to them.Interested especially be reduce non-specific target in conjunction with or the big or hydrophilic sulfhydryl reagent of modifying variant.
The example of environmental factor dependence is all fixed sequence programs, comprises causing interactional pIII albumen, joint, peptide tag, biotin-streptavidin, Fc and other fusion rotein.Avoid the typical method of environmental factor dependence to comprise for fear of accumulation handoff environment with resembling actual frequent.This can comprise that being used alternatingly different display systems (is M13 and T7, or M13 and yeast), be used alternatingly label and joint, be used alternatingly (solid) support that is used for fixing, and be used alternatingly target proteins matter itself (different suppliers, different fusion forms).
Also can utilize the library to handle the protein of selecting to have preferred mass.A kind of selection is to use the Protease Treatment library in order to remove unsettled variant from the library.Those that the protease that uses is used in using typically.For pulmonary administration, can use lung protease, for example obtain by lung lavage.Similarly, can from serum, saliva, stomach, intestinal, skin, nose etc., obtain proteinase mixture.Yet, also can use the mixture of the protease of single purification.One of protease strengthens tabulation and is presented among the appendix E.The special most of protease of tolerance of phage itself and other violent processing.
For example, can be for stable structure and screen the library, promptly have the structure of strong disulfide bond, by being exposed to the Reducing agent (being DTF or beta-mercaptoethanol) that concentration raises gradually, so at first removed least stable structure.Generally use from 2.5mM to 5mM, 10mM, 20mM, 30mM, 40mM, 50mM, 60mM, 70mM, 80mM, 90mM or even Reducing agent (being DTT, the BME etc.) concentration of 100mM, this depends on desirable stability.
As mentioned above, by reduce whole display libraries with high-caliber Reducing agent.Then reoxidize protein library gradually to form disulfide bond again, remove the clone with free SH group then, also can select can be external folding effectively again clone.This method can be used one or many, removes the clone with low external folding efficiency again.
A kind of method is as people such as A.C.Fisher. as described in (2006) Genetic selection for proteinsolubility enabled by the fold quality control feature of the twin-argininetranslocation pathway.Protein Science (online) protein expression level, folding and solubility applying gene selected.
Behind elutriation display libraries (optional), can avoid choosing thousand kinds of clones' target combination, expression and folding at the protein level top sieve.A kind of alternative method is that whole group of will select inserts fragment cloning in the beta-lactamase fusion vector, and this carrier is on being inoculated in beta-lactam the time, author's proof for good representation, the soluble protein of disulfide bonding is optionally fully.
In M13 phage exhibit protein library with after the one or more circulations of elutriation on the target, continue several different methods:
Screen individual phage clone by phage E LISA.This determines and the bonded phage particle number of fixed target (using anti--M13 antibody).
Shift to the T7 phage display library from M13.Any single library form tends to help and can form the clone that high affinity contacts with target.This is the very important reason of screening soluble protein, although this is a tediously long scheme.The polyvalency that obtains in the T7 phage display may be different from and obtains in M13 shows very much, and the circulation between T7 and the M13 may be that a kind of outstanding minimizing is based on coordinate false-positive method.
Filter membrane shifts (Filter lift). Filter membrane shiftsCan make by the bacterial clump that high density (10e2-10e5) on big agar plate generates.A spot of some protein secreting finally is attached on the filter membrane (celluloid or nylon) in culture medium.Hatch then with defatted milk, 1% casein hydrolysate or 1%BSA solution sealing filter membrane, and with the target protein of fluorescent dye and indicator enzyme (directly or indirectly by antibody or by biotin-streptavidin) labelling.The bacterium colony position is determined by the back side that filter membrane is overlapped onto plate, selects all positive bacterium colonies, is used for other sign.The advantage that filter membrane shifts is, by after the washing different time sections, reading signal, can be affinity optionally.High-affinity clone's information is " decay " slowly, and low-affinity clone's signal is decayed fast.This affinity characterize generally need to utilize based on
Figure A20068003404901151
Mensuration carry out 3 mensuration, and can provide than based on comparability between the better hole of the mensuration in hole.It is useful that the bacterium colony grid turns to array, because this difference that bacterium colony size or position are caused minimizes.
Pharmaceutical composition
The present invention also provides and contains the described pharmaceutical composition that contains cysteine protein matter.They can per os, intranasal, parenteral or by sucking the treatment administration, and can adopt tablet, lozenge, granule, capsule, pill, ampoule, suppository or aerosol form.They also can adopt the form of suspension, solution and the emulsion of effective ingredient in aqueous or non-aqueous diluent, syrup, granule or powder.In addition, pharmaceutical composition also can contain the other drug reactive compound, or multiple chemical compound of the present invention.
The protein that contains cysteine of the present invention also can with various liquid phase carriers such as aseptic or aqueous solution, pharmaceutically acceptable carrier, suspension and Emulsion combination.Examples of non-aqueous comprises propyl glycol, Polyethylene Glycol and vegetable oil.
More particularly, pharmaceutical composition of the present invention can be administered for treatment by any suitable way, and these approach comprise per os, rectum, per nasal, part (comprising percutaneous, aerosol, oral cavity and Sublingual), vagina, parenteral (comprising subcutaneous, intramuscular, intravenous and intradermal) and through the lung approach.Should be appreciated that also preferred approach will be along with experimenter's situation and age and the disease of being treated and change.
Product form
The multiple product form (for example, referring to Figure 159) expection is used for multiple use, comprise reagent, diagnosis, prevention, the treatment and be used for the specialized form of the different dosing method of interior therapeutic of exsomatizing, for example in intravenous, subcutaneous, the sheath, ophthalmic, transcleral, intraperitoneal, percutaneous, per os, oral cavity, through intestinal, transvaginal, per nasal, through lung and other form of medication.
These product forms comprise domain monomer and domain polymer (having 2,3,4,5,6,7,8,9,10,15,20,30,40,50 or even the product of 100 domains in list or polyprotein matter chain).Domain may not only contain unique sequences or structural motif, and perhaps it may contain repetitive sequence or structural motif, or height repeating sequences or structural motif (repetitive proteins matter) more.For 1,2,3,4,5,6,7,8,9 or 10 kind of different target, each domain can have one successive
Figure A20068003404901161
Discontinuous (on the space or order determine) binding site.This target can be therapeutic agent, diagnostic agent (body is interior, external), reagent or material target, and can be (combination) protein, carbohydrate, lipid, metal or any other biology or non-biological material.The domain monomer may have a plurality of binding sites for identical target with polymer, randomly produces active.The domain polymer also can have 1,2,3,4,5,6,7,8 or more binding site for different targets, produces polyspecific.It is 1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,25,30 amino acid whose peptide linker that the domain polymer randomly contains length range.Multiple element can merge with these domains, as contains the label linearity or the cyclic peptide of (for example, being used for detecting or purification with antibody or Ni-NTA).
Half-life prolongs form: a kind of preferable methods is to use fusogenic peptide (linear, monocycle or bicyclic, the meaning is that it contains 0,1 or 2 disulfide bond) or provide and serum albumin, immunoglobulin (being IgG), erythrocyte or other blood molecules or serum can and the bonded protein domain of molecule, extend to the arrangement half-life length of expectation the serum of product is drained the half-life, its scope can be from 1,2,4,8 or 16 hours to 1,2,3,4,5 or 6 day to 1 week, 2 weeks, 3 weeks or 1,2,3 month.A kind of optional method is the project organization territory, and application may be overlapped or may not partly overlapping different binding site, makes itself and drug targets and half-life prolong target such as serum albumin combines.A kind of method of expectation be created in the zone randomized and be chosen as and the bonded support of half-life target (as HSA), utilize other being designed to and the bonded zone of one or more drug targets of these construct randomizations then, produce and half-life target and all bonded domain of drug targets.Provide the domain that the half-life prolongs also can for example human cell factor, somatomedin and chemotactic factor merge with non-microorganism protein by the protein bound that exposes with serum proteins or serum.A kind of optional application is to prolong these human proteins' half-life or make human protein's targeting particular organization.The preferred affinity of this interaction can be less than (perhaps greater than) 10uM, 1uM, 100nM, 10nM, 1nM, 0.1nM.Another selection is merged long, that do not make up, the flexible glycine sequence that is rich in domain, to prolong their Stokes ' hydrodynamic radius, prolong their serum drainage half-life thus.It is not by peptide bond that another one is selected, but by disulfide bond or other chemical bonds, and domain and other domain is covalently bound.It is with micromolecule (comprising the pharmaceutical active pharmacophore), radioactive marker (being chelate) and PEG or PEG sample molecule or carbohydrate and protein chemistry coupling that another one is selected.
Substitute delivery form: the character of average microprotein is particularly suitable for great majority and substitutes (non-injection) delivery form (size, protease are stable, solubility, hydrophilic), utilizes through engineering approaches further to improve their abilities for particularly preferred delivery form.Werle, people such as M.. (2006) J.DrugTargeting14:137-146 shows three kinds of different microprotein confrontation protease such as elastoser, pepsin, chymase and plasma proteinase (serum) and goldbeater's skin protease (2/3) is had high resistance.They show that also the apparent transport coefficient (Papp) of two kinds of microproteins is higher 3 times than what infer from the standard curve of multiple peptide and the generation of small protein matter.For passing the transhipment of organizing barrier, as per nasal, percutaneous, per os, oral cavity, intestinal or through the sclera transhipment, effectiveness and bioavailability depend primarily on proteinic size.Reported multiple excipient,, can make the transhipment of pharmaceutical grade protein improve up to about 10 times (Maggio, E. (2006) Drug Delivery Reports as the alkyl saccharide; Maggio, E. (2006) Expert Opinion in Drug Delivery 3:1-11).These transport enhancers or GRAS or as food additive, so their application in medicine may not need very long FDA to examine process.Some reinforcing agent be amphipathic/facultative, and can form micelle, because their possess hydrophilic properties part (being carbohydrate) and hydrophobic parts (alkyl chain).It is feasible that hydrophilic that use and microprotein and non-microorganism protein peptide or protein gene merge and hydrophobic protein sequence are simulated.For example, the hydrophilic sequence can be rich in glycine (nonionic), glutamic acid and aspartic acid (electronegative) or lysine and arginine (positively charged), and the hydrophobicity sequence can be rich in tryptophan.Protein with the hydrophobicity tail (as 5-20 trp residue) that stretches out can be used for half-life of obtaining to prolong because insert poly-tryptophan in cell membrane, being similar to hydrophobic drug, inserts can obtain the long half-lift by film.Protein itself is still constant, because its binding specificity expection does not reduce, has only (little) bio distribution to change.A kind of alternative method is to be combined by drug transporter such as PepT1, PepT2, HPT1, abc transport and the microprotein peptide of internalization or small molecule coupled with known.List of references is Lee, VHL (2001) Mucosal drug delivery.J Natl CancerInst Monogr 29:41-44; With Kunta JR and Sinko, PJ (2004) Intestinal drug transporters:in vivo function and clinical importance.Current Drug Metabolism 5:109-124; Nielsen, CU and Brodin, B (2003) Di-/Tri-peptide transporters as drug deliverytargets:Regulauon of transport under physiological and patho-physiological
Figure A20068003404901181
Nditions.Current Drug Targets 4:373-388; Blanchette, people such as J.. (2004) Principlesof transmucosal delivery of therapeutic agents, Biomedicine﹠amp; Pharmacotherapy58:142-152.Dietrich, people such as CG. (2005); ABC of oral bioavailability:transporters asgatekeepers in the gut.Gut 52:1788-1795; People such as Yang CY. (1999) Intestinal Peptidetransport systems and oral drug availability.Pharmaceutical Research 16:1331-1343.
Microprotein is ideally suited topical, because do not need prolong half-life.In order to obtain to discharge continuously with single administration, microprotein can pass through the depot formulations administration.
Depot formulations (as implant, nanosphere, microsphere and Injectable solution such as gel) may not need medicine (soluble form) to have the half-life of prolongation, still may be useful although certain half-life prolongs.
The microprotein domain of each amino acid composition and polypeptide spacer are polymerized to heavy-gravity long polymer expection and produce the bank that slowly discharges soluble agents.These polymer can merge with microprotein, and perhaps they can be independent protein.Heavy-gravity liquid is with subcutaneous or muscle injection down.People also can not use protein polymer, but mixed protein and multiple other biodegradable substrate, as polyanhydride or polyester or PLG (poly-(D, L-lactide-Acetic acid, hydroxy-, bimol. cyclic ester) copolymer) or SAIB (sucrose acetate isobutyrate) or Polyethylene Glycol (PEG) and other hydrogels, lipid foam, collagen and hyaluronic acid.Small size, high protein enzyme, machinery and heat tolerance and high-hydrophilic make microprotein be fit to the irrealizable aggressivity preparation of other protein of great majority.Because their size is little, microprotein is fit to very that ionotherapy, powered guns are sent, sound is sent and electroporation send (Cleland, people such as JL. (2001) Emerging protein delivery methods.Current Opinion in Biotechnology12:212-219).
The oral administration of fusion rotein: a kind of different oral methods comprises that (PE38 PE40) merges, and this toxin can pass through cell membrane and deliver drugs in the Cytoplasm with microprotein medicine and existing bacteriotoxin such as Pseudomonas exotoxin.Verified this method can be used for sending pharmaceutical grade protein and effectively oral in cell (being tumor cell), mean from the transfer (Mrsny of enteric cavity to blood flow, people such as RJ., (2002) Bacterial toxins as tools for mucosal vaccination.Drug Discovery Today4:247-258).
The method of another per os (and lung) administration is that microprotein and Fc receptor are merged, and use picked-up of the receptor-mediated intestinal of newborn Fc-and the blood transmission by transcytosis (Low, people such as SC. (2005) Oral and pulmonary dclivry of FSH-Fc fusion proteins via neonatal Fcreceptor-mediated transcytosis.Human Reproduction (waiting to publish).
Carry in the cell of microprotein: people such as Rothbard are verified natural is rich in arginic peptide such as HIV-tat can pass cell membrane transporter, and the peptide of the synthetic arg of being rich in also is like this.A kind of analogy method is that second method is the arginine content that increases microprotein in the design of library, and the clone who supports high arg content in screening process to the N-of microprotein or the terminal peptide that is rich in arg that adds of C-.Arginine content can be increased to up to about 3%, preferably even 5%, often even 7.5%, sometimes 10%, but ideally even be 15,20,25,30 or 35%.
Polymer form:, comprise and improving affinity and raising half-life that microprotein can multimerization owing to multiple reason.We concentrate on the form that senior relatives' aqueous spacer that domain is rich in glycine separates, but also can polymerization do not contain spacer or have the domain of naturally occurring spacer.
Therefore the long sequence that is rich in glycine has big hydrodynamic radius, the prolongation by the PEGization simulation half-life.Each length that is rich in the sequence spacer of glycine can be 20,25,30,35,40,50,60,70,80,100,120,140,160,180,200,240,280,320 aminoacid or longer.For homotype poly target and cell surface target, even for the monomer target, multimerization microprotein binding site is useful, has the spacer that is rich in glycine with (randomly) at N-and C-end between binding site.In these protein, the total length of glycine polymer can reach 100,150,200,250,300,350 or even 400 aminoacid.These protein can contain a plurality of different binding sites, and each binding site is gone up different site combinations with same target (identical copies or different copy).Like this, for example, may produce the protein the long half-lift of having the utmost point, part is because its length and radius partly are owing to exist (microprotein) to serum albumin or immunoglobulin or other serum exposure combination of proteins site.
Antibody also utilize size and receptors bind obtain they the long half-lift, maximum may need this two mechanism at the half-life.There are several different methods and compositions to realize the polymer of this combination and non-binding element: the 1) binding motif (gene fusion) of combination multicopy in the simple protein chain; Copy can be the same or different; 2) single (or many) binding sites of copying be expressed as independent protein and by chemical coupling from N-
Figure A20068003404901201
The terminal multimerization of-C-.Can use various chemical coupling methods (referring to the coupling agent tabulation of www.pierce.com); Copy can be the same or different; 3) binding site of multicopy in the simple protein chain, but separated by non-binding connector; 4) binding site and non-binding connector are expressed as independent protein separately, and by the chemical coupling multimerization.Can use various chemical coupling methods (increasing the coupling agent tabulation of Pierce); Copy can be the same or different; 5) every kind of protein contains a binding site and a non-binding connector, and these protein are by the chemical coupling multimerization.Can use various chemical coupling methods (referring to www.pierce.com); Copy can be the same or different; 6) every kind of protein contains a binding site and optional non-binding connector, and every kind of protein all has " binding peptide " at N-and C-end, and they are bonded to each other, and producing direction is linear protein polymer.Can use various peptide sequences, as SKVILF (E) or RARADADARARADADA and derivant; Copy can be the same or different.SKVILF (E) is with antiparallel manner homotype dimerization (people (1986) EMBO J. such as Bodenmuller), RARARA (or [RA] n) combines with DADADA (or [DA] n), it derives from Narmoneve, people such as DA., the RARADADARARADADA peptide of (2005) Self-assembling short oligopeptides and thepromotion of angiogenesis.Biomaterials 26:4837-4846 report.With [RA] nPolymer places domain or the polymeric end of domain, with [DA] nPolymer places the other end (C-or N-end), combines with the C-of another copy of same protein is terminal by a kind of proteinic N-end, will produce linear, directed polymer.If polymer can be prepared as such length or crosslinked, make them can not leave the subcutaneous injection position effectively, then can obtain depot or slow releasing preparation.A kind of method is that the protease cutting site with the serum albumin enzyme is designed in the polymer that will slowly decompose.
Drug targets: microprotein of the present invention shows the particular combination specificity at particular target usually.In some embodiments, microprotein of the present invention can combine with a kind of target in being selected from following unrestricted tabulation: VEGF, VEGF-R1, VEGF-R2, VEGF-R3, Her-1, Her-2, Her-3, EGF-1, EGF-2, EGF-3, Alpha3, cMet, ICOS, CD40L, LFA-1, c-Met, ICOS, LFA-1, IL-6, B7.1, B7.2, OX40, IL-1b, TACI, IgE, BAFF or BLys, TPO-R, CD19, CD20, CD22, CD33, CD28, IL-1-R1, TNF, TRAIL-R1, complement receptor 1, FGFa, osteopontin, vitronectin, liver is joined protein A 1-A5, liver is joined protein B 1-B3, α-2-macroglobulin, CCL1, CCL2, CCL3, CCL4, CCL5, CCL6, CCL7, CXCL8;
Figure A20068003404901211
XCL9, CXCL10, CXCL11, CXCL12, CCL13, CCL14, CCL15, CXCL16, CCL16, CCL17, CCL18, CCL19, CCL20, CCL21, CCL22, PDGF, TGFb, GMCSF, SCF, p40 (IL12/IL23), IL1b, IL1a, IL1ra, IL2, IL3, IL4, IL5, IL6, IL8, IL10, IL12, IL15, Fas, FasL, the Flt3 part, 41BB, ACE, ACE-2, KGF, FGF-7, SCF, lead albumen 1,2, IFNa, b, g, Caspase 2,3,7,8,10, ADAMS1, S5,8,9,15, TS1, TS5; Adiponectin, ALCAM, ALK-1, APRIL, annexin V, angiogenin, amphiregulin, angiogenin 1,2,4, Bcl-2, BAK, BCAM, BDNF, bNGF, bECGF, BMP2,3,4,5,6,7,8; CRP, cadherin 6,8,11; Cathepsin A, B, C, D, E, L, S, V, X; CD11a/LFA-1, LFA-3, GP2b3a, the GH receptor, RSV F albumen, IL-23 (p40, p19), IL-12, CD80, CD86, CD28, CTLA-4, α 4 β 1, α 4 β 7, the TNF/ lymphotoxin, VEGF, IgE, CD3, CD20, IL-6, IL-6R, BLYS/BAFF, IL-2R, HER2, EGFR, CD33, CD52, digoxin, Rho (D), chickenpox, hepatitis, CMV, tetanus, vaccine, venom, Botulinum toxin, Trail-R1, Trail-R2, cMet, TNF-R family such as LA NGF-R, CD27, CD30, CD40, CD95, lymphotoxin a/b receptor, Wsl-1, TL1A/TNFSF15, BAFF-R/TNFRSF13C, TRAIL R2/TNFRSF10B, TRAILR2/TNFRSF10B, Fas/TNFRSF6 CD27/TNFRSF7, DR3/TNFRSF25, HVEM/TNFRSF14, TROY/TNFRSF19, CD40 part/TNFSF5, BCMA/TNFRSF17, CD30/TNFRSF8, LIGHT/TNFSF14,4-1BB/TNFRSF9, CD40/TNFRSF5, GITR/TNFRSF18, protect bone protein/TNFRSF11B, RANK/TNFRSF11A, TRAIL R3/TNFRSF10C, TRAIL/TNFSF10, TRANCE/RANK L/TNFSF11,4-1BB part/TNFSF9, TWEAK/TNFSF12, CD40 part/TNFSF5, Fas part/TNFSF6, RELT/TNFRSF19L, APRIL/TNFSF13, DcR3/TNFRSF6B, TNF RI/TNFRSF1A, TRAILR1/TNFRSF10A, TRAIL R4/TNFRSF10D, CD30 part/TNFSF8, GITR part/TNFSF18, GITR part/TNFSF18, TACI/TNFRSF13B, NGF R/TNFRSF16, OX40 part/TNFSF4, TRAIL R2/TNFRSF10B, TRAIL R3/TNFRSF10C, TWEAK R/TNFRSF12, BAFF/BLyS/TNFSF13, DR6/TNFRSF21, TNF-α/TNFSF1A, Pro-TNF-α/TNFSF1A, lymphotoxin-beta R/TNFRSF3, lymphotoxin-beta R (LTbR)/Fc chimera, TNF RI/TNFRSF1A, TNF-β/TNFSF1B, PGRP-S, TNF RI/TNFRSF1A, TNF RII/TNFRSF1B, EDA-A2, TNF-α/TNFSF1A, EDAR, XEDAR, TNF RI/TNFRSF1A.
The following example is non-limiting the present invention for explanation, and the method for preparing available material in the inventive method and the operation embodiment of the inventive method are provided.
Embodiment
The randomization of embodiment 1:CDP 6_6_12_3_2
The following example has been described the design based on the library of CDP 6_6_12_3_2.The partial sequence of mating with CDP 6_6_12_3_2 in the search TrEMBL protein sequence database.71 sequences and this CDP mate altogether.Calculate the aminoacid abundance in each site as shown in table 5.For each non-cysteine site, we are based on following Standard Selection randoming scheme: a) avoid introducing termination codon, b) avoid introducing extra cysteine residues, c) allow to observe a large amount of aminoacid at>3% specific site, d) make with 71 kinds of native sequences of this CDP coupling in any unobservable amino acid whose introducing reduce to minimum.
Figure A20068003404901231
Figure A20068003404901241
Embodiment 2: the protein expression in escherichia coli and folding
Oligonucleotide is cloned in the expression plasmid carrier that kinesin matter expresses in Bacillus coli cells matter.Preferred promoter is T7 (the Novagen pET carrier series in e. coli strains BL21 DE3; The Kan labelling).The method of a kind of preferred these oligos of insertion is the Kunkel method (Scholle of improvement, D., Kehoe, JW and Kay, B.K. (2005) Efficient construction of a large collection ofphage-displayed combinatorial peptide libary.Comb.Chem.﹠amp; HTP Screening8:545-551).A kind of diverse ways is that (all or part of) carrier is carried out 2-oligo PCR, digests restriction enzyme site unique in the deutero-fragment end of oligo-subsequently, connects the non-palindrome jag (effectively connecting in the fragment) of coupling subsequently.The third method is to insert fragment by overlapping PCR by 2 or 4 oligos assemblings, and the terminal digestion of the insertion fragment restriction enzyme site in assembling is connected in the postdigestive carrier then.The DNA that connects is transformed in the competence Bacillus coli cells, and is coating on the LB-Kan plate after the grow overnight, each bacterium colony of picking is inoculated in 96 orifice plates that contain the 2xYT culture medium, and culture is grow overnight on 37 ℃ of shaking tables.
Plate is heated to 80 ℃ continues 20 minutes, centrifugal with 6000g, precipitate accumulative Escherichia coli protein.
Embodiment 3: the design procedure of antifreeze protein
Purpose: be freeze proof repetitive proteins design library
Strategy: the homing sequence of library design derives from the antifreeze protein (Genbank accession number AF160494) from tenebrio molitor (Tenebrio molitor).Known this protein is good at expression in escherichia coli.Can obtain two kinds of structures of crystal and NMR.This protein makes up by forming columniform repetitive.The core of this structure lacks hydrophobic amino acid, but each repetitive contains a disulfide bond and constant serine and alanine residue.Preceding two figure form have three disulfide bond add the medicated cap motif.Infer that this adds the medicated cap motif and forms folding nuclear.Therefore, preceding two repetitives generally remain unchanged in external evolution.Referring to Figure 127.
In order to select exchange point and discovery to be used for the glutamine residue site of Scholle mutation, analyzed the architectural feature of antifreeze protein.
The exchange point red display is selected it to be used for remaining on the beta sheet of finding in the structure and is piled up.Therefore, two rings of each library on can mutation β-accumulative facies offside.Ring in the end cap can come mutation in the general forward primer site beyond the late phase use is positioned at the antifreeze protein open reading frame.In order to select to be used for the codon of mutation, the comparison of 215 repetitives is downloaded from the Pfam webpage of describing antifreeze protein family (PF02420 the Pfam data base).Service routine Profile analysis software v1.0 analyzes text file, is set to: the cysteine site is " 2,8 ", and repeating length overall is " 12 ".This set has been got rid of the terminal repetition unit of N-, and per 12 aminoacid of the latter repeat to contain three cysteine.As a result, this program has been got rid of 89 sequences, and has analyzed remaining 126 sequence, has shown each amino acid whose conservative and appearance in the antifreeze protein repeating part.Output-stuck in the excel spreadsheet lattice, is used as the starting point of library design.
Embodiment refers to the design procedure of toxin (erabutoxin) at 4: three
Purpose: utilize three to refer to toxin support Design library
Background: three refer to that toxin shows particular structure, have core and three long rings that stretch out from this core of a 4-disulfide bond.Known these rings participate in range protein-protein interactions, and can be directed evolve institute at.
Method: modal cysteine interval mode is 10-6-16-3-10-0-4,13-6-16-1-10-0-4 and 13-5-16-1-10-0-4.Select erabutoxin sequence TRICFNHQSSQPQTTKTCSPGESSCYNKQWSDFRGTIIERGCGCPTVKPGIKLSCC ESEVCNNA as homing sequence, and fall into the 13-6-16-1-10-0-4 pattern.Select this sequence to be because it can be at expression in escherichia coli.
Select two exchange points in the ring district, to allow to have the sudden change of maximum number.
Embodiment 5: the proteic design procedure of clump
Purpose: utilize clump albumen or PSI support Design library.
The advantage of this support: this support have between each cysteine residues the distinct advantages of introducing length variation with.The significant change of length is found in nature between the cysteine that PSI folds, and therefore supports this design principle.The multiformity of ring length sorts the highest in microprotein family.Figure 135 has shown can increase " many-clump albumen " that length produces gradually by adding amino acid residue.
Strategy: the Pfam data base has listed 468 family members.Cysteine spacing between Cys5/Cys6, Cys6/Cys7 and the Cys7/8 has highly variable.Therefore be difficult to select initial consensus sequence.The NMR structure of the PSI domain of Met receptor is resolved, and shows 5,2,8,2,3,5,9 pattern.This protein is at expression in escherichia coli, although expression quite low (1mg/9 rises cell).The member who shows 5,2,8,2 spacings in the searching database finds 99 sequences.Yet, have only 11% to have motif 5,2,8,2,3, have only three members to have 5,2,8,2,3,5,9.Therefore, this interval mode is left in the basket, and determines the most general interval mode of this family.With 5,2,7,2,5 retrievals produce 54 sequences.These patterns are compared in the excel spreadsheet lattice, obtain the modal codon in each position.Last spacing
Figure A20068003404901271
The most variable, even find the insertion of whole protein domain.Have 5,2,7,2 at 54,5 member's the modal spacing in last position is " 15 ".In a word, the folding consensus sequence of PSI derives from and has pattern 5,2,7,2,5,15 family member.
Structure " 1ssl " shows the PSI domain from the Met receptor.Exchange point is designed to make the most conservative motif CGWC of family to be kept perfectly.It allows the first half randomization with support.Second exchange point inserts at Cys 7 places.This makes and can will show the randomization maximization of the cysteine spacing 5,6,7 of very big length variations in the nature.Referring to Figure 119.
Figure 120: the comparison of library consensus sequence and consensus sequence 5,2,8,2,3,5 (having only 11 members) shows that 25% is identical.Maximum multiformity is arranged in last cys spacing, this and marked graph and more consistent with other members.
Embodiment 6: the design procedure of somatomedin
Purpose: utilize somatomedin support Design library
Strategy: total
EESCKGRCGEGFNRGKECQCDELCKYYQSCCPDYESVCKPK derives from 44 sequences with identical cysteine interval mode.
Select exchange point proteinic near the middle part, to allow mutation in two half parts of this sequence.Referring to Figure 121.
Embodiment 7: the evaluation that the microprotein support is expressed
The microprotein open reading frame that antifreeze protein (AF), three is referred to toxin (TF), somatomedin (SM) and clump albumen (PL) is cloned in the deutero-carrier of pET30-, and expresses in e. coli strains BL21 (DE3).Overnight culture is diluted among the 20ml LB at 1: 200, grew 3 hours, induce with 2mM IPTG then, and regrowth 4 hours.Culture centrifugal 10 minutes with 5000 * g, and be resuspended among the PBS.250 μ l samples are heated to 80 ℃ kept at room temperature centrifugal 10 minutes 30 minutes.Supernatant (50 μ l sample) from heating steps mixes with the sample buffer that 25 μ l contain 5%BME; The cell of resuspension (50 μ l) directly mixes with the sample buffer that 25 μ l contain 5%BME.Sample boiled 10 minutes, and application of sample is to 16%SDS-PAGE then.
Result: referring to Figure 122.From left to right (16%SDS-PAGE): partially purified protein: positive control, new AF support, new TF support, new SM support, PL (short-form), contrast, the NEB wide region is the full cell product of the same protein of same order then.
Conclusion: protein TF, SM, PL are present in the supernatant with high concentration, and are highly heat-resistants.
Embodiment 8: the structure of phasmid carrier pMP0003
We have made up the carrier that is used for effectively making up the microprotein library.The background of this carrier is based on pBluescript phasmid carrier.We have inserted the expression cassette by the lacZ promoters driven.Coded sequence comprises following element: ompA signal peptide, the short padding sequence that is positioned at SfiI and BstXI site flank, joint component, six histidine marks, hemagglutinin (HA) labelling, succinum termination codon, the proteic C-terminal fragment of pIII of M13 phage, termination codon.The only long 40bp of padding sequence.It contains double T AA and TGA termination codon and unique BssHII site.Can the structure in big phasmid library be subject to the digestion and the cmy vector fragment that obtain capacity usually.Preparation process has been simplified in the design of pMP0003 greatly, because it does not need by preparation type agarose gel electrophoresis cmy vector fragment.Discharge two extremely short stuffers with SfiI, BstXI and the triple digested plasmid pMP0003 of BssHII, this fragment respectively long 19 and 21bp can use YM-100 post (Microcon) to remove by ultrafiltration.The existence in BssHII site also causes significantly reducing based on the frequency of non-recombinant clone in the library of pMP0003 in the stuffer.
Embodiment 9: the design of library LMB0020 and structure
Ke Long library can be based on multiple microprotein sequence construct at random.This method comprises several steps: 1) identify suitable microprotein support, 2) evaluation is used for randomized residue, 3) be each randomization choice of location randoming scheme, 4) according to randoming scheme design part oligonucleotide at random, this oligonucleotide coding microprotein support and introduce mixture of ribonucleotides at ad-hoc location, 5) assembling microprotein fragment, 6) restriction digest and purification, 7) this fragment is connected in the postdigestive carrier segments 7) the transformed competence colibacillus cell.
Library LMB0020 is based on the sequence of trypsin inhibitor EETI-II, the latter be Fructus Cucurbitae moschatae family protein enzyme inhibitor the member (Christmann, A. wait the people. (1999) Protein Eng, 12:797-806).Check
Figure A20068003404901291
The crystal structure of ETI-II selects 10 sites to carry out randomization.9 site sub-NHK randomizations of random cipher, and these codon permission introducing 16 seed amino acids (A, D, E, F, H, I, K, L, M, N, P, Q, S, T, V, Y).In a site, use to allow 16 seed amino acids (A, D, E, F, H, I, K, L, M, N, P, Q, S, T, V, the sub-VNK of random cipher Y).The random sequence that produces is: GCPXXXXXCKQDSDCXXGCVCZPXGXCGSP, and wherein X represents codon NHK, and Z represents codon VNK.This randoming scheme allows to surpass 10 in theory 12Plant the multiformity of different aminoacids sequence.The overlap extension of two oligonucleotide of the genetic fragment of coding randomization trypsin inhibitor by having following sequence assembles:
LMB0020F=CAGGCAGCGGGCCCGTCTGGCCCGGGTTGTCCTNHKNHKNHKNHKNHKTGTAAACAAGACTCTGACTG,
LMB0020R=TGTAAACAAGACTCTGACTGTNHKNHKGGTTGCGTTTGCVNKCCGNHKGGTNHKTGTGGCTCTCCGGGCCAGTCTGGTGGTTCCGGTCACGTGACCGGAACCACCAGACTGGCCCGGAGAGCCACAMDNACCMDNCGGMNBGCAAACGCAACCMDNMDNACAGTCAGAGTCTTGTTTACA。
Oligonucleotide LMB0020F and LMB0020R have the complementation district of common 20 nucleotide.The two-step pcr amplification is undertaken by two complementary primer annealing are mended to put down then in reaction.Use the support primer LIBPTF contain restriction site and LIBPTR this product that increases then.
The product that obtains YM-30 filter Microcon) concentrate, and by using 1.2% Preparation of Agarose type agarose gel electrophoresis purification.
10 μ g products with SfiI/BstXI digestion 5 hours, are gone up fast purifying at PCR post (Qiagen) under 50 ℃, produce the purification fragment of about 4 μ g.Carrier pMP0003 prepares with QIAGEN HiSpeed Maxi test kit.150 μ g carrier DNAs digested 4 hours down at 50 ℃ with SfiI/BstXI/BssHII in 3 independent Eppendorf pipes, and went up purification at YM-100 post (Microcon).Total output is the postdigestive carriers of 112.5 μ g (75%).The different proportion of fragment and carrier is inserted in test in small scale experiments, so that the number of transformant maximization in the library.Be connected on a large scale in 7 connection tubes and carry out.Each pipe contains the postdigestive carrier of 3 μ g, the postdigestive insertion fragment of 0.5 μ g (1: 2.5 ratio), 40 μ l ligase buffer, 20 μ l T4 dna ligases, and cumulative volume is 400 μ l.Be connected under 16 ℃ and spend the night.Each Spend the night in-20 ℃ of ethanol precipitations in 8 pipes in the storehouse, thus the product that obtains of purification.The DNA that connects in each pipe is dissolved in the 30ml distilled water, presses 2x15 μ l and distributes, and is used for transforming thereby each library produces 16 pipes.
Electroreception attitude escherichia coli ER2738 prepares in order to the below method: 1) escherichia coli glycerol stock solution is gone up fresh line at LB agar (5mg/l tetracycline), get the super broth culture medium (SB) of the pre-temperature of a 15ml in the colony inoculation 50-ml polypropylene tube.Add tetracycline to 30 μ g/ml (the 5mg/ml tetracyclines of 90 μ l), on 37 ℃ of shaking tables with the 250rpm grow overnight.2) the 2.5ml culture is diluted in 42 liters of each of shaking in the bottle with 500ml SB culture medium, adds 10ml 20% glucose, 5ml 1M MgCl 25mg/ml tetracycline with 500 μ l.In 250rpm and 37 ℃ of following joltings, the absorbance up to the 600nm place is about 0.9 (2 hours 45 minutes).3) in cooled on ice culture and 4 500-ml bottles 15 minutes.4) culture is transferred in the bottle of 4 500-ml under 4 ℃ centrifugal 20 minutes with 4000rpm.5) supernatant that inclines uses the pipet of 25-ml pre-cooling that every kind of precipitate is resuspended in 10% glycerol of 25ml pre-cooling.In a 250-ml bottle, merge 2 parts of precipitate, add 10% glycerol, obtain 250ml.Centrifugal as mentioned above.6) supernatant that inclines, repeating step 5.7) abandoning supernatant is resuspended in every kind of precipitate in the remaining volume (3.5ml).Merge all suspensions.Use 300 μ l equal portions to carry out the library electroporation.Choose wantonly: in order to store, five equilibrium 320 μ l freeze suddenly with ethanol and dry ice in the eppendorf pipe.Cover test tube, be stored in-80 ℃.8) 50 μ l cell suspending liquids are applied on the LB agar (100mg/l Carbenicillin), pollute to detect carrier phage.50 μ l cell suspending liquids are applied on the LB agar (100mg/l kanamycin), pollute to detect helper phage.
The library electroporation follows these steps to carry out: 1) DNA (being generally 16 kinds) of connection and the box of respective number were placed 10 minutes on ice.2) in the sample of the library of each connection, add the ER2738 cell of prepared fresh,, and transfer in the cuvette by imbibition mixed once up and down.Placed 1 minute on ice.At 2.5kV, electroporation under 25 μ F and the 200ohm.At room temperature wash cuvette with 1ml SOC culture medium then with 2ml immediately.In the 10-ml culture tube, merge the 3ml culture.Under 37 ℃ with 300rpm jolting 1 hour.3) merge two 3ml samples, and it is transferred in the 50-ml polypropylene tube.The 5mg/ml tetracycline that adds (37 ℃) SB culture medium, 3 μ l100mg/ml Carbenicillins and the 15 μ l of the pre-temperature of 9ml.For the antibacterial that titration transforms, in 200 μ l SB culture medium, dilute 2 μ l cultures, and 10 μ l and this 1: 100 diluent of 1 μ l are applied on the LB agar (100mg/l Carbenicillin).Dull and stereotyped in
Figure A20068003404901311
℃ overnight incubation.By counting clump count, multiply by volume of culture and, calculating the transformant sum divided by the coating volume.The 15-ml culture adds 4.5 μ l100mg/ml Carbenicillins 300rpm and 37 ℃ of following joltings 1 hour, jolting 1 hour again under 300rpm and 37 ℃.4) merge two parts of 15ml samples, add 3ml VCSM13 helper phage.Transfer in the 500-ml polypropylene centrifuge tube.(37 ℃) the SB culture medium, 92.5 μ l 100mg/ml Carbenicillins and the 185 μ l 5mg/ml tetracyclines that add the pre-temperature of 167ml.The 200-ml culture was 300rpm and 37 ℃ following jolting 1.5-2 hour.5) add 280 μ l 50mg/ml kanamycin, continue to spend the night at 300rpm and 37 ℃ of following joltings.6) under 4 ℃ centrifugal 15 minutes with 4000rpm.Supernatant is transferred in the 500-ml centrifuge bottle of cleaning, added 50ml 20%PEG-8000/NaCl 2.5M.Placed 30 minutes on ice.7) under 4 ℃ centrifugal 15 minutes with 9000rpm.Abandoning supernatant is removed liquid by centrifuge bottle being upside down on the napkin at least 10 minutes, and is wiped the remaining liquid in centrifuge bottle top with napkin.8) by imbibition up and down, the phage precipitate is resuspended in the Tris buffer (TBS) that contains 2ml 1% (w/v) bovine serum albumin (BSA), and transfers in the 2-ml microcentrifugal tube along the centrifuge bottle side.Further by using 1-ml suction nozzle imbibition resuspending up and down, with microcentrifuge 4 ℃ at full speed centrifugal 5 minutes down, make supernatant pass through 0.2-μ m filter, be filled in the aseptic 2-ml microcentrifugal tube.Store these phage goods in 4 ℃.For long term store, can add sodium azide to 0.02% (w/v).The library size that obtains for LMB0020 is 2.4 * 10 9Individual transformant.
Embodiment 10: the elutriation of library LMB0020
1) with the hole of the antigen coated Costar 96-of the 0.25 μ gCD22 hole elisa plate among the 25 μ l PBS.Plate is covered the plate envelope.Bag can spent the night under 4 ℃ or carried out under 37 1 hour.In first round elutriation, the library bag that each will examination is by 2 holes; A hole is just enough in each wheel of back.Target concentration is reduced to 0.1 μ g/ hole in the elutriation of 3-6 wheel.
2) shake out bag by solution after, seal each hole by adding 150 μ l TBS/BSA 3% (the Tris buffer that contains 3% bovine serum albumin).Shrouding, and hatched 1 hour in 37 ℃.
3) after shaking out lock solution, Xiang Kongzhong adds the phage library (input sample) of 50 μ l prepared fresh.Shrouding, and hatched 2 hours in 37 ℃.Simultaneously, add inoculation 2 μ l ER 2738 cell products in the 2 μ l 5mg/ml tetracyclines, and make it in 250rpm and 37 ℃ of growths 2.5 down to 2ml SB culture medium
Figure A20068003404901321
The time.1 part of culture is cultivated in the library of each examination, and another part culture is used to import titration.
4) shake out phage solution, Xiang Kongzhong adds 150 μ l TBS/Tween-200.05%, and violent up and down imbibition 5 times.Wait for 5 minutes, shake out, repeat this washing step.In first round elutriation, wash by this way 4 times, washing is 6 times in second takes turns, and washing is 8 times in third round, by that analogy.
5) after shaking out last wash solution, add the 10mg/ml trypsin of 50 μ l prepared fresh in TBS, shrouding was hatched under 37 30 minutes.Violent up and down imbibition 10 times, and eluate (being 2 * 50 μ l in the first round, is 1 * 50 μ l in the later round) transferred in the 2-ml culture of Escherichia coli of preparation, at room temperature hatched 15 minutes.
6) add 6ml pre-warm SB culture medium and 1.6 μ l 100mg/ml Carbenicillins and 6 μ l 5mg/ml tetracyclines.Culture is transferred in the 50-ml polypropylene tube.For the output titration, at 200 μ l Dilution 2 μ l samples in the B culture medium are applied to LB agar (100mg/l Carbenicillin) upward (output sample) with 100 μ l and 10 these samples of μ l.Abreast, by with 1 μ l 10 -8The phage goods of dilution infect the 2-ml culture of Escherichia coli of 50 μ l preparation and import titration, at room temperature hatch 15 minutes, and are applied on the LB agar (100mg/l Carbenicillin).
7) the 8-ml culture added 2.4 μ l100mg/ml Carbenicillins 250rpm and 37 ℃ of following joltings 1 hour, other 1 hour of 250rpm and 37 ℃ of following joltings.
8) add 1ml VCSM13 helper phage, and transfer in the 500-ml polypropylene centrifuge tube.(37 ℃) the SB culture medium and 46 μ l100mg/ml Carbenicillins and the 92 μ l5mg/ml tetracyclines that add the pre-temperature of 91ml.300rpm and 37 ℃ of following jolting 100-ml cultures 1.5 to 2 hours.
9) add 140 μ l50mg/ml kanamycin, continue to spend the night at 300rpm and 37 ℃ of following joltings.
10) under 4 ℃ centrifugal 15 minutes with 4000rpm.Supernatant is transferred in the 500-ml centrifuge bottle of cleaning, added 25ml 20%PEG-8000/NaCl 2.5M.Placed 30 minutes on ice.
11) under 4 ℃ centrifugal 15 minutes with 9000rpm.Abandoning supernatant is inverted at least on napkin and was removed liquid in 10 minutes, and wipes the remaining liquid in centrifuge bottle top with napkin.
12) by along the centrifuge bottle side up and down imbibition the phage precipitate is resuspended in the 2ml TBS/BSA1% buffer, and transfer in the 2-ml microcentrifugal tube.Use the 1-ml suction nozzle further resuspension of imbibition up and down, with microcentrifuge 4 ℃ at full speed centrifugal 5 minutes down, and supernatant is filled in the aseptic 2-ml microcentrifugal tube by 0.2-μ m filter.
13) next round continues from step 3), perhaps stores the phage goods down at 4 ℃.For long preservation, can add sodium azide to 0.02% (w/v).The every phage that only to use prepared fresh of taking turns.
Table 6 shows that 6 take turns the phage titre of input and output solution in the library elutriation process
Round Input (10 11) Output (10 6) The response rate (% * 10 3) Enrichment
1 12 ?1.9 ?0.16 -
2 0.45 ?0.032 ?0.007 Negative
3 4.7 ?2.14 ?0.46 2.87
4 2.5 ?0.064 ?0.032 Negative
5 0.52 ?1.2 ?2.3 14.37
6 0.6 ?2.0 ?3.33 20.8
Embodiment 11: at target in conjunction with the screening each separated strain
ER2738 output phage-infect, and on the coating LB agar (100mg/l Carbenicillin).Dull and stereotyped 37 ℃ of following overnight incubation.Then can each bacterium colony of following examination and the combining of target protein:
1) in 96 orifice plates, adds the SB culture medium that 0.75ml contains 50 μ g/ml Carbenicillins with deep hole.Use aseptic toothpick with each colony lift in each hole.
2) plate that contains bacterial cultures shakes a few hours with 300rpm under 37 ℃.
2) the inoculation 6 hours after with each 1 μ l culture point sample to LB agar (100mg/l Carbenicillin).With plate 37 ℃ of following overnight incubation; Use the paraffin shrouding, and be stored in 4 ℃.Reclaim the separated strain that shows positive ELISA signal with these plates subsequently, and order-checking.
3) add IPTG to 1mM (7.5 μ l 1M IPTG stock solutions, water dilution in 1: 10) inducing culture thing, then 37 ℃ of following overnight incubation.
4) centrifugal derivative culture of Escherichia coli (4000rpm; 20min).
5) preparation Bugbuster solution (Novagen) (1.5ml reagent adds 13.5ml TBS and 15 μ lBenzonase).
6) precipitate is resuspended among the 150 μ l bugbuster.Plate was at room temperature hatched 30 minutes,
Figure A20068003404901341
With 4000rpm centrifugal 20 minutes.
7) every hole 50 μ l supernatant are transferred in the microtitration plate, this plate is spent the night with 4 ℃ of bags of target protein that every hole 100ng is dissolved among the PBS, and seals 1 hour with the TBS that 150 μ l/ holes contain 3%BSA.
8) plate was hatched under 37 2 hours.
9) with tap water washing 10 times.
10) dilution biotinylated rat anti-HA antibody (3F10, RocheBiosciences) (dilution in 1: 500) in TBS/BSA 1%.Xiang Kongzhong adds 50 μ l dilution antibody, hatches 1 hour in 37 ℃.
11) with tap water washing 10 times.
12) dilute Streptavidin/HRP (dilution in 1: 2500) in TBS/BSA 1%, every hole adds 50 μ l, hatches 30 minutes in 37 ℃.
13) preparation ABTS solution (2.94ml citrate buffer solution+60 μ l ABTS+1 μ l H 2O 2).
14) with tap water washing 10 times.
15) in every hole, add 50 μ l substrate solutions.
16) at room temperature hatch, use ELISA to read the plate instrument after 20 minutes in incubated at room and read O.D. in 405nm.
The output in the output that library LMB0020 the 5th takes turns and other two kinds of microprotein libraries is screened as mentioned above.Following table shows the binding data with the plate acquisition of IgG and BSA bag quilt.Several separated strains show that the binding signal on the plate of IgG bag quilt is significantly higher than the hole of BSA bag quilt.
IgG 1 2 3 4 5 6 7 8 9 10 11 12
A 0.14?0.11?0.10?0.10?0.10?0.11?0.10?0.12?0.14?0.11?0.13?0.13?SMP3S5
B 0.11?0.29?0.11?0.10?0.10?0.11?0.10?0.12?0.12?0.17?0.59?0.33?SMP3S5
C 0.24?0.27?0.16?0.23?0.11?0.19?0.12?0.10?0.10?0.10?0.11?0.16?SMP3S5
D 0.12?0.10?0.10?0.14?0.12?0.11?0.09?0.15?0.09?0.09?0.10?0.10?SMP3S5
E 0.10?0.11?0.10?0.17?0.09?0.09?0.10?0.15?0.15?0.11?0.10?0.10?SMP3S5
F 0.10?0.10?0.10?0.11?0.11?0.09?0.11?0.10?0.10?0.10?0.10?0.14?SMP3S5
G 0.46?0.12?0.33?0.20?0.40?0.11?0.09?0.33?0.09?0.09?0.10?0.30?SMP4S5
H 0.12?0.12?0.11?0.10?0.13?0.07?0.09?0.41?0.09?0.12?0.48?0.15?SMP5S5
BSA?A 1 2 3 4 5 6 7 8 9 10 11 12
B 0.10?0.10?0.10?0.10?0.09?0.10?0.10?0.10?0.12?0.10?0.10?0.10?SMP3S5
C 0.10?0.14?0.09?0.09?0.09?0.09?0.09?0.10?0.10?0.11?0.15?0.12?SMP3S5
D 0.12?0.12?0.10?0.13?0.09?0.12?0.10?0.11?0.10?0.09?0.10?0.10?SMP3S5
E 0.10?0.09?0.09?0.10?0.10?0.10?0.10?0.11?0.09?0.09?0.13?0.09?SMP3S5
F 0.09?0.10?0.09?0.12?0.09?0.09?0.09?0.10?0.12?0.09?0.09?0.10?SMP3S5
G 0.09?0.09?0.09?0.09?0.10?0.09?0.09?0.09?0.09?0.09?0.09?0.10?SMP3S5
H 0.14?0.09?0.11?0.09?0.11?0.09?0.09?0.12?0.09?0.09?0.09?0.11?SMP4S5
0.10?0.09?0.10?0.09?0.10?0.09?0.09?0.15?0.09?0.11?0.18?0.11?SMP5S5
Three IgG-are checked order in conjunction with separated strain.All separated strains all keep the spacing between 6 cysteine residues of trypsin inhibitor support.All 3 separated strains are all different on its aminoacid sequence, and this has proved that this method can produce many binding structural domains, and wherein each can be as the starting point of further optimizing.
LMB0020/SMP003S5.B2
G?P?S?G?P?G? C?P?I?L?Y?A?H? C?K?Q?D?S?D? C?V?T?G? C?V? C?R?P?L?G?M? C?G?SP?G?Q?S?G?G?S?G?H?H?H?H?H?H
LMB0020/SMP003S5.B12
G?P?S?G?P?G? C?P?S?L?P?T?P? C?K?Q?D?S?D? C?D?E?G? C?V? C?K?P?N?G?T? C?G?SP?G?Q?S?G?G?S?G?H?H?H?H?H?H
LMB0020/SMP003S5.C2
G?P?S?G?P?G? C?P?L?Y?S?P?V? C?K?Q?D?S?D? C?D?N?G? C?V? C?R?P?A?G?P? C?G?SP?G?Q?S?G?G?S?G?H?H?H?H?H?H
Embodiment 12: assembly (Build-up) method of microprotein design
Can progressively evolve with the bonded 1-disulfide bond protein of VEGF matter (1SS) is the 2SS microprotein, The person is more stable and immunogenicity is lower to protease.Fig. 1 shows two kinds of ELISA results that derive from the independent 2SS protein (" clone 2 " and " clone 7 ") of the phage-derived peptide of 1SS (" VEGF peptide ").All three kinds all is specific for VEGF, does not show with other protein such as BSA to combine.The M13 that does not conform to microprotein does not combine with VEGF or BSA yet.This 2SS protein produces by determining the bonded 1SS sequence of VEGF to move in the natural 2SS support (alpha-conotoxin).The protein that obtains is that VEGF is specific, and not nothing to do with protein such as bovine serum albumin (BSA) combination.Wild type phage granule (M13) demonstration does not combine with VEGF or BSA.Referring to Figure 168.
Embodiment 13: by the library construction of big primer mutation
Big primer method is that a kind of combination of primers that two (or a plurality of) are different is the method for a big primer, in Kunkel type polymerase elongation reaction this big primer by two terminal homology be introduced in the plasmid (except can utilize the termination codon displacement make introducing highly effective).Big primer method is used 60,70,80,90,100,110 or preferred even more than the two strands or the single stranded DNA of 120 nucleotide or base pair, is used to introduce or the complexity group (pool) of transfer DNA and encoded protein matter sequence.In our embodiment, these group coding microprotein libraries, arbitrary DNA or protein library but identical method can be encoded.One group of sequence (" old library ") of selecting before big primer generally contains and one group of new randomized sequence (" new library ").Therefore big primer method allows to produce new library from the blind method in old library---and need not check order to old library.
Usually, the PCR fragment from before the library district (" randomization zone ") of sequence set of selection produce, and this fragment (overlapping by PCR-) is connected on the synthetic oligo of the new randomized library fragment of coding (unselected), produces the dsDNA fragment contain new (unselected) and old (selection) randomization district.If one of primer has been introduced new library, then use primer can obtain identical final result by single PCR to " old library " regional both sides.By PCR asymmetric or out of control this dsDNA PCR fragment is converted into the big primer of ssDNA.The tip designs of the big primer of this ssDNA is that about 10-25 base and carrier have sequence homology, to guarantee in correct position insertion.
Use overlapping PCR to produce double-stranded big primer by two or more PCR fragments and/or synthetic oligonucleotide, single stranded DNA can be used the double-stranded PCR product and/or the single stranded DNA " asymmetric PCR " of degeneration
Figure A20068003404901371
PCR out of control) produces.Asymmetric PCR amplification and the complementary single stranded sequence of single stranded DNA template.Big primer sequence can contain simple sequence, but more typically contains the library (for example, microprotein) (shown in Figure 143) of sequence.Single-stranded template DNA (carrier or phage) can contain uridnine or the quenchable termination codon of can encoding (TAG, TAA, TGA), and this termination codon is replaced the big primer sequence that does not contain termination codon.It is synthetic by polymerase that annealed big primer causes DNA second chain then, and in the presence of buffer agent, archaeal dna polymerase, dna ligase and Deoxydization nucleotide triphosphoric acid (dNTP), utilize the connection of synthetic chain to produce covalence closed cyclic DNA (ccc-DNA).The ccc-DNA that produces is transformed in the bacterial cell system, is used for microprotein is expressed as soluble protein, soluble protein or fusion rotein.
Big primer result's a example is shown in the following table.It has shown in 15 sites of pro-by the aminoacid sequence of the microprotein of mutation.The conserved residues of coupling pathogenic microorganism protein template marks with gray shade.The library of microprotein sequence (sequence that comprises Fig. 2) is as the synthetic starting point of big primer.Utilize two dna primers generations to contain the PCR fragment in " old library " district and new library district: i) with the annealed primer in microprotein upstream, ii) contain the primer in new randomized microprotein sequence (" new library "), its flank is microprotein-specific annealed zone and dna profiling annealed zone.The input of microprotein library uses PCR with two primer amplifications, by the asymmetric PCR amplification, and is cloned in the single stranded DNA template, produces the second microprotein library.The clone (Fig. 2 bottom) who produces has shown in the first half of former sequence and back randomized microprotein sequence in half.
The list entries that is used for big primer mutation or clone
Figure A20068003404901381
Library district 1
Behind big primer mutation or the clone
Library district 2
Embodiment 14: the generation of microprotein
With the microprotein gene clone in the expression vector pET30 that has the T7 promoter, and transformed into escherichia coli strain BL21 (DE3).2mlLB (50mg/l kanamycin) inoculates from refrigerated glycerol stock solution, and cultivates 4 hours down at 37 ℃.This starting culture of 200 μ l is joined among the 250mlLB (50mg/l kanamycin), do not add the jolting overnight incubation.Transfer to 250rpm with shaking table morning next day, culture regrowth 1 hour.Add IPTG to 0.5mM final concentration then, protein was expressed 6 hours in 37 ℃ of jolting incubators.Culture centrifugal 15 minutes with 3000rpm was resuspended among the 5ml PBS, in 75 ℃ of heating 20 minutes.This step causes lysis and most of Escherichia coli protein degeneration.Suspension uses the SS34 rotary head with 10, centrifugal 30 minutes of 000rpm.With the supernatant application of sample that obtains to the HiTrap post that is filled with nickel sulfate (Pharmacia GE).The manufacturer advises as post, uses the imidazoles elute protein.Judge the lipidated protein that obtains>90% according to the SDS PAGE under the reducing condition.
Determining of embodiment 15:DBP complexity
Complexity is cumulative disulfide bond span, and it equals the connection of measuring in the protein chain upper amino acid
Figure A20068003404901391
Cumulative Distance between the cysteine.
Complexity is measuring of crosslinking degree, and being that support is inflexible therefore measures, and higher complexity provides higher rigidity.Because rigidity is the prediction index of protease resistant, it also is immunogenic useful prediction index.Proteasome degradation that higher complexity prediction reduces and lower immunogenicity.
Complexity=(Ca-Cb)+(Cc-Cd)+(Ce-Cf)
Figure A20068003404901392
Figure A20068003404901401
Embodiment 16: the support that does not contain the repetition motif
The superfamily of toxin family
1) uPAR/Ly6/CD59/ ophiotoxin receptor superfamily.Comprise family: activator protein _ receptor; BAMBI; The PLA2_ inhibitor; Toxin _ 1; UPAR_LY6;
2) charybdotoxin sample knottin superfamily comprises family's toxin _ 2; Toxin _ 17; γ-thionin; Sozin _ 2; Toxin _ 3; Toxin _ 5;
3) sozin/creatoxin-sample superfamily comprises the BDS_I_II of family; Sozin _ 1; Sozin _ β; Toxin _ 4;
4) ω toxin-sample superfamily comprises family's toxin _ 7; Toxin _ 30; Toxin _ 27; Toxin _ 24; Toxin _ 21; Toxin _ 16; Toxin _ 12; Toxin _ 11; ω-toxin; Albumin _ I; Toxin _ 9;
5) conotoxin O-superfamily is made up of 3 groups of cone shell peptides that belong to the same structure group.These 3 groups different on its pharmacological property: the w-conotoxin suppresses calcium channel, the slow down inactivation ratio of voltage-sensitive sodium channel of δ-conotoxin, the sodium current of muO-conotoxin blocking voltage sensitivity.
6) conotoxin I-superfamily includes only toxin 19 families.
7) conotoxin T-superfamily includes only toxin 26 families.
Each toxin family:
PF00087: toxin 1
Ophiotoxin.Venom neurotoxin and cytotoxic family.Structure is less, is rich in disulfide bond, almost is βZhe Die.Referring to Figure 61.
1)Cxxxxx(xxxx)xxxCxxxxxxCxxxx(xxx)C(xx)xxxxxxxxCxxxC
2)Cxxxxx(xxxx)xxxCxxxxxxCYxkx(wf)(xx)C(xx)xxxxxxxGCxxxC
PF00451: toxin 2
" Scorpio brevitoxin ".Scorpion venom contains multiple to the deleterious peptide of mammal, insecticide and Crustacean.In these peptides, brevitoxin family (30 to 40 residues) suppresses calcium-activated potassium channel.Referring to Figure 55.Topology is 1-42-63-5.
1)CxxxxxCxxxCxxxxxxxxxxCxxxxCxC
2)CxxxxxCxxxCkxxxxxxxgKCxxxKCxC
PF00537: toxin 3
This family contain neurotoxin and plant defense element (F.M.Assadi-Porter waits the people. (2000) ArchBiochem Biophys, 376:259-65).Caulis et Folium Brassicae junceae trypsin inhibitor MTI-2 is the plant defense element.It is tryptic potent inhibitor.MTI-2 is poisonous to Lepidoptera (Lepidopteran) insecticide.Charybdotoxin (a kind of neurotoxin) combines with the sodium channel, and suppresses the activate mechanism of this passage, and block nerves unit transmits thus.Referring to Figure 22.Topology is 1-82-53-64-7.
1)C(xxx)x(xx)xxxxCxxxCxx(xx)xxCxxxCxx(x)xxxxCxxxxx(xx)xxCxC
2)C(xxx)Y(xx)xxxxCxxxCxx(xx)xxCxxxCxx(x)xxGxCxxxxx(xx)xxC(W,Y)C
PF00706: toxin 4
Actinia neurotoxin.Sea anemone produces the multiple different neurotoxin with dependency structure and function.The protein that belongs to this family comprises neurotoxin, wherein has severally, comprises actinocongestin and actinine (anthopleurin).Neurotoxin combines with the sodium channel specificity, postpones its activation in the signal transduction process thus, the strong stimulation that causes cardiac muscle of mammal to shrink.Found actinocongestin 1 in the neuromuscular goods of Crustacean, it strengthens the release of mediator therein, causes that aixs cylinder excites (firing).There are three disulfide bond in this protein.This family is a member of sozin/creatoxin sample superfamily.Should super family comprise following Pfam member: BDS_I_II; Sozin _ 1; Sozin _ β; Toxin _ 4.Sea anemone produces the multiple different neurotoxin with dependency structure and function.The protein that belongs to this family comprises neurotoxin, wherein has severally, comprises actinocongestin and actinine.Neurotoxin combines with the sodium channel specificity, postpones its activation in the signal transduction process thus, the strong stimulation that causes cardiac muscle of mammal to shrink.Found actinocongestin 1 in the neuromuscular goods of Crustacean, it strengthens the release of mediator therein, causes that aixs cylinder excites.There are three disulfide bond in this protein.Have 25 known family members.Topology is 1-52-43-6.Figure 87.
1)CxCxxxxxxxxxxxxxxxx(xx)xxxxC(xxx)xxxxxxCxxxxxxxxxCC
2)CxCxxxxPxxrxxxxxGxx(xx)xxxxC(xxx)xxxWxxCxxxxxxxxxCC
PF05294: toxin 5
The Scorpio brevitoxin.Figure 46.
PF05453: toxin 6
Figure 90.This family is made up of isolating toxin sample peptide from the venom of Scorpio (Buthus martensii Karsch).Precursor is made up of 60 amino acid residues, has the signal peptide and an extra residue of 28 residues of inferring and has the mature peptide of 31 residues of amidatioon C-terminal.These peptides and other Scorpio K+ passage toxin have close homology, should present common three dimensional fold, cysteine-stable alphabeta (CSalphabeta) motif.This family works by blocking the medium and small conductivity calcium activation of its victim potassium-channel.Topology is 1-4 2-5 3-6.Motif is CxxCxxxCxxxxxxx (xx) C (xx) xxxxxCxC
PF05980: toxin 7
This family is made up of several short spider neurotoxins protein, comprise multiple neurotoxin protein from funnel-web spider (W.S.Skinner waits the people. (1989) J Biol Chem, 264:2150-55).Referring to Figure 64.
Topology is 1-4 2-5 3-8 6-7.
1)CxxxxxxCxxxxxxxCCxxxxxCxCxxxxxCxC
2)CxxxxxxCxxWxxxxCCxgxxYCxCxxxpxCxC
PF07365: toxin 8
Alpha-conotoxin and precursor.This family is made up of the α conotoxin precursor protein matter of several kinds from a large amount of Conuses (Conus).Alpha-conotoxin is that they block nAChR (nAChRs) from the little peptide neurotoxin of the venom of fishing Carnis Rapanae thomasianae.Figure 72.
PF00095: toxin 9
This spider neurotoxins family is considered to the calcium channel inhibitor.
Referring to Figure 63.Topology is 1-4 2-5 3-8 6-7.
1)Cxx(x)xxxxCxxxxxCCxxx(x)xCxCxxxxxCxC
2)Cxx(x)yxxxCxxgxxCCxrx(x)xCxCxxxxnCxC
PF07473: toxin 11
This family by several spastic peptide gm9a sequences form (M.B.Lirazan waits the people. (2000) Biochemistry, 39:1583-8).Referring to Figure 27, DBP:1-52-43-6
Motif: CxxxCxxxxxCxxxCxC
PF07740: toxin 12
HaTx1 is isolating 35 amino acid whose peptide toxin from Chilean tarantula venom.Its suppresses the valtage-gated K of drk1 (+) passage, is not by the blocking-up hole, but the energetics by changing gate (H.Takahashi waits the people. (2000) J Mol Biol, 297:771-80).Referring to Figure 50.
Topology is 1-4 2-5 3-6.Motif is CxxxxxxCxxxxx (x) CCxxxxCxxx (xxx) x (xx) xxC
PF07822: toxin 13
The member of this family is similar to neurotoxin B-IV, and the latter is an optionally neurotoxin of Crustacean, is produced by marine borer brain stricture of vagina ribbon wirm (Cerebratulus lacteus).Nearly 55 residues of the peptide of this height cationic are arranged and are formed two antiparallel spirals that connected by the ring that well limits in the hairpin structure.The branch of hair clip connects by four disulfide bond.Be defined as for active and three residues that overstate and want, i.e. Arg-7 ,-25 and-34, be found on the same one side of this molecule, and for activity and the another one residue Trp30 that wants that overstates is positioned on the opposite flank.This proteinic binding mode is also understood fully, but it may act on valtage-gated sodium channel, may be by combining with the site that do not characterize as yet on these protein.Also the possibility specificity is lower for its interaction sites, for example, may interact with electronegative membrane lipid.Referring to Figure 65.
PF07829: toxin 14
α-A conotoxin PIVA is the main paralytic toxin of finding in the venom that ichthyophagy spiral shell Conus purpurascens produces.This peptide by the blocking-up nAChR the acetylcholine binding site work (K.J.Nielsen waits the people. (2002) J Biol Chem, 277:27247-55).Referring to Figure 66.
Motif 1:CCxxxxxxxCxxCxCx (x) xxxxxC, motif 2:CCgxxpxxxChpCxCx (x) xxpxxC
PF07945: toxin 16
Janus Atracotoxin family.This family comprises the excretory three kinds of peptides of Aranea Hadronyche versuta.They be insecticide optionally, excitatoty neurotoxin, the acetylcholinergic receptor hypotype that exists in acetylcholinergic receptor that can be by antagonism muscle or other invertebrates neurons works.Janus atracotoxin-Hv1c is organized as spherical nuclei (residue 3-19) and the β-hair clip (residue 20-34) that is rich in disulfide bond.Have 4 disulfide bond, one of them is adjacent disulfide bond; Known its maintenance for structure is unessential, is important for killing insect active still.Have 3 known family members.
Figure A20068003404901441
Pounce on to learn and be 1-62-73-45-8.Figure 91.
1)CxxxxxxCxxCCxCCxxxxCxxxxxxxxxxC
2)CxgxxxpCxxCCpCCpgxxCxxxxxxgxxyC
PF08086: toxin 17
This family is made up of the ergtoxin peptide, and this peptide is by the excretory toxin of Scorpio.Ergtoxin can block the function of K+ passage.From scorpion venom, had been found that to surpass 100 kinds of ergtoxin, they according to primary structure be classified in three superfamilies (K.Frenal waits the people. (2004) Proteins, 56:367-75).
25 known family members are arranged.Topology is 1-42-63-75-8.Referring to Figure 60.
1)CxxxxxCxxxxxxxxCxxCCxxxxxxxxxCxxxxCxC
2)drdxCxDxxxCxxygxyxxCxxCCxxxgxxxgxCxxxxCxC
PF08087: toxin 18
Conotoxin O-superfamily.This family is by the member composition of conotoxin O-superfamily.The O-superfamily of conotoxin is made up of 3 groups of cone shell peptides that belong to same structural group.These 3 groups different on its pharmacological property: the w-conotoxin suppresses calcium channel, and δ-conotoxin reduces the deactivation speed of voltage sensitivity sodium channel, the sodium current of muO-conotoxin blocking voltage sensitivity.Referring to Figure 31.
Motif 1:CxxxxxxCxxxxxCCx (xx) xxCxxxxxxC,
Motif 2:CxxxgxxCxxxxxCCx (xx) gxCxxxfxxC
PF08088: toxin 19
Conotoxin I-superfamily.Referring to Fig. 6.This family is made up of the I-superfamily of conotoxin.This is a new class peptide that exists in the venom of some kind of Conus.These toxin are characterised in that the inhibition of the modified ion passage of four disulfide bond and neurocyte.The I-superfamily conotoxin is found in five of Carnis Rapanae thomasianae or six main differentiation branches, and may be found in many kinds.
PF08089: toxin 20
Selenocosmiahuwena neurotoxin family.This family is by the tame group composition of Selenocosmiahuwena neurotoxin-II (HWTX-II) of the excretory toxin of Aranea.These toxin are found in the excretory venom of Selenocosmiahuwena (Selenocosmia huwenaWang).HWTX-II has adopted new support, is different from the ICK motif of finding in other Selenocosmiahuwena neurotoxins.HWTX-II is made up of 37 amino acid residues, bag
Figure A20068003404901451
Participate in 6 cysteine of three disulfide bond.Referring to Fig. 5.
PF08091: toxin 21
This family is a member of ω toxin sample clan (clan).This family is made up of the isolating insecticide peptide that kills from spider venom.Referring to Figure 58.Have 4 known family members.Topology the unknown.Structure can not obtain.
1)CxxxxxxCxxxxxCCxxxCxxxxxxCxxxxxxCxxxC
2)CxxxxxPCxnxxxCCxgxCxxxxWxCxxxxxxCskxC
PF08092: toxin 22
Referring to Fig. 4.This family is made up of isolated M agi peptide toxin ( Magi 1,2 and 5) from the venom of six wart Aranea section (Hexathelidae) Araneas.The insecticidal peptide toxin combines with the sodium channel, and induces flaccid paralysis when being expelled to lepidopterous larvae.Yet these peptides do not have toxicity for mice when with 20 pmol/g intracranial injections.
PF08093: toxin 23
Referring to Fig. 3.This family is made up of the toxicity peptide of finding in six wart Aranea section spider venoms (Magi 5).Magi 5 is first kind the site 4 of mammal sodium channel had the spider venom of binding affinity, and this toxin has insecticide effect extremely to larva, causes paralysis in the time of in being expelled to the larva body.
PF08094: toxin 24
Conotoxin TVIIA/GS family.This family is made up of isolating conotoxin from the venom of Carnis Rapanae thomasianae Herba Kalimeridis cone shell (Conus tulipa) and ground-tint cone shell (Conus geographus).Show that from the isolating conotoxin TVIIA of Herba Kalimeridis cone shell the peptide with pharmacology's classification of other well-characterized has few sequence homology, but show with conotoxin GS (a kind of peptide) to have similarity from the ground-tint cone shell.These two kinds of peptides are all blocked the skeletal muscle sodium channel, and have several common biochemical characteristicss, represent the Fourth Ring conotoxin different subgroups (J.M.Hill waits the people. (2000) Eur J Biochem, 267:4642-8).Referring to Figure 28.
1)CxxxxxxCxxxCCxxxxCxxxxxxxC
2)CxGxxxxCPPxCCxGxxCxxGxxxxC
PF08095: toxin 25
Hefutoxin family.This family is by finding in the venom of Scorpio Heterometrus fulvipes
Figure A20068003404901461
Futoxin forms.These toxin are κ-hefutoxin1 and κ-hefutoxin2, and showing with any known toxin does not have homology.Hefutoxin is the potassium channel toxin, shows the topology of 1-4 2-3.Figure 173.
PF08097: toxin 26
Conotoxin T superfamily.Referring to Fig. 2.This family is made up of the T-superfamily of conotoxin.8 kinds of different T-superfamily peptides have been identified from the kind of 5 Conuses.These peptides have common shared signal sequence and conserved cysteine residue is arranged.Discovery T-superfamily peptide is expressed in the venom duct of the cone shell of all main feed type, and prompting T-superfamily is the diversified peptide of a big group, extensively is distributed in the kind of 500 kinds of different Conuses.
PF08099: toxin 27
Scorpion Calcine family.Referring to Fig. 1.This family is by calcine family's group composition of charybdotoxin.Calcine family is made up of Maurocalcine and Imperatoxin.These toxin are shown as the strong effector of the ryanodyne-sensitivity calcium channel of skeletal muscle.These toxin can be used for dihydropyridine receptor/ryanodyne acceptor interaction research.
PF08116: toxin 29
This family is made up of the PhTx parasite killing neurotoxin of finding in New Tijuca abdomen comb-footed spider (Phoneutria nigriventer) venom.Contain a large amount of 30-140 amino acid whose neurotoxic polypeptides in the black abdomen comb-footed spider venom, their performance various biological effects.Some neurotoxin mice that behind intracerebral ventricle injection, can cause death wherein, other has extremely strong toxicity to Diptera and Dictyoptera insecticide, but to the poisonous effect of mice quite a little less than.Referring to Fig. 7.
PF08117: toxin 30
Be also referred to as Ptu family.This family is made up of isolating toxicity peptide from assassin bugs saliva.Contain the complex proteins mixture that assassin bugs is used for fixing prey or digestion prey in the saliva.A kind of protein (Ptu1) is purified, and shows and irreversibly to block N-type calcium channel, and is lower for L-that expresses in bhk cell and P/Q-type calcium channel specificity.
Topology 1-42-53-6; 3 members.Referring to Figure 79.
1)CxxxxxxCxxxxxxCCxxxxxCxxxxxxC
2)CxxxgxxxCxgxxkxCCxxxxxCxxyanxC
PF08119: toxin 31
This family is made up of acid α-KTx short chain charybdotoxin.These toxin are named as parabutoxins, the K passage of their blocking voltage gates, and have extremely low pI value.In addition, they also lack crucial filling perforation lysine.In addition, the second important residue of dyad, hydrophobic residue (Phe or Tyr) is also lost.Referring to Fig. 8.
PF08120: toxin 32
Referring to Fig. 9.This family is made up of the tamulustoxin that finds in the venom of Indian red scorpion (Mesobuthus tamulus).Tamulustoxin and other scorpion venom toxin do not have similarity, and it has identical structure stand the site prompting of to the greatest extent let it be 6 cysteine residues.Tamulustoxin plays potassium channel blocker.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed
Figure A20068003404901471
dopt=Abstract&list_uids=11361010
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11361010
PF08396: toxin 34
Spider venom ω agotoxin/Tx1 family.Tx1 family lethal spider neurotoxins is induced the irritability symptom in mice.Referring to Figure 10.
PF01033: somatomedin
Referring to Figure 14.SM-B is a kind of serum factor of unknown function, and it is the little peptide that is rich in cysteine, derives from the N-end of attachment proteins vitronectin at the bottom of the cell based by Proteolytic enzyme.The SMB domain contains 8 Cys residues, be arranged in four disulfide bond (Y.Kamikubo waits the people. (2004) Biochemistry, 43:6519-34).Proposed active SMB domain and can allow sizable disulfide bond heterogeneity or variability, prerequisite is to keep the Cys25-Cys31 disulfide bond.The three dimensional structure of SMB domain is extremely tight, and disulfide bond is packaged in the center of the domain of the core that forms covalent bonding.This protein can be expressed as the soluble fusion protein in the C-end structure territory with thioredoxin.
1)Cxx(x)xCxxxxxxxxxxCxCxxxCxxxxxCCxxxxxxC
2)Cxx(x)rCxxxxxxxxCxCxxxCxxxxxCCxDxxxxC
3)Cxx(x)RCxexxxxxxxxCxCxxxCxxxxxCCxd[yf]xxxC
Described the topology of 1-23-45-67-8, but other isomers also are possible, and
Figure A20068003404901481
The NMR Structure Calculation conforms to.
PF00087, PF00021: three refer to toxin family
Referring to Figure 14-18.Poisonous neurotoxin and cytotoxic family.Structure is little, be rich in disulfide bond, almost all be βZhe Die.This family is a member of uPAR/Ly6/CD59/ ophiotoxin receptor superfamily clan.This clan can comprise following Pfam member: activator protein _ receptor; BAMBI; The PLA2_ inhibitor; Toxin _ 1; UPAR_LY6.
A kind of preferred library strategy is three the longest rings of randomization between Cys1-Cys2, Cys3-Cys4 and Cys5-Cys6.Use two different layout strategies: 1) the disulfide bond core is kept perfectly, and three rings of mutation only, 2) allow the mutation in the disulfide bond core, and can produce the higher ring of multiformity and arrange.The most conservative cysteine spacing is positioned at position n6=0 and n7=4, and (" n6 " is defined as between C6 and the C7; " n7 " is defined as between C7 and the C8).Utilize remaining CDP of this information evaluation.69 the most common CDP of member are 10,6,16,3,10,0,4.
1)
Cxxxxxxxxxx(xxx)Cxxxx(xx)Cxxxxxxxxxxxx(x)xxxxCx(xx)CxxxxxxxxxxCCxxxxC
2)
Cyxxxxxxxxx(xxx)Cpxgx(xx)Cyxkx(wf)xxxxxx(x)xxxxGCx(xt)CPxxxxxxxxxCCx(ts)DxC
PF01607, PF00187: chitin conjugated protein
Exist two kinds of different chitins that are rich in cysteine in conjunction with family (Z.Shen waits the people. (1998) JBiol Chem, 273:17665-70); T.Suetake waits the people. (2000) J Biol Chem, 275:17929-32; T.Suetake waits the people. (2002) Protein Eng, 15:763-9).PF00187 is found in fungus and the plant, comprises wheat germ agglutinin.Dambonite is a member of typically containing four disulfide bond.This family comprises 382 known family members, has the cysteine position and the topology 1-42-53-67-8 of high conservative.This family is included in the N-end place of first cysteine and the C-end position place amino acid number less (<3) of last cysteine as the advantage that support uses in the design of library.Distance between each cysteine is less than 10, and domain is rich in disulfide bond (near 50 aminoacid, having four disulfide bond).DBP is modal 1-4 2-5 3-6 topology.Domain is found in repeating part in nature.
PF01607 is also referred to as the Peritrophin domain, in animal and insecticide as the extracellular matrix egg
Figure A20068003404901491
A part and be found.This domain also is present among the little peptide tachycitin.The structure of tachycitin and dambonite (PF00187) is display structure similarity (referring to comparison) relatively.Tachycitin contains five disulfide bond, but the member of this family is generally contained 3SS (seeing marked graph).3 significant SS of Tachycitin show the topology of 1-32-64-5.1075 known family members are arranged.The cysteine position height is conservative.The aminoacid that does not have too many (<3) at the C-of the N-of first cysteine end and last cysteine end.
See Figure 19-21.
PF00187 chitin conjugated protein:
CxxxxxxxxCxxxxCCxxxxxCxxxxxxCxxxCxxxC
CgxqxxxxxCxxxxCCsxxGxCGxxxxyCxxxCxxxC
PF0160 chitin binding structural domain:
1)Cxxx(x)xxxxxxx(x)xxxC(x)xxxxxCxxxxxxxxxCxxxxxxxxxxxxCxxxxxxxx
2)Cxxx(x)xxgxxxx(x)xxxC(x)xx[yf]xxCxxxxxxxxxCxxgxxfxxxxxxCxxxxxxxxC
PF01826: trypsin inhibitor
Trypsin inhibitor contains in this family and the domain in many extracellular proteins, found [N.D.Rawlings waits the people. (2004) Biochem J, 378:705-16].This domain typically contains 10 cysteine residues, and they form 5 disulfide bond.DBP is 1-7 2-6 3-5 4-10 8-9.Known have 414 family members.The cysteine position height is conservative.Referring to Figure 23.
CxxxxxxxxCxxxCxxxCxxxx(xxxxx)xxxCx(xxxxxxx)xxCxxx(x)CxCxxxxxxxxx(xx)xCxxxxxC
PF02428: potato protein inhibitor
This family is finding in repeating part on the gene level.This protein synthesis is big precursor protein matter.The proteolysis cracking takes place within repeating part, rather than takes place between repeating part, produces sophisticated microprotein [E.Barta, Deng the people. (2002) Trends Genet, 18:600-3] [N.Antcheva waits the people. (2001) Protein Sci, 10:2280-90].
Synthetic a kind of big precursor protein matter, but disulfide bond topology the unknown of precursor.
Express repetitive, analyze its NMR structure.It folds and is similar to sophisticated microprotein, and annular array has taken place in prompting, and this unit is ancestors.This protein that obtains annular array is corresponding to heavy The support of this discovery of unit.Joint or protease site (EEKKN) are present in ancestors' the structure as the ring of confusion.Referring to Figure 24.
1)CxxxCxxxxxxxxCxxxxxx(x)xxxxxCxxCCxxxxxCxxxxxxxxxxC
2)CxxxCxxxxxxxxCPxxxxx(x)xxxxxCxxCCxxxxGCxxxxxxGxxxC
Because proteolysis processing, the sequence of ripe microprotein is different from above-mentioned marked graph:
2C2CC5C10C11C3C8C2 (ripe marked graph-protein level)
3C3C8C12C2CC5C10C2 (duplicate marking figure-gene level)
PF00304:γThionin
For their mature form, these plantlet protein are made up of about 45-50 amino acid residue usually.The foldable structure of γ-purothionine is characterised in that the folding and short spiral of the 3-chain antiparallel of good qualification.Three disulfide bond spiral and folding between hydrophobic core in, form the stable spiral motif of cysteine (P.B.Pelegrini waits the people. (2005) Int J Biochem Cell Biol, 37:2239-53).This similar in charybdotoxin and insecticide sozin (C.Bloch, Jr. wait the people. (1998) Proteins, 32:334-49).
Its domain shows high disulfide bond density, and every about 50 aminoacid have 4 disulfide bond, and topology is 1-8 2-5 3-6 4-7.Therefore cysteine spacing between each cysteine is preferred for the library design less than 10.The cysteine position is high conservative between this family's different members.Referring to Figure 25.
PF00304-γ-Thionin:
Motif 1:CxxxxxxxxxCxxxxxCxxxCxxxxxx (x) xxxCxx (x) xxxxCxCxxxC
Motif 2:CxxxSxxFxGxCxxxxxCxxxCxxxxxx (x) xGxCxx (x) xxxxCxCxxxC
PF02950: omega-conotoxin
Conotoxin is little spiral shell neurotoxin, and it blocks ion channel.Omega-conotoxin acts on presynaptic membrane, in conjunction with and block calcium channel (W.R.Gray waits the people. (1988) Annu Rev Biochem, 57:665-700).Its domain shows high disulfide bond density, and every about 24 aminoacid have three disulfide bond.Exist and surpass 380 known family members.Therefore cysteine spacing between each cysteine is preferred for the library design less than 10.The cysteine position is high conservative between this family's different members, and DBP is 1-42-53-6.Referring to Figure 26.
Referring to Figure 26.Motif: C (xx) xxxxxCCxx (xx) xCx (xxx) xxCC
Ziconotide is a kind of 25 amino acid whose conotoxins, is ratified by FDA as " Prialt ".Ziconotide has been used for surpassing 7000 patients, and right and wrong immunogenic (<1% incidence rate) therefore are a kind of promising supports that is used for human new conjugated protein.Sequence and 1-4 2-5 3-6 DBP are shown in Figure 12.
PF05374: mu-conotoxin
Mu-conotoxin be the sodium channel of voltage-sensitive inhibitor peptides (K.J.Nielsen waits the people. (2002) JBiol Chem, 277:27247-55).See Figure 29.DBP:1-42-53-6
Motif 1:CCxxxxxCxxxxCxxxxCC motif 2:CCxxpxxCxxxxCxPxxCC
PF02822:Antistasin
Peptide protease inhibitors can be used as single structure territory protein or as the list in the protein or Multidomain and be found; They be called as respectively simple inhibitor or composite inhibitor (R.Lapatto waits the people. (1997) Embo J, 16:5151-61).In many cases, they synthesize than the proteinic part of larger precursor, and are former or as N-end structure territory as the propetide relevant with non-activity peptidase or proenzyme.The Pfam definition only comprises 6 cysteine, and DBP is 1-42-53-6.Yet (1bx7 1hia) contains plural N-end disulfide bond to most of members of this family.Therefore this family can extend on the N-end.
Its domain shows high disulfide bond density, and every 39-54 aminoacid has 3-5 disulfide bond, and topology is 1-32-45-86-97-10.Therefore cysteine spacing between each cysteine, is preferred for the library design less than 10.The cysteine position is high conservative between this family's different members.Referring to Figure 32.
The member of this family is highly hydrophilic, is preferred for library design (low non-specific binding, low T-cell epitope number).For example, hirustasin only contains 6 hydrophobic residue altogether.There is the secondary structure element hardly in the crystal structure demonstration.The a large amount of possible disulfide bond isomer of this and 5SS combines, and makes it become the very useful support that is used for the library design.
For 5 disulfide bond, the cysteine position height is conservative: C4C5C6C1C4C4C10C5C1C
PF02822-Antistasin:
1)CxxxxCxxxxxCxxxxxxCxCxxxxC(x)xxxCxxxxxxxxxCx(xxx)xCxC
2)CxxxxCxxxxxCxxxxxxCxCxxxxC(x)xxxCxxGxxxdxxgCx(xxx)xCxC
3)CxxxxCxxxxxCxxxxxxCxCxxxxC(x)xxxCpyGxxxdxxgCx(xxx)xCxC
The short-form that lacks 4 cysteine residues of N-end:
1)CxxxxC(x)xxxCxxxxxxxxxCx(xxx)xCxC
2)CxxxxC(x)xxxCxxGxxxdxxgCx(xxx)xCxC
3)CxxxxC(x)xxxCpyGxxxdxxgCx(xxx)xCxC
PF05039: wild grey albumen-relevant
Referring to Figure 33.Wild grey albumen is regulated the pigmentation in the mice hair follicle, produces the black wool with the yellow band in nearly top.The wild grey protein signal albumen of highly homologous protein (ASIP) is present among the mankind, express on top level ground in fatty tissue, wherein it may be in energy homeostasis and the (J.C.McNulty that may work in human pigmentation, Deng the people. (2001) Biochemistry, 40:15520-7; J.Voisey waits the people. (2002) Pigment Cell Res, 15:10-8).
Disulfide bond between Cys5 and Cys10 is optional for 26S Proteasome Structure and Function.After removal, DBP becomes 1-4 2-5 3-8 6-7.First three disulfide bond forms distinctive cystine kink motif.Receptor binding site comprises the RFF motif between Cys7 and Cys8, preceding 16 aminoacid form ring.The C end is chaotic and can remove (noticing that Cys1 and Cys10 do not exist in the Pfam marked graph).
Following marked graph is preferred for the wild grey albumen of library design: PF05039-:
1)CxxxxxCxxxxxxCCxxCxxCxCxxxxxxCxCxxxxxxxxxC
2)CxxxxSCxxxxxxCCDPCxxCxCRFFxxxCxCRxxxxxxxxC
3)CxxxxSCxGxxxPCCDPCAxCxCRFFxxxCxCRxLxxxxxxC
The through engineering approaches protein folding that has shorter C-end and lack cysteine 5 and cysteine 10 for and the similar structure of native protein.The support that this engineered forms designs as the library, and have following marked graph:
CxxxxxCxxxxxxCCxxxxxCxCxxxxxxCxCx,
CxxxxxCxxxxxxCCDPxxxCxCRFFxxxCxCRxx,
CxGxxxCxxxxxxCCDPAxxCYCRFFxxxCxCRxx
The wild grey albumen of total length can expression in escherichia coli be soluble protein (R.D.Rosenfeld waits the people. (1998) Biochemistry, 37:16041-52).
PF05375:PMP inhibitor/Pacifastin
This family member's structure shows that they are by the three chain antiparallel beta sheets that connect by three disulfide bond
Figure A20068003404901531
Become, this with this family be defined as serpin new family (G.Simonet waits the people. (2002) Comp Biochem Physiol B Biochem Mol Biol, 132:247-55; A.Roussel waits the people. (2001) J Biol Chem, 276:38893-8).Referring to Figure 34.
There are 39 family members.The cysteine position height is conservative.The disulfide bond topology is 1-4 2-63-5.Distance between each cysteine<10.The C-end is invisible in structure, points out it to omit from the design of library.Two strong conservative aminoacid are N15 and T29, and they participate in forming and stabilize proteins enzyme coupling collar.They can omit to improve in conjunction with multiformity from the design of library.
1)CxxxxxxxxxCxxCxCxxxx(x)xxxCxxxxC
2)CxpGxxxKxxCNxCxCxxxx(x)xxxCTxxxC
PF01549:ShTK family and Stecrisp
Stecrisp shows with ShTK family to have highly similarly 3D structure, but whether the part (PF01549) of ShTK family (M.Guo waits the people. (2005) J Biol Chem, 280:12405-12).Carry out the Blast retrieval with the Stecrisp protein sequence and obtain 48 couplings, but do not obtain any ShTK family member with 30-100% homogeny.Referring to Figure 35-36.
Pfam01549 is the domain of unknown function, finds in several Caenorhabditis elegans (C.elegans) protein.Long 30 aminoacid of this domain have 6 conservative cysteine sites, form three disulfide bond.This domain be named as ShK toxin (according to SMART) (M.Dauplais waits the people. (1997) J Biol Chem, 272:4302-9).
This domain shows high disulfide bond density, and per 39 aminoacid have 3 disulfide bond, and topology is 1-62-43-5.Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine site is high conservative between the different members of this family.
PF01549-ShTK。Referring to Figure 35:
1)Cx(xxx)xxx(x)xxCxxxxxx(xx)Cxxxx(x)xxxxxxxxCxxxCxxC
2)Cx(dxx)dxx(x)xxCxxxxxx(xx)Cxxxx(x)xxxxxxxCxxtCxxC
C-end structure territory and the correlated series of STECRISP: referring to Figure 36.
The PF07974:EGF2 domain
The member of this family all belongs to the EGF superfamily, and being characterized as of it has 6-8 cysteine, forms 3-4 disulfide bond, and order is 1-3,2-4, and 5-6, this is very crucial for the folding stability of EGF.
Figure A20068003404901541
A little disulfide bond are piled up with terraced sample arrangement mode.Laminin EGF family distinguishes and comes owing to having extra disulfide bond.The function of this family's intracellular domain is not clear, but thinks that they mainly bring into play structure function.Usually, the domain repeated arrangement of in exoprotein, connecting.
PF07974-EGF2: referring to Figure 37.
1)Cx(xxxxxx)Cxx(x)xxxCxxxx(xxxxxxxx)CxCxxx(xxxx)xxxxxC
2)Cx(xxxxxx)Cxx(x)xGxCxxxx(xxxxxxxx)CxCxxx(xxxx)xxGxxC
Other EGF-spline structure territories:
PF00008-EGF: referring to Figure 38.
1)CxxxxxCxxxxxCxxxxx(xx)xxxCxCxxx(xxxx)xxxxxC
2)CxxxxxCxxxgxCxxxxx(xx)xxxCxCxxg(xxxx)xxgxxC
PF00053-Lam-EGF: referring to Figure 39 .DBP:1-32-45-67-8
1)
CxCxxxxxxxx(xx)Cxxxxxxxxx(xxxx)CxxCxxxxxxxxCxxCxxxxxxxxxx(xxxxx)C
2)
CxCxxxxxxxx(xx)Cxxxxxxxxx(xxGx)CxxCxxxxxGxxC(DE)xCxxxxxxxxxx(xxxxx)C
PF07645:Ca-EGF: referring to Figure 40.
1)CxxxxxxxCxxxxxx(xx)CxxxxxxxCx(xxxx)Cxxxxxxxxxx(xxxxxxx)C
2)
CxxxxxxxCxxxxxx(xx)CxNxxGx(F,Y)xCx(xxxx)Cxx(G,Y)xxxxxxx(xxxxxxx)C
PF04863: allinase EGF-sample: referring to Figure 41.
1)Cxxxxxxxxxxxxxxxx(xxxx)CxCxxCxxxxxCxxxxxxC
2)Cxxxxxxxxxxxxxxxx(xxxx)CxCxxCxxxxxCxxxxxxC
PF00323: mammal sozin; Sozin 1
Referring to Figure 45.DBP:1-6?2-4?3-5
1)CxCxxxxCxxxxxxxxxCxxxxxxxxxCC
2)CxCRxxxCxxxErxxGxCxxxgxxxxxCC
PF01097: arthropod sozin; Defensin 2
Referring to Figure 44.DBP:1-4?2-5?3-6
1)CxxxCxxxxxxxxxCx(xxx)xxxCxC
2)CxxHCxxxgxxGGxCxx(xx)xxxCxC
PF00711: sozin B, beta-defensin
Referring to Figure 43.DBP:1-4 2-53-6 or 1-5_2-4_3-6
1)CxxxxxxCxxxxCxxxxxxxxxCxxxxxxCC
2)CxxxxgxCxxxxCxxxxxxxgxCxxxxxxCC
PF08131: sozin-sample; Sozin 3 Figure 42.
1)CxxxxGxCrxkxxxnCxxxxxxxCxnxxqkCC
2)CxsxxGxCrxkxxxnCxxxxxxxCxnxxqkCC
Sozin-(sample-) 3 families form (A.M. by isolating sozin sample peptide (DLP) from the platypus venom
Figure A20068003404901551
Orres waits the people. (1999) Biochem J, 341 (Pt 3): 785-94).These DLP show and beta-defensin-12 and the similar three dimensional fold of sodium channel neurotoxin Shl.Yet, known conservative in DLP for the important side chain of beta-defensin-12 and Shl function.The biological function that this prompting is different.Identical of views with this, proved that DLP does not have anti-microbial properties and unobserved activity to the rat dorsal root ganglion sodium channel current.Only know three members, still the similarity with beta-defensin makes it become a kind of attractive support.
Its domain shows high disulfide bond density, whenever has 3 disulfide bond near 36 aminoacid, and topology is 1-5_2-4_3-6.Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine position is high conservative between the different members of this family.
PF00321: Herba Astragali Sinici albumen
Herba Astragali Sinici albumen is little alkaline soil plant protein, and length is 45 to 50 aminoacid, comprises three or four conservative disulfide bond.This protein has toxicity to zooblast, by inference its attack cells film and that it is become is permeable; This causes the inhibition of Sugar intake and make potassium and phosphonium ion, protein and nucleotide are overflowed from cell.This family be different from γ-thionin PF00304 (P.B.Pelegrini waits the people. (2005) Int J Biochem Cell Biol, 37:2239-53).
Its domain shows high disulfide bond density, whenever has 4 disulfide bond near 46 aminoacid.Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine position is high conservative between the different members of this family.Referring to Figure 46.
The cysteine position height is conservative, and the distance between each cysteine is approximately 10 and lower, and topology is 1-62-53-4; Domain is less, has 6 cysteine.
The motif that contains the member of three disulfide bond is
PF00321-Herba Astragali Sinici albumen:
1)xxCCxxxxxxxxxxCxxxxxxxxxCxxxxxCxxxxxxxCxxxxxx
2)xxCCxxxxxRxxYxxCxxxGxxxxxCxxxxxCxIxxxxxCxxxxxx
3)xxCCxxxxxRxxYxxCRxxGxxxxxCAxxxxCxIISGxxCPxx(Y,F)xx
The feature of motif that has four disulfide bond and topology and be the member of 1-8 2-7 3-6 4-5 is following marked graph: xxCCxxxxxxxCxxxCxxxxxxxxCxxxCxCxxxxxxxC
PF06360:Raikovi
Diffusible peptide pheromone only has 6 family members, but aminoacid height multiformity between cysteine (M.S.Weiss waits the people. (1995) Proc Natl Acad Sci U S A, 92:10172-6).The cysteine position height is conservative, and topology is 1-42-63-5.Distance between each cysteine<10.Referring to Figure 47.
1)CxxxxxxCxxxxCxxxCxxxxxxxxCxxxxxxxxxC
2)CxxaxxxCxxxxCxxxCxxxxxxxxCxxxxxxxxxC
The PF00683:TB domain
Transforming growth factor (TGF-)-conjugated protein sample (TB) domain is from people's fibrillin.Find in the fibrillin of the filamentary structure of this domain in being confined to extracellular matrix and the potential TGF-conjugated protein (LTBPs).(X.Yuan waits the people. (1997) Embo J, 16:6659-66).The multiple meaning is that this domain is found a plurality of copies in fibrillin and LTBP, but is not placed in-line.Referring to Figure 49.
Marked graph shows to have only 6 conservative cysteine.(1ksq): a cysteine of losing inserts between Cys1 and the Cys triplet (position 8/12,4/12,9/12), and last cysteine is lost in marked graph for 1uzq, 1apj to have analyzed three kinds of structures.Topology is 1-3 2-6 4-7 5-8.
1)CxxxxxxxxxxxxxCCCxxxx(xx)xxxxxCxxCPxxxxxxxC
2)Cxxxxxxx(x)xxkxxCCCxxxx(xx)xxgxxCexCPxxxxxxxC
PF00093:von Willebrand factor C type domain
The vWF domain is present in various plasma proteinss, complement factor, integrin, VI, VII, in XII and XIV Collagen Type VI, other exoproteins (P.Bork (1993) FEBS Lett, 327:125-30).488 kinds of known family members are arranged, have the cysteine residues of high conservative.Structure and sequence have relatively disclosed the N-terminal subdomain of CR module and the evolutionary relationship between the fibronectin 1 type domain, point out these domains have common ancestors (J.M.O ' Leary waits the people. (2004) J Biol Chem, 279:53857-66).Referring to Figure 50.
The domain that is rich in cysteine of little collagen
Little collagen is present in the cell wall of hydra.Little collagen contains the domain that is rich in cysteine of C-end, and it synthesizes the precursor of intramolecular disulfide bond bonding.C-end structure territory be have unique folding microprotein (S.Meier waits the people. (2004) FEBS Lett, 569:112-6; E.Pokidysheva waits the people. (2004) J Biol Chem, 279:30395-401).Having only cysteine residues in 16 family members is high conservative.It is generally acknowledged that disulfide bond is reorganized as intermolecular disulfide bond, form the cell wall stable matrix.The topology of disulfide bond is 1-5 2-4 3-6.The discovery that can utilize C-end structure territory to form intermolecular disulfide bond each other produces the combinatorial library of the dimer molecule that connects by intermolecular disulfide bond.Referring to Figure 136.
Motif: being C3C3C3C3CC in the little collagen, is C5C3C3C3C3CC in the hydra HOWA albumen, and wherein this domain repeats.
PF03784:Cyclotide
One group of cyclic peptide with various active is contained in this family.This structure by the three chain beta sheets and the cysteine-knot of disulfide bond of distortion rearrange (D.J.Craik waits the people. (1999) J Mol Biol, 294:1327-36).
Referring to Figure 51.
Topology is 1-4_2-5_3-6
1)CxxxCxxxxCxxxxxxxCxCxxxxC
2)CxExCxxxxCxxxxxxGCxCxxxxC
PF06446:Hepcidin
Hepcidin is a kind of antibacterium of expressing in liver and antifungal protein, also is a kind of signaling molecule in the iron metabolism.Hepcidin albumen is rich in cysteine, and forms the beta sheet of distortion, finds uncommon disulfide bond in the corner of hair clip.
Referring to Figure 52.Topology is 1-8 2-7 3-6 4-5
Motif 1:xxxCxxCCxCCxxxxCxxCC
Motif 2:FPxCxFCCxCCxxxxCGxCC
PF05353:δ-Atracotoxin
The structure of Atracotoxin comprises a core β district, and this core β district comprises extension of three chain thumb samples and the C-end spiral that stretches out from the β district.β contains in the district cystine knot motif, and this is the feature of finding in other neurotoxic polypeptides.Referring to Figure 53.
Topology is 1-4 2-6 3-7 5-8
Motif 1:CxxxxxxCxxxxxCCCxxxCxxxxxxxxCxxxxxxxxxC
Motif 2:CxxxxxWCxxxxxCCCPxxCxxWxxxxxCxxxxxxxxxC
PF00299: serpin
The Fructus Cucurbitae moschatae inhibitor forms a kind of in a large amount of serpins.Their length is approximately 30 residues, contains 6 Cys residues, forms 3 disulfide bond.Topology is 1-4 2-5 3-6.Referring to Figure 56.
1)CxxxxxxCxxxxxCxxxCxCxxxx(x)xC
2)CPxxxxxCxxpxpCxxxCxCxxxx(x)xCG
PF01821: anaphylatoxin-spline structure territory
C3a, C4a and C5a anaphylatoxin are the protein fragments that enzymatic produces in serum in complement molecule C3, C4 and C5 activation process.They induce smooth muscle contraction.Among these fragments and the fibulin three folds the repeating part homology.Topology is 1-4 2-5 3-6.There are 123 known members in this family.
Referring to Figure 57.
1)CCxxxxxx(xxxx)xxCxxxxxxxx(xx)xxCxxxxxxCC
2)CCxxGxxx(xxxx)xxCxxxxxxxx(xx)xxCxxxFxxCC
PF05196: midkine/PTN
The extracellular heparin-binding protein that several participations growth and differentiation are regulated belong to a new growth factor family (people such as W.Iwasaki. (1997) Embo J, 16:6936-46).33 family members are arranged.Cysteine site high conservative, the disulfide bond topology of formation 1_42-53-6.Distance between each cysteine<10.The NMR structure of midkine shows highly chaotic N-and C-end, points out them to omit from the design of library.Positively charged residue participates in the hair clip combination, can be from the library Omit in the meter.Referring to Figure 59.
1)CxxxxxxxCxxxxxxCxxxxxxxCxxxxxxxxCxxxC
2)CxxWxxxxCxxxxxDCGxGRExxCxxxxxxxxCxxPCxW
PF02819:WAP " four-disulfide bond core "
Although the mode annunciations sequence of conservative cysteine can adopt similarly folding, the overall sequence similarity lower (L.G.Hennighausen waits the people. (1982) Nucleic Acids Res, 10:2677-84).25 known family members are arranged.Referring to Figure 62.
Topology is 1-6 2-7 3-5 4-8.
1)Cxxxx(xx)xxxxCxxx(xxx)CxxxxxCxxxxxCCxxxC
2)CPxxx(xx)xxxxCxxx(xxx)CxxDxxCxxxxKCCxxxC
PF02048, PF07822: toxicity hair clip
Toxin 13 (PF07822) is folded into the alpha-helix hair clip of 4SS disulfide bond-connection.It is the toxicity hair clip of 1_42-53-6 as DBP that the SCOP data base has also listed heat-staple enterotoxin (PF02048).
The member of this family is similar to neurotoxin B-IV, and the latter is an optionally neurotoxin of Crustacean, is produced by marine borer brain stricture of vagina ribbon wirm (Cerebratulus lacteus).Nearly 55 residues of the cationic peptide of this height are arranged in hairpin structure and are formed two antiparallel spirals that connected by the good ring that limits.The branch of hair clip connects by four disulfide bond.Three are defined as the vital residue of activity is present on the same one side of molecule, and another is positioned on the relative side the vital residue Trp30 of activity.This proteinic binding mode still imperfectly understands, but may act on voltage-gated sodium channel, may be by combining with the site that does not characterize as yet on these protein.Referring to Figure 65.The topology of toxin 13 is 1-82-53-64-5
1)CxxxCxxxxxxCxxCxxxxxxxxxxCxxxCxxxxxxCxxxC
2)CxxxCxxxyxxCxxCxgxWxgxxgxxCxxhCxxxxxxCxxxC
PF06357:ω-atracotoxin
ω-Atracotoxin-Hv1a is the specific neurotoxin of a kind of insecticide, its system's generation specificity derives from the ability (X.Wang of antagonism insecticide rather than vertebrate valtage-gated calcium channel, Deng the people. (1999) Eur J Biochem, 264:488-94).Topology is 1-6_2-7_3-4_5-8
Referring to Figure 66.Topology is 1-4_2-5_3-6.
Figure A20068003404901601
PxxxPCPYxxxxCCxxxCxxxxxxGxxxxxxC
PF06954: phylaxin
This family is made up of several mammal phylaxin protein.The raising that has proved circulation phylaxin level under the situation that has fixed physiology insulin level significant stimulation the glucose generation, and insulin has suppressed the phylaxin expression.
Phylaxin contains the N-end α spiral that participation C-end is rich in the multimerization of disulfide bond part.Referring to Figure 67.Topology is 1-10 2-9 3-6 4-7 5-8.
Only shown the microprotein that is rich in disulfide bond.N-end alpha-helix motif can be used for the multimerization of microprotein.
1)CxxxxxxxxxxxCxxxxxxxxCxCxxxCxxxxxxxxCxCxCxxxxxxxxCC
2)CxxxxxxxxxxxCPxGxxxxxCxCGxxCGxWxxxxxCxCxCxxxDWxxRCC
PF00066:Notch/DSL
The growth course of the extracellular domain participation animal of transmembrane protein (J.C.Aster waits the people. (1999) Biochemistry, 38:4736-42; D.Vardar waits the people. (2003) Biochemistry, 42:7061-7).There is (3x) in the series connection of DSL repeating part.Three conservative Asp or Asn residue.In the NMR structure, D12, N15, D30, D33 form the Ca2+ binding site.Under the situation of the Ca2+ that has mM, only form a kind of isomer, but when having Mg2+ or EDTA, find to have multiple isomer.This can be used for the structure evolution of microprotein.175 family members are arranged.The cysteine position height is conservative, and topology is 1-5 2-4 3-6.The N-end of first cysteine and the C end of last cysteine have not many (<3) aminoacid.Distance between each cysteine<10.Referring to Figure 68.
1)Cx(xx)xxxCxxxxxxxxCxxxCxxxxCxxxxxxC
2)Cx(xx)xxxCxxxxxxgxCxxxCnxxxCxxDGxDC
PF00020:TNFR
Have been found that a large amount of protein contain the domain that is rich in cysteine at the N-petiolarea, known some of them are receptors of somatomedin, they can be further divided into four (perhaps in some cases, three) contain the repeating part of 6 conservative cysteine, all these cysteine all participate in the formation (M.D.Jones of intrachain disulfide bond, Deng the people. (1997) Biochemistry, 36:14914-23).Its domain contains the cysteine residues of 6 high conservatives, and topology is 1-2 3-5 4-6.
Referring to Figure 69.
1)Cxxx(x)xxxxxxx(x)xxCx(x)CxxCxx(xx)xxxxxxxCxxxxxxxC
2)Cxxx(x)x[yf]xxxxx(x)xxCx(x)CxxCxx(xx)gxxxxxxCxxxxxtxC
PF00039: fibronectin II type domain
Fibronectin is a kind of multiple domain glycoprotein, finds that with soluble form it comprises collagen, fibrin, heparin, DNA and actin in conjunction with cell surface and various composition in blood plasma.
Referring to Figure 70.Topology is 1-3 2-4.Motif:
CxfpfxxxxxxxxxCxxxxxxxxxxwCxxxxxxxxDxxxxxC
PF02013: cellulose or protein bound domain
They are found in the aerobe of cellulose-binding (or other carbohydrates); But they are protein bound domains in anaerobic fungi, are called as docking protein (dockerin) domain or stop domain.
Topology is 1-23-4.Referring to Figure 71.
Motif:
Cxx(xxx)xxxyxCCxxxxxxxxxxwcxxxxxxxxDxxxxxCxxxx(xxxx)xxxxxxxxwxxxxxxxC
PF00734: fungin binding structural domain
Structurally, cellulase is formed [N.R.Gilkes by the short catenation sequence by proline rich and/or hydroxy-amino-acid with the catalytic domain that cellulose binding domain (CBD) is connected usually with xylanase, Deng the people. (1991) Microbiol Rev, 55:303-15].The CBD of verified a large amount of fungal cellulases is made up of 36 amino acid residues, and in the N-of this enzyme end or the terminal discovery of C-end.The member of this family has two disulfide bond, and topology is 1-3 2-4.Referring to Figure 73.
Motif: qCGGxxxxGxxxCxxgxxCxxxxxxy
PF00219: insulin-like growth factor binding proteins
Insulin like growth factor (IGF-I and IGF-II) combines with particular combination protein high-affinity in the extracellular fluid body.The member of this family has two disulfide bond, and topology is 1-32-4.Referring to Figure 74,75.
PF00322: Endothelin family
Endothelin (ET) is a strongest known vasoconstrictor.Long 21 residues of these peptides contain two intramolecular disulfide bonds, and topology is 1-4 2-3.Referring to Figure 76.
PF02058: guanylin precursor
Guanylin is the amino acid whose peptide of 15-, is the endogenous ligands of intestinal receptor guanylate cyclase-C, is called StaR.These peptides contain two intramolecular disulfide bonds, and topology is 1-32-4.Referring to Figure 77.
PF02977: carboxyl peptide enzyme inhibitor
Peptide protease inhibitors can be used as single structure territory protein or as the list in the protein or Multidomain and exist; They are called as simple inhibitor or composite inhibitor respectively.In many cases, they are synthesized and are the proteinic part of larger precursor, the former or N-end structure territory as the propetide relevant with non-activity peptidase or proenzyme.By with second kind of peptide enzyme interacting or remove N-end inhibitor structure territory by self-catalysis cutting and can activate proenzyme.
35 known family members are arranged.Topology is 1-42-53-6.Referring to Figure 80.
1)CxxxxxxCxxxxxCxxxCxCxxxxxxC
2)CPxixxxCxxdxdCxxxCxCxxxxxxCg
PF06373:CART
CART mainly forms (about 40 aminoacid) by corner and ring, the tight framework that these corners and ring quilt are made up of the several little extension of the common antiparallel beta sheet of cystine knot across.13 known family members are arranged.
Topology is 1-3 2-5 4-6.Referring to Figure 81.
As if different with every other family, non--cys residue is more conservative, and this family is not randomized preferred selection.
The folliculus chalone
People's folliculus chalone is the product of FDA approval, does not have immunogenicity, so 70-72 amino acid whose folliculus chalone domain is attractive support.It contains 36 cysteine residues altogether, is considered to be arranged in one group of non-overlapped disulfide bond, corresponding to four spontaneous folding units (Figure 21 8).These unitary first, we are called Fs0, contain 63 N-end residues of mature polypeptide, and do not have sequence similarity with the protein of other known structure.On the contrary, remaining folliculus chalone chain shows the folliculus chalone domains that are folded into a series of three continuous 70-74-residue length, they are structure repeating parts, be called as Fs1, Fs2 and Fs3, demonstration has homology with the folliculus chalone spline structure territory of extracellular matrix protein BM-40, also in several other extracellular matrix protein matter, find, as assemble albumen, tomoregulin and
Figure A20068003404901631
Body protein C6 and C7.Referring to Figure 151.The DBP of 69-72 amino acid whose folliculus chalone domain is 1-3 2-4 5-9 6-8 7-10.
PF00713: hirudin
Hirudin family is a histone enzyme inhibitor, belongs to the MEROPS inhibitor I14 of family, IM clan; They suppress the serine peptidase of S1 family.
Hirudin is a kind of strong thrombin inhibitor of the salivary gland secretion of Hirudinaria manillensis (Hirudinaria manillensis) (wild ox Hirudo) and doctor trematodiasis (Hirudomedicinalis) (Hementaria officianalis).It and α-thrombin form stable non-covalent complex, thereby eliminate the ability of its cutting fibre proteinogen.The structure of hirudin is resolved by NMR, and the structure of lepirudin 023 ludon-thrombin complex is defined as 2.3A by the x-radiocrystallography.Hirudin is made up of the C-end structure territory of N-end spherical structure territory and extension.Residue 1-3 forms the parallel beta chain with thrombin residue 214-217, and the Ser195O γ atom of the nitrogen-atoms of residue 1 and catalytic site forms hydrogen bond.C-end structure territory combines the outside with the anion of thrombin and forms a large amount of electrostatic interactions, and last five residues are arranged in helical ring, forms many hydrophobicity contacts.Referring to Figure 123.
PF06410:Gurmarin
Gurmarin is the polypeptide from 35 residues of milkweed plant (Asclepiad vine) Largeleaf Gymnema (Gymnema sylvestre).It is used as pharmacological tool in the research of sweet taste conduction, because it can be replied the nerve of sweet taste tastant by selectivity inhibition rat.
2 known family members are arranged.Topology is 1-4 2-5 3-6.Referring to Figure 82.
1)CxxxxxxCxxxxxxCCxxxxCxxxxxxxxxC
2)CxxxxxxCxxxxxxCCxxxxCxxxxwwxxxC
PF08027: albumin-1
Albumin I albumen is a kind of hormonelike peptide, stimulates kinase activity after combining the 43kDa receptors bind with film.The structure of this domain shows that a kind of knottin sample is folding, is made up of three β chains.34 known family members are arranged.Topology is 1-42-53-6.Referring to Figure 83-84.
PF08098: neurotoxin (ATXIII)
This family is by ditch sea anemone (Anemonia sulcata) toxin III (ATX III) the neurotoxin man group composition of facining the wind.ATX III is a kind of neurotoxin that is produced by sea anemone.It take to contain four reverse corners and
Figure A20068003404901641
Individual other chains reverse, still do not have the tight structure of regular alpha-helix or beta sheet.The hydrophobicity sheet of finding on the peptide surface can constitute the part of sodium channel mating surface.2 known family members are arranged.Topology is 1-42-53-6.
Figure 85.Motif: CCxCxxxxxxxxCxxxxxxxxxxC
The PF01147:CHH/MIH/GIH neuro hormone
Arthropod is expressed neuropeptide family, comprise from Crustacean hyperglycemic (CHH), MIH (MIH), gonad inhibitory hormone (GIH) and mandibular bone-organ inhibitory hormone (MOIH) and from the iron transfer peptide (ITP) of locust.
131 known family members are arranged.Topology is 1-52-43-6.Referring to Figure 86.
PF04736: ecdyson
Ecdyson is the cast off a skin insect neuropeptide of behavior of a kind of initiation, and it causes taking off old crust when finishing casting off a skin.5 known family members are arranged.Topology is 1-52-43-6.Do not obtain structure.Referring to Figure 88.
1)CxxxCxxCxxxxxxxxxxxxCxxxCxxxxxxxxxxC
2)CxxnCxqCkxmxgxxfxgxxCxxxCxxxxgxxxpxC
PF01160: endogenous opioids neuropeptide
Vertebrates endogenous opioids neuropeptide discharges by the posttranslational protein matter hydrolysis cutting of precursor protein matter.Precursor is grouped into by following one-tenth: the signal sequence before the conserved region of about 50 residues; The variable-length district; Sequence with neuropeptide itself.Sequence analysis shows that the conservative N-petiolarea of precursor contains 6 cysteine, and they may participate in disulfide bond and form.Infer that processing may be important for neuropeptide in this zone.50 known family members are arranged.Topology is 1-4 2-5 3-6.Do not obtain structure.Referring to Figure 89.
1)CxxxCxxCxxxxxxxxxxxxxxxCxxxCxxxxxxxxxxxxC
2)CxxxCxxCxxxxxxxxxxxxxxxCxlxCxxxxxxxxxWxxC
PF08037: Mollusca pheromone
This family is by the water plain white tame group composition of nest egg that diffuses information.Copulation in the Carnis Rapanae thomasianae attracts to relate to the remote water transmitting signal that draws the protein peptide form, and it discharges in the process of laying eggs.These peptides contain 6 conservative cysteine, are folded into 2 antiparallel spirals.Second spiral contains in the Carnis Rapanae thomasianae nest egg is white
Figure A20068003404901651
The IEECKTS sequence of keeping.5 known family members are arranged.Topology is 1-62-53-4.Figure 90.
1)CxxxxxxxxCxxxxxxCxxxxxCxxxxxxCxxxxxxxC
2)CdxxxxxsxCqmxxxxCxxaxxCxxxieeCktsxxexC
PF03913:AMBV albumen
Amb V is the protein of the kind (artemisiifolia) of a kind of Ambrosia (Ambrosia).Proved that AmbV contains C-end spiral as main t cell epitope.Also play a major role in the T cell recognition of the cross reactivity t cell epitope of free sulfydryl in these relevant allergens.
3 known family members are arranged.Topology is 1-7 2-5 3-6 4-8.Figure 92.
1)CxxxxxxCCxxxxxxC(x)xxxxCxxxxxxCxxxC
2)CgxxxxyCCxxxgxyC(x)xxxxCyxxxxxCxxxC
Appendix B: the HDD domain that contains the repetition motif
PF01437: clump albumen PSI
The repeating part that is rich in cysteine outside several different born of the same parents, find in the receptor (J.Stamos waits the people. (2004) Embo J, 23:2325-35; J.P.Xiong waits the people. (2004) J Biol Chem, 279:40252-4).The function of this repeating part is not clear.In clump albumen, find the repeating part of three copies.In redwood protein, find the repeating part of two copies.Relevant c. elegans protein matter contains the repeating part of four copies.The Met receptor contains the repeating part of a copy.The Pfam comparison shows the cysteine residues of 6 high conservatives, and they can form three conservative disulfide bond, and 5 and 7 places observe two other cysteine in the site, and they may participate in forming a disulfide bond.Topology is 1-4_2-8_3-6_5-7 (structure 1shy).Semaphorins (structure 1olz) only contains three disulfide bond, and topology is 1-4_2-6_3-5.Referring to Figure 93.
1)
CxxxxxCxxCxxxxxx(x)xCxxCxxxxxCxxxx(xxxxxx)xCxxxx(xxxxxxxxxx)xxxxxxC
2)
CxxxxxCxxCxxxxxx(x)xCxWCxxxxxCxxxx(xxxxxx)xCxxxx(xxxxxxxxxx)xxxxxxC
Ring between Cys7 and the Cys8 is for inserting very tolerance.For example, the hybrid structure territory insert between these cysteine in the integrin β subunit structure (J.P.Xiong waits the people. (2004) J Biol Chem, 279:40252-4), Cys8 still forms disulfide bond with Cys2.This can be used for inserting arbitrary sequence behind Cys7.
Design: CxxxxxCxxCxxxxxx (x) xCxxCxxxxxCxxxx (xxxxxx) xCxxxxxxxx (xxxxx) (" arbitrary sequence ") C
This can be used for producing many-clump albumen:
Insert for the first time:
CxxxxxCxxCxxxxxx (x) xCxxCxxxxxCxxxx (xxxxxx) xCxxxxxxxx (xxxxx) (" PLEX ") C, wherein PLEX corresponding to
CxxxxxCxxCxxxxxx(x)xCxxCxxxxxCxxxx(xxxxxx)xCxxxx(xxxxxxxxxx)xxxxxxC.
Insert for the second time:
CxxxxxCxxCxxxxxx (x) xCxxCxxxxxCxxxx (xxxxxx) xCxxxxxxxx (xxxxx) (" clump albumen " ("
Figure A20068003404901661
Albumen ")) C, wherein (" clump albumen " (" clump albumen ")) corresponding to
CxxxxxCxxCxxxxxx (x) xCxxCxxxxxCxxxx (xxxxxx) xCxxxx (xxxxxxxxxx) xxxxxxC is inserted into
After the Cys7 of " PLEX ", in the clump protein sequence that inserts, repeatedly insert later on, after Cys7 among CxxxxxCxxCxxxxxx (x) xCxxCxxxxxCxxxx (xxxxxx) xCxxxxxxxx (xxxxx) (" the PLEX ") C.
PF00088: trilobal and big trilobal
The discovery in the eukaryotic protein outside some born of the same parents of the module that is rich in cysteine of about 45 amino acid residues (M.D.Carr waits the people. (1994) Proc Natl Acad Sci U S A, 91:2206-10; T.Yamazaki waits the people. (2003) Eur J Biochem, 270:1269-76).Human TFF 3 can be in colibacillus periplasm high level expression (15mg/l culture).This module shows high disulfide bond density, and per 45 aminoacid have 3 disulfide bond, and topology is 1-5 2-4 3-6.Big trilobal is made up of two adjacent modules that connect by the another one disulfide bond, and connectivity is 1-14 2-6 3-5 4-7 8-12 9-11 10-13.Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine position is high conservative between the different members of this family.Referring to Figure 94-95.
1)C(x)xxxxxxxxxCxx(x)xxxxxxxCxxxxCCxxxxx(x)xxxxxCx
2)C(x)xxxxxxRxxCxx(x)xxxxxxxCxxxxCCfxxxx(x)xxxxwCf
3)C(x)xxxxxxRxxCgx(x)xxitxxxCxxxgCC[fwy]dxxx(x)xxxxwC[fy]
Marked graph with big trilobal variant of two adjacent blocks and extra 1-14 disulfide bond:
CxC (x) xxxxxxxxxCxx (x) xxxxxxxCxxxxCCxxxxx (x) xxxxxCxxxxxxxxxxxC (x) xxxxxxxxxCxx (x) xxxxxxxCxxxxCCxxxxx (x) xxxxxCxxxxxxxxC and derivant.
Figure 134 shows may be by multiple " poly--trilobal " structure of trilobal motif generation.
PF00090: thrombospondin 1
This module is present in the thrombospondin, and triplicate is present in a large amount of protein and extracellular matrix protein that relate to complement pathway therein.Proved that participation cell-cell interaction, blood vessel take place and the inhibition of apoptosis (P.Bork (1993) FEBS Lett, 327:125-30).Referring to Figure 96.
Its domain shows high disulfide bond density, and every about 50 aminoacid have 3 disulfide bond, topology be 1-5_2-6_3-4 (T.M.Misenheimer waits the people. (2005) J Biol Chem).Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine position is conservative between the different members of this family.
CxxxCxxxxxxxxxxcxxxx(xxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
CxxxCxxGxxxRxxxcxxxx(Pxxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
CsvtCgxGxxxRxrxcxxxx(Pxxx)xxxxxCxxxxxx(xxx)xxxC(x)xxxxC
PF00228:Bowman Birk inhibitor
Bowman-Birk inhibitor family is in the many serpins family.They have repetitive structure, and have two different inhibition sites usually.These inhibitor mainly find in plant, particularly in the seed of beans and grain, find (R.F.Qi waits the people. (2005) ActaBiochim Biophys Sin (Shanghai), 37:283-92).
Have two different classifications: 1) have the domain of 14 cysteine, topology is 1-14 2-63-13,4-5 7-9 8-12 10-11, or have the domain of 10 cysteine, topology is 1-10 2-5 3-46-8 7-9.Because these subfamilies, the position of Cys seems not to be conservative especially in the marked graph, although they are all guarded for each subfamily.
Its domain shows high disulfide bond density, and every about 50 aminoacid have 5 or 7 disulfide bond.Therefore cysteine spacing between each cysteine can be used for the library design less than 10.The cysteine position is high conservative between the different members of this family.Referring to Figure 97-98.
PF00184: neurohypophyseal hormone, C-end structure territory
With the neruosecretory granule complex of the protein that is rich in disulfide bond 1.1 ratios that are called as neurophysin in find the nonapeptide hormone vassopressin and the oxytocin of high concentration.Identified two kinds of closely-related NP classifications, a kind of and vassopressin are compound, another kind of and oxytocin compound [L.Q.Chen waits the people. (1991) Proc Natl Acad Sci U S A, 88:4240-4].There are 75 members in this family, and the cysteine position height is conservative.The module that is rich in cysteine repeats in marked graph.Referring to Figure 99.
Two kinds of modules have homologous disulfide bond topology.A disulfide bond is connected two modules by Cys1 with Cys8.If omit this disulfide bond, then the disulfide bond topology of each module is 1-3,2-6,4-5.Referring to Figure 100.
The crystal structure of neurophysin discloses a monomer to be made up of two homology layers, and every layer has four antiparallel beta chains.Two zones are connected by the spiral of spreading ring after.Monomer-monomer contact relates to the antiparallel beta sheet and interacts, and forms dimer with two-layer 8 beta chains.
PF00200: extendible dimerization de-connects albumen
De-connecting albumen is the peptide of about 50-80 amino acid residue, contains a plurality of cysteine that all participate in disulfide bond.De-connect albumen and contain Arg-Gly-Asp (RGD) sequence, the recognition site of multiple attachment proteins.De-connecting proteic RGD sequence interacts with glycoprotein IIB-IIIa complex by inference.
De-connect albumen according to the classification of length and cysteine content (J.J.Calvete waits the people. (2005) Toxicon, 45:1063-74).
Little: CxxxxCCxxCxxxxxxxxCxxxxxxxxx (xx) CxxxxCxC, have 4SS, the disulfide bond topology is 1-4 2-6 3-7 5-8.
In:
xCxxxxxxCCxxxxCxxxx(x)xxxCx(xxx)xxxCCxxCxxxxxxxxCxxxxxxxxxxxCxxxxxxxC
Have 6SS, the disulfide bond topology is 1-5,2-4,3-8,6-8,7-11,10-12.
Long:
XxxxxxxxxxCxCxxxxCxxxCCxxxxCxxxx (x) xxxCx (xxx) xxxCCxxCxxxxxxxxCxxxxxxxxxxxCxxxxxxxC has 7SS, and the disulfide bond topology is 1-4,2-7,3-6,5-11,8-10,9-13,12-14
Dimer:
CCxxxxCxxxx (x) xxxCx (xxx) xxxCCxxCxxxxxxxxCxxxxxxxxxxxCxxxxxxxC has
Figure A20068003404901691
S, disulfide bond topology are 1-7,4-6, and 5-10,8-10 and two intermolecular SS that relate to Cys2 and Cys3 produce the dimerization integrin.Referring to Figure 101 and 157.Have been found that these not evolutionary relationships between on the same group, it is characterized in that disulfide bond loses/add.Therefore, in vivo in the evolutionary process this motif can extend.
Appendix C: have the support that highly repeats motif
Be rich in the repetitive proteins matter (CRRPs) of cysteine
PF00396: granulin
Granulin is the family of the peptide that is rich in cysteine of about 6Kd, may have multiple biological activity (A.Bateman waits the people. (1998) J Endocrinol, 158:145-51).A kind of precursor protein matter (be called acrogranin, sequence sees below) seven kinds of multi-form granulins (grnA is to grnG) of may encoding, they may be released after posttranslational protein hydrolysis processing.Granulin is relevant with PMP-D1 on evolving, and PMP-D1 is a kind of peptide that extracts from the Locustamigratoria (Linnaeus) pars intercerebralis.Referring to Figure 103.Granulin spacing: CxxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxCxx
DBP:1-3_25_47_6-9_8-11_10-12
The design that enlarges size (lines out below adding the medicated cap motif; 1 repetition marks with italic, and 1 is repeated black matrix and marks):
3C6C5 C6C2
Introduce the design of kink: 3C6C5CC a4G3CC bP5CC c2G2CC dP4 C6C2
Can use the 8-6-5-5 pattern of nature or more common 5-5-5-5 pattern.Because structure has beta sheet, a kind of method is to support to the aminoacid of good beta sheet organizator and avoid not being the aminoacid of beta sheet organizator.Preferred following aminoacid, they can obtain with mixed cipher: valine, isoleucine, phenylalanine, tyrosine, tryptophan and threonine.Figure 125 shows the granulin structure.
Adopt the design of 5 aminoacid random loops:
3C6C5
Figure A20068003404901701
CC5CC5CC5CC5 C6C2
Minimum initiation protein has only two end caps:
C6C5C6C (17 random amino acids)
Adding minimum unit increases:
C6C5CC5C6C
Treatment step: the preparation library, randomized 5CC5 unit is added in elutriation, and the 5CC5 unit is added in elutriation, or the like.
PF02420: antifreeze protein
Antifreeze protein is the protein of 8kDa, and formation β-helical structure (M.E.Daley waits the people. (2002) Biochemistry, 41:5515-25).The N-end adds the medicated cap motif and is formed by the microprotein domain, and topology is 1-3 2-5 4-6.Increasing disulfide bond connection degree in this motif is the repetitive of the 2C5C3 of 1-2.Threonine is guarded, because it participates in the ice combination, but can ignore from design.Serine and alanine are guarded, because have only little side chain just can cater to this spiral inside.It should be noted that does not have hydrophobic core fully.Figure 104 shows some freeze proof deutero-repetitive proteins.Figure 104 shows some motifs.Referring to Figure 127.
Natural sequence:
QCTGGADCTSCTG ACTGCGNCPNA (VTCTNSQHCVKA) (NTCTGSTDCNTA) (QT CTNSKDCFEA) (NTCTDSTNCYKA)(TACTNSSGCPGH)
Repeating part is clearer when following demonstration:
QCTGGADCTSCTG ACTGCGNCPNA
(VTCTNSQHCVKA)
(NTCTGSTDCNTA)
(QTCTNSKDCFEA)
(NTCTDSTNCYKA)
(TACTNSSGCPGH)
Different designs (lines out below adding the cap territory; Repetitive sequence marks with italic):
1) 1C5C2C3C2C2C3
Figure A20068003404901711
2) 1C5C2C3C2C2C3
Figure A20068003404901712
3) QCTGGA
Figure A20068003404901713
4) QCTGGA
Figure A20068003404901714
PF00757: furin-spline structure territory
The zone that furin-sample is rich in cysteine finds in from Eukaryotic multiple proteins that these protein participate in the signal transduction mechanism by receptor tyrosine kinase, and it relates to receptor
Figure A20068003404901715
Collection.Referring to Figure 105.
A subclass of marked graph is folded into spiral-shaped repeating part, and is used as the support of library design: CxxxCxxxCxxxxxxCCxxxCxxxCxxxxxxxC.The topology of this motif is 1-3_2-4_5-7_6-8.The member of this family shows high conservative aspect cysteine site and spacing.This repeating part can add (CxxxCxxxCxxxxxxxC) by the C-terminal to above-mentioned motif nProlong.
PF03128:CxCxCx
This repeats to contain conservative mode CXCXC, and wherein x can be an arbitrary amino acid.This repetition can find to reach 5 copies in vascular endothelial growth factor C.In the salivary gland of dipteron midge (Chironomustentans), visible specific messenger ribonucleoprotein (mRNP) granule, Balbiani ring (BR) granule in the gene assembling process and in the cell caryoplasm transport process.This repetition finds to surpass 70 copies (seeing below) in Balbiani ring albumen 3.Also in some silk protein, find.
CXCXC repeats not form inner disulfide bond, because this ring only strides across three aminoacid, and the microprotein among the data base does not all have 3 cysteine spans.Shown in Figure 109, the cysteine in the CxCxCx motif participates in containing the real multiple formation that connects the multiple disulfide bond of different copies.Typically between repeating, CxCxCx finds a cysteine (in marked graph, guard, but the position can change).Figure 106,107,108.
Actual: C10 C1C1C8C10 C1C1C8C10 C1C1C3C10 C1C1C6C11C
Extract, have beginning and end: C1C8C10 C1C1C8C10 C1C1C8C10 C1
The disulfide bonding structural model is presented among Figure 109.
PF05444:DUF753
Repeating sequences in the domain of several Unknown Function of fruit bat.
Figure 110.
PF01508: Paramoecium
The surface antigen that contains the above-mentioned repetitive sequence of 37 copies.Structure function has been proposed.There is not the α spiral in the secondary structure prediction prompting, but has βZhe Die structure (how like this not knowing the interference prediction of the existence of disulfide bond possibility).Figure 111-112.
PF00526:Dicty
Several kinds of the net post Pseudomonas have the protein that contains conservative repetitive sequence.These protein by
Figure A20068003404901721
Be described as " extracellular matrix protein B ", " ring nucleus nucleotide phosphodiesterase inhibitors precursor ", " prestalk protein precursor ", " the caldesmon CamBP64 of supposition " and " being rich in the acid conformability memebrane protein precursor of cysteine " and " protein of supposition " together.Referring to Figure 113.
PF03860:DUF326
This family is the little repetitive sequence that is rich in cysteine.These cysteine are followed the pattern of CxxCxxxCxxCxxxCxxC basically, although they often also appear at other positions of this repetitive sequence.Referring to Figure 114.
PF02363: the repetitive sequence that is rich in cysteine
This cysteine repetitive sequence CxxxCxxxCxxxC repeats 34 times in O17970_CAEEL in this family's family sequence.The function of these repetitive sequences is not clear, because their existing proteinic functions are not clear.Most of sequences in this family come from Caenorhabditis elegans.
Referring to Figure 115-116.
Figure A20068003404901722
Figure A20068003404901731
List of references:
Artavanis-Tsokanas, people such as S. (1995) Science 268:225-232.
Aster, people such as JC. (1999) Biochemistry 38:4736.
People such as Bensch KW. (1995) FEBS Lett 368:331-335.
Bork,P(1993)FEBS?Lett?327:125-30
Carr, people such as MD. (1994) PNAS 91:2206-2210.
Chirino?AJ,Ary?ML,Marshall?SA.(2004)Minimizing?the?immunogenicity?ofprotein?therapeutics.Drug?Discovery?Today9:82-90
People such as Chong JM. (2001) J.Biol.Chem.277:5134-5144.
Chong, JM and Speicher, DW (2001) J.Biol.Chem.276:5804-5813.
Conticello?SG,Gilad?Y,Avidan?N,Ben-Asher?E,Levy?Z,Fainzilber?M.(2001)Mechanisms?for?evolving?hypervariability:the?case?of?conopeptides.Mol?Biol?Evol.18:120-31.
People (1995) Structure 3:435-448. such as Cornet B
DeA waits the people. (1994) PNAS 91:1084-1088
Dufton?MJ(1984)J.Mol.Evol.20:128-134.
Fajloun, people such as Z (2000) J.Biol.Chem.275:39394-402.
Fitzgerald, people such as K. (1995) Development121:4275-82.
People (1988) Annu Rev Biochem 57:665-700. such as Gray WR
People (1999) EMBO J18:793-803. such as Guncar G
Hermeling?S,Crommelin?DJ,Schellekens?H,Jiskoot?W.(2004)Structure-immunogcnicity?relationships?of?therapeutic?proteins.Pharm?Res.21,897-903
Higgins, people such as JM. (1995) J.Immunol.155:5777-85
Hoffman, people such as W. (1993) Trends Biochem Sci 18:239-243.
Hugli,TE(1990)Curr?Topics?Microbiol?Immunol.153:181-208.
People (1995) Protein Sci 4:1587-1595. such as Jonassen I
Kamikubo, people such as Y (2004)
Kim, people such as JI (1995) J.Mol.Biol.250:659-671.
Kimble, people such as J. (1997) Annu Rev Cell Dev Biol 13:333-361.
Koduri, V and Blacklow, SC (2001) 40:12801
Lauber, people such as T. (2003) J.Mol.Biol.328:205-219.
People such as L é onetti. (1998) J.Immunol, 160; 3820-3827 (1998)
Léonetti?M,Thai?R,Cotton?J,Leroy?S,Drevet?P,Ducancel?F,Boulain?JC,Ménez?A.(1998)Increasing?immunogenicity?of?antigens?fused?to?Ig-binding?proteinsby?cell?surface?targeting.J.Immunol,160;3820-3827.
Leung-Hagesteijn, people such as C. (1992) Cell 71:289-99
People (1997) Genomics 43:316-320. such as Liu L
Maillère?B,Mourier?G,HervéM,Cotton?J,Leroy?S,Ménez?A.(1995)Immunogenicity?of?a?disulphide-containing?neurotoxin:presentation?to?T-cellsrequires?a?rednction?step.Toxicon,4,475-482;
People such as Maillere B.., the data of not delivering.
Maillere, B., Cotton, J., Mourier, G., L é onetti, M., Leroy, S. and M é nez, A. (1993) .Role of thiols in the presentation of a snake toxin to murine T cells.J.Immunol.150:5270-5280.
Martin?L,Stricher?F,Misse?D,Sironi?F,Pugniere?M,Barthe?P,Prado-Gotor?R,Freulon?I,Magne?X,Roumestand?C,Ménez?A,Lusso?P,Veas?F,Vita?C(2003)Rational?design?of?a?CD4?mimic?that?inhibits?HIV-1?entry?and?exposes?crypticneutralization?epitopes.Nat?Biotechnol.21:71-6.
Ménez,A.(1991)Immunology?of?snake?toxins,p.35-90.In:Snake?Toxins.ALHarvey(Ed),Pergamon?Press,Inc.,New?York.
Miljanich,G,P.(2004),Ziconotide:neuronal?calcium?channel?blocker?fortreating?severe?chronic?pain.Curr.Med.?Chem.23,3029.
Misenheimer, people such as TM. (2001) J.Biol.Chem.276:45882
People (1996) Eur.J.Biochem.240:125-133. such as Molina F
People such as Mourier., (1995) Toxicon 4:475-482.
Nielsen, people such as KJ (2002) J.Biol.Chem.277:27247-27255.
People (1993) J.Mol Biol234:405-420. such as Pallaghy PK
Pallaghy, people .Protein Sci3:1833 (1994) such as P
Pan, people such as TC. (1993) J.Cell.Biol.123:1269-1277
Patten, P.A. and Schellekens, H. (2003) The immunogenicity ofBiopharmaceuticals.In:Immunogenicity of Therapeutic Biological Products.Brown, F. and Mire-Sluis, A.R. (eds) .Dev.Biol.Basel, Karger, 112:81-97.
Pereira, C.M., Guth, B.E.C,, Sbrogio-Almeida, M.E. and Castilho, B.A. (2001) Microbiology147:861-867.
Petersen, people such as SV (2003) Proc.Natl.Acad.Sci.USA 100:13875-80.
Rebayl waits the people. (1991) Cell 67:687-699
Roszmusz, people such as E.. (2002) BBRC 296:156
Sands, BE and Podolsky, DK (1996) Annu.Rev.Physiol.58:253-273.
Schultz-Cherry, people such as S. (1995) J.Biol.Chem.270:7304-7310
Schultz-Cherry, people .J. (1994) J.Biol.Chem.269:26783-8 such as S
People (2005) Biopolymers 80:34-49. such as Schulz A.
Singh?H,Raghava?GP(2001)ProPred:prediction?of?HLA-DR?binding?sites.Bioinformatics17:1236-7.
People such as Skinner WS, J.Biol.Chem. (1989) 264:2150-2155.
So, T., Ito, H., Hirata, M., Ueda, T. and Imoto, T. (2001) Contribution ofconformational stability of hen lysozyme to induction of type 2 T-helper immuneresponses.Immunology 104:259-268.
Sturniolo, T., Deng the people. (1999) Generation of tissue-specific and promiscuous HLAligand databases using DNA microarrays and virtual HLA class II matrices.NatureBiotechnol, 17:555
Tam, JP and Lu, YA.Protein Sci.7:1583 (1998)
Tax, people such as FE. (1994) Nature 368:150-154.
Thai?R,Moine?G,Desmadril?M,Servent?D,Tartide?JL,Ménez?A,Léonetti?M.(2004)Antigen?stability?controls?antigen?presentation.J.Biol.Chem.279,50257-50266.
Van den Hooven, people such as HW. (2001) Biochemistry 40:3458-3466.
van?Vlijmen?HW,Gupta?A,Narasimhan?S,Singh?J(2004).A?novel?database?ofdisulfide?patterns?and?its?application?to?the?discovery?of?distantly?related?homologs.JMol?Biol?335:1083-92.
Vardar, people such as D. (2003) Biochemistry 42:7061
White, people such as CE. (1996) PNAS 93:10177.
People (2000) Biochemistry39:13669-13675. such as Xu Y
People (1988) Biochemistry27:7102-7105. such as Zaffarella GC
People (1999) FEBSLett457:509-514. such as Zhu S
Zuiderweg, people such as ER. (1989) Biochemistry28:172-85.

Claims (44)

1. a non-natural exists contains cysteine (C) protein, and it contains to have and is no more than 35 amino acid whose polypeptide, wherein
At least 10% aminoacid is cysteine in this polypeptide,
Pairing by cysteine in the support forms at least two disulfide bond, and wherein said pairing produces the complexity index method greater than 3.
2. a non-natural exists contains cysteine (C) protein, and it contains to have and is no more than about 60 amino acid whose polypeptide, wherein
At least 10% aminoacid is cysteine in this polypeptide,
Pairing by cysteine contained in this polypeptide forms at least four disulfide bond, and wherein
Described pairing produces the complexity index method greater than 4.
3. claim 1 or 2 non-natural exist contains cysteine (C) protein, and wherein complexity index method is greater than 6.
4. claim 1 or 2 non-natural exist contains cysteine (C) protein, and wherein complexity index method is greater than 10.
5. claim 1 or 2 non-natural exist contains cysteine (C) protein, and it combines with the target molecule specificity.
6. claim 1 or 2 non-natural exist contains cysteine (C) protein, and it keeps the target binding ability after being heated to above about 50 ℃ temperature.
7. claim 1 or 2 non-natural exist contains cysteine (C) protein, and it keeps the target binding ability after being heated to above about 80 ℃ temperature.
8. claim 1 or 2 non-natural exist contains cysteine (C) protein, it be heated to above about 100 ℃ temperature and continue to surpass 0.1 second after keep the target binding ability.
9. claim 1 or 2 non-natural exist contains cysteine (C) protein, it and the part coupling that is selected from label, effector, antibody and half-life prolongation.
10. claim 1 or 2 non-natural exist contains cysteine (C) protein, and it is a kind of monomer.
Contain cysteine (C) protein 11. the non-natural of claim 1 or 2 exists, it is a kind of polymer.
Contain cysteine (C) protein 12. the non-natural of claim 1 or 2 exists, wherein this protein comprises one type support.
Contain cysteine (C) protein 13. the non-natural of claim 1 or 2 exists, wherein this protein comprises the support of more than one types.
Contain cysteine (C) protein 14. the non-natural of claim 1 or 2 exists, wherein this protein comprises target binding site and half-life prolongation.
Contain cysteine (C) protein 15. the non-natural of claim 1 or 2 exists, wherein this protein comprises and the bonded repetitive of target.
16. the non-natural of claim 1 or 2 exists contains cysteine (C) protein, wherein this protein comprise be selected from serum albumin, IgG, erythrocyte and serum can and proteinic half-life prolongation.
17. what show that non-natural at a kind of binding specificity of target exists contains cysteine (C) protein, this target is different from the corresponding naturally occurring natural target that contains cysteine (C) protein or support.
18. a non-natural protein, it contains 20-60 amino acid whose single structure territory, and it has 3 or more disulfide bond, and the protein bound that exposes with human serum, and wherein said protein contains and is lower than 5% aliphatic amino acid.
19. protein that non-natural exists, it contains 20-60 amino acid whose single structure territory, it has 3 or more disulfide bond, and with the protein bound that human serum exposes, the score of wherein said protein in the T-Epitope program less than the data base in proteinic meansigma methods 90%.
20. the proteinic library that claim 1,2,18 or 19 non-natural exist.
21. show the gene package body in the library of claim 20.
22. one kind is detected and whether has the interactional method of specificity between the allogenic polypeptide of showing on target and the gene package body, this method comprises:
(a) provide the gene package body in the library of showing claim 20;
(b) under the condition that is fit to the stable polypeptide-target complex of generation, make gene package body contact target; With
(c) formation of stable polypeptide-target complex on the detection gene package body, the interactional existence of detection specificity thus.
23. the method for claim 22 further comprises the step of separating the gene package body of showing the polypeptide with required character.
24. a pharmaceutical composition, its non-natural that contains claim 1 or 2 exists contains cysteine (C) protein and pharmaceutically acceptable carrier.
Contain cysteine (C) support 25. a non-natural exists, the binding specificity that it shows at target molecule comprises and has two according to being selected from C1 -2,3-4, C 1-3,2-4And C 1-4,2-3The polypeptide of the disulfide bond that forms by cysteine pairing in the support of pattern, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of polypeptide N-.
Contain cysteine (C) support 26. a non-natural exists, the binding specificity that it shows at target molecule comprises and has three according to being selected from C 1-2,3-4,5-6, C 1-2,3-5,4-6, C 1-2,3-6,4-5, C 1-3,2-4,5-6, C L-3,2-5, 4-6, C 1-3,2-6,4-5, C 1-4,2-3,5-6, C 1-4,2-6,3-5, C 1-5,2-3,4-6, C 1-5,2-4,3-6, C 1-5,2-6,3-4, C 1-6,2-3,4-5And C 1-6,2-5,3-4The polypeptide of the disulfide bond that forms by cysteine pairing in the support of pattern, wherein two numerals that connect with hyphen form disulfide bond from which two cysteine pairing of the terminal counting of polypeptide N-.
Contain cysteine (C) support 27. a non-natural exists, the binding specificity that it shows at target molecule comprises and has at least four according to the polypeptide of the pattern that is selected from down group by the disulfide bond of cysteine pairing formation in the support:
1-2?3-4?5-6?7-8 1-2?3-4?5-7?6-8 1-2?3-4?5-8?6-7 1-2?3-5?4-6?7-8 1-2?3-5?4-7?6-8 1-2?3-5?4-8?6-7
1-2?3-6?4-5?7-8 1-2?3-6?4-7?5-8 1-2?3-6?4-8?5-7 1-2?3-7?4-5?6-8 1-2?3-7?4-6?5-8 1-2?3-7?4-8?5-6
1-2?3-8?4-5?6-7 1-2?3-8?4-6?5-7 1-2?3-8?4-7?5-6 1-3?2-4?5-6?7-8 1-3?2-4?5-7?6-8 1-3?2-4?5-8?6-7
1-3?2-5?4-6?7-8 1-3?2-5?4-7?6-8 1-3?2-5?4-8?6-7 1-3?2-6?4-5?7-8 1-3?2-6?4-7?5-8 1-3?2-6?4-8?5-7
1-3?2-7?4-5?6-8 1-3?2-7?4-6?5-8 1-3?2-7?4-8?5-6 1-3?2-8?4-5?6-7 1-3?2-8?4-6?5-7 1-3?2-8?4-7?5-6
1-4?2-3?5-6?7-8 1-4?2-3?5-7?6-8 1-4?2-3?5-8?6-7 1-4?2-5?3-6?7-8 1-4?2-5?3-7?6-8 1-4?2-5?3-8?6-7
1-4?2-6?3-5?7-8 1-4?2-6?3-7?5-8 1-4?2-6?3-8?5-7 1-4?2-7?3-5?6-8 1-4?2-7?3-6?5-8 1-4?2-7?3-8?5-6
1-4?2-8?3-5?6-7 1-4?2-8?3-6?5-8 1-4?2-8?3-7?5-6 1-5?2-3?4-6?7-8 1-5?2-3?4-7?6-8 1-5?2-3?4-8?6-7
1-5?2-4?3-6?7-8 1-5?2-4?3-7?6-8 1-5?2-4?3-8?6-7 1-5?2-6?3-4?7-8 1-5?2-6?3-7?4-8 1-5?2-6?3-8?4-7
1-5?2-7?3-4?6-8 1-5?2-7?3-6?4-8 1-5?2-7?3-8?4-6 1-5?2-8?3-4?4-7 1-5?2-8?3-6?4-7 1-5?2-8?3-7?4-6
1-6?2-3?4-5?7-8 1-6?2-3?4-7?5-8 1-6?2-3?4-8?5-7 1-6?2-4?3-5?7-8 1-6?2-4?3-7?5-8 1-6?2-4?3-8?5-7
1-6?2-5?3-4?7-8 1-6?2-5?3-7?4-8 1-6?2-5?3-8?4-7 1-6?2-7?3-4?5-8 1-6?2-7?3-5?4-8 1-6?2-7?3-8?4-5
1-6?2-8?3-4?5-7 1-6?2-8?3-5?4-7 1-6?2-8?3-7?4-5 1-7?2-3?4-5?6-8 1-7?2-3?4-6?5-8 1-7?2-3?4-8?5-6
1-7?2-4?3-5?6-8 1-7?2-4?3-6?5-8 1-7?2-4?3-8?5-6 1-7?2-5?3-4?6-8 1-7?2-5?3-6?4-8 1-7?2-5?3-8?4-6
1-7?2-6?3-4?5-8 1-7?2-6?3-5?4-8 1-7?2-6?3-8?4-5 1-7?2-8?3-4?5-6 1-7?2-8?3-5?4-6 1-7?2-8?3-6?4-5
1-8?2-3?4-5?6-7 1-8?2-3?4-6?5-7 1-8?2-3?4-7?5-6 1-8?2-4?3-5?6-7 1-8?2-4?3-6?5-7 1-8?2-4?3-7?5-6
1-8?2-5?3-4?6-7 1-8?2-5?3-6?4-7 1-8?2-5?3-7?4-6 1-8?2-6?3-4?5-7 1-8?2-6?3-5?4-7 1-8?2-6?3-7?4-5
1-8?2-7?3-4?5-6 1-8?2-7?3-5?4-6 1-8?2-7?3-6?4-5
Wherein two numerals that connect with hyphen form disulfide bond from terminal which two the cysteine pairing counted of polypeptide N-.
Contain cysteine (C) support 28. claim 25,26 or 27 non-natural exist, it keeps the target binding ability after being heated to above about 50 ℃ temperature.
Contain cysteine (C) support 29. claim 25,26 or 27 non-natural exist, it keeps the target binding ability after being heated to above about 80 ℃ temperature.
30. claim 25,26 or 27 non-natural exist contains cysteine (C) support, it be heated to above about 100 ℃ temperature and continue to surpass 0.1 second after keep the target binding ability.
Contain cysteine (C) support, it and the part coupling of selecting order label, effector and antibody 31. claim 25,26 or 27 non-natural exist.
Contain cysteine (C) support 32. claim 25,26 or 27 non-natural exist, it is a kind of monomer.
Contain cysteine (C) support 33. claim 25,26 or 27 non-natural exist, it comprises the half-life prolongation.
34. the non-natural of claim 33 exists contains cysteine (C) support, wherein said half-life prolongation be selected from serum albumin, IgG, erythrocyte and serum can and protein.
Contain cysteine (C) support 35. claim 25,26 or 27 non-natural exist, it shows the binding specificity at a kind of target, and this target is different from the corresponding naturally occurring natural target that contains cysteine (C) protein or support.
36. the library that contains cysteine (C) support that claim 25,26 or 27 non-natural exist.
37. show the gene package body in the library of claim 36.
38. one kind is detected and whether has the interactional method of specificity between the allogenic polypeptide of showing on target and the gene package body, this method comprises:
(a) provide the displaying gene package body of claim 37;
(b) under the condition that is fit to the stable polypeptide-target complex of generation, make gene package body contact target; With
(c) formation of stable polypeptide-target complex on the detection gene package body, the interactional existence of detection specificity thus.
39. the method for claim 38 further comprises the step of separating the gene package body of showing the polypeptide with required character.
40. the method for claim 37, wherein said gene package body is a phage.
41. the method for claim 36, wherein said phage is a filobactivirus.
42. one kind produces the method that contains cysteine (C) support that non-natural exists, comprising:
A kind of host cell is provided, and it comprises the nucleic acid that contains cysteine (C) support that each non-natural of coding claim 25-27 exists;
Under the condition that realizes by the described support of described expression of nucleic acid, in suitable culture medium, cultivate described host cell.
43. the method for claim 38 further comprises the step that reclaims described support from described culture medium.
44. a pharmaceutical composition, it comprises that claim 25,26 or 27 non-natural exists contains cysteine (C) support and pharmaceutically acceptable carrier.
CNA2006800340492A 2005-09-27 2006-09-27 Proteinaceous pharmaceuticals and uses thereof Pending CN101583370A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US72127005P 2005-09-27 2005-09-27
US60/721,188 2005-09-27
US60/721,270 2005-09-27
US60/743,622 2006-03-21

Publications (1)

Publication Number Publication Date
CN101583370A true CN101583370A (en) 2009-11-18

Family

ID=41365153

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006800340492A Pending CN101583370A (en) 2005-09-27 2006-09-27 Proteinaceous pharmaceuticals and uses thereof

Country Status (1)

Country Link
CN (1) CN101583370A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107683290A (en) * 2015-04-22 2018-02-09 Ucb生物制药私人有限公司 For the method for the percentage for increasing monomeric igg FAB DSFV polymer species
WO2018049285A1 (en) 2016-09-09 2018-03-15 Fred Hutchinson Cancer Research Center Stable peptides and methods of use thereof
CN107936092A (en) * 2010-01-29 2018-04-20 阿切尔丹尼尔斯密德兰公司 It is combined with the peptide domain of the small molecule of industrial significance
CN108026162A (en) * 2015-09-15 2018-05-11 豪夫迈·罗氏有限公司 Cystine knot rack platform
CN110337443A (en) * 2016-06-27 2019-10-15 乌尔夫.乔兰森 Bioactive compound derived from Nemertea
CN110776569A (en) * 2019-10-09 2020-02-11 天津大学 Diblock fusion protein with adhesion-freeze resistance dual functions and synthesis method and application thereof
CN111705365A (en) * 2014-02-11 2020-09-25 科罗拉多州立大学董事会(法人团体) CRISPR-supported multiplexed genome engineering
US11013814B2 (en) 2017-03-16 2021-05-25 Blaze Bioscience, Inc. Cartilage-homing peptide conjugates and methods of use thereof
US11090358B2 (en) 2015-09-09 2021-08-17 Fred Hutchinson Cancer Research Center Cartilage-homing peptides
US11331393B2 (en) 2017-06-15 2022-05-17 Blaze Bioscience, Inc. Renal-homing peptide conjugates and methods of use thereof
CN115120704A (en) * 2022-07-25 2022-09-30 山东中医药大学 A medicine for treating depression by reducing oxidative damage of nerve cells
US11548923B2 (en) 2017-01-18 2023-01-10 Fred Hutchinson Cancer Center Peptide compositions and methods of use thereof for disrupting TEAD interactions
US11866466B2 (en) 2017-12-19 2024-01-09 Blaze Bioscience, Inc. Tumor homing and cell penetrating peptide-immuno-oncology agent complexes and methods of use thereof

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107936092A (en) * 2010-01-29 2018-04-20 阿切尔丹尼尔斯密德兰公司 It is combined with the peptide domain of the small molecule of industrial significance
CN107936092B (en) * 2010-01-29 2022-08-09 阿切尔丹尼尔斯密德兰公司 Peptide domains binding small molecules of industrial interest
CN111705365A (en) * 2014-02-11 2020-09-25 科罗拉多州立大学董事会(法人团体) CRISPR-supported multiplexed genome engineering
CN107683290B (en) * 2015-04-22 2022-01-04 Ucb生物制药有限责任公司 Method for increasing the percentage of monomeric antibody FAB-DSFV multimer species
CN107683290A (en) * 2015-04-22 2018-02-09 Ucb生物制药私人有限公司 For the method for the percentage for increasing monomeric igg FAB DSFV polymer species
US11648290B2 (en) 2015-09-09 2023-05-16 Fred Hutchinson Cancer Center Cartilage-homing peptides
US11090358B2 (en) 2015-09-09 2021-08-17 Fred Hutchinson Cancer Research Center Cartilage-homing peptides
US11407794B2 (en) 2015-09-15 2022-08-09 Genetech, Inc. Cystine knot scaffold platform
CN108026162A (en) * 2015-09-15 2018-05-11 豪夫迈·罗氏有限公司 Cystine knot rack platform
CN108026162B (en) * 2015-09-15 2022-04-05 豪夫迈·罗氏有限公司 Cystine knot scaffold platform
US11078243B2 (en) 2015-09-15 2021-08-03 Genentech, Inc. Cystine knot scaffold platform
US11155586B2 (en) 2015-09-15 2021-10-26 Genentech, Inc. Cystine knot scaffold platform
CN110337443A (en) * 2016-06-27 2019-10-15 乌尔夫.乔兰森 Bioactive compound derived from Nemertea
EP3509615A4 (en) * 2016-09-09 2020-08-05 Fred Hutchinson Cancer Research Center Stable peptides and methods of use thereof
WO2018049285A1 (en) 2016-09-09 2018-03-15 Fred Hutchinson Cancer Research Center Stable peptides and methods of use thereof
US11548923B2 (en) 2017-01-18 2023-01-10 Fred Hutchinson Cancer Center Peptide compositions and methods of use thereof for disrupting TEAD interactions
US11013814B2 (en) 2017-03-16 2021-05-25 Blaze Bioscience, Inc. Cartilage-homing peptide conjugates and methods of use thereof
US11331393B2 (en) 2017-06-15 2022-05-17 Blaze Bioscience, Inc. Renal-homing peptide conjugates and methods of use thereof
US11866466B2 (en) 2017-12-19 2024-01-09 Blaze Bioscience, Inc. Tumor homing and cell penetrating peptide-immuno-oncology agent complexes and methods of use thereof
CN110776569B (en) * 2019-10-09 2022-03-15 天津大学 Diblock fusion protein with adhesion-freeze resistance dual functions and synthesis method and application thereof
CN110776569A (en) * 2019-10-09 2020-02-11 天津大学 Diblock fusion protein with adhesion-freeze resistance dual functions and synthesis method and application thereof
CN115120704A (en) * 2022-07-25 2022-09-30 山东中医药大学 A medicine for treating depression by reducing oxidative damage of nerve cells

Similar Documents

Publication Publication Date Title
CN101583370A (en) Proteinaceous pharmaceuticals and uses thereof
US20070191272A1 (en) Proteinaceous pharmaceuticals and uses thereof
JP4369662B2 (en) Combinatorial library of monomer domains
CN101616685B (en) Unstructured recombinant polymers and uses thereof
EP1675623B1 (en) Ubiquitin or gamma-crystalline conjugates for use in therapy, diagnosis and chromatography
US8685890B2 (en) Multispecific peptides
US7772189B2 (en) Phage displayed cell binding peptides
AU2010326024A1 (en) Binding proteins that bind to human FGFR1c, human beta-Klotho and both human FGFR1c and human beta-Klotho
CN103360471A (en) Unstructured recombinant polymer and use thereof
JP2007524341A (en) Monomer domain combinatorial library
JPH08506487A (en) Total synthetic affinity reagent
Souriau et al. New binding specificities derived from Min-23, a small cystine-stabilized peptidic scaffold
US10093921B2 (en) Scaffolded peptidic libraries and methods of making and screening the same
JP5220620B2 (en) Plant chimera binding polypeptides for universal molecular recognition
AU2017252409B2 (en) Compositions and methods for nucleic acid expression and protein secretion in bacteroides
Kolářová et al. De novo developed protein binders mimicking Interferon lambda signaling
US20100249377A1 (en) ENGINEERED HYBIRD PHAGE VECTORS FOR THE DESIGN AND THE GENERATION OF A HUMAN NON-ANTIBODY PEPTIDE OR PROTEIN PHAGE LIBRARY VIA FUSION TO pIX OF M13 PHAGE
US20230357323A1 (en) Human il23 receptor binding polypeptides
Poole The integration of ProxiMAX randomisation with CIS display for the production of novel peptides
WO2012073045A2 (en) Polypeptide scaffold
WO2024035660A2 (en) Methods, systems and kits for identifying bio active compounds and therapeutic methods and compositions
Wetzel Development and Application of a RTX protein-based production platform for peptides and small proteins
US20140135472A1 (en) Affinity tags able to detect frameshifts
US20190031736A1 (en) Compositions and methods that involve protein scaffolds that specifically bind to hepatocyte growth factor receptor
WO2022165609A1 (en) Recombinant plant-derived antibodies and fc variants and related methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20091118