CN102521527A - Method for predicting space epitope of protein antigen according to antibody species classification - Google Patents

Method for predicting space epitope of protein antigen according to antibody species classification Download PDF

Info

Publication number
CN102521527A
CN102521527A CN2011104126791A CN201110412679A CN102521527A CN 102521527 A CN102521527 A CN 102521527A CN 2011104126791 A CN2011104126791 A CN 2011104126791A CN 201110412679 A CN201110412679 A CN 201110412679A CN 102521527 A CN102521527 A CN 102521527A
Authority
CN
China
Prior art keywords
amino acid
epitope
antigen
epi
antibody
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104126791A
Other languages
Chinese (zh)
Other versions
CN102521527B (en
Inventor
曹志伟
孙静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201110412679.1A priority Critical patent/CN102521527B/en
Publication of CN102521527A publication Critical patent/CN102521527A/en
Application granted granted Critical
Publication of CN102521527B publication Critical patent/CN102521527B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for predicting space epitope of protein antigen according to antibody species classification. The method comprises the steps of 1) collecting data: classifying and collecting antigen-antibody structure data according to the antibody species source information (such as mouse source, human source and the like) to obtain classified data sets, and collecting descriptive features of protein interactive binding sites; 2) building model: computing the descriptive features for the collected classified data sets, and building antigen space epitope prediction model for antigens which bind antibodies of different species sources according to the interspecies difference; and 3) predicting epitope: according to the species source of antibody to be bound, selecting the antigen space epitope prediction model to predict the potential space epitope for antigen with unknown epitope. The method of the invention predicts the space epitope of protein antigen according to antibody species classification, comprehensively utilizes the amino acid physicochemical feature and three-dimensional space local structure feature in the protein, has ingenious design, and notably improves prediction accuracy. Thus, the method is suitable for being applied on a large scale.

Description

A kind of method according to antibody species classification predicted protein matter antigen space epi-position
Technical field
The present invention relates to antigen space epi-position technical field, specifically be meant a kind of method according to antibody species classification predicted protein matter antigen space epi-position.This method is according to the antibody source of species Information Selection model that is combined, each seed amino acid physicochemical property of integrated use and three-dimensional structure spatial information predicted protein matter antigen space epi-position.
Background technology
Show that according to relevant statistics along with the fast development of global vaccine industry, present international vaccine market scale is above 20,000,000,000 dollars.Expectation is in the five-year, and global vaccine market will increase with annual 14% growth rate, and the average growth rate per annum in China vaccine market also will be above 15%.According to another a address prediction that Datamonitor company of market study mechanism delivers recently, by 2010, only be on global seven big markets, the scale in influenza vaccines market just might be above 3,000,000,000 dollars.Traditional vaccine is made by pathogenic microorganism attenuation or deactivation, and this type of immune effect of vaccine is good, and defective is often to bring bad reaction even relevant disease.And utilizing the subunit of antigenic substance, the vaccine that promptly has antigenic protein fragments or partial structurtes (epitope) development then can to a certain degree reduce this type of risk.Because only form part, can eliminate the antibody of many irrelevant antigen induced, thereby reduce the subsidiary reaction of vaccine and the relevant disease that vaccine causes by several kinds of main surface proteins or protein.Obtained certainly in the immune effect of relevant antigen subunit vaccine and the security widespread use at home and abroad.
The site of positioning protein matter antigen and antibodies how, promptly epi-position is significant for the design of new generation vaccine molecule, also means great demand simultaneously.Be main method through crystallization immune complex structural orientation proteantigen epi-position at present, but because its cycle is long, difficulty is big, also inconvenient in practical application.And confirm that by computing method epitope can accelerate related experiment work greatly.Calculate auxiliary epitope prediction through the excavation to available data, find that rule sets up model, prediction has antigenic potential epi-position, get rid of obviously can't binding antibody the position, thereby experiment is had direct directive function.Focus mostly in linear epitope for the epitope Study on Forecast both at home and abroad at present, be mainly t cell epitope.B cell spaces epi-position is owing to reasons such as structural complexities, to accuracy rate still not high so far (Xu, X.L., Sun, the J. of its prediction; Liu, Q., Wang, X.J., Xu; T.L., Zhu, R.X., Wu, D.; Cao, Z.W., Evaluation of spatial epitope computational tools based on experimentally-confirmed dataset for protein antigens.Chinese Science Bulletin, 2010.55 (20): p.6).How on the basis of available data, improving and improving forecast model is the key of dealing with problems.
Often there is the potential epi-position more than in the proteantigen surface; In bringing out the process of host immune response, having one or one type of epi-position plays a major role; It is master's immune response that the host is produced with this specificity; This phenomenon is called immundominance or immunodominance (immunodominance), and the epi-position that plays a crucial role is called the dominance epi-position.Our early-stage Study finds that same proteantigen which epi-position in immune response can show the source of species that immundominance depends on antibody information, particularly antibody.Existing space epi-position forecast model is many from the antigen angle, seldom relates to the antibody of required combination.Therefore, take into full account the information of institute's binding antibody, improve the accuracy of antigen spatial table position prediction,, seem very important to break through the limitation of tradition research in the past.
Summary of the invention
The objective of the invention is to overcome the shortcoming in the existing space epi-position forecasting techniques, a kind of method according to antibody species classification predicted protein matter antigen space epi-position is provided.This proteantigen spatial table position prediction method fully combines amino acid physicochemical property and three-dimensional structure spatial information according to the antibody source of species Information Selection model that is combined, and designs ingeniously, significantly improves forecasting accuracy, is suitable for large-scale promotion application.
To achieve these goals, the method according to antibody species classification predicted protein matter antigen space epi-position of the present invention is characterized in, may further comprise the steps:
Following steps:
(1) data aggregation: according to antibody source of species information (as, mouse source, people source etc.) categorised collection Ag-Ab structured data, obtain categorized data set; Collect the descriptive characteristics of protein-interacting binding site;
(2) modelling: for the categorized data set of collecting, calculate foregoing description property characteristic, set up to the antigen spatial table position prediction model that combines different plant species source antibody according to the class differences;
(3) epi-position prediction: the antibody source of species according to need combine is selected antigen spatial table position prediction model, and the antigen of unknown epi-position is carried out the prediction of latent space epi-position.
Preferably, said step (1) specifically may further comprise the steps:
(11) from the PDB database, collect the Ag-Ab structured data,, obtain antibody source of species information, with the classification of Ag-Ab structured data, obtain categorized data set according to antibody source of species information according to its source document;
(12) descriptive characteristics of collection protein-interacting binding site from document.
More preferably; In said step (11) about from the PDB database, collecting the Ag-Ab structured data; At first choose the structure of accuracy value, such as precision threshold being made as
Figure BDA0000118790340000021
less than precision threshold; It is long-pending to utilize the solvent accessibility method to calculate the amino acid whose solvent accessible surface of antigenic surface, confirms the space epi-position; Weigh the not similarity between the synantigen spatial table position, higher to keep precision such as structural similarity in a plurality of selection of antigen more than 85% the highest for space epi-position similarity, i.e. the minimum structure of accuracy value.
More preferably, in said step (12), said descriptive characteristics comprises amino acid physicochemical property and three dimensions partial structurtes characteristic.
Preferably, said step (2) specifically may further comprise the steps:
(21) utilize foregoing description property characteristic that all amino acid of antigenic surface are calculated, obtain a series of marking values;
(22) marking value and categorized data set are compared, select descriptive characteristics with potential prediction effect;
(23) set up to the antigen spatial table position prediction model that combines different plant species source antibody.
More preferably, in said step (21), said calculating comprises that following five kinds of calculating are one of at least:
To combining the categorized data set of different plant species antibody, adopt following formula (I) to calculate the preference property of the amino acid triangle of any three types of combinations (comprising same type) in the epi-position appearance,
preference R i - R j - R k = ( N R i - R j - R k ) epitope Σ N epitope ( N R i - R j - R k ) non - epitope Σ N non - epitope , R i , R j , R k = 1,2,3 , . . . 20 - - - ( I )
R wherein i, R jAnd R kRepresent any three seed amino acid types,
Figure BDA0000118790340000032
Expression R i-R j-R kThe preference property that the amino acid triangle of type occurs at epitope,
Figure BDA0000118790340000033
Expression R i-R j-R kThe amino acid triangle of type appears at the number of epitope, ∑ N EpitopeRepresent that all types of amino acid triangles appear at epitope number sum,
Figure BDA0000118790340000034
With ∑ N Non-epitopeRepresent R respectively i-R j-R kThe amino acid triangle of type and all types combination appears at the number of the non-epi-position of antigenic surface, and this value can be weighed R i-R j-R kThe amino acid of type tends to appear at epitope regions or surperficial non-epitope regions: Show R i-R j-R kThe tendentiousness that type appears at epitope regions will be higher than surperficial non-epitope regions, otherwise then tends to appear at surperficial non-epitope regions.With
Adopt relative solvent accessible surface long-pending (relative Accessible Surface Area) to weigh amino acid whose accessibility.Concrete grammar is big or small divided by the amino acid of the type itself when each amino acid whose solvent accessible surface of calculating is long-pending, thus the deviation of avoiding.The amino acid here size itself is that the solvent accessible surface of amino acid in tripeptides ALA-X-ALA of this type is long-pending, and wherein X representes the amino acid of this type.With
Adopt the sequence conservation software for calculation to weigh amino acid whose conservative property.The result is each amino acid whose score, and is linear with its conservative property on sequence.With
Adopt convergence factor (Clustering coefficient) to weigh the topology characteristic that amino acid constitutes network.The convergence factor on a summit is exactly the ratio that the number that connects the limit between its all adjacent vertex accounts for possible limit, Dalian number.For amino acid, this parameter has been weighed any amino acid r iThe tightness degree of amino acid gathering on every side.Calculating formula is as shown in the formula shown in (II),
C i = 2 | { e jk } | N i ( N i - 1 ) ; r j , r k ∈ N i , e jk ∈ E - - - ( II )
R wherein jAnd r kAll be r iThe neighbour, | { e Jk| expression r jAnd r kBetween in esse limit, N i(N i-1)/2 representes r iThe number on maximum limit that possibly exist on every side, the value of this formula result of calculation from 0 to 1 approaches 1 expression more with r iThe amino acid that constitutes for core amino acid is bunch tight more, otherwise structure is loose more; With
Adopt planarity coefficient (Planarity index) to weigh the smooth degree of amino acid area of space.For any amino acid r of antigenic surface i; With it is that core amino acid obtains an amino acid bunch; Utilize wherein to constitute least square plane of amino acid whose coordinate fitting, all constitute amino acid and calculate the plane property coefficient to being called apart from sum of this least square plane, and the theoretical value of this coefficient is infinite to just from 0; For the surface amino groups acid of same albumen, the high more explanation of this value institute zoning spatially is uneven more.
Need to prove that relative solvent accessible surface is long-pending in that amino acid is calculated, sequence conservation; Convergence factor, in the time of these character of planarity coefficient, though above provided these parameter range; Whether but not having definite threshold value to estimate this amino acid is epi-position amino acid; Because these values disperse for the epi-position amino acid on the same albumen, different albumen upper amino acid scores possibly differ greatly, so criterion cannot be provided; Concrete threshold value need be provided by the user in prediction, and those skilled in the art can confirm concrete threshold value as the case may be.
Preferably, said step (3) specifically may further comprise the steps:
(31) for the antigen of unknown epi-position, need the source of species of binding antibody to choose suitable antigen spatial table position prediction model according to it;
(32) according to the antigen spatial table position prediction model of choosing, calculate the amino acid whose score of unknown epitope antigen all surface, predict the latent space epi-position of unknown epitope antigen.
More preferably, in said step (32), said calculating comprises the amino acid epi-position tendentiousness marking of adopting following formula (III) to carry out,
score r = Σ rn indice N rn - - - ( III )
Wherein, r representes proteantigen surface arbitrary amino acid, and rn representes all surface amino groups acid in the certain zone of amino acid r on every side; Indice representes special properties; Such as the leg-of-mutton epi-position preference property of amino acid, the relative solvent accessible surface of amino acid is long-pending, the amino acid sequence conservative property; The convergence factor of three-dimensional structure and plane property coefficient, ∑ RnIndice representes that amino acid is about the marking value sum of character indice, N among the rn RnAmino acid whose number among the expression rn, the result of formula (III) representes the score of amino acid r about character indice: mark is high more to tend to appear at the epitope zone more.
Beneficial effect of the present invention specifically is:
1, the method according to antibody species classification predicted protein matter antigen space epi-position of the present invention is a fundamentals of forecasting with the source of species information of antigen institute binding antibody.Analyze the descriptive characteristics of epitope according to the different plant species origin classification of institute's binding antibody; And then, epi-position adopts the different character tendentiousness factor when predicting; Perhaps different descriptive characteristics; The observed situation of such disposal route and actual experiment matches, thereby makes that prediction is more accurate, is suitable for large-scale promotion application.
2, the method according to antibody species classification predicted protein matter antigen space epi-position of the present invention is introduced the descriptive characteristics of describing amino acid physicochemical property and three dimensions partial structurtes two big classes.Research in the past is abundant inadequately for the assurance of space epi-position three-dimensional feature; This method directly adopts multiple description three dimensions partial structurtes descriptor; And in prediction, consider the mark rather than the amino acid whose mark of single-point of regional area microenvironment; The three-dimensional structure characteristic that has reflected proteantigen space epi-position more accurately, thus make that prediction is more reliable, be suitable for large-scale promotion application.
Description of drawings
Fig. 1 is the schematic flow sheet that is used for the method according to antibody species classification predicted protein matter antigen space epi-position of the present invention.
Embodiment
In order more to be expressly understood technology contents of the present invention, describe the present invention below in conjunction with Fig. 1.Should be understood that embodiment is used to explain the present invention, rather than limitation of the present invention.
1. data aggregation
The antibody source of species comprises in the PDB database: MUSMUSCULUS, HOMO SAPIENS, RATTUSRATTUS, CRICETULUSMIGRATORIUS etc.Here be example with the maximum people source of data volume (HOMO SAPIENS) and mouse source (MUS MUSCULUS).
Utilize antibody, antigen, immu* etc. collect the antigen-antibody complex data as keyword search PDB database.The structure that the removal precision is lower (such as, precision threshold is made as ).Use the amino acid whose solvent accessible surface of Naccess V2.1.1 computed in software proteantigen long-pending, confirm the space epi-position.Weigh not similarity between the synantigen spatial table position, for space epi-position similarity higher (such as, structural similarity is more than 85%) a plurality of data select to keep the highest structure of precision, thereby guarantee a break-even spatial table bit data collection.Extract antibody source of species information, obtain categorized data set successively.
The descriptive characteristics of the protein interaction binding site that from document, relates in the finishing collecting former studies, tabulation sees the following form 1, specifically describes as follows,
A. amino acid physicochemical property:
I. amino acid whose solvent accessibility (Accessible Surface Area, ASA).This parameter is to weigh the important parameter of amino acid position on protein, and epi-position amino acid often has bigger solvent accessibility;
Ii. amino acid whose epi-position preference property.The position that dissimilar amino acid occurs often has preference property, calculates the frequency that 20 types amino acid appears at epi-position/surperficial non-epitope regions respectively;
Iii. amino acid is right, the leg-of-mutton epi-position preference property of amino acid.On the basis of single amino acids epi-position preference property, the amino acid that calculates dissimilar combinations is to preference property occurring with the amino acid triangle, and the characteristic that epitope regions amino acid is cooperated with its neighbour's amino acid has been stressed in the use of this parameter when participating in protein combination;
Iv.AAindex character (544 groups).AAindex (Kawashima; S. and M.Kanehisa; AAindex:amino acid index database.Nucleic Acids Res; The prediction of aspects such as 2000. (1): be to comprise database of information such as amino acid physicochemical property, sequence length information and local structural information entropy p.374), these score values are widely used in protein binding site, and structure is folding;
V. sequence evolution conservative property.The usual protein interaction sites often has conservative sequence and space structure; And a main part is a viral antigen in the proteantigen; These antigens are in long-term evolution; Its epi-position is in order to escape the monitoring of body immune system, and non-epi-position part can be faster compared to the surface for the frequency of mutation, promptly has lower sequence conservation.
B. three dimensions partial structurtes characteristic:
I. topology parameter.It is a little that proteantigen space epi-position can be regarded as with the single amino acids, and amino acid distance between any two is the network on limit.The topology parameter is as a kind of three-dimensional description of simplification; Can be used for describing space epi-position characteristic (Huang; J., S. Kawashima, and M. Kanehisa; New amino acid indices based on residue network topology.Genome Inform, 2007.18:p.152-61).Topology parameter degree of often selecting for use of comprising (Degree) and convergence factor (Clustering coefficient);
Ii. concavity and convexity index.High-lighting coefficient (Protruding index) (Thornton JM; Edwards MS; Taylor WR, Barlow DJ:Location of ' continuous ' antigenic determinants in the protruding regions of proteins.EMBO J 1986,5 (2): 409-413) with plane sex index (Planarity index) (Taylor WR; Thornton JM;, Turnell WG:An ellipsoidal approximation of protein shape.Journal of Molecular Graphics 1983 1:30-38) is the three-dimensional parameter of directly weighing the protein surface partial structurtes.
C. epi-position describe, in general terms:
I. epi-position is big or small.The space epi-position is the special region in proteantigen surface, the amino acid number that this zone comprises, and the solvent accessible surface is long-pending to be the important parameter of weighing potential epi-position;
Ii. epitope sequences continuity.Amino acid in the epi-position of space often on the sequence discontinuous amino acid flock together through the space of albumen is folding, these amino acid whose continuitys also are to judge the whether important parameter of suitable combination of a potential prediction epi-position.
20111212
The descriptive characteristics of table 1. proteantigen space epi-position
Figure BDA0000118790340000071
2. modelling
To the break-even categorized data set in the step 1, calculate the descriptive characteristics of collecting.Obtain quantitative target according to the result, as: the preference property that the amino acid triangle of any three types of combinations (comprising same type) occurs in epi-position,
preference R i - R j - R k = ( N R i - R j - R k ) epitope Σ N epitope ( N R i - R j - R k ) non - epitope Σ N non - epitope , R i , R j , R k = 1,2,3 , . . . 20 - - - ( I )
R wherein i, R jAnd R kRepresent any three seed amino acid types,
Figure BDA0000118790340000073
Expression R i-R j-R kThe preference property that the amino acid triangle of type occurs at epitope,
Figure BDA0000118790340000074
Expression R i-R j-R kThe amino acid triangle of type appears at the number of epitope, ∑ N EpitopeRepresent that all types of amino acid triangles appear at epitope number sum,
Figure BDA0000118790340000075
With ∑ N Non-epitopeRepresent R respectively i-R j-R kThe amino acid triangle of type and all types combination appears at the number of the non-epi-position of antigenic surface.This value can be weighed R i-R j-R kThe amino acid of type tends to appear at epitope regions or surperficial non-epitope regions:
Figure BDA0000118790340000076
Show R i-R j-R kThe tendentiousness that type appears at epitope regions will be higher than surperficial non-epitope regions, otherwise then tends to appear at surperficial non-epitope regions;
Or law of regularity; As: do 544 class indexs among the AAindex have discrimination to epi-position/non-epi-position partial amino-acid? Utilize 544 class indexs among the AAindex to epi-position amino acid group in the proteantigen data and surperficial non-epi-position amino acid set of calculated marking value; Check the average of two groups of data whether to have significant difference; Concrete separating capacity is calculated as shown in the formula (II)
proportion i = N stgnificant N all , i = 1,2,3 . . . 544 - - - ( II )
Wherein, i representes any class index in 544 groups, N StgnificantExpression can significantly be distinguished epi-position and the amino acid whose proteantigen number of surperficial non-epi-position, N by index i AllThe number of expression data centralization all proteins antigen, the result of formula (II) representes that index i distinguishes epi-position and the amino acid whose effect of surperficial non-epi-position: select the forward index of rank in 544 groups and think that it has potential predictive ability.
And for example: be the conservative property of epi-position amino acid on sequence higher still lower? Utilize sequence conservation software for calculation ConSurf (Ashkenazy H, Erez E, Martz E; Pupko T; And Ben-Tal N, ConSurf 2010:calculating evolutionary conservation in sequence and structure of proteins and nucleic acids.Nucleic Acids Res, 2010.38 (Web Server issue): p.W529-33) calculate this parameter; The result is the marking value after the normalization; Average=0, standard deviation=1, it is lower to get the higher amino acid sites evolution conservative property of score value.
Categorized data set is analyzed above-mentioned descriptive characteristics respectively, promptly according to the species classification of institute's binding antibody antigen is classified, above-mentioned characteristic is variant for the distinguishing ability of different classes of antigen space epi-position.Target is to find the characteristics combination of the proteantigen prediction that is applicable to different binding antibodies, and sets up the classification forecast model.Under the situation that the antibody species can be provided, improve the accuracy rate and the science of prediction.
3. epi-position prediction
For the proteantigen of unknown epi-position, how to confirm that its epi-position is the important step of vaccine design.In the method:
A. preference pattern: the antibody source of species of at first confirming to combine (as, people source, mouse source etc.), selects suitable forecast model, i.e. characteristics combination accordingly.
B. prediction marking: utilize the model characteristic of correspondence combination of choosing in a step, calculate unknown epi-position proteantigen all surface amino acid score.It is also higher that the higher amino acid of score appears at the possibility of epitope regions.
C. the result uses: further obtain potential space epitope regions, thereby instruct test or vaccine design.
The evaluation of the accuracy of Forecasting Methodology
The training dataset that the present invention adopts ends in April, 2011.Utilize antibody, antigen, keyword search PDB databases such as immu*; According to Source hurdle in the PDB database, extract data centralization antibody source of species information; Remove the structure of precision lower (precision threshold is made as
Figure BDA0000118790340000081
); Keep the highest structure of precision during epi-position similarity higher (similarity degree is more than 85%) when the space; Download altogether and obtain 161 nonredundant proteantigen-antibody complexes and space epi-position thereof; According to the antibody source of species information of extracting this is classified to 161 nonredundant proteantigen-antibody complexes and space epi-position thereof and to obtain categorized data set; Obtain the categorized data set in antibody species behaviour source and mouse source, comprise 64 and 75 proteantigen-antibody complex structures respectively.
From the document of the PubMed data in literature library searching protein interaction binding site of NCBI website, summary particularly.The descriptive characteristics of collecting can be divided into amino acid physicochemical property and three dimensions partial structurtes characteristic.
Marking to the following character of each surface amino groups acid calculating: solvent accessibility, amino acid triangle preference property and sequence conservation; With
Figure BDA0000118790340000082
is threshold value, regional computational topology parameter and concavity and convexity parameter that all amino acid sour to each surface amino groups and that comprise are on every side confirmed.
The ROC curve is according to a series of two different mode classifications (cut off value or decision threshold), is ordinate with True Positive Rate (sensitivity), the curve that false positive rate (1-specificity) is drawn for horizontal ordinate.This curve basis is judged a forecast model quality with 45 departing from of line of degree.AUC (Area Under the ROC Curve) is the ROC area under a curve, and value can be used to quantitatively weigh predict the outcome between 0 to 1, and it is good more to be worth high more explanation prediction effect.Above-mentioned character is a series of successive value to amino acid whose marking result, is discontinuous if whether amino acid is positioned at that epi-position/surperficial non-epi-position is expressed as numerical value, adopts the ROC curve to weigh the prediction effect of each character here.Different character is different for different categorized data set prediction performances, each categorized data set is selected to have the character of prediction effect (the AUC value is more than 0.6).
Final forecast model is classified according to species, is made up of heterogeneity.The input data are the proteantigen structure, and the output result is each amino acid whose each item character marking value sum on this proteantigen, and mark and antigenicity are directly proportional.
When the forecast model of this method is estimated, select for use the data of in the PDB database, issuing after in April, 2011 as test data set.Select same searching key word and screening conditions for use, obtain altogether 11 proteantigen-antibody complex structures, it is 2 following to tabulate:
Table 2 test set protein structure data
Figure BDA0000118790340000091
The data based antibody source of species of test set is chosen species classification forecast model, and calculate potential epitope.
DiscoTope (Haste Andersen P, Nielsen M, and Lund O; Prediction of residues in discontinuous B-cell epitopes using protein 3D structures.Protein Sci; 2006.15 (11): p.2558-67), BEpro (Sweredoski MJ and Baldi P, PEPITO:improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure.Bioinformatics; 2008.24 (12): p.1459-60) and SEPPA (Sun J; Wu D, Xu T, Wang X; Xu X; Tao L, Li YX, and Cao ZW; SEPPA:a computational server for spatial epitope prediction of protein antigens.Nucleic Acids Res, 2009.37 (Web Server issue): p.W672-6) be issue in recent years and the higher spatial table position prediction instrument of prediction level in the world.Adopt the antigen of same test collection to carry out epi-position prediction and result verification to this method and above-mentioned three forecasting tools.Here still adopt AUC (Area Under the Curve) value to describe and predict the outcome.Predicting the outcome of four kinds of methods is as shown in table 3:
Table 3 distinct methods predicts the outcome for the test set data
Figure BDA0000118790340000101
Can know that according to last table this method AUC average reaches 0.77, have higher predictablity rate for the test set data.Through four set of calculated results of monolateral paired t-test analysis said method, the predictablity rate of this method is significantly higher than other three groups of results (p<0.05).
Traditional vaccine is made by pathogenic microorganism attenuation or deactivation, and the defective of this type of vaccine is often to bring bad reaction even relevant disease.And utilizing the subunit of antigenic substance, the vaccine that promptly has antigenic protein fragments or partial structurtes (epitope) development then can reduce this type of risk.Can accelerate related experiment work greatly through the auxiliary definite epitope of computing method.Compare in the past forecasting research, the maximum innovative point of the method among the present invention are that the source of species information with binding antibody is basis of classification, the directly prediction of guiding space epi-position.In addition, this method integrated use amino acid physicochemical property and three-dimensional structure spatial information are described antigen space epi-position, and prediction is accurate, and has clear and definite practical value, is applicable to large-scale promotion application.
In sum; Method according to antibody species classification predicted protein matter antigen space epi-position of the present invention takes into full account the source of species of institute's binding antibody, has the high characteristics of predictablity rate; For antigen space epi-position confirm offer help, can assist vaccine design and immunization therapy.
In this instructions, the present invention is described with reference to its certain embodiments.But, still can make various modifications and conversion obviously and not deviate from the spirit and scope of the present invention.Therefore, instructions and accompanying drawing are regarded in an illustrative, rather than a restrictive.

Claims (8)

1. the method according to antibody species classification predicted protein matter antigen space epi-position is characterized in that, may further comprise the steps:
(1) data aggregation: collect the Ag-Ab structured data according to the information classification of antibody source of species, obtain categorized data set; Collect the descriptive characteristics of protein-interacting binding site;
(2) modelling: for the categorized data set of collecting, calculate foregoing description property characteristic, set up to the antigen spatial table position prediction model that combines different plant species source antibody according to the class differences;
(3) epi-position prediction: the antibody source of species according to need combine is selected antigen spatial table position prediction model, and the antigen of unknown epi-position is carried out the prediction of latent space epi-position.
2. according to claim 1 according to antibody species classification predicted protein matter antigen spatial table method for position, it is characterized in that said step (1) specifically may further comprise the steps:
(11) from the PDB database, collect the Ag-Ab structured data,, obtain antibody source of species information, with the classification of Ag-Ab structured data, obtain categorized data set according to antibody source of species information according to its source document;
(12) descriptive characteristics of collection protein-interacting binding site from document.
3. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 2; It is characterized in that; In said step (11),, at first choose the structure of accuracy value less than precision threshold about from the PDB database, collecting the Ag-Ab structured data; It is long-pending to utilize the solvent accessibility method to calculate the amino acid whose solvent accessible surface of antigenic surface, confirms the space epi-position; Weigh the not similarity between the synantigen spatial table position, it is the highest to keep precision for the higher a plurality of selection of antigen of space epi-position similarity, i.e. the minimum structure of accuracy value.
4. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 2 is characterized in that in said step (12), said descriptive characteristics comprises amino acid physicochemical property and three dimensions partial structurtes characteristic.
5. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 1 is characterized in that said step (2) specifically may further comprise the steps:
(21) utilize foregoing description property characteristic that all amino acid of antigenic surface are calculated, obtain a series of marking values;
(22) marking value and categorized data set are compared, select descriptive characteristics with potential prediction effect;
(23) set up to the antigen spatial table position prediction model that combines different plant species source antibody.
6. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 5 is characterized in that, in said step (21), said calculating comprises that following five kinds of calculating are one of at least:
To combining the categorized data set of different plant species antibody, adopt following formula (I) to calculate the preference property of the amino acid triangle of any three types of combinations in the epi-position appearance,
preference R i - R j - R k = ( N R i - R j - R k ) epitope Σ N epitope ( N R i - R j - R k ) non - epitope Σ N non - epitope , R i , R j , R k = 1,2,3 , . . . 20 - - - ( I )
R wherein i, R jAnd R kRepresent any three seed amino acid types,
Figure FDA0000118790330000022
Expression R i-R j-R kThe preference property that the amino acid triangle of type occurs at epitope, Expression R i-R j-R kThe amino acid triangle of type appears at the number of epitope, ∑ N EtitopeRepresent that all types of amino acid triangles appear at epitope number sum,
Figure FDA0000118790330000024
With ∑ N Non-epitopeRepresent R respectively i-R j-R kThe amino acid triangle of type and all types combination appears at the number of the non-epi-position of antigenic surface, and this value can be weighed R i-R j-R kThe amino acid of type tends to appear at epitope regions or surperficial non-epitope regions: Show R i-R j-R kThe tendentiousness that type appears at epitope regions will be higher than surperficial non-epitope regions, otherwise then tends to appear at surperficial non-epitope regions;
Adopt the amino acid whose accessibility of the long-pending measurement of relative solvent accessible surface; Concrete grammar is big or small divided by the amino acid of the type itself when each amino acid whose solvent accessible surface of calculating is long-pending; Thereby the deviation of avoiding; The amino acid here size itself is that the solvent accessible surface of amino acid in tripeptides ALA-X-ALA of this type is long-pending, and wherein X representes the amino acid of this type;
Adopt the sequence conservation software for calculation to weigh amino acid whose conservative property, the result is that each amino acid whose score and its conservative property on sequence are linear;
Adopt convergence factor to weigh the topology characteristic that amino acid constitutes network; The convergence factor on a summit is exactly the ratio that the number that connects the limit between its all adjacent vertex accounts for possible limit, Dalian number; For amino acid, this parameter has been weighed any amino acid r iThe amino acid tightness degree of assembling on every side, calculating formula be as shown in the formula shown in (II),
C i = 2 | { e jk } | N i ( N i - 1 ) ; r j , r k ∈ N i , e jk ∈ E - - - ( II )
R wherein jAnd r kAll be r iThe neighbour, | { e Jk| expression r jAnd r kBetween in esse limit, N i(N i-1)/2 representes r iThe number on maximum limit that possibly exist on every side, the value of this formula result of calculation from 0 to 1 approaches 1 expression more with r iThe amino acid that constitutes for core amino acid is bunch tight more, otherwise structure is loose more; With
Adopt the planarity coefficient to weigh the smooth degree of amino acid area of space, for any amino acid r of antigenic surface i; With it is that core amino acid obtains an amino acid bunch; Utilize wherein to constitute least square plane of amino acid whose coordinate fitting, all constitute amino acid and calculate the plane property coefficient to being called apart from sum of this least square plane, and the theoretical value of this coefficient is infinite to just from 0; For the surface amino groups acid of same albumen, the high more explanation of this value institute zoning spatially is uneven more.
7. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 1 is characterized in that said step (3) specifically may further comprise the steps:
(31) for the antigen of unknown epi-position, need the source of species of binding antibody to choose suitable antigen spatial table position prediction model according to it;
(32) according to the antigen spatial table position prediction model of choosing, calculate the amino acid whose score of unknown epitope antigen all surface, predict the latent space epi-position of unknown epitope antigen.
8. the method according to antibody species classification predicted protein matter antigen space epi-position according to claim 7 is characterized in that in said step (32), said calculating comprises the amino acid epi-position tendentiousness marking of adopting following formula (III) to carry out,
score r = Σ rn indice N rn - - - ( III )
Wherein, R representes proteantigen surface arbitrary amino acid, and rn representes all surface amino groups acid in the certain zone of amino acid r on every side, and indice representes special properties; Such as the leg-of-mutton epi-position preference property of amino acid; The relative solvent accessible surface of amino acid is long-pending, amino acid sequence conservative property, the convergence factor of three-dimensional structure and plane property coefficient; ∑ RnIndice representes that amino acid is about the marking value sum of character indice, N among the rn RnAmino acid whose number among the expression rn, the result of formula (III) representes the score of amino acid r about character indice: mark is high more to tend to appear at the epitope zone more.
CN201110412679.1A 2011-12-12 2011-12-12 Method for predicting space epitope of protein antigen according to antibody species classification Expired - Fee Related CN102521527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110412679.1A CN102521527B (en) 2011-12-12 2011-12-12 Method for predicting space epitope of protein antigen according to antibody species classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110412679.1A CN102521527B (en) 2011-12-12 2011-12-12 Method for predicting space epitope of protein antigen according to antibody species classification

Publications (2)

Publication Number Publication Date
CN102521527A true CN102521527A (en) 2012-06-27
CN102521527B CN102521527B (en) 2015-01-14

Family

ID=46292438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110412679.1A Expired - Fee Related CN102521527B (en) 2011-12-12 2011-12-12 Method for predicting space epitope of protein antigen according to antibody species classification

Country Status (1)

Country Link
CN (1) CN102521527B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868583A (en) * 2016-04-06 2016-08-17 东北师范大学 Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence
CN107056937A (en) * 2017-04-11 2017-08-18 同济大学 A kind of method based on the direct designerantibodies hypervariable region sequence of epitope structure
CN107341363A (en) * 2017-06-29 2017-11-10 河北省科学院应用数学研究所 A kind of Forecasting Methodology of proteantigen epitope
CN108959852A (en) * 2017-05-24 2018-12-07 北京工业大学 Prediction technique on protein based on the pairs of Preference information of amino acid-nucleotide with RNA binding modules
CN109326324A (en) * 2018-09-30 2019-02-12 河北省科学院应用数学研究所 A kind of detection method of epitope, system and terminal device
CN109651506A (en) * 2017-10-11 2019-04-19 上海交通大学 Method for rapidly obtaining antigen-specific antibody
WO2022057388A1 (en) * 2020-09-18 2022-03-24 上海商汤智能科技有限公司 Antibody prediction method and apparatus, electronic device, storage medium, and program
CN116386712A (en) * 2023-02-20 2023-07-04 北京博康健基因科技有限公司 Epitope prediction method and device based on antigen protein dynamic space structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172215A1 (en) * 2007-01-12 2008-07-17 Microsoft Corporation T-cell epiotope prediction
CN101320405A (en) * 2008-07-07 2008-12-10 重庆大学 Human immunodeficiency virus protease cracking site estimation and specificity analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172215A1 (en) * 2007-01-12 2008-07-17 Microsoft Corporation T-cell epiotope prediction
CN101320405A (en) * 2008-07-07 2008-12-10 重庆大学 Human immunodeficiency virus protease cracking site estimation and specificity analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JASONA.GREENBAUM等: "Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools", 《JOURNAL OF MOLECULAR RECOGNITION》, vol. 20, 5 January 2007 (2007-01-05), pages 75 - 82 *
JING SUN等: "SEPPE:a computational server for spatial epitope prediction of protein antigens", 《NUCLEIC ACIDS RESEARCH》, vol. 37, 22 May 2009 (2009-05-22), pages 612 - 616 *
徐小莲等: "基于实验数据集对常用蛋白抗原空间表位预测工具的测评", 《科学通报》, vol. 55, no. 18, 31 December 2010 (2010-12-31), pages 1810 - 1815 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868583A (en) * 2016-04-06 2016-08-17 东北师范大学 Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence
CN105868583B (en) * 2016-04-06 2018-08-10 东北师范大学 A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity
CN107056937A (en) * 2017-04-11 2017-08-18 同济大学 A kind of method based on the direct designerantibodies hypervariable region sequence of epitope structure
CN108959852A (en) * 2017-05-24 2018-12-07 北京工业大学 Prediction technique on protein based on the pairs of Preference information of amino acid-nucleotide with RNA binding modules
CN107341363A (en) * 2017-06-29 2017-11-10 河北省科学院应用数学研究所 A kind of Forecasting Methodology of proteantigen epitope
CN107341363B (en) * 2017-06-29 2020-09-22 河北省科学院应用数学研究所 Prediction method of protein epitope
CN109651506A (en) * 2017-10-11 2019-04-19 上海交通大学 Method for rapidly obtaining antigen-specific antibody
CN109651506B (en) * 2017-10-11 2021-12-07 上海交通大学 Method for rapidly obtaining antigen-specific antibody
CN109326324A (en) * 2018-09-30 2019-02-12 河北省科学院应用数学研究所 A kind of detection method of epitope, system and terminal device
CN109326324B (en) * 2018-09-30 2022-01-25 河北省科学院应用数学研究所 Antigen epitope detection method, system and terminal equipment
WO2022057388A1 (en) * 2020-09-18 2022-03-24 上海商汤智能科技有限公司 Antibody prediction method and apparatus, electronic device, storage medium, and program
CN116386712A (en) * 2023-02-20 2023-07-04 北京博康健基因科技有限公司 Epitope prediction method and device based on antigen protein dynamic space structure
CN116386712B (en) * 2023-02-20 2024-02-09 北京博康健基因科技有限公司 Epitope prediction method and device based on antigen protein dynamic space structure

Also Published As

Publication number Publication date
CN102521527B (en) 2015-01-14

Similar Documents

Publication Publication Date Title
CN102521527B (en) Method for predicting space epitope of protein antigen according to antibody species classification
Zhang et al. Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature
Liang et al. Prediction of antigenic epitopes on protein surfaces by consensus scoring
Greenbaum et al. Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools
Saha et al. Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network
Haste Andersen et al. Prediction of residues in discontinuous B‐cell epitopes using protein 3D structures
CN105868583B (en) A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity
Rahman et al. Inadequate reference datasets biased toward short non-epitopes confound B-cell epitope prediction
Sun et al. Advances in in-silico B-cell epitope prediction
Shen et al. Predicting linear B-cell epitopes using amino acid anchoring pair composition
Qiu et al. CE-BLAST makes it possible to compute antigenic similarity for newly emerging pathogens
Melvin et al. Detecting remote evolutionary relationships among proteins by large-scale semantic embedding
Huang et al. Co-evolution positions and rules for antigenic variants of human influenza A/H3N2 viruses
Qiu et al. Incorporating structure context of HA protein to improve antigenicity calculation for influenza virus A/H3N2
Gao et al. bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming
McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server
Liu et al. Prediction of discontinuous B-cell epitopes using logistic regression and structural information
Huang et al. Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features
Panchenko et al. A comparison of position‐specific score matrices based on sequence and structure alignments
Huang et al. AbAgIntPre: A deep learning method for predicting antibody-antigen interactions based on sequence information
Reimer Prediction of linear B-cell epitopes
Xu et al. Evaluation of spatial epitope computational tools based on experimentally-confirmed dataset for protein antigens
CN100428254C (en) Cross reaction antigen computer-aided screening method
Zhang et al. The relationship between B-cell epitope and mimotope sequences
Qiu et al. A benchmark dataset of protein antigens for antigenicity measurement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150114

Termination date: 20171212