CN105868583A - Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence - Google Patents
Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence Download PDFInfo
- Publication number
- CN105868583A CN105868583A CN201610207437.1A CN201610207437A CN105868583A CN 105868583 A CN105868583 A CN 105868583A CN 201610207437 A CN201610207437 A CN 201610207437A CN 105868583 A CN105868583 A CN 105868583A
- Authority
- CN
- China
- Prior art keywords
- residue
- feature
- antigen
- epi
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Abstract
The invention belongs to a computational biology information technique, and particularly relates to a method for predicting epitope through cost-sensitive integrating and clustering on the basis of a sequence. The method comprises the main steps that 1, descriptive features of antigen protein residues are constructed, wherein the features comprise the evolutionary conservation feature, the secondary structure feature, the disordered region feature, the dipeptide composition feature and physical and chemical attributes; 2, an optimal feature subset is selected through Fisher-Markov and an incremental iterative feature selection method; 3, unbalanced data sets are processed through cost-sensitive integrating learning; 4, potential epitope residues are predicted from antigenic determination residues through a spatial clustering algorithm. The method is suitable for antigen protein epitope prediction of known and unknown structure information and is also suitable for large-scale application and popularization.
Description
Technical field
The invention belongs to calculate biology information technology, be specifically related to a kind of sensitive integrated based on sequence utilization cost and cluster
The method of prediction epi-position.
Background technology
Along with expanding economy and growth in the living standard, to the demand of clothing, food, lodging and transportion--basic necessities of life unlike the shortage economy epoch that
Sample cannot meet.Attention is transferred to health by people, and corresponding industry all welcomes high speed development.Along with China steps into always
Ageization society, country and individual increase year by year in input pharmaceutically.Bio-pharmaceuticals and the huge machine of production of vaccine field face
Meet.According to statistics, more than the 50% of 60 years old later medical expense averaged occupation of people health care expenditures in its in all one's life.2010,
Whole world medicine vaccine marketplaces, close to 25,000,000,000 dollars, has reached 50,000,000,000 dollars for 2014, and short 4 year market scales have turned over one
Times.According to estimates, this market will rise to 200,000,000,000 U.S. dollars by 2025.
Pharmacy vaccine marketplaces is forefront in medical market, one of field that scientific and technological content is the highest.A new and effective medicine
Research and development the most much several years even many decades of thing, on the one hand this need the long-term of a large amount of scientific manpower to be absorbed in research, the opposing party
Face is also required to the support of a large amount of fund of scientific research and sophisticated equipment.Succeeding in developing of newtype drug, can not only give millions upon millions of diseases
Patient brings glad tidings, and also implies that great riches effect and social benefit simultaneously.Pursue pharmacy and the commanding elevation of vaccine marketplaces,
Have become as the most important thing of European and American developed countries' life sciences development.Chinese Government is more and more heavier for pharmacy vaccines arts
Depending on.In recent years, medical school is flourish, and armarium constantly develops renewal, and medical knowledge is goed deep in general public
Universal.For pharmacy, vaccine and association area, in the nearest more than ten years, country is each sides such as science and technology, fund, the policy talents
Face all puts into huge.
In theory, the key point of pharmacy and vaccine is to be accurately positioned epitope, and designs on this basis
Corresponding immunologic intervention antibody or artificial vaccines.At present, the location most reliable method of epitope is to pass through Ag-Ab
The method of complex crystal diffraction and nuclear magnetic resonance, NMR obtains the space structure of complex;Then the space for complex is tied
Structure, probes into the epi-position that its surface is potential.But this experimental technique needs the highest technology to require and substantial amounts of manpower and money
Gold is supported.If the structural resolution obtained is relatively low or manufactured goods failure, all are needed to restart.
B cell epi-position can aid in people and is better understood by dividing of Ag-Ab to use computational methods to predict accurately
Sub-interaction mechanism, it is also possible to the prevention of some diseases, treat and diagnosis brings hope, therefore the research of this respect has reason concurrently
Value and positive realistic meaning.The critical period of SARS epidemic situation in 2003 development, Hua Da gene, Peking University, Fudan University are big
Xue Deng scientific research institution, by calculating SARS virus epi-position, produces first vaccine in short some months.This achievement rouses oneself
The popular feeling, and it is greatly promoted the development in Antigen Epitope Prediction field.Although the research in terms of conformational antigen Antigen Epitope Prediction at present is the most not
Maturation, but existing increasing research worker recognizes this importance studied, and start to be absorbed in the work of this respect.
Summary of the invention
Present invention is generally directed to the shortcoming in current Epitope prediction technology, it is provided that a kind of quick based on sequence utilization cost
Feel integrated and Forecast epi-position method.The method has been concentrated and can have been determined that residue and non-antigen determine residual by accurate description antigen
The feature of base, in combination with efficient feature selection approach, identifies potential antigen from antigen protein primary sequence and determines
Residue, then uses Spatial Clustering that the antigen of gathering being determined, residue screens as epi-position, designs ingenious, accuracy rate
The highest, it is also suitably for large-scale promotion application simultaneously.
To achieve these goals, the sensitive integrated and side of Forecast epi-position based on sequence utilization cost of the present invention
Method, is characterized in, comprises the following steps:
(1) feature construction: according to the analysis to antigenic surface residue characteristic, calculates antigen and determines that residue and non-antigen determine residue
Descriptive characteristics.
(2) feature selection: for the full eigenmatrix of structure, selective discrimination degree is higher, descriptive accurate feature,
And build optimal feature subset on this basis.
(3) integrated study: in order to solve data sample imbalance problem and improve estimated performance, use integrated study strategy
Build a classifiers.
(4) Antigen Epitope Prediction: analyzed by schedule of samples position prediction, calculates epi-position distribution space threshold value.For cohesion in threshold value
The antigen of the prediction more than 3 of collection determines residue, assert that it can potential composition epi-position.
Beneficial effects of the present invention is characterized in particular in:
1. present invention primary sequence based on antigen protein, can be analyzed detection for the novel protein of unknown structure,
Application surface is wider;The various descriptive characteristics of conjugated antigen albumen and cleverly Fisher-Markov feature selection and increment feature
In order to distinguish antigen, selection strategy, determines that residue and non-antigen determine residue.
2. the present invention (the most only predicts residue relative to the method for traditional prediction epi-position, does not consider the gathering tendency of residue
Property), adding the further analysis for predicting the outcome, this analysis is based on for the gathering tendentiousness of epi-position in reality.This
Method reflects the feature in proteantigen epi-position space more accurately so that predict the outcome more true and reliable.
Accompanying drawing explanation
Fig. 1 is present invention flow chart based on the sensitive integrated method with Forecast epi-position of sequence utilization cost.
Fig. 2 is that the antigen of protein 1PKO during the present invention tests determines residue prediction and cluster analysis.Figure identifies two
Individual epi-position group (1 and 2), grey parts is normal residue of protein, and black part is divided into antigen to determine residue.Left side circle in black
The antigen of black part during color part and right side are enclosed determines that residue, according to setting threshold value, is under the jurisdiction of Liang Ge epi-position group respectively.
Detailed description of the invention
For the more careful technology contents being expressly understood the present invention, in conjunction with Fig. 1, Fig. 2, the present invention is carried out detailed
Describe.Especially, case study on implementation is merely to illustrate the present invention, rather than limitation of the present invention.
The sensitive integrated and method of Forecast epi-position based on sequence utilization cost of the present invention, comprises the following steps:
(1) feature construction: according to the analysis to antigenic surface residue characteristic, calculates antigen and determines that residue and non-antigen determine residue
Descriptive characteristics.
(2) feature selection: for the full eigenmatrix of structure, selective discrimination degree is higher, descriptive accurate feature,
And build optimal feature subset on this basis.
(3) integrated study: in order to solve data sample imbalance problem and improve estimated performance, use integrated study strategy
Build a classifiers.
(4) Antigen Epitope Prediction: analyzed by schedule of samples position prediction, calculates epi-position distribution space threshold value.For cohesion in threshold value
The antigen of the prediction more than 3 of collection determines residue, assert that it can potential composition epi-position.
Described step (1) specifically includes following steps:
(1.1) PSIBLAST is used to calculate location specific marking (PSSM) matrix of antigen protein sequence, in sequence
A certain position residue replaces to the score of other residues, uses logistics function to be normalized:
Wherein x in being PSSM matrix a certain position residue replace to the score of other residues, the evolutionary conservatism of a certain residue is special
Levy as all of evolutionary conservatism score in this residue sequence position front 5 and rear 5 length of window.
(1.2) use PSIPRED to calculate each residue on antigen protein to form secondary structure and (spiral, crimp or roll over
Folded) probability matrix.In the second structure characteristic of a certain residue is this residue sequence position front 5 and rear 5 length of window
All of secondary structure probability matrix.
(1.3) using DISORDER to calculate, that each residue on antigen protein falls in protein disordered regions is general
Rate, it is contemplated that neighboring residues can produce impact to center residue, and therefore the disordered regions of center residue is characterized as this residue sequence
All of disordered regions probability matrix in position front 5 and rear 5 length of window.
(1.4) residue pair, the residue combinations acted on the most two-by-two, in forming protein function group, play important work
With, and it is widely used in analysis and predicted protein matter 26S Proteasome Structure and Function site.Aminoacid one under naturalness has 20 kinds, because of
This, corresponding aminoacid to for 20 × 20=400 kind, i.e. " AA, AC ..., VV ".
(1.5) physico-chemical properties is closely related with the function of residue of protein, selects 6 kinds of physics and chemistry attributes here: hydrophilic
Property, flexible, accessibility, polarity, exposed surface, corner.
Described step (2) specifically includes following steps:
(2.1) Fisher-Markov is used to calculate each feature and the Relevance scores of class label in described step (1),
Being arranged in order from big to small by Relevance scores, score is the highest shows that this feature is higher with the dependency of class label, otherwise then
Show that dependency is more weak.
(2.2) described step (2.1) is calculated to the Relevance scores list obtained, use increment iterative policy selection
Excellent character subset.First, from the above-mentioned feature arranged, from the high to Low feature of adding successively of dependency to feature pool and structure
Build corresponding grader be modeled and predict, by estimated performance record and draw a diagram, select the peak value in chart corresponding
Number of features and corresponding character subset are optimal feature subset.
Described step (3) specifically includes following steps:
(3.1) on the basis of conventional machines study is built upon equilibrium criterion collection, during model construction, for positive and negative
The wrong point penalty of sample is the same.Conventional machines learning algorithm obtains minimum point penalty by optimizing and obtains optimal predictability
Energy.For unbalanced dataset (positive and negative sample proportion serious unbalance), this searching minimum of conventional machines learning algorithm
Point penalty often tends to filter out small scale classification as noise data, so that small scale classification can not get study.Examine
Considering to this situation, we introduce cost-sensitive strategy, and the wrong identification for positive negative sample gives different point penalties, the least ratio
The wrong identification point penalty of example classification is high, and the wrong identification point penalty of vast scale classification is low.
(3.2) although the discrimination efficiency of single Weak Classifier is more weak, but the organic assembling of multiple Weak Classifier can make
Discrimination efficiency exceedes best that of discrimination efficiency in each sub-classifier.
Described step (4) specifically includes following steps:
(4.1) first obtain the three-dimensional structure data of the antigen protein of all known epi-positions in sample data, and obtain institute
The three-dimensional coordinate that some epi-positions are corresponding.
(4.2) for each epi-position, the distance of itself and other residues is added up.According to maximum enrichment density and minimum cluster
The principle of group, determines the radius of average cluster space sphere.
(4.3) according to the radius of (4.2) step statistics gained, the antigen of all predictions is determined, and residue carries out region and draws
Point, for the prediction data flocked together, regard as the potential residue that may be constructed epi-position;For one or two away from poly-
The prediction antigen in collection region determines residue, regards as false sun data.
1. data set includes the bound data set (having antigen-antibody complex structure) of two part: Rubinstein,
The unbundling data set (having the antigen single structure without antibody) of Liang.This data set is the benchmark data of comformational epitope prediction
Collection.
2. the feature description of antigen protein residue: particular content is shown in Table 1.
The feature description of table 1. antigen protein residue
After carrying out Fisher-Markov and increment feature iteration acquisition optimal feature subset in the feature space created,
Use traditional method and integrated learning approach in binding and unbundling data set respectively, and compare itself and cost-sensitive Integrated Strategy
Prediction effect.Table 2 and table 3 give different integrated learning approach predicting the outcome in binding and unbundling data set.
The different integrated learning approach results contrast in bound data set of table 2
The different integrated study strategy results contrast on unbundling data set of table 3
From table 2 and table 3 it can be seen that conventional machines learning method on unbalanced dataset almost without predictive ability, though
So its accuracy rate is all more than 90%, but this is built upon it and almost treats as negative sample using indiscriminate for all of sample milli
The result caused, therefore specificity the highest (reaching 99.9%) and sensitivity is the lowest, only about 1%.
Compared to sample not carried out the traditional method of any process, simply it is integrated in for having in the identification of minority class
Bigger raising, in bound data set, brings up to 19.6% from 0.8%;On unbundling data set, bring up to from 1.1%
25.6%.Simple Integrated Strategy is to carry out taking turns stochastical sampling in overall sample more, and each group of sampling all generates independent classification mould
Type.Simple Integrated Strategy is simplest Ensemble classifier strategy, and its advantage is to realize simply, and speed, shortcoming is performance
Limited.
Balance cascade Integrated Strategy improves on the basis of the most integrated.In the sampling of balance cascade, most classes
The data sampled be no longer participate in after sampling, so ensure that sample can large range of covering the most
Data.Relative to simple Integrated Strategy, the prediction effect of balance cascade has certain progress.
Cost-sensitive Integrated Strategy gives different cost value for positive negative sample, predicts by finding optimal classification
Result cost expected value so that the prediction error penalty value of minimum sought automatically by grader.This method, it is possible to make each
Sub-classifier all focuses onto in the sample of minority class, thus substantially increases the discrimination for minority class sample.
Cost-sensitive strategy respectively reached the discrimination of 64.8% and 70.4% in binding and unbundling data set, it was demonstrated that the party
The effectiveness of method.
Compared to traditional pharmacy vaccine approach, antigen protein epi-position can be the quickest to use the method calculated to predict
Potential candidate's epi-position is provided, this can provide reality to help for biologist, and a huge sum of money reduced in medicine research and development is thrown
Enter the risk brought.Relative to previous studies method, the present invention has two big innovative points: utilization cost sensitivity is integrated first
Strategy, transfers to the prediction for minority class (positive sample) data by the emphasis of prediction from extensive accuracy rate, significantly increases
Prediction effect;2. use Spatial Clustering to analyze further for the result predicted, the residue of scattered distribution got rid of,
Assert that the prediction antigen flocked together determines that residue can constitute potential epi-position simultaneously.This method can improve further
Precision of prediction, has higher realistic meaning.
Claims (2)
1. one kind based on sequence utilization cost the sensitive integrated and method of Forecast epi-position, it is characterised in that include following step
Rapid:
(1) feature construction: for sample data, calculates antigen protein descriptive characteristics, obtains the feature space of sample data;
(2) feature selection: use Fisher-Markov and increment iterative feature selection approach to select optimal feature subset;
(3) cost-sensitive integrated study: utilization cost sensitivity Integrated Strategy, is assigned to not respectively for serious unbalanced sample
Same mistake classification punishment parameter, significantly improves the discrimination of sample positive for minority;
(4) space clustering: the antigen for prediction determines residue, uses Spatial Clustering, for resisting in setting threshold value
Former decision residue, assert that it is epi-position.
The most according to claim 1 based on the sensitive integrated method with Forecast epi-position of sequence utilization cost, its feature
It is that described step (1) specifically includes following steps:
(1.1) evolutionary conservatism feature: use PSIBLAST to calculate the location specific scoring matrix of antigen sequence;Obtained
Scoring matrix on, for each amino acid replacement value, use logistic function to be normalized, obtain entering of this position
Change conservative score;In the evolutionary conservatism of a certain residue is characterized as this residue sequence position front 5 and rear 5 length of window
All of evolutionary conservatism score;
(1.2) second structure characteristic: use PSIPRED to calculate each residue on antigen protein and form secondary structure i.e. spiral shell
Rotation, the probability matrix crimping or folding;The second structure characteristic of a certain residue is first 5 and latter 5 of this residue sequence position
All of secondary structure probability matrix in length of window;
(1.3) disordered regions feature: use DISORDER to calculate each residue affiliated area on antigen protein and be ordered into district
Territory or the probability matrix of disordered regions;The disordered regions of a certain residue is characterized as first 5 and latter 5 of this residue sequence position
All of disordered regions probability matrix in length of window;
(1.4) dipeptides constitutive characteristic: residue combines the most two-by-two in protein and forms stable function residue pair, this residue
Have very important significance to for analysis and predicted protein matter 26S Proteasome Structure and Function tool;According to 20 kinds of combination sides that aminoacid is different
Formula, adds up 400 kinds of different dipeptides on some protein and constitutes;
(1.5) physics and chemistry attribute: select 6 kinds to be proved and the antigen protein closely-related physico-chemical properties of residue function, i.e. parent
Aqueous, flexible, accessibility, polarity, exposed surface, corner 6 attribute;
Described step (2) specifically includes following steps;
(2.1) Fisher-Markov method is used feature to be ranked up: to use Fisher-Markov selector to calculate described
Each feature and the dependency of class label in step (1), and arrange from big to small according to the numerical value of dependency;
(2.2) increment feature policy selection optimal feature subset is used: use increment feature strategy, from the above-mentioned feature arranged
In, from the high to Low feature of adding successively of dependency to feature pool and build grader and be modeled study and prediction, and according to
Estimated performance selects optimal number of features, and corresponding character subset is optimal feature subset;
Described step (3) specifically includes following steps:
(3.1) the utilization cost sensitivity Integrated process serious imbalance problem of positive and negative sample data: conventional machines study side
Method effect in positive and negative sample imbalance classification problem is poor, and this is owing to its birth defect i.e. tends to ignore minority class to chase after
The accuracy rate asking higher is caused;Introduce cost-sensitive Integrated and process the problem of positive and negative sample imbalance, first
Giving different costs respectively for positive negative sample, the positive negative sample of wrong identification is different to the punishment of prediction effect, and grader is
Pursuit preferable effect, can pay attention to the identification for minority class;
(3.2) support vector machine is used to build sub-classifier: based on using LibSVM, Machine learning tools builds basis point
Class device, uses gridsearchforSVM.m to find optimized parameter c and value;By multiple sub-classifiers, constitute Ensemble classifier
Device, improves Model Identification accuracy rate;
In described step (4), typically there is according to epi-position the phenomenon being enriched in the same area, the antigen of prediction is determined that residue enters
Row space clusters, and points out that the region that cluster density is bigger is the region that potential composition epi-position probability is higher, specifically include with
Lower step:
(4.1) in statistical sample data, the antigen on the antigen protein surface of known epi-position determines the spatial distribution coordinate of residue;
According to maximum enrichment density and the principle of minimum cluster group, all antigens decision residue is clustered, it is thus achieved that it is the most poly-
The radius of space-like spheroid;
(4.2) according to the radius of calculated Cluster space spheroid, antigen early stage predicted determines that residue carries out cluster and draws
Point, in antigen determines the region that residue is enriched with, all of residue is identified as epi-position;Only have the antigen of one or two predictions
The region of decision residue is considered as false sun data, i.e. non-epitopes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610207437.1A CN105868583B (en) | 2016-04-06 | 2016-04-06 | A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610207437.1A CN105868583B (en) | 2016-04-06 | 2016-04-06 | A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105868583A true CN105868583A (en) | 2016-08-17 |
CN105868583B CN105868583B (en) | 2018-08-10 |
Family
ID=56626985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610207437.1A Expired - Fee Related CN105868583B (en) | 2016-04-06 | 2016-04-06 | A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105868583B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169312A (en) * | 2017-05-27 | 2017-09-15 | 南开大学 | A kind of Forecasting Methodology of the natural unordered protein of low complex degree |
CN107341363A (en) * | 2017-06-29 | 2017-11-10 | 河北省科学院应用数学研究所 | A kind of Forecasting Methodology of proteantigen epitope |
CN110010248A (en) * | 2019-04-17 | 2019-07-12 | 电子科技大学 | A kind of readmission's Risk Forecast Method based on cost-sensitive integrated study model |
CN110060738A (en) * | 2019-04-03 | 2019-07-26 | 中国人民解放军军事科学院军事医学研究院 | Method and system based on machine learning techniques prediction bacterium protective antigens albumen |
CN110428865A (en) * | 2019-08-14 | 2019-11-08 | 信阳师范学院 | A kind of method of high-throughput prediction Antifreeze protein |
CN110444249A (en) * | 2019-08-14 | 2019-11-12 | 信阳师范学院 | A method of the prediction fluorescence protein based on calculating |
CN110689544A (en) * | 2019-09-06 | 2020-01-14 | 哈尔滨工程大学 | Method for segmenting delicate target of remote sensing image |
CN111339165A (en) * | 2020-02-28 | 2020-06-26 | 重庆邮电大学 | Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket |
CN114242169A (en) * | 2021-12-15 | 2022-03-25 | 河北省科学院应用数学研究所 | Antigen epitope prediction method for B cells |
CN115205570A (en) * | 2022-09-14 | 2022-10-18 | 中国海洋大学 | Unsupervised cross-domain target re-identification method based on comparative learning |
CN116130005A (en) * | 2023-01-30 | 2023-05-16 | 深圳新合睿恩生物医疗科技有限公司 | Tandem design method and device for multi-epitope vaccine, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521527A (en) * | 2011-12-12 | 2012-06-27 | 同济大学 | Method for predicting space epitope of protein antigen according to antibody species classification |
CN102779240A (en) * | 2012-06-21 | 2012-11-14 | 哈尔滨工程大学 | Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis |
CN104331642A (en) * | 2014-10-28 | 2015-02-04 | 山东大学 | Integrated learning method for recognizing ECM (extracellular matrix) protein |
CN105138866A (en) * | 2015-08-12 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Method for identifying protein functions based on protein-protein interaction network and network topological structure features |
-
2016
- 2016-04-06 CN CN201610207437.1A patent/CN105868583B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521527A (en) * | 2011-12-12 | 2012-06-27 | 同济大学 | Method for predicting space epitope of protein antigen according to antibody species classification |
CN102779240A (en) * | 2012-06-21 | 2012-11-14 | 哈尔滨工程大学 | Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis |
CN104331642A (en) * | 2014-10-28 | 2015-02-04 | 山东大学 | Integrated learning method for recognizing ECM (extracellular matrix) protein |
CN105138866A (en) * | 2015-08-12 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Method for identifying protein functions based on protein-protein interaction network and network topological structure features |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169312B (en) * | 2017-05-27 | 2020-05-08 | 南开大学 | Low-complexity natural disordered protein prediction method |
CN107169312A (en) * | 2017-05-27 | 2017-09-15 | 南开大学 | A kind of Forecasting Methodology of the natural unordered protein of low complex degree |
CN107341363A (en) * | 2017-06-29 | 2017-11-10 | 河北省科学院应用数学研究所 | A kind of Forecasting Methodology of proteantigen epitope |
CN107341363B (en) * | 2017-06-29 | 2020-09-22 | 河北省科学院应用数学研究所 | Prediction method of protein epitope |
CN110060738A (en) * | 2019-04-03 | 2019-07-26 | 中国人民解放军军事科学院军事医学研究院 | Method and system based on machine learning techniques prediction bacterium protective antigens albumen |
CN110010248B (en) * | 2019-04-17 | 2023-01-10 | 电子科技大学 | Readmission risk prediction method based on cost-sensitive integrated learning model |
CN110010248A (en) * | 2019-04-17 | 2019-07-12 | 电子科技大学 | A kind of readmission's Risk Forecast Method based on cost-sensitive integrated study model |
CN110428865A (en) * | 2019-08-14 | 2019-11-08 | 信阳师范学院 | A kind of method of high-throughput prediction Antifreeze protein |
CN110444249A (en) * | 2019-08-14 | 2019-11-12 | 信阳师范学院 | A method of the prediction fluorescence protein based on calculating |
CN110444249B (en) * | 2019-08-14 | 2022-02-01 | 信阳师范学院 | Method for predicting fluorescent protein based on calculation |
CN110689544A (en) * | 2019-09-06 | 2020-01-14 | 哈尔滨工程大学 | Method for segmenting delicate target of remote sensing image |
CN111339165A (en) * | 2020-02-28 | 2020-06-26 | 重庆邮电大学 | Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket |
CN111339165B (en) * | 2020-02-28 | 2022-06-03 | 重庆邮电大学 | Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket |
CN114242169A (en) * | 2021-12-15 | 2022-03-25 | 河北省科学院应用数学研究所 | Antigen epitope prediction method for B cells |
CN114242169B (en) * | 2021-12-15 | 2023-10-20 | 河北省科学院应用数学研究所 | Antigen epitope prediction method for B cells |
CN115205570A (en) * | 2022-09-14 | 2022-10-18 | 中国海洋大学 | Unsupervised cross-domain target re-identification method based on comparative learning |
CN116130005A (en) * | 2023-01-30 | 2023-05-16 | 深圳新合睿恩生物医疗科技有限公司 | Tandem design method and device for multi-epitope vaccine, equipment and storage medium |
CN116130005B (en) * | 2023-01-30 | 2023-06-16 | 深圳新合睿恩生物医疗科技有限公司 | Tandem design method and device for multi-epitope vaccine, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105868583B (en) | 2018-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105868583A (en) | Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence | |
CN102222178B (en) | Method for screening and/or designing medicines aiming at multiple targets | |
CN109887541A (en) | A kind of target point protein matter prediction technique and system in conjunction with small molecule | |
CN107038348A (en) | Drug targets Forecasting Methodology based on protein ligands interaction finger-print | |
CN106446607A (en) | Drug target virtual screening method based on interactive fingerprints and machine learning | |
CN102779240B (en) | Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis | |
CN106203377A (en) | A kind of coal dust image-recognizing method | |
CN102521527A (en) | Method for predicting space epitope of protein antigen according to antibody species classification | |
CN107194207A (en) | Protein ligands binding site estimation method based on granularity support vector machine ensembles | |
CN111402967A (en) | Method for improving virtual screening capability of docking software based on machine learning algorithm | |
CN102884203A (en) | Query sequence genotype or subtype classification method | |
Wang et al. | G-DipC: an improved feature representation method for short sequences to predict the type of cargo in cell-penetrating peptides | |
CN115240762A (en) | Multi-scale small molecule virtual screening method and system | |
CN104615910B (en) | The method that the spiral interaction relationship of α transmembrane proteins is predicted based on random forest | |
CN101110095B (en) | Method for batch detecting susceptibility gene of common brain disease | |
CN113421658B (en) | Drug-target interaction prediction method based on neighbor attention network | |
Saha et al. | Application of data mining in protein sequence classification | |
Van Buren et al. | Artificial intelligence and deep learning to map immune cell types in inflamed human tissue | |
CN109326329A (en) | Zinc-binding protein matter action site prediction technique based on integrated study under a kind of unbalanced mode | |
CN110598836B (en) | Metabolic analysis method based on improved particle swarm optimization algorithm | |
CN114242159B (en) | Method for constructing antigen peptide presentation prediction model, and antigen peptide prediction method and device | |
CN108388774A (en) | A kind of on-line analysis of polypeptide spectrum matched data | |
Yu et al. | A supervised approach to detect protein complex by combining biological and topological properties | |
CN101609486A (en) | The recognition methods of g protein coupled receptor superclass and Web service system thereof | |
CN114999566B (en) | Drug repositioning method and system based on word vector characterization and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180810 Termination date: 20210406 |