CN105868583A - Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence - Google Patents

Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence Download PDF

Info

Publication number
CN105868583A
CN105868583A CN201610207437.1A CN201610207437A CN105868583A CN 105868583 A CN105868583 A CN 105868583A CN 201610207437 A CN201610207437 A CN 201610207437A CN 105868583 A CN105868583 A CN 105868583A
Authority
CN
China
Prior art keywords
residue
feature
antigen
epi
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610207437.1A
Other languages
Chinese (zh)
Other versions
CN105868583B (en
Inventor
马志强
张健
柴海挺
高博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Northeast Normal University
Original Assignee
Northeast Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Normal University filed Critical Northeast Normal University
Priority to CN201610207437.1A priority Critical patent/CN105868583B/en
Publication of CN105868583A publication Critical patent/CN105868583A/en
Application granted granted Critical
Publication of CN105868583B publication Critical patent/CN105868583B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention belongs to a computational biology information technique, and particularly relates to a method for predicting epitope through cost-sensitive integrating and clustering on the basis of a sequence. The method comprises the main steps that 1, descriptive features of antigen protein residues are constructed, wherein the features comprise the evolutionary conservation feature, the secondary structure feature, the disordered region feature, the dipeptide composition feature and physical and chemical attributes; 2, an optimal feature subset is selected through Fisher-Markov and an incremental iterative feature selection method; 3, unbalanced data sets are processed through cost-sensitive integrating learning; 4, potential epitope residues are predicted from antigenic determination residues through a spatial clustering algorithm. The method is suitable for antigen protein epitope prediction of known and unknown structure information and is also suitable for large-scale application and popularization.

Description

A kind of based on the sensitive integrated method with Forecast epi-position of sequence utilization cost
Technical field
The invention belongs to calculate biology information technology, be specifically related to a kind of sensitive integrated based on sequence utilization cost and cluster The method of prediction epi-position.
Background technology
Along with expanding economy and growth in the living standard, to the demand of clothing, food, lodging and transportion--basic necessities of life unlike the shortage economy epoch that Sample cannot meet.Attention is transferred to health by people, and corresponding industry all welcomes high speed development.Along with China steps into always Ageization society, country and individual increase year by year in input pharmaceutically.Bio-pharmaceuticals and the huge machine of production of vaccine field face Meet.According to statistics, more than the 50% of 60 years old later medical expense averaged occupation of people health care expenditures in its in all one's life.2010, Whole world medicine vaccine marketplaces, close to 25,000,000,000 dollars, has reached 50,000,000,000 dollars for 2014, and short 4 year market scales have turned over one Times.According to estimates, this market will rise to 200,000,000,000 U.S. dollars by 2025.
Pharmacy vaccine marketplaces is forefront in medical market, one of field that scientific and technological content is the highest.A new and effective medicine Research and development the most much several years even many decades of thing, on the one hand this need the long-term of a large amount of scientific manpower to be absorbed in research, the opposing party Face is also required to the support of a large amount of fund of scientific research and sophisticated equipment.Succeeding in developing of newtype drug, can not only give millions upon millions of diseases Patient brings glad tidings, and also implies that great riches effect and social benefit simultaneously.Pursue pharmacy and the commanding elevation of vaccine marketplaces, Have become as the most important thing of European and American developed countries' life sciences development.Chinese Government is more and more heavier for pharmacy vaccines arts Depending on.In recent years, medical school is flourish, and armarium constantly develops renewal, and medical knowledge is goed deep in general public Universal.For pharmacy, vaccine and association area, in the nearest more than ten years, country is each sides such as science and technology, fund, the policy talents Face all puts into huge.
In theory, the key point of pharmacy and vaccine is to be accurately positioned epitope, and designs on this basis Corresponding immunologic intervention antibody or artificial vaccines.At present, the location most reliable method of epitope is to pass through Ag-Ab The method of complex crystal diffraction and nuclear magnetic resonance, NMR obtains the space structure of complex;Then the space for complex is tied Structure, probes into the epi-position that its surface is potential.But this experimental technique needs the highest technology to require and substantial amounts of manpower and money Gold is supported.If the structural resolution obtained is relatively low or manufactured goods failure, all are needed to restart.
B cell epi-position can aid in people and is better understood by dividing of Ag-Ab to use computational methods to predict accurately Sub-interaction mechanism, it is also possible to the prevention of some diseases, treat and diagnosis brings hope, therefore the research of this respect has reason concurrently Value and positive realistic meaning.The critical period of SARS epidemic situation in 2003 development, Hua Da gene, Peking University, Fudan University are big Xue Deng scientific research institution, by calculating SARS virus epi-position, produces first vaccine in short some months.This achievement rouses oneself The popular feeling, and it is greatly promoted the development in Antigen Epitope Prediction field.Although the research in terms of conformational antigen Antigen Epitope Prediction at present is the most not Maturation, but existing increasing research worker recognizes this importance studied, and start to be absorbed in the work of this respect.
Summary of the invention
Present invention is generally directed to the shortcoming in current Epitope prediction technology, it is provided that a kind of quick based on sequence utilization cost Feel integrated and Forecast epi-position method.The method has been concentrated and can have been determined that residue and non-antigen determine residual by accurate description antigen The feature of base, in combination with efficient feature selection approach, identifies potential antigen from antigen protein primary sequence and determines Residue, then uses Spatial Clustering that the antigen of gathering being determined, residue screens as epi-position, designs ingenious, accuracy rate The highest, it is also suitably for large-scale promotion application simultaneously.
To achieve these goals, the sensitive integrated and side of Forecast epi-position based on sequence utilization cost of the present invention Method, is characterized in, comprises the following steps:
(1) feature construction: according to the analysis to antigenic surface residue characteristic, calculates antigen and determines that residue and non-antigen determine residue Descriptive characteristics.
(2) feature selection: for the full eigenmatrix of structure, selective discrimination degree is higher, descriptive accurate feature, And build optimal feature subset on this basis.
(3) integrated study: in order to solve data sample imbalance problem and improve estimated performance, use integrated study strategy Build a classifiers.
(4) Antigen Epitope Prediction: analyzed by schedule of samples position prediction, calculates epi-position distribution space threshold value.For cohesion in threshold value The antigen of the prediction more than 3 of collection determines residue, assert that it can potential composition epi-position.
Beneficial effects of the present invention is characterized in particular in:
1. present invention primary sequence based on antigen protein, can be analyzed detection for the novel protein of unknown structure, Application surface is wider;The various descriptive characteristics of conjugated antigen albumen and cleverly Fisher-Markov feature selection and increment feature In order to distinguish antigen, selection strategy, determines that residue and non-antigen determine residue.
2. the present invention (the most only predicts residue relative to the method for traditional prediction epi-position, does not consider the gathering tendency of residue Property), adding the further analysis for predicting the outcome, this analysis is based on for the gathering tendentiousness of epi-position in reality.This Method reflects the feature in proteantigen epi-position space more accurately so that predict the outcome more true and reliable.
Accompanying drawing explanation
Fig. 1 is present invention flow chart based on the sensitive integrated method with Forecast epi-position of sequence utilization cost.
Fig. 2 is that the antigen of protein 1PKO during the present invention tests determines residue prediction and cluster analysis.Figure identifies two Individual epi-position group (1 and 2), grey parts is normal residue of protein, and black part is divided into antigen to determine residue.Left side circle in black The antigen of black part during color part and right side are enclosed determines that residue, according to setting threshold value, is under the jurisdiction of Liang Ge epi-position group respectively.
Detailed description of the invention
For the more careful technology contents being expressly understood the present invention, in conjunction with Fig. 1, Fig. 2, the present invention is carried out detailed Describe.Especially, case study on implementation is merely to illustrate the present invention, rather than limitation of the present invention.
The sensitive integrated and method of Forecast epi-position based on sequence utilization cost of the present invention, comprises the following steps:
(1) feature construction: according to the analysis to antigenic surface residue characteristic, calculates antigen and determines that residue and non-antigen determine residue Descriptive characteristics.
(2) feature selection: for the full eigenmatrix of structure, selective discrimination degree is higher, descriptive accurate feature, And build optimal feature subset on this basis.
(3) integrated study: in order to solve data sample imbalance problem and improve estimated performance, use integrated study strategy Build a classifiers.
(4) Antigen Epitope Prediction: analyzed by schedule of samples position prediction, calculates epi-position distribution space threshold value.For cohesion in threshold value The antigen of the prediction more than 3 of collection determines residue, assert that it can potential composition epi-position.
Described step (1) specifically includes following steps:
(1.1) PSIBLAST is used to calculate location specific marking (PSSM) matrix of antigen protein sequence, in sequence A certain position residue replaces to the score of other residues, uses logistics function to be normalized:
Wherein x in being PSSM matrix a certain position residue replace to the score of other residues, the evolutionary conservatism of a certain residue is special Levy as all of evolutionary conservatism score in this residue sequence position front 5 and rear 5 length of window.
(1.2) use PSIPRED to calculate each residue on antigen protein to form secondary structure and (spiral, crimp or roll over Folded) probability matrix.In the second structure characteristic of a certain residue is this residue sequence position front 5 and rear 5 length of window All of secondary structure probability matrix.
(1.3) using DISORDER to calculate, that each residue on antigen protein falls in protein disordered regions is general Rate, it is contemplated that neighboring residues can produce impact to center residue, and therefore the disordered regions of center residue is characterized as this residue sequence All of disordered regions probability matrix in position front 5 and rear 5 length of window.
(1.4) residue pair, the residue combinations acted on the most two-by-two, in forming protein function group, play important work With, and it is widely used in analysis and predicted protein matter 26S Proteasome Structure and Function site.Aminoacid one under naturalness has 20 kinds, because of This, corresponding aminoacid to for 20 × 20=400 kind, i.e. " AA, AC ..., VV ".
(1.5) physico-chemical properties is closely related with the function of residue of protein, selects 6 kinds of physics and chemistry attributes here: hydrophilic Property, flexible, accessibility, polarity, exposed surface, corner.
Described step (2) specifically includes following steps:
(2.1) Fisher-Markov is used to calculate each feature and the Relevance scores of class label in described step (1), Being arranged in order from big to small by Relevance scores, score is the highest shows that this feature is higher with the dependency of class label, otherwise then Show that dependency is more weak.
(2.2) described step (2.1) is calculated to the Relevance scores list obtained, use increment iterative policy selection Excellent character subset.First, from the above-mentioned feature arranged, from the high to Low feature of adding successively of dependency to feature pool and structure Build corresponding grader be modeled and predict, by estimated performance record and draw a diagram, select the peak value in chart corresponding Number of features and corresponding character subset are optimal feature subset.
Described step (3) specifically includes following steps:
(3.1) on the basis of conventional machines study is built upon equilibrium criterion collection, during model construction, for positive and negative The wrong point penalty of sample is the same.Conventional machines learning algorithm obtains minimum point penalty by optimizing and obtains optimal predictability Energy.For unbalanced dataset (positive and negative sample proportion serious unbalance), this searching minimum of conventional machines learning algorithm Point penalty often tends to filter out small scale classification as noise data, so that small scale classification can not get study.Examine Considering to this situation, we introduce cost-sensitive strategy, and the wrong identification for positive negative sample gives different point penalties, the least ratio The wrong identification point penalty of example classification is high, and the wrong identification point penalty of vast scale classification is low.
(3.2) although the discrimination efficiency of single Weak Classifier is more weak, but the organic assembling of multiple Weak Classifier can make Discrimination efficiency exceedes best that of discrimination efficiency in each sub-classifier.
Described step (4) specifically includes following steps:
(4.1) first obtain the three-dimensional structure data of the antigen protein of all known epi-positions in sample data, and obtain institute The three-dimensional coordinate that some epi-positions are corresponding.
(4.2) for each epi-position, the distance of itself and other residues is added up.According to maximum enrichment density and minimum cluster The principle of group, determines the radius of average cluster space sphere.
(4.3) according to the radius of (4.2) step statistics gained, the antigen of all predictions is determined, and residue carries out region and draws Point, for the prediction data flocked together, regard as the potential residue that may be constructed epi-position;For one or two away from poly- The prediction antigen in collection region determines residue, regards as false sun data.
1. data set includes the bound data set (having antigen-antibody complex structure) of two part: Rubinstein, The unbundling data set (having the antigen single structure without antibody) of Liang.This data set is the benchmark data of comformational epitope prediction Collection.
2. the feature description of antigen protein residue: particular content is shown in Table 1.
The feature description of table 1. antigen protein residue
After carrying out Fisher-Markov and increment feature iteration acquisition optimal feature subset in the feature space created, Use traditional method and integrated learning approach in binding and unbundling data set respectively, and compare itself and cost-sensitive Integrated Strategy Prediction effect.Table 2 and table 3 give different integrated learning approach predicting the outcome in binding and unbundling data set.
The different integrated learning approach results contrast in bound data set of table 2
The different integrated study strategy results contrast on unbundling data set of table 3
From table 2 and table 3 it can be seen that conventional machines learning method on unbalanced dataset almost without predictive ability, though So its accuracy rate is all more than 90%, but this is built upon it and almost treats as negative sample using indiscriminate for all of sample milli The result caused, therefore specificity the highest (reaching 99.9%) and sensitivity is the lowest, only about 1%.
Compared to sample not carried out the traditional method of any process, simply it is integrated in for having in the identification of minority class Bigger raising, in bound data set, brings up to 19.6% from 0.8%;On unbundling data set, bring up to from 1.1% 25.6%.Simple Integrated Strategy is to carry out taking turns stochastical sampling in overall sample more, and each group of sampling all generates independent classification mould Type.Simple Integrated Strategy is simplest Ensemble classifier strategy, and its advantage is to realize simply, and speed, shortcoming is performance Limited.
Balance cascade Integrated Strategy improves on the basis of the most integrated.In the sampling of balance cascade, most classes The data sampled be no longer participate in after sampling, so ensure that sample can large range of covering the most Data.Relative to simple Integrated Strategy, the prediction effect of balance cascade has certain progress.
Cost-sensitive Integrated Strategy gives different cost value for positive negative sample, predicts by finding optimal classification Result cost expected value so that the prediction error penalty value of minimum sought automatically by grader.This method, it is possible to make each Sub-classifier all focuses onto in the sample of minority class, thus substantially increases the discrimination for minority class sample. Cost-sensitive strategy respectively reached the discrimination of 64.8% and 70.4% in binding and unbundling data set, it was demonstrated that the party The effectiveness of method.
Compared to traditional pharmacy vaccine approach, antigen protein epi-position can be the quickest to use the method calculated to predict Potential candidate's epi-position is provided, this can provide reality to help for biologist, and a huge sum of money reduced in medicine research and development is thrown Enter the risk brought.Relative to previous studies method, the present invention has two big innovative points: utilization cost sensitivity is integrated first Strategy, transfers to the prediction for minority class (positive sample) data by the emphasis of prediction from extensive accuracy rate, significantly increases Prediction effect;2. use Spatial Clustering to analyze further for the result predicted, the residue of scattered distribution got rid of, Assert that the prediction antigen flocked together determines that residue can constitute potential epi-position simultaneously.This method can improve further Precision of prediction, has higher realistic meaning.

Claims (2)

1. one kind based on sequence utilization cost the sensitive integrated and method of Forecast epi-position, it is characterised in that include following step Rapid:
(1) feature construction: for sample data, calculates antigen protein descriptive characteristics, obtains the feature space of sample data;
(2) feature selection: use Fisher-Markov and increment iterative feature selection approach to select optimal feature subset;
(3) cost-sensitive integrated study: utilization cost sensitivity Integrated Strategy, is assigned to not respectively for serious unbalanced sample Same mistake classification punishment parameter, significantly improves the discrimination of sample positive for minority;
(4) space clustering: the antigen for prediction determines residue, uses Spatial Clustering, for resisting in setting threshold value Former decision residue, assert that it is epi-position.
The most according to claim 1 based on the sensitive integrated method with Forecast epi-position of sequence utilization cost, its feature It is that described step (1) specifically includes following steps:
(1.1) evolutionary conservatism feature: use PSIBLAST to calculate the location specific scoring matrix of antigen sequence;Obtained Scoring matrix on, for each amino acid replacement value, use logistic function to be normalized, obtain entering of this position Change conservative score;In the evolutionary conservatism of a certain residue is characterized as this residue sequence position front 5 and rear 5 length of window All of evolutionary conservatism score;
(1.2) second structure characteristic: use PSIPRED to calculate each residue on antigen protein and form secondary structure i.e. spiral shell Rotation, the probability matrix crimping or folding;The second structure characteristic of a certain residue is first 5 and latter 5 of this residue sequence position All of secondary structure probability matrix in length of window;
(1.3) disordered regions feature: use DISORDER to calculate each residue affiliated area on antigen protein and be ordered into district Territory or the probability matrix of disordered regions;The disordered regions of a certain residue is characterized as first 5 and latter 5 of this residue sequence position All of disordered regions probability matrix in length of window;
(1.4) dipeptides constitutive characteristic: residue combines the most two-by-two in protein and forms stable function residue pair, this residue Have very important significance to for analysis and predicted protein matter 26S Proteasome Structure and Function tool;According to 20 kinds of combination sides that aminoacid is different Formula, adds up 400 kinds of different dipeptides on some protein and constitutes;
(1.5) physics and chemistry attribute: select 6 kinds to be proved and the antigen protein closely-related physico-chemical properties of residue function, i.e. parent Aqueous, flexible, accessibility, polarity, exposed surface, corner 6 attribute;
Described step (2) specifically includes following steps;
(2.1) Fisher-Markov method is used feature to be ranked up: to use Fisher-Markov selector to calculate described Each feature and the dependency of class label in step (1), and arrange from big to small according to the numerical value of dependency;
(2.2) increment feature policy selection optimal feature subset is used: use increment feature strategy, from the above-mentioned feature arranged In, from the high to Low feature of adding successively of dependency to feature pool and build grader and be modeled study and prediction, and according to Estimated performance selects optimal number of features, and corresponding character subset is optimal feature subset;
Described step (3) specifically includes following steps:
(3.1) the utilization cost sensitivity Integrated process serious imbalance problem of positive and negative sample data: conventional machines study side Method effect in positive and negative sample imbalance classification problem is poor, and this is owing to its birth defect i.e. tends to ignore minority class to chase after The accuracy rate asking higher is caused;Introduce cost-sensitive Integrated and process the problem of positive and negative sample imbalance, first Giving different costs respectively for positive negative sample, the positive negative sample of wrong identification is different to the punishment of prediction effect, and grader is Pursuit preferable effect, can pay attention to the identification for minority class;
(3.2) support vector machine is used to build sub-classifier: based on using LibSVM, Machine learning tools builds basis point Class device, uses gridsearchforSVM.m to find optimized parameter c and value;By multiple sub-classifiers, constitute Ensemble classifier Device, improves Model Identification accuracy rate;
In described step (4), typically there is according to epi-position the phenomenon being enriched in the same area, the antigen of prediction is determined that residue enters Row space clusters, and points out that the region that cluster density is bigger is the region that potential composition epi-position probability is higher, specifically include with Lower step:
(4.1) in statistical sample data, the antigen on the antigen protein surface of known epi-position determines the spatial distribution coordinate of residue; According to maximum enrichment density and the principle of minimum cluster group, all antigens decision residue is clustered, it is thus achieved that it is the most poly- The radius of space-like spheroid;
(4.2) according to the radius of calculated Cluster space spheroid, antigen early stage predicted determines that residue carries out cluster and draws Point, in antigen determines the region that residue is enriched with, all of residue is identified as epi-position;Only have the antigen of one or two predictions The region of decision residue is considered as false sun data, i.e. non-epitopes.
CN201610207437.1A 2016-04-06 2016-04-06 A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity Expired - Fee Related CN105868583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610207437.1A CN105868583B (en) 2016-04-06 2016-04-06 A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610207437.1A CN105868583B (en) 2016-04-06 2016-04-06 A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity

Publications (2)

Publication Number Publication Date
CN105868583A true CN105868583A (en) 2016-08-17
CN105868583B CN105868583B (en) 2018-08-10

Family

ID=56626985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610207437.1A Expired - Fee Related CN105868583B (en) 2016-04-06 2016-04-06 A method of it is integrated and Forecast epitope based on sequence utilization cost sensitivity

Country Status (1)

Country Link
CN (1) CN105868583B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169312A (en) * 2017-05-27 2017-09-15 南开大学 A kind of Forecasting Methodology of the natural unordered protein of low complex degree
CN107341363A (en) * 2017-06-29 2017-11-10 河北省科学院应用数学研究所 A kind of Forecasting Methodology of proteantigen epitope
CN110010248A (en) * 2019-04-17 2019-07-12 电子科技大学 A kind of readmission's Risk Forecast Method based on cost-sensitive integrated study model
CN110060738A (en) * 2019-04-03 2019-07-26 中国人民解放军军事科学院军事医学研究院 Method and system based on machine learning techniques prediction bacterium protective antigens albumen
CN110428865A (en) * 2019-08-14 2019-11-08 信阳师范学院 A kind of method of high-throughput prediction Antifreeze protein
CN110444249A (en) * 2019-08-14 2019-11-12 信阳师范学院 A method of the prediction fluorescence protein based on calculating
CN110689544A (en) * 2019-09-06 2020-01-14 哈尔滨工程大学 Method for segmenting delicate target of remote sensing image
CN111339165A (en) * 2020-02-28 2020-06-26 重庆邮电大学 Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket
CN114242169A (en) * 2021-12-15 2022-03-25 河北省科学院应用数学研究所 Antigen epitope prediction method for B cells
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN116130005A (en) * 2023-01-30 2023-05-16 深圳新合睿恩生物医疗科技有限公司 Tandem design method and device for multi-epitope vaccine, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521527A (en) * 2011-12-12 2012-06-27 同济大学 Method for predicting space epitope of protein antigen according to antibody species classification
CN102779240A (en) * 2012-06-21 2012-11-14 哈尔滨工程大学 Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis
CN104331642A (en) * 2014-10-28 2015-02-04 山东大学 Integrated learning method for recognizing ECM (extracellular matrix) protein
CN105138866A (en) * 2015-08-12 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for identifying protein functions based on protein-protein interaction network and network topological structure features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521527A (en) * 2011-12-12 2012-06-27 同济大学 Method for predicting space epitope of protein antigen according to antibody species classification
CN102779240A (en) * 2012-06-21 2012-11-14 哈尔滨工程大学 Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis
CN104331642A (en) * 2014-10-28 2015-02-04 山东大学 Integrated learning method for recognizing ECM (extracellular matrix) protein
CN105138866A (en) * 2015-08-12 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for identifying protein functions based on protein-protein interaction network and network topological structure features

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169312B (en) * 2017-05-27 2020-05-08 南开大学 Low-complexity natural disordered protein prediction method
CN107169312A (en) * 2017-05-27 2017-09-15 南开大学 A kind of Forecasting Methodology of the natural unordered protein of low complex degree
CN107341363A (en) * 2017-06-29 2017-11-10 河北省科学院应用数学研究所 A kind of Forecasting Methodology of proteantigen epitope
CN107341363B (en) * 2017-06-29 2020-09-22 河北省科学院应用数学研究所 Prediction method of protein epitope
CN110060738A (en) * 2019-04-03 2019-07-26 中国人民解放军军事科学院军事医学研究院 Method and system based on machine learning techniques prediction bacterium protective antigens albumen
CN110010248B (en) * 2019-04-17 2023-01-10 电子科技大学 Readmission risk prediction method based on cost-sensitive integrated learning model
CN110010248A (en) * 2019-04-17 2019-07-12 电子科技大学 A kind of readmission's Risk Forecast Method based on cost-sensitive integrated study model
CN110428865A (en) * 2019-08-14 2019-11-08 信阳师范学院 A kind of method of high-throughput prediction Antifreeze protein
CN110444249A (en) * 2019-08-14 2019-11-12 信阳师范学院 A method of the prediction fluorescence protein based on calculating
CN110444249B (en) * 2019-08-14 2022-02-01 信阳师范学院 Method for predicting fluorescent protein based on calculation
CN110689544A (en) * 2019-09-06 2020-01-14 哈尔滨工程大学 Method for segmenting delicate target of remote sensing image
CN111339165A (en) * 2020-02-28 2020-06-26 重庆邮电大学 Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket
CN111339165B (en) * 2020-02-28 2022-06-03 重庆邮电大学 Mobile user exit characteristic selection method based on Fisher score and approximate Markov blanket
CN114242169A (en) * 2021-12-15 2022-03-25 河北省科学院应用数学研究所 Antigen epitope prediction method for B cells
CN114242169B (en) * 2021-12-15 2023-10-20 河北省科学院应用数学研究所 Antigen epitope prediction method for B cells
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN116130005A (en) * 2023-01-30 2023-05-16 深圳新合睿恩生物医疗科技有限公司 Tandem design method and device for multi-epitope vaccine, equipment and storage medium
CN116130005B (en) * 2023-01-30 2023-06-16 深圳新合睿恩生物医疗科技有限公司 Tandem design method and device for multi-epitope vaccine, equipment and storage medium

Also Published As

Publication number Publication date
CN105868583B (en) 2018-08-10

Similar Documents

Publication Publication Date Title
CN105868583A (en) Method for predicting epitope through cost-sensitive integrating and clustering on basis of sequence
CN102222178B (en) Method for screening and/or designing medicines aiming at multiple targets
CN109887541A (en) A kind of target point protein matter prediction technique and system in conjunction with small molecule
CN107038348A (en) Drug targets Forecasting Methodology based on protein ligands interaction finger-print
CN106446607A (en) Drug target virtual screening method based on interactive fingerprints and machine learning
CN102779240B (en) Inherent irregular protein structure forecasting method based on kernel canonical correlation analysis
CN106203377A (en) A kind of coal dust image-recognizing method
CN102521527A (en) Method for predicting space epitope of protein antigen according to antibody species classification
CN107194207A (en) Protein ligands binding site estimation method based on granularity support vector machine ensembles
CN111402967A (en) Method for improving virtual screening capability of docking software based on machine learning algorithm
CN102884203A (en) Query sequence genotype or subtype classification method
Wang et al. G-DipC: an improved feature representation method for short sequences to predict the type of cargo in cell-penetrating peptides
CN115240762A (en) Multi-scale small molecule virtual screening method and system
CN104615910B (en) The method that the spiral interaction relationship of α transmembrane proteins is predicted based on random forest
CN101110095B (en) Method for batch detecting susceptibility gene of common brain disease
CN113421658B (en) Drug-target interaction prediction method based on neighbor attention network
Saha et al. Application of data mining in protein sequence classification
Van Buren et al. Artificial intelligence and deep learning to map immune cell types in inflamed human tissue
CN109326329A (en) Zinc-binding protein matter action site prediction technique based on integrated study under a kind of unbalanced mode
CN110598836B (en) Metabolic analysis method based on improved particle swarm optimization algorithm
CN114242159B (en) Method for constructing antigen peptide presentation prediction model, and antigen peptide prediction method and device
CN108388774A (en) A kind of on-line analysis of polypeptide spectrum matched data
Yu et al. A supervised approach to detect protein complex by combining biological and topological properties
CN101609486A (en) The recognition methods of g protein coupled receptor superclass and Web service system thereof
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180810

Termination date: 20210406