CN109326329B - Zinc binding protein action site prediction method - Google Patents

Zinc binding protein action site prediction method Download PDF

Info

Publication number
CN109326329B
CN109326329B CN201811353819.0A CN201811353819A CN109326329B CN 109326329 B CN109326329 B CN 109326329B CN 201811353819 A CN201811353819 A CN 201811353819A CN 109326329 B CN109326329 B CN 109326329B
Authority
CN
China
Prior art keywords
sample
binding protein
classification
svm
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811353819.0A
Other languages
Chinese (zh)
Other versions
CN109326329A (en
Inventor
李慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinling Institute of Technology
Original Assignee
Jinling Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinling Institute of Technology filed Critical Jinling Institute of Technology
Priority to CN201811353819.0A priority Critical patent/CN109326329B/en
Publication of CN109326329A publication Critical patent/CN109326329A/en
Application granted granted Critical
Publication of CN109326329B publication Critical patent/CN109326329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a zinc binding protein action site prediction method, aiming at the characteristics of zinc binding protein action sites, protein source data are preprocessed; carrying out balancing treatment on the nonequilibrium of the zinc binding protein action sites by means of a random down-sampling technology to obtain a plurality of sub-balance data sets; selecting distinguishable protein biochemical characteristics on a plurality of sub-equilibrium data sets respectively, and performing characteristic representation to form characteristic vectors; respectively taking the feature vectors as the input of a base classifier support vector machine, calculating sample weights, then constructing a probability neural network model based on sample weights, and finally integrating the base classification model support vector machine and the probability neural network model based on the sample weights to obtain a prediction model; and identifying the zinc binding protein action site in the target sample by using the obtained prediction model.

Description

Zinc binding protein action site prediction method
Technical Field
The invention relates to a zinc binding protein action site prediction method, which aims at identifying a zinc binding protein action site by utilizing an ensemble learning classification model under a non-equilibrium classification mode, and belongs to the crossing field of proteomics and computer science.
Background
With the completion of human genome project, life sciences have entered the post-genome era, and proteins expressed by genes have become one of the important research subjects in the fields of life sciences and natural sciences. Proteins (proteins) are the basic organic substances that make up cells, are the material basis of life, and play a decisive role in biological life processes. However, this decisive effect is not simply determined by a single protein, and in most cases, it is necessary for a protein to interact with other proteins or ligands in order to perform a specific biological function.
In cells, proteins serve as both players and carriers of vital activities and perform specific critical functions such as DNA synthesis, signal transduction, gene transcriptional activation, vital metabolic processes, viral protection, etc., by interacting with ligands. Secondly, the protein action also has great promotion effect on the treatment of various diseases, in particular to the invasion of some virus proteins, such as Ebola virus (Ebola virus), which can reveal the pathogenesis of some diseases, and can find the target of some medicines and have guiding effect on the development of new medicines.
The metal ions are used as cofactors to be combined with proteins, and play a decisive role in the biological functions of the proteins and even in some life processes. The zinc ion is the second most abundant metal ion in the organism, is second to iron, and has important regulation and control effects on growth and development of the organism, disease control, DNA synthesis and the like. Zinc ion deficiency can lead to diseases such as age-related degenerative diseases, malignancies and Wilson's disease. In addition, zinc also has important effects on aging, apoptosis, immune function and oxidative stress. Zinc ions bind to proteins to perform biological functions such as catalysis, structure stabilization and coordination.
The recognition of the zinc binding protein action site mainly adopts a biochemical experiment method. Although the experimental methods can determine the interaction sites between the protein and the zinc ions, the experimental determination cost is too high, and time and labor are wasted; moreover, because different limiting conditions are required for experiments, different experimental principles are adopted, so that the experimental result has certain false negative and false positive. Therefore, the biological significance of these data found by simple experimental techniques and means is far from meeting the needs of biological development.
With the development of information technology and the appearance of massive biological data, it is a necessary trend of development to automatically identify the zinc binding protein action sites by using some calculation methods such as data mining technology and machine learning related algorithms. The method has the advantages of low cost, high speed and the like, can make up for the defects of the experiment, and further provides direct support and guidance for the interaction of the expensive biological experiment determination.
The prediction of the zinc ion binding protein action site is a two-class problem, the really bound action site is few, the non-bound action site is high in proportion, and the prediction of the zinc ion binding protein action site is a typical non-equilibrium classification problem. The existing prediction method adopts methods such as data mining and the like to establish a classification model, treats two types of samples equally, does not consider the imbalance of data, and causes the prediction precision of the zinc binding protein action site to be very low. Therefore, the research on the non-equilibrium in the prediction of the action site of the zinc binding protein and the improvement of the classification accuracy of a few classes have important research significance.
Disclosure of Invention
The invention aims to provide a zinc binding protein action site prediction method based on ensemble learning in an unbalanced mode aiming at the unbalanced classification problem in the zinc binding protein action site prediction.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a method for predicting the action site of a zinc binding protein, comprising the following steps:
the method comprises the following steps: preprocessing protein source data aiming at the characteristics of the action site of the zinc binding protein;
step two: carrying out balancing treatment on the nonequilibrium of the zinc binding protein action sites by means of a random down-sampling technology to obtain a plurality of sub-balance data sets;
step three: selecting distinguishable protein biochemical characteristics on a plurality of sub-equilibrium data sets respectively, and performing characteristic representation to form characteristic vectors;
step four: respectively taking the feature vectors as the input of a base classifier support vector machine, calculating sample weights, then constructing a probability neural network model based on sample weights, and finally integrating the base classification model support vector machine and the probability neural network model based on the sample weights to obtain a prediction model;
step five: and identifying the zinc binding protein action site in the target sample by adopting the prediction model obtained in the fourth step.
In the first step, the preprocessing removes the following noise data:
(1) removing peptide chain structure with homology higher than 70%;
(2) elimination of repetitive, shorter protein chains and erroneous and unreliable data;
(3) chains satisfying less than 20% sequence redundancy are removed.
In the second step, the balancing treatment is a random downsampling technology, namely random downsampling is carried out on large samples, the number of the large samples is the same as that of small samples, and a plurality of sub-balance data sets are formed; the large class of samples are non-binding protein sites of action and the small class of samples are zinc-binding protein sites of action.
In step three, the distinguishable biochemical characteristics comprise a characteristic position specificity score matrix, conservative scores and relative weights of RW-GRMTP (relative weight of gapless real matches of pseudoranges) to pseudo amounts; carrying out normalization processing on the position specificity scoring matrix, and adopting a histogram and a sliding window to process to obtain a 20-dimensional vector; converting the 20-dimensional conservation score into a value; normalization processing is carried out on RW-GRMTP to obtain a 2-dimensional vector; finally, a 23-dimensional feature vector is formed.
In the fourth step, training SVM support vector machine on several sub-balance data sets, respectively, and calculating prediction error rate e according to equation (1) and equation (2)jAnd important program weights α for classification modelsj
Figure GDA0002495434440000031
Figure GDA0002495434440000032
Wherein the whole volume data set is D, D { (x)1,y1),(x2,y2),…,(xn,yn)},
Figure GDA0002495434440000034
X represents the class domain instance space of the classification problem,
Figure GDA0002495434440000035
i is 1,2, … n, n is the number of samples; w is amiFor weighting, the initial value is set to 1/n, i.e. w1=(w11,w12,...,w1n) Wherein w is1i=1/n;i=1,2,…,n;m is 1, 2; respectively training the k balanced data sets by using a base classifier SVM to obtain k classification prediction results Csvm_j(x),j=1,…,k。
Calculating the weight of the current sample and carrying out normalization processing, wherein the sample is correctly classified, and the corresponding sample weight is reduced; if the sample classification is wrong, the corresponding sample weight is increased, and the calculation formula is as follows (3):
Figure GDA0002495434440000033
and constructing a probability neural network model based on sample weighting, namely weighting the protein characteristic data, taking the weighted sample data as the input of the probability neural network model, and predicting by using the probability neural network, wherein the method is marked as SWPNN (single-point neural network), and the prediction result is SWPNN (x).
Integrating a base classification model support vector machine and a sample weighting-based probabilistic neural network model to obtain a prediction model SSWPNN, wherein the SSWPNN is { SVM, SWPNN, kernelopt, spread, f }, wherein the kernelopt and the spread are parameters of the SVM and the SWPNN classifier respectively, the definition of f is shown as a formula (4), and meanwhile, calculating corresponding weight β according to an error ratej
Figure GDA0002495434440000041
Where δ is the threshold value, Csvm_j(x) And SWPNN (x) are the classification results of the classifiers SVM and SWPNN respectively, if the value of the classification result is greater than 0, the classification result is predicted to be a positive sample, and if the value of the classification result is less than 0, the classification result is predicted to be a negative sample. If the value of svm (x) is positive and smaller than the threshold δ and swpnn (x) is predicted as a counter example, the final integrated prediction result is determined as a counter example, otherwise, the svm (x) result is used as the final determination result.
In the fifth step, the whole test data set is respectively predicted by using an integration model SSWPNN to obtain different classification results, then the results are subjected to weighted integration, and finally the action sites of the zinc binding protein in the target sample are identified, as shown in the formula (5):
Figure GDA0002495434440000042
has the advantages that:
the method provided by the invention provides a novel zinc binding protein action site prediction method based on ensemble learning from the perspective of machine learning, aiming at the problem of recognition of the zinc binding protein action site in an unbalanced mode, so that the prediction of the zinc binding protein action site in the unbalanced classification mode is effectively solved, and a certain prediction accuracy is obtained. The invention can be applied to the prediction and identification of the action sites of other types of metal ion binding proteins after being expanded.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is an overall block diagram of the method of the present invention.
FIG. 2 is a framework diagram of zinc binding protein action site classifier based on SVM and SWPNN models.
Fig. 3 is a prediction process diagram of the SSWPNN classifier.
Detailed Description
The invention will be better understood from the following examples.
The general flow of the present invention is shown in FIG. 1.
Aiming at the problem of predicting the zinc binding protein action site under an unbalanced data set, the invention uses a down-sampling technology to balance data so that the data tend to be stable. And (3) constructing a probabilistic neural network classifier model based on a support vector machine and sample weighting by utilizing an integration technology, and classifying and identifying the zinc binding protein action sites by using the model.
The specific implementation steps are as follows:
1. equilibration treatment
The zinc-binding protein site of action is called a subclass (negative subclass); the unbound protein sites of action are called the bulk sample (positive sample). Randomizing large samplesAnd (4) under-sampling without putting back, and in order to avoid the loss of useful information of a large class of samples caused by random under-sampling, multiple under-sampling without putting back on the data corpus is adopted. Random sample sampling without returning is carried out on the large class samples, the same number of the large class samples as the small class samples are extracted each time, namely the large class samples are divided into k subsets, and each subset and the small class samples are synthesized into a balanced data set D1,D2,…,Dk. The process can be described by algorithm 1:
algorithm 1: data balancing processing algorithm
Inputting: protein sequence sample data D
And (3) outputting: sub-balanced data set D1,D2,…,Dk
1 BEGIN;
2 Divide(D);
3 N=CountUp(MinoritySample);
4 For(i=1;i<=k;i++);
5 ExtractedSamplei=RandomExtract(MajoritySample,N);
6 Di=Merge(MinoritySample,ExtractedSamplei);
7 MajoritySample=MajoritySample-ExtractedSamplei
8 End for;
9 END。
2. Attribute feature representation
Selecting distinguishable biochemical characteristics: position specificity scoring matrix, conservative score and RW-GRMTP (relative weight of gapless programs to pseudo) to perform feature representation and form feature vector set. Carrying out normalization processing on the position specificity scoring matrix, and adopting a histogram and a sliding window to process to obtain a 20-dimensional vector; converting the 20-dimensional conservation score into a value; normalization processing is carried out on RW-GRMTP to obtain a 2-dimensional vector; finally, a 23-dimensional feature vector is formed.
3. Probabilistic neural network model integrating support vector machine and sample weighting
And training by using a base classifier support vector machine, weighting samples according to a classification result, and training a weighted probability neural network model for 'difficult-to-divide samples' which are easily divided by mistake at a boundary.
Let the whole volume data set be D, D { (x)1,y1),(x2,y2),…,(xn,yn)},
Figure GDA0002495434440000064
X represents the class domain instance space of the classification problem,
Figure GDA0002495434440000065
i is 1,2, … n, n is the number of samples. The process is as follows:
step 1, respectively training SVM classifiers on a plurality of sub-balance data sets;
respectively training the k sub-balance data sets by using a base classifier SVM, and obtaining k classification prediction results C by adopting 5-folding cross validationsvm_j(x) J is 1, …, k. The predicted error rate is recorded as ejThe importance degree of the classification model is weighted αjThe equations (1) and (2) are calculated. In the formula (1), wmiFor weighting, the initial value is set to 1/n, i.e. w1=(w11,w12,...,w1n) Wherein w is1i=1/n;i=1,2,…,n;m=1,2。
Figure GDA0002495434440000061
Figure GDA0002495434440000062
Step 2, calculating the weight of the current sample and carrying out normalization processing;
after the first round of prediction by the SVM, if a certain sample is correctly classified, reducing the weight of the sample in the next round of prediction; conversely, if a sample is classified incorrectly, his weight is increased in the next prediction round. The sample weight function is calculated as in equation (3):
Figure GDA0002495434440000063
step 3, training a sample weighting-based PNN predictor SWPNN;
and weighting the feature sample data by using the weight calculated in Step 2, training a probability neural network model based on the weighting, recording the proposed method as SWPNN, and obtaining a prediction result of SWPNN (x). The zinc binding site classifier framework based on SVM and SWPNN models is shown in figure 2.
Step 4, integrating a base classification model SVM and a sample weighted SWPNN classifier;
a probability neural network model integrating a base classifier SVM and sample weighting provides a new prediction method SSWPNN, namely SSWPNN { (SVM, SWPNN, kernelopt, spread, f }, wherein the kernelopt and the spread are parameters of the SVM and the SWPNN classifier respectively, and the definition of f is shown as a formula (4)j(the weight of the base classifier in the final classifier).
Figure GDA0002495434440000071
Where δ is the threshold value, Csvm_j(x) And SWPNN (x) are the classification results of the classifiers SVM and SWPNN respectively, if the value of the classification result is greater than 0, the classification result is predicted to be a positive sample, and if the value of the classification result is less than 0, the classification result is predicted to be a negative sample. If the value of svm (x) is positive and smaller, smaller than the threshold δ, and the swpnn (x) prediction is counterexample, the final integrated prediction result is determined as counterexample, otherwise, the svm (x) result is used as the final determination result.
And Step 5, respectively predicting the whole data set by using an integration model SSWPNN in Step 4 to obtain different classification results, and performing weighted integration on the results by using a formula (5) to finally identify the action sites of the zinc binding protein. The frame model is shown in fig. 3.
Figure GDA0002495434440000072
The test was performed on the 392 protein chain data set, and compared with the existing four methods (meta-zinc prediction, zinc explorer, zinc finder, zinc pred), the method of the present invention is superior to other methods, whether the overall prediction performance of the four residues (CHED) or the prediction performance of any one of the residues.
The present invention provides a method and a concept for predicting the site of action of a zinc binding protein, and a plurality of methods and ways for implementing the method, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (1)

1. A method for predicting a zinc binding protein site of action, comprising the steps of:
the method comprises the following steps: preprocessing protein source data aiming at the characteristics of the action site of the zinc binding protein;
step two: carrying out balancing treatment on the nonequilibrium of the zinc binding protein action sites by means of a random down-sampling technology to obtain a plurality of sub-balance data sets;
step three: selecting distinguishable protein biochemical characteristics on a plurality of sub-equilibrium data sets respectively, and performing characteristic representation to form characteristic vectors;
step four: respectively taking the feature vectors as the input of a base classifier support vector machine, calculating sample weights, then constructing a probability neural network model based on sample weights, and finally integrating the base classification model support vector machine and the probability neural network model based on the sample weights to obtain a prediction model;
step five: identifying the zinc binding protein action site in the target sample by adopting the prediction model obtained in the step four;
in the first step, the preprocessing removes the following noise data:
(1) removing peptide chain structure with homology higher than 70%;
(2) elimination of repetitive, shorter protein chains and erroneous and unreliable data;
(3) removing chains satisfying sequence redundancy less than 20%;
in the second step, the balancing treatment is a random downsampling technology, namely random downsampling is carried out on large samples, the number of the large samples is the same as that of small samples, and a plurality of sub-balance data sets are formed; the large sample is a non-binding protein action site, and the small sample is a zinc-binding protein action site;
in step three, the distinguishable biochemical characteristics comprise a characteristic position specificity score matrix, a conservative score and RW-GRMTP; carrying out normalization processing on the position specificity scoring matrix, and adopting a histogram and a sliding window to process to obtain a 20-dimensional vector; converting the 20-dimensional conservation score into a value; normalization processing is carried out on RW-GRMTP to obtain a 2-dimensional vector; finally forming a 23-dimensional feature vector;
in the fourth step, training SVM support vector machine on several sub-balance data sets, respectively, and calculating prediction error rate e according to equation (1) and equation (2)jAnd important program weights α for classification modelsj
Figure FDA0002495434430000011
Figure FDA0002495434430000012
Wherein the whole volume data set is D, D { (x)1,y1),(x2,y2),…,(xn,yn)},xi∈ X, X stands for class Domain instance space of the classification problem, yi∈ {1, -1}, i ═ 1,2, … n, n is the number of samples, wmiFor weighting, the initial value is set to 1/n, i.e. w1=(w11,w12,...,w1n) Wherein w is1i1/n; 1,2, …, n; m is 1, 2; separately on k sub-balanced datasetsTraining by using a base classifier SVM to obtain k classification prediction results Csvm_j(x),j=1,…,k;
In the fourth step, the weight of the current sample is calculated and normalized, the sample is classified correctly, and the corresponding sample weight is reduced; if the sample classification is wrong, the corresponding sample weight is increased, and the calculation formula is as follows (3):
Figure FDA0002495434430000021
in the fourth step, a probabilistic neural network model based on sample weighting is constructed to weight protein characteristic data, the weighted sample data is used as the input of the probabilistic neural network model, and the probabilistic neural network is used for prediction, wherein the method is marked as SWPNN, and the prediction result is SWPNN (x);
in the fourth step, a prediction model SSWPNN is obtained by integrating a base classification model support vector machine and a sample weighting-based probabilistic neural network model, wherein the SSWPNN is { SVM, SWPNN, kernelopt, spread, f }, wherein the kernelopt and the spread are parameters of the SVM and the SWPNN classifier respectively, the definition of f is shown in a formula (4), and corresponding weights β are calculated according to error rates at the same timej
Figure FDA0002495434430000022
Where δ is the threshold value, Csvm_j(x) And SWPNN (x) are the classification results of the classifiers SVM and SWPNN respectively, if the value of the classification result is greater than 0, the classification result is predicted to be a positive sample, and if the value of the classification result is less than 0, the classification result is predicted to be a negative sample; if the value of SVM (X) is positive and less than the threshold value delta and SWPNN (X) is predicted to be a counterexample, the final integrated prediction result is judged to be the counterexample, and in other cases, the SVM (X) result is taken as the final judgment result;
in the fifth step, the whole data set is respectively predicted by using an integration model SSWPNN to obtain different classification results, then the results are subjected to weighted integration, and finally the action sites of the zinc binding protein in the target sample are identified, as shown in the formula (5):
Figure FDA0002495434430000031
CN201811353819.0A 2018-11-14 2018-11-14 Zinc binding protein action site prediction method Active CN109326329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811353819.0A CN109326329B (en) 2018-11-14 2018-11-14 Zinc binding protein action site prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811353819.0A CN109326329B (en) 2018-11-14 2018-11-14 Zinc binding protein action site prediction method

Publications (2)

Publication Number Publication Date
CN109326329A CN109326329A (en) 2019-02-12
CN109326329B true CN109326329B (en) 2020-07-07

Family

ID=65257207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811353819.0A Active CN109326329B (en) 2018-11-14 2018-11-14 Zinc binding protein action site prediction method

Country Status (1)

Country Link
CN (1) CN109326329B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109979525A (en) * 2019-02-28 2019-07-05 天津大学 Improved hormonebinding protein qualitative classification method
CN110689920B (en) * 2019-09-18 2022-02-11 上海交通大学 Protein-ligand binding site prediction method based on deep learning
CN111916148B (en) * 2020-08-13 2023-01-31 中国计量大学 Method for predicting protein interaction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992079A (en) * 2015-06-29 2015-10-21 南京理工大学 Sampling learning based protein-ligand binding site prediction method
CN106250718A (en) * 2016-07-29 2016-12-21 於铉 N based on individually balanced Boosting algorithm1methylate adenosine site estimation method
CN107194207A (en) * 2017-06-26 2017-09-22 南京理工大学 Protein ligands binding site estimation method based on granularity support vector machine ensembles
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992079A (en) * 2015-06-29 2015-10-21 南京理工大学 Sampling learning based protein-ligand binding site prediction method
CN106250718A (en) * 2016-07-29 2016-12-21 於铉 N based on individually balanced Boosting algorithm1methylate adenosine site estimation method
CN106250718B (en) * 2016-07-29 2018-03-02 於铉 N based on individually balanced Boosting algorithms1Methylate adenosine site estimation method
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information
CN107194207A (en) * 2017-06-26 2017-09-22 南京理工大学 Protein ligands binding site estimation method based on granularity support vector machine ensembles

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"基于不平衡学习的蛋白质_维生素绑定位点预测研究";朱非易;《中国优秀硕士学位论文全文数据库 基础科学辑》;20160115(第01期);第27页 *
"基于机器学习方法的蛋白质亚细胞定位预测研究";马军伟;《中国博士学位论文全文数据库 基础科学辑》;20120615(第06期);第35,55页 *
"蛋白质相互作用位点预测方法研究";魏志森;《中国博士学位论文全文数据库 基础科学辑》;20180715(第07期);第37-38,40-41,56,58-61,63,67页 *

Also Published As

Publication number Publication date
CN109326329A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
Zhang et al. Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer
Al-Ajlan et al. CNN-MGP: convolutional neural networks for metagenomics gene prediction
Lee et al. Review of statistical methods for survival analysis using genomic data
Koo et al. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology
Abdi et al. A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification
Qu et al. An asymmetric classifier based on partial least squares
CN109326329B (en) Zinc binding protein action site prediction method
Rasheed et al. Metagenomic taxonomic classification using extreme learning machines
Zhang et al. Informative gene selection and direct classification of tumor based on chi-square test of pairwise gene interactions
Land Jr et al. Partial least squares (PLS) applied to medical bioinformatics
Qiao et al. MIonSite: ligand-specific prediction of metal ion-binding sites via enhanced AdaBoost algorithm with protein sequence information
Wong et al. Predicting protein-ligand binding site using support vector machine with protein properties
Wang et al. Prediction of protein self-interactions using stacked long short-term memory from protein sequences information
Shujaat et al. Cr-prom: A convolutional neural network-based model for the prediction of rice promoters
Qiu et al. Prediction of protein–protein interaction sites using patch-based residue characterization
Kim et al. Bayesian evolutionary hypergraph learning for predicting cancer clinical outcomes
Baten et al. Fast splice site detection using information content and feature reduction
Liu et al. Prediction of acid radical ion binding residues by K-nearest neighbors classifier
Yousef et al. SFM: a novel sequence-based fusion method for disease genes identification and prioritization
Liu et al. Recognizing ion ligand–binding residues by random forest algorithm based on optimized dihedral angle
CN115662504A (en) Multi-angle fusion-based biological omics data analysis method
Kusonmano et al. Effects of pooling samples on the performance of classification algorithms: a comparative study
Wiemer et al. Bioinformatics in proteomics: application, terminology, and pitfalls
Li et al. A novel prediction method for zinc-binding sites in proteins by an ensemble of SVM and sample-weighted probabilistic neural network
Webb-Robertson Support vector machines for improved peptide identification from tandem mass spectrometry database search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant