CN103955628A - Subspace fusion-based protein-vitamin binding location point predicting method - Google Patents

Subspace fusion-based protein-vitamin binding location point predicting method Download PDF

Info

Publication number
CN103955628A
CN103955628A CN201410164632.1A CN201410164632A CN103955628A CN 103955628 A CN103955628 A CN 103955628A CN 201410164632 A CN201410164632 A CN 201410164632A CN 103955628 A CN103955628 A CN 103955628A
Authority
CN
China
Prior art keywords
protein
amino acid
feature
acid residue
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410164632.1A
Other languages
Chinese (zh)
Other versions
CN103955628B (en
Inventor
胡俊
於东军
何雪
李阳
沈红斌
杨静宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201410164632.1A priority Critical patent/CN103955628B/en
Publication of CN103955628A publication Critical patent/CN103955628A/en
Application granted granted Critical
Publication of CN103955628B publication Critical patent/CN103955628B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a subspace fusion-based protein-vitamin binding location point predicting method. The method comprises the steps of feature extracting and feature combining: extracting evolution information, secondary-level structural information the binding tendency information of a protein by respectively using a PSI-BLAST, a PSIPRED and a protein-vitamin binding location point tendency table, and converting amino acid residues in a protein sequence into a vector presentation mode by a sliding window and serial combination; using a multi-feature selection algorithm to perform feature selection on an original feature space for multiple times; forming a feature subspace by feature subsets obtained from feature selection every time and establishing multiple feature subspaces; training one SVM (support vector machine) classifier for each obtained feature subspace; fusing multiple SVM classifiers which are trained by a weighted average classifier fusing mode; performing protein-vitamin binding location prediction on the protein to be predicted based on a fused SVM predictor. The prediction method is fast in prediction sped and high in prediction precision.

Description

Protein-vitamin bindings bit point prediction the method merging based on subspace
Technical field
The present invention relates to bioinformatics protein-vitamin interaction field, in particular to a kind of protein-vitamin bindings bit point prediction method merging based on subspace.
Background technology
Vital effect has been played in interaction between protein and vitamin in metabolism, is ubiquity and indispensable in vital movement.Determine that by the method for Bioexperiment the binding site between protein and vitamin need to expend a large amount of time and fund, and efficiency is lower.Along with the develop rapidly of sequencing technologies and the continuous propelling of mankind's Structural genomics, in proteomics, accumulate the protein sequence not carrying out in a large number with vitamin binding site demarcation.Therefore the relevant knowledge of applying biological information science, research and development can be directly from protein sequence carry out protein-vitamin binding site fast and accurately Intelligent Forecasting have active demand, and for finding and understanding protein structure and physiological function have great significance.
At present, be also short of very much for the computation model in predicted protein matter-vitamin binding site.At present only find that a kind of specialized designs is used for carrying out the computation model of protein-vitamin bindings bit point prediction, i.e. VitaPred.VitaPred is that first specialized designs is used for carrying out the fallout predictor (B.Panwar of protein-vitamin bindings bit point location in the world, S.Gupta, and G.P.S.Raghava, " Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information; " BMC Bioinformatics, vol.14, Feb7,2013).VitaPred is that one can predicted protein matter and the fallout predictor of variety classes vitamin (Alphalin, vitamin B, Vitamin B6 etc.).Owing to existing otherness between different vitamin kinds, so VitaPred has constructed 4 nonredundant data acquisitions, respectively: contain 187 with vitamin have binding relationship protein (this data acquisition is not distinguished vitaminic kind), contain 31 with Alphalin have binding relationship protein, contain 141 and have the protein of binding relationship with vitamin B and contain 71 and have the protein of binding relationship with Vitamin B6.VitaPred, by extracting the represented evolution information characteristics of position-specific scoring matrices of amino acid residue, then inputs svm classifier model and judges whether an amino acid residue belongs to protein vitamin binding site.This is external, in the corresponding paper of VitaPred, also attempt other feature and the combination method of being combined with SVM thereof and go to predicted protein matter-vitamin binding site, but its precision of prediction with and the extendability method that information characteristics is combined with SVM of all not evolving good, so the method for VitaPred just represents the Forecasting Methodology that evolution information characteristics is combined with SVM.
But, comprehensively analyze this only forecast model, it is all by the same procedure under same proper subspace to protein and variety classes vitamin bindings bit point prediction problem, thus the poor problem of the interpretation causing has to be overcome; And can find that precision of prediction is apart from the in addition larger gap of practical application, in the urgent need to further raising.
Summary of the invention
For solve in above-mentioned single various dimensions feature space, have mutual exclusion feature cause precision of prediction apart from practical application gap the large and poor shortcoming of interpretation, the object of the invention is to propose a kind of protein-vitamin bindings bit point prediction method merging based on subspace that predetermined speed is fast, precision of prediction is high.
For reaching above-mentioned purpose, the technical solution adopted in the present invention is as follows:
Protein-vitamin bindings bit point prediction the method merging based on subspace, comprises the following steps:
Step 1, feature extraction and Feature Combination, utilize respectively PSI-BLAST algorithm, PSIPRED algorithm to extract evolution information characteristics and the secondary structure information characteristics of protein, and according to the binding tendentiousness information characteristics of protein-vitamin binding site tendency table extraction protein, aforementioned three kinds of features composition primitive character space; Then using moving window and serial combination mode that the amino acid residue in protein sequence is converted to vector form represents;
Step 2, use characteristic selection algorithm are Joint Laplacian Feature Weights Learning algorithm, Fisher Score algorithm and Laplacian Score algorithm, respectively repeatedly feature selecting are carried out in primitive character space; The character subset that each feature selecting obtains forms a proper subspace, thereby builds multiple proper subspaces;
Step 3, each proper subspace to step 2 gained, train a svm classifier device;
Step 4: use average weighted Multiple Classifier Fusion mode to training complete multiple svm classifier devices to merge; And
Step 5, SVM fallout predictor based on after merging are treated predicted protein matter and are carried out protein-vitamin bindings bit point prediction.
Further, in embodiment, in described step 1, comprise the following steps for feature extraction and the serial combination of training protein:
Step 1-1, for the protein being formed by l amino acid residue, obtain its position-specific scoring matrices by PSI-BLAST algorithm, this matrix is the matrix that a l capable 20 is listed as, thereby prlmary structure of protein information (information of evolving) is converted to matrix representation:
Wherein: A, C...Y represent 20 seed amino acid residues, p i,jrepresent that i amino acid residue of protein is mutated into the possibility of j amino acid residue of 20 seed amino acid residues during evolution;
Then utilize following formula (2) to carry out standardization line by line to the each value in PSSM:
f ( x ) = 1 1 + e - x - - - ( 2 )
PSSM after standardization is suc as formula (3):
Afterwards, re-use the moving window of size for W, extract the eigenmatrix of each amino acid residue:
Finally, above-mentioned eigenmatrix (4) is combined into the proper vector that dimension is 20*W by the mode of row major:
f i = ( p i , 1 normalized , p i , 2 normalized , . . . , p i , 20 W normalized ) T - - - ( 5 )
Step 1-2, for the protein being formed by l amino acid residue, obtain its secondary structure probability matrix by PSIPRED, this matrix is the matrix that a l capable 3 is listed as, shown in (6):
Wherein, C, H...E represent three kinds of secondary structure: coil, helix, the strand of protein, s i, 1the secondary structure that represents i amino acid residue in protein is the probability of coil, s i, 2the secondary structure that represents i amino acid residue in protein is the probability of helix, s i, 3the secondary structure that represents i amino acid residue in protein is the probability of strand;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 3*W, shown in (7):
f i=(s i,1,s i,2,…,p i,3W) T (7)
Step 1-3, for the protein being formed by l amino acid residue, obtain by searching protein-vitamin binding site tendency table the matrix that contains its binding tendentiousness information, this matrix is the matrix that a l capable 1 is listed as, shown in (8):
b 1 . . . b i . . . b l - - - ( 8 )
Wherein, b irepresent that in protein, i amino acid residue bound vitaminic tendentiousness;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 1*W, shown in (9):
f i=(b i,1,b i,2,…,b i,W) T (9)
Step 1-4,3 proper vector serial combination that above-mentioned steps is obtained, obtain the proper vector that length is 20*W+3*W+1*W.
Further, in embodiment, in described step 2, the specific implementation that uses described three kinds of feature selecting algorithm to build multiple proper subspaces comprises the following steps:
Feature selecting is carried out in step 2-1, the primitive character space that utilizes Joint Laplacian Feature Weights Learning algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use following formula (10) and formula (11) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jh ij, 1≤i≤M and 1≤j≤M (11)
Wherein, R n × Mrepresent the scale of X matrix, X has M element that has N dimensional feature, N representation feature dimension, and M represents that number of samples is amino acid residue number;
2) the Laplacian matrix H to above-mentioned steps gained m × Mwith diagonal matrix D m × Msolve generalized eigenvalue decomposition problem Hy=λ Dy, obtain an eigenvalue of maximum characteristic of correspondence vector y below 1;
3) use the above-mentioned proper vector y trying to achieve, upgrade the weight that every one-dimensional characteristic is corresponding until restrain according to following formula (12):
w i t + 1 ← 2 3 w i t + 1 3 w i t ( 2 Xy + 4 ϵw t ) i ( 2 XX T w t + 4 ϵw t ( w t ) T w t ) i , 1 ≤ i ≤ N - - - ( 12 )
Wherein, w=[w 1, w 2..., w i..., w n] represent each characteristic dimension weight, the transposition of T representing matrix, t represents iterations, ε represents to control the lax item of neutral element number in w;
4) at the above-mentioned weight vectors w=[w trying to achieve 1, w 2..., w i..., w n], select all weight component w that are greater than zero icorresponding sample characteristics dimension, the proper subspace finally all selected characteristic dimension being combined into output, simultaneously by the number of characteristic dimension in subspace output in the lump;
Feature selecting is carried out in step 2-2, the primitive character space that utilizes Fisher Score algorithm to produce step 1, and it comprises:
1) for the space with c class original sample wherein represent the sample set of i class, representation feature vector, represent classification, M (i)represent the number of samples of i class, aforementioned sample refers to an amino acid residue of protein; Calculate the average of every one-dimensional characteristic of each class data according to formula (13) and formula (14) and variance
u n ( i ) = 1 M ( i ) Σ j = 1 M ( i ) x jn ( i ) , 1≤n≤N and 1≤i≤c (13)
( σ n ( i ) ) 2 = 1 M ( i ) Σ j = 1 M ( i ) ( x jn ( i ) - u n ( i ) ) 2 , 1≤n≤N and 1≤i≤c (14)
2) all averages that use above-mentioned middle calculating to get and variance each characteristic dimension is calculated to Fisher Score according to formula (15):
H n = Σ i = 1 c M ( i ) ( u n ( i ) - u n ) 2 Σ i = 1 c M ( i ) ( σ n ( i ) ) 2 , 1 ≤ n ≤ N - - - ( 15 )
Wherein, u nrepresent the average of n dimensional characteristics in all data, H nrepresent the Fisher Score value of n characteristic dimension, N characteristic dimension has a Fisher Score value;
Obtain a Fisher Score vector H, H=[H according to formula (15) 1, H 2..., H n... H n];
3) to above-mentioned Fisher Score vector H=[H 1, H 2..., H n... H n] in each value sort from big to small, then select before sample characteristics corresponding to individual Fisher Score value, the proper subspace output that all selected Feature Combinations are become, wherein the number that represents to select to have stayed feature, by step, 2-1 determines;
Feature selecting is carried out in step 2-3, the primitive character space that utilizes Laplacian Score algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use formula (16) and formula (17) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jh ij, 1≤i≤M and 1≤j≤M (17)
Wherein, R n × Mrepresent the scale of X matrix, be that X has M element that has N dimensional feature, N representation feature dimension, M represents that number of samples is amino acid residue number, σ represents Gaussian parameter, formula (16) is for trying to achieve the distance that two samples are the nuclear space of amino acid residue, and this σ is for controlling the width of nuclear space;
2) use the Laplacian matrix H of above-mentioned structure m × Mwith diagonal matrix D m × M, calculate the Laplacian Score of each characteristic dimension according to formula (18):
L n = Σ i = 1 M Σ j = 1 M ( x in - x jn ) 2 H ij Σ i = 1 M ( x in - x ‾ n ) 2 D ij , 1 ≤ n ≤ N - - - ( 18 )
Wherein, x inrepresent the value of n dimensional characteristics of i sample, represent the average of n dimensional characteristics of all samples; L nrepresent the Laplacian Score value of n characteristic dimension, N characteristic dimension has a Laplacian Score value, finally obtains a Laplacian Score vector L, L=[L according to formula (18) 1, L 2..., L n..., L n];
3) the Laplacian Score vector L=[L above-mentioned calculating being tried to achieve 1, L 2..., L n..., L n] in each value sort from big to small, then select before sample characteristics corresponding to individual Laplacian Score value, the proper subspace output that all selected Feature Combinations are become, wherein the number that represents to select to have stayed feature, by abovementioned steps, 2-1 determines.
Further, in embodiment, in described step 3, the distribution situation according to aforementioned original sample in each proper subspace, is used respectively the SVC classification algorithm training one sub spaces SVM fallout predictor in LIBSVM; Finally three different SVM fallout predictors are trained at three proper subspaces.
Further, in embodiment, in described step 4, use weighted average method to train the SVM fallout predictor of three different characteristic subspaces that obtain to merge to step 3, it comprises:
Make ω 1and ω 2represent respectively binding site class and unbundling site class, S 1, S 2and S 3represent respectively three SVM fallout predictors under different characteristic subspace, represent assessment sample set, for determining the weight of SVM model corresponding to subspace, the amino acid residue of wherein assessing sample set is known its classification; For each x irepresented sample characteristics, S 1, S 2and S 3will export the vector (s of three 2 dimensions 1,1(x i), s 1,2(x i)) t, (s 2,1(x i), s 2,2(x i)) t(s 3,1(x i), s 3,2(x i)) t, two elements of each 2 dimensional vectors represent respectively x ibelong to ω 1and ω 2degree and two elements and be 1, therefore for assessment sample set can obtain at S respectively 1, S 2and S 3on the matrix that predicts the outcome:
R i = s i , 1 ( x 1 ) s i , 2 ( x 1 ) s i , 1 ( x 2 ) s i , 2 ( x 2 ) . . . . . . s i , 1 ( x M eva ) s i , 2 ( x M eva ) T , i = 1,2,3 - - - ( 19 )
First, according to true category construction objective result matrix:
R true = p 1 1 - p 1 p 2 1 - p 2 . . . . . . p i 1 - p i . . . . . . p M eva 1 - p M eva T , If y i1p i=1, otherwise p i=0 (20)
Secondly, calculate the error of the svm classifier device under each proper subspace:
E i = | | R true - R i | | 2 2 , i = 1,2,3 - - - ( 21 )
Again, gather in assessment according to each proper subspace SVM fallout predictor on the weight of predicated error structure different subspace SVM fallout predictor:
w i = ( M eva - E i ) Σ k = 1 3 ( M eva - E k ) , i = 1,2,3 - - - ( 22 )
Wherein, M evarepresent completely by a point error of staggering the time;
Finally, according to the SVM fallout predictor that calculates the integrated different subspace of weight on assessment sample set:
S = Σ i = 1 3 w i · S i - - - ( 23 )
Obtain as the SVM fallout predictor after above formula (23) fusion.
Further, in embodiment, in step 5, use the SVM fallout predictor after merging to carry out protein-vitamin bindings bit point prediction to protein to be predicted:
For each amino acid residue in protein to be predicted, produce the feature of amino acid residue in primitive character space according to step 1; Then use respectively three feature selecting algorithm described in step 2 to produce three sub spaces features to the primitive character of amino acid residue; Again three sub spaces features are input to corresponding three the SVM fallout predictor S of step 3 1, S 2and S 3obtain three predicting the outcome of providing with binding vitamin Probability Forms, by the SVM fallout predictor after integrated according to the weighted average method of step 4 these three inputs that predict the outcome, vitaminic probability is bound or do not bound to output amino acid residue; Finally bind judgement using the threshold value T that maximizes Ma Xiusi relative coefficient (matthews correlation coefficient) as judgment standard: the amino acid residue that all binding probability are more than or equal to T is predicted as binding residue; Other amino acid residues are bound the amino acid residue that probability is less than threshold value T and are predicted as unbundling residue, wherein T ∈ [0,1].
From the above technical solution of the present invention shows that, beneficial effect of the present invention is:
1, improve training speed, predetermined speed and precision of prediction: use the subspace integration technology based on feature selecting algorithm, can build proper subspace more closely, effectively solve the phenomenon of the alternative existing between feature, reduce the dimension of feature space, thereby improve training speed, predetermined speed and precision of prediction;
2, the interpretation of lift scheme: used after the integration technology of subspace, to protein and different classes of vitamin bindings bit point prediction problem, the proper subspace of selecting is different, better express the otherness between protein and variety classes vitamin bindings bit point prediction problem, promoted the interpretation of model.
Brief description of the drawings
Fig. 1 is the principle schematic of the protein-vitamin bindings bit point prediction method of an embodiment of the present invention based on subspace fusion.
Embodiment
In order more to understand technology contents of the present invention, especially exemplified by specific embodiment and coordinate appended graphic being described as follows.
As shown in Figure 1, according to preferred embodiment of the present invention, protein-vitamin bindings bit point prediction the method merging based on subspace, first, use PSI-BLAST, PSIPRED to obtain respectively PSSM matrix (information matrix of evolving), the secondary structure probability matrix of protein, and show the binding tendentiousness matrix of the protein generating according to protein-vitamin binding site tendency; Secondly, use moving window and serial combination to build the proper vector of each amino acid residue from PSSM matrix, secondary structure probability matrix and protein-vitamin binding site tendency table; Then, use three feature selecting algorithm of Joint Laplacian Feature Weights Learning (algorithm 1), Fisher Score (algorithm 2) and Laplacian Score (algorithm 3) to build three proper subspaces with complementary characteristic between not mutual exclusion of feature in the same space, different spaces, in every sub spaces, train a SVM fallout predictor; Finally, use weighted average method to use integrated technology to form final forecast model to multiple SVM fallout predictors and carry out protein-vitamin bindings bit point prediction.
So-called binding site, has bound vitaminic amino acid residue exactly.
Shown in Fig. 1, describe the specific implementation of the above steps of the present embodiment in detail.
As optional mode, in described step 1, comprise the following steps for feature extraction and the serial combination of training protein::
Step 1-1, for the protein being formed by l amino acid residue, obtain its position-specific scoring matrices by PSI-BLAST algorithm, this matrix is the matrix that a l capable 20 is listed as, thereby prlmary structure of protein information (information of evolving) is converted to matrix representation:
Wherein: A, C...Y represent 20 seed amino acid residues, p i,jrepresent that i amino acid residue of protein is mutated into the possibility of j amino acid residue of above-mentioned 20 seed amino acid residues (A, C...Y) during evolution;
Then utilize following formula (2) to carry out standardization line by line to the each value in PSSM:
f ( x ) = 1 1 + e - x - - - ( 2 )
PSSM after standardization is suc as formula (3):
Afterwards, re-use the moving window of size for W, extract the eigenmatrix of each amino acid residue:
Finally, above-mentioned eigenmatrix (4) is combined into the proper vector that dimension is 20*W by the mode of row major:
f i = ( p i , 1 normalized , p i , 2 normalized , . . . , p i , 20 W normalized ) T - - - ( 5 )
Step 1-2, for the protein being formed by l amino acid residue, obtain its secondary structure probability matrix by PSIPRED, this matrix is the matrix that a l capable 3 is listed as, shown in (6):
Wherein, C, H...E represent three kinds of secondary structure: coil, helix, the strand of protein, s i, 1the secondary structure that represents i amino acid residue in protein is the probability of coil, s i, 2the secondary structure that represents i amino acid residue in protein is the probability of helix, s i, 3the secondary structure that represents i amino acid residue in protein is the probability of strand;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 3*W, shown in (7):
f i=(s i,1,s i,2,…,p i,3W) T (7)
Step 1-3, for the protein being formed by l amino acid residue, obtain by searching protein-vitamin binding site tendency table the matrix that contains its binding tendentiousness information, this matrix is the matrix that a l capable 1 is listed as, shown in (8):
b 1 . . . b i . . . b l - - - ( 8 )
Wherein, b irepresent that in protein, i amino acid residue bound vitaminic tendentiousness;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 1*W, shown in (9):
f i=(b i,1,b i,2,…,b i,W) T (9)
Step 1-4,3 proper vector serial combination that above-mentioned steps is obtained, obtain the proper vector that length is 20*W+3*W+1*W.
As optional embodiment, in described step 2, the specific implementation that uses described three kinds of feature selecting algorithm to build multiple proper subspaces comprises the following steps:
Feature selecting is carried out in step 2-1, the primitive character space that utilizes Joint Laplacian Feature Weights Learning algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use following formula (10) and formula (11) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jh ij, 1≤i≤M and 1≤j≤M (11)
Wherein, R n × Mrepresent the scale of X matrix, X has M element that has N dimensional feature, N representation feature dimension, and M represents that number of samples is amino acid residue number;
2) the Laplacian matrix H to above-mentioned steps gained m × Mwith diagonal matrix D m × Msolve generalized eigenvalue decomposition problem Hy=λ Dy, obtain eigenvalue of maximum characteristic of correspondence below a 1 vector y (Hy=λ Dy is certain, and to have an eigenwert be 1, and proper vector is y=[1,1 ..., 1] t, and this y is useless for feature selecting, so need eigenwert to be less than 1, proper vector is not y=[1, and 1 ..., 1] t);
3) use the above-mentioned proper vector y trying to achieve, upgrade the weight that every one-dimensional characteristic is corresponding until restrain according to following formula (12):
w i t + 1 ← 2 3 w i t + 1 3 w i t ( 2 Xy + 4 ϵw t ) i ( 2 XX T w t + 4 ϵw t ( w t ) T w t ) i , 1 ≤ i ≤ N - - - ( 12 )
Wherein, w=[w 1, w 2..., w i..., w n] represent each characteristic dimension weight, the transposition of T representing matrix, t represents iterations, ε represents to control lax of neutral element number in w, and (above-mentioned formula (12) is an iterative formula, t represents iteration the t time, carrys out mark w different in different iterations intermediate values with t);
4) at the above-mentioned weight vectors w=[w trying to achieve 1, w 2..., w i..., w n], select all weight component w that are greater than zero icorresponding sample characteristics dimension (w iw=[w 1, w 2..., w i..., w n] in one-component), the output of the proper subspace that finally all selected characteristic dimension is combined into, simultaneously by the number of characteristic dimension in subspace output in the lump;
Feature selecting is carried out in step 2-2, the primitive character space that utilizes Fisher Score algorithm to produce step 1, and it comprises:
1) for the space with c class original sample wherein represent the sample set of i class, representation feature vector, represent classification, M (i)represent the number of samples of i class, aforementioned sample refers to an amino acid residue of protein; Calculate the average of every one-dimensional characteristic of each class data according to formula (13) and formula (14) and variance (it is worth mentioning that: the sample in original sample is to represent a concrete things; Be in protein-vitamin bindings bit point prediction in the present embodiment, a sample just represents an amino acid residue of protein, also: the i.e. element of a sample):
u n ( i ) = 1 M ( i ) Σ j = 1 M ( i ) x jn ( i ) , 1≤n≤N and 1≤i≤c (13)
( σ n ( i ) ) 2 = 1 M ( i ) Σ j = 1 M ( i ) ( x jn ( i ) - u n ( i ) ) 2 , 1≤n≤N and 1≤i≤c (14)
2) all averages that use above-mentioned middle calculating to get and variance each characteristic dimension is calculated to Fisher Score according to formula (15):
H n = Σ i = 1 c M ( i ) ( u n ( i ) - u n ) 2 Σ i = 1 c M ( i ) ( σ n ( i ) ) 2 , 1 ≤ n ≤ N - - - ( 15 )
Wherein, u nrepresent the average of n dimensional characteristics in all data, H nrepresent the Fisher Score value of n characteristic dimension, N characteristic dimension has a Fisher Score value;
Obtain a Fisher Score vector H, H=[H according to formula (15) 1, H 2..., H n... H n];
3) to above-mentioned Fisher Score vector H=[H 1, H 2..., H n... H n] in each value sort from big to small, then select before sample characteristics corresponding to individual Fisher Score value, the proper subspace output that all selected Feature Combinations are become, wherein represent to select to have stayed the number of feature, determined in (as step by step 4 of abovementioned steps 2-1) by step 2-1, exported simultaneously );
Feature selecting is carried out in step 2-3, the primitive character space that utilizes Laplacian Score algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use formula (16) and formula (17) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jhi j, 1≤i≤M and 1≤j≤M (17)
Wherein, R n × Mrepresent the scale of X matrix, be that X has M element that has N dimensional feature, N representation feature dimension, M represents that number of samples is amino acid residue number, σ represents Gaussian parameter, formula (16) is for trying to achieve the distance that two samples are the nuclear space of amino acid residue, and this σ is for controlling the width of nuclear space;
2) use the Laplacian matrix H of above-mentioned structure m × Mwith diagonal matrix D m × M, calculate the Laplacian Score of each characteristic dimension according to formula (18):
L n = Σ i = 1 M Σ j = 1 M ( x in - x jn ) 2 H ij Σ i = 1 M ( x in - x ‾ n ) 2 D ij , 1 ≤ n ≤ N - - - ( 18 )
Wherein, x inrepresent the value of n dimensional characteristics of i sample, represent the average of n dimensional characteristics of all samples; L nrepresent the Laplacian Score value of n characteristic dimension, N characteristic dimension has a Laplacian Score value, finally obtains a Laplacian Score vector L, L=[L according to formula (18) 1, L 2..., L n..., L n];
3) the Laplacian Score vector L=[L above-mentioned calculating being tried to achieve 1, L 2..., L n..., L n] in each value sort from big to small, then select before sample characteristics corresponding to individual Laplacian Score value, the proper subspace output that all selected Feature Combinations are become, wherein represent to select to have stayed the number of feature, determined in (as step by step 4 of abovementioned steps 2-1) by abovementioned steps 2-1, exported simultaneously ).
Because Fisher Score algorithm and Laplacian Score algorithm do not have initiatively to determine the ability of selecting how many intrinsic dimensionalities, so select the ability of intrinsic dimensionality in the present embodiment by the algorithm Autonomous determination of step 2-1.
As optional embodiment, in described step 3, the distribution situation according to aforementioned original sample in each proper subspace, is used respectively the SVC classification algorithm training one sub spaces SVM fallout predictor in LIBSVM; Finally three different SVM fallout predictors are trained at three proper subspaces.
Further, in embodiment, in described step 4, use weighted average method to train the SVM fallout predictor of three different characteristic subspaces that obtain to merge to step 3, it comprises:
Make ω 1and ω 2represent respectively binding site class and unbundling site class, S 1, S 2and S 3represent respectively three SVM fallout predictors under different characteristic subspace, represent assessment sample set, for determining the weight of SVM model corresponding to subspace, the amino acid residue of wherein assessing sample set is known its classification; For each x irepresented sample characteristics, S 1, S 2and S 3will export the vector (s of three 2 dimensions 1,1(x i), s 1,2(x i)) t, (s 2,1(x i), s 2,2(x i)) t(s 3,1(x i), s 3,2(x i)) t, two elements of each 2 dimensional vectors represent respectively x ibelong to ω 1and ω 2degree and two elements and be 1, therefore for assessment sample set can obtain at S respectively 1, S 2and S 3on the matrix that predicts the outcome:
R i = s i , 1 ( x 1 ) s i , 2 ( x 1 ) s i , 1 ( x 2 ) s i , 2 ( x 2 ) . . . . . . s i , 1 ( x M eva ) s i , 2 ( x M eva ) T , i = 1,2,3 - - - ( 19 )
First, according to true category construction objective result matrix:
R true = p 1 1 - p 1 p 2 1 - p 2 . . . . . . p i 1 - p i . . . . . . p M eva 1 - p M eva T , If y i1p i=1, otherwise p i=0 (20)
Secondly, calculate the error of the svm classifier device under each proper subspace:
E i = | | R true - R i | | 2 2 , i = 1,2,3 - - - ( 21 )
Again, gather in assessment according to each proper subspace SVM fallout predictor on the weight of predicated error structure different subspace SVM fallout predictor:
w i = ( M eva - E i ) Σ k = 1 3 ( M eva - E k ) , i = 1,2,3 - - - ( 22 )
Wherein, M evarepresent completely by a point error of staggering the time;
Finally, according to the SVM fallout predictor that calculates the integrated different subspace of weight on assessment sample set:
S = Σ i = 1 3 w i · S i - - - ( 23 )
Obtain as the SVM fallout predictor after above formula (23) fusion.
In the present embodiment, above-mentioned assessment sample set and protein to be predicted are different, are two different set; The amino acid residue of protein to be predicted is not know classification, and assessment sample set is known classification, but use in the present embodiment its (assessing sample set) to determine the weight of SVM model corresponding to subspace, on its practical significance, still belong to the data part for building model.
As optional embodiment, in step 5, use the SVM fallout predictor after merging to carry out protein-vitamin bindings bit point prediction to protein to be predicted:
For each amino acid residue in protein to be predicted, produce the feature of amino acid residue in primitive character space according to step 1; Then use respectively three feature selecting algorithm described in step 2 to produce three sub spaces features to the primitive character of amino acid residue; Again three sub spaces features are input to corresponding three the SVM fallout predictor S of step 3 1, S 2and S 3obtain three predicting the outcome of providing with binding vitamin Probability Forms, by the SVM fallout predictor after integrated according to the weighted average method of step 4 these three inputs that predict the outcome, vitaminic probability is bound or do not bound to output amino acid residue; Finally bind judgement using the threshold value T that maximizes Ma Xiusi relative coefficient (matthews correlation coefficient) as judgment standard: the amino acid residue that all binding probability are more than or equal to T is predicted as binding residue; Other amino acid residues are bound the amino acid residue that probability is less than threshold value T and are predicted as unbundling residue, wherein T ∈ [0,1].
By an above example technique scheme of the present invention, the Forecasting Methodology proposing in this embodiment, its evolution information based on protein, secondary structure information and binding tendentiousness information, adopt subspace integration technology and support vector machine (SVM) forecasting techniques based on multiple feature selecting algorithm to carry out the prediction in protein-vitamin site, use PSI-BLAST algorithm (A.A.Schaffer et al., " Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, " Nucleic Acids Res., vol.29, pp.2994 – 3005, 2001) generate the position-specific scoring matrices of evolution information that represents protein, use PSIPRED algorithm (D.T.Jones, " Protein secondary structure prediction based on position-specific scoring matrices, " J Mol Biol, vol.292, no.2, pp.195-202, Sep17,1999) extract the secondary structure information of protein, use and generate binding tendentiousness algorithm (D.Yu, J.Hu, J.Yang et al., " Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering, " IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no.4, pp.994-1008,2013) generate the binding tendentiousness information of protein.Use multiple feature selecting algorithm (H.Yan, and J.Yang, " Joint Laplacian feature weights learning, " Pattern Recognition, vol.47, no.3, pp.1425-1432,2014; Bishop, C. " Neural Networks for Pattern Recognition, " Clarendon Press:Oxford, 1995.) construct the subspace of containing complementary information; Use average weighted integrated technology to carry out multi predictors fusion, finally use Threshold sementation based on soft classification to bind the judgement in site.Compared with current only VitaPred fallout predictor, there is higher precision of prediction and better interpretation.
Bind site as example taking the vitamin of not distinguishing kind of predicted protein matter 2ZZA_A below, predict the outcome as shown in table 1.
The amino acid sequence of protein 2ZZA_A is as follows:
>2ZZA_A
VIVSMIAALANNRVIGLDNKMPWHLPAELQLFKRATLGKPIVMGRNTFESIGRPLPGRL NIVLSRQTDYQPEGVTVVATLEDAVVAAGDVEELMIIGGATIYNQCLAAADRLYLTHIELTTE GDTWFPDYEQYNWQEIEHESYAADDKNPHNYRFSLLERVX
This protein has 19 vitamin binding sites.
First describe and use PSI-BLAST algorithm, PSIPRED algorithm and protein-vitamin binding site tendency table to extract the primitive character of each amino acid residue in protein 2ZZA_A according to step 1; Secondly Joint Laplacian Feature Weights Learning (algorithm 1), Fisher Score (algorithm 2) and three feature selecting algorithm of Laplacian Score (algorithm 3) described in use step 2 are carried out subspace feature selection to the primitive character of each amino acid residue in protein 2ZZA_A, form three sub spaces features, then three sub spaces features are input to corresponding three the SVM fallout predictor S of step 3 1, S 2and S 3obtain three predicting the outcome of providing with binding vitamin Probability Forms, by in the SVM fallout predictor after integrated according to the weighted average method of step 4 these three inputs that predict the outcome, obtain final protein 2ZZA_A and the prediction case of vitaminic binding, finally predict the outcome as shown in table 1:
Table 1 the present embodiment method and the predict the outcome contrast of current only protein-vitamin bindings bit point prediction device to 2ZZA_A
As can be seen from Table 1, use the Forecasting Methodology of the present embodiment, 15 vitamin binding sites of correct Prediction number, 0 false positive vitamin binding site, 4 false negative vitamin binding sites, predict the outcome and are obviously better than only protein-vitamin bindings bit point prediction device in currently available technology.
Although the present invention discloses as above with preferred embodiment, so it is not in order to limit the present invention.Persond having ordinary knowledge in the technical field of the present invention, without departing from the spirit and scope of the present invention, when being used for a variety of modifications and variations.Therefore, protection scope of the present invention is when being as the criterion depending on claims person of defining.

Claims (6)

1. protein-vitamin bindings bit point prediction the method merging based on subspace, is characterized in that, comprises the following steps:
Step 1, feature extraction and Feature Combination, utilize respectively PSI-BLAST algorithm, PSIPRED algorithm to extract evolution information characteristics and the secondary structure information characteristics of protein, and according to the binding tendentiousness information characteristics of protein-vitamin binding site tendency table extraction protein, aforementioned three kinds of features composition primitive character space; Then using moving window and serial combination mode that the amino acid residue in protein sequence is converted to vector form represents;
Step 2, use characteristic selection algorithm are Joint Laplacian Feature Weights Learning algorithm, Fisher Score algorithm and Laplacian Score algorithm, respectively repeatedly feature selecting are carried out in primitive character space; The character subset that each feature selecting obtains forms a proper subspace, thereby builds multiple proper subspaces;
Step 3, each proper subspace to step 2 gained, train a svm classifier device;
Step 4: use average weighted Multiple Classifier Fusion mode to training complete multiple svm classifier devices to merge; And
Step 5, SVM fallout predictor based on after merging are treated predicted protein matter and are carried out protein-vitamin bindings bit point prediction.
2. protein-vitamin bindings bit point prediction the method merging based on subspace according to claim 1, is characterized in that, in described step 1, comprises the following steps for feature extraction and the serial combination of training protein:
Step 1-1, for the protein being formed by l amino acid residue, obtain its position-specific scoring matrices by PSI-BLAST algorithm, this matrix is the matrix that a l capable 20 is listed as, thereby prlmary structure of protein information is converted to matrix representation:
Wherein: A, C...Y represent 20 seed amino acid residues, p i,jrepresent that i amino acid residue of protein is mutated into the possibility of j amino acid residue of 20 seed amino acid residues during evolution;
Then utilize following formula (2) to carry out standardization line by line to the each value in PSSM:
f ( x ) = 1 1 + e - x - - - ( 2 )
PSSM after standardization is suc as formula (3):
Afterwards, re-use the moving window of size for W, extract the eigenmatrix of each amino acid residue:
Finally, above-mentioned eigenmatrix (4) is combined into the proper vector that dimension is 20*W by the mode of row major:
f i = ( p i , 1 normalized , p i , 2 normalized , . . . , p i , 20 W normalized ) T - - - ( 5 )
Step 1-2, for the protein being formed by l amino acid residue, obtain its secondary structure probability matrix by PSIPRED, this matrix is the matrix that a l capable 3 is listed as, shown in (6):
Wherein, C, H...E represent three kinds of secondary structure: coil, helix, the strand of protein, s i, 1the secondary structure that represents i amino acid residue in protein is the probability of coil, s i, 2the secondary structure that represents i amino acid residue in protein is the probability of helix, s i, 3the secondary structure that represents i amino acid residue in protein is the probability of strand;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 3*W, shown in (7):
f i=(s i,1,s i,2,…,p i,3W) T (7)
Step 1-3, for the protein being formed by l amino acid residue, obtain by searching protein-vitamin binding site tendency table the matrix that contains its binding tendentiousness information, this matrix is the matrix that a l capable 1 is listed as, shown in (8):
b 1 . . . b i . . . b l - - - ( 8 )
Wherein, b irepresent that in protein, i amino acid residue bound vitaminic tendentiousness;
Then, utilizing the moving window extraction of above-mentioned steps 1-1 and combining by the mode of row major the dimension that obtains each amino acid residue is the proper vector of 1*W, shown in (9):
f i=(b i,1,b i,2,…,b i,W) T (9)
Step 1-4,3 proper vector serial combination that above-mentioned steps is obtained, obtain the proper vector that length is 20*W+3*W+1*W.
3. protein-vitamin bindings bit point prediction the method merging based on subspace according to claim 1, is characterized in that, in described step 2, the specific implementation that uses described three kinds of feature selecting algorithm to build multiple proper subspaces comprises the following steps:
Feature selecting is carried out in step 2-1, the primitive character space that utilizes Joint Laplacian Feature Weights Learning algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use following formula (10) and formula (11) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jh ij, 1≤i≤M and 1≤j≤M (11)
Wherein, R n × Mrepresent the scale of X matrix, X has M element that has N dimensional feature, N representation feature dimension, and M represents that number of samples is amino acid residue number;
2) the Laplacian matrix H to above-mentioned steps gained m × Mwith diagonal matrix D m × Msolve generalized eigenvalue decomposition problem Hy=λ Dy, obtain an eigenvalue of maximum characteristic of correspondence vector y below 1;
3) use the above-mentioned proper vector y trying to achieve, upgrade the weight that every one-dimensional characteristic is corresponding until restrain according to following formula (12):
w i t + 1 ← 2 3 w i t + 1 3 w i t ( 2 Xy + 4 ϵw t ) i ( 2 XX T w t + 4 ϵw t ( w t ) T w t ) i , 1 ≤ i ≤ N - - - ( 12 )
Wherein, w=[w 1, w 2..., w i..., w n] represent each characteristic dimension weight, the transposition of T representing matrix, t represents iterations, ε represents to control the lax item of neutral element number in w;
4) at the above-mentioned weight vectors w=[w trying to achieve 1, w 2..., w i..., w n], select all weight component w that are greater than zero icorresponding sample characteristics dimension, the proper subspace finally all selected characteristic dimension being combined into output, simultaneously by the number of characteristic dimension in subspace output in the lump;
Feature selecting is carried out in step 2-2, the primitive character space that utilizes Fisher Score algorithm to produce step 1, and it comprises:
1) for the space with c class original sample wherein represent the sample set of i class, representation feature vector, represent classification, M (i)represent the number of samples of i class, aforementioned sample refers to an amino acid residue of protein; Calculate the average of every one-dimensional characteristic of each class data according to formula (13) and formula (14) and variance
u n ( i ) = 1 M ( i ) Σ j = 1 M ( i ) x jn ( i ) , 1≤n≤N and 1≤i≤c (13)
( σ n ( i ) ) 2 = 1 M ( i ) Σ j = 1 M ( i ) ( x jn ( i ) - u n ( i ) ) 2 , 1≤n≤N and 1≤i≤c (14)
2) all averages that use above-mentioned middle calculating to get and variance each characteristic dimension is calculated to Fisher Score according to formula (15):
H n = Σ i = 1 c M ( i ) ( u n ( i ) - u n ) 2 Σ i = 1 c M ( i ) ( σ n ( i ) ) 2 , 1 ≤ n ≤ N - - - ( 15 )
Wherein, u nrepresent the average of n dimensional characteristics in all data, H nrepresent the Fisher Score value of n characteristic dimension, N characteristic dimension has a Fisher Score value;
Obtain a Fisher Score vector H, H=[H according to formula (15) 1, H 2..., H n... H n];
3) to above-mentioned Fisher Score vector H=[H 1, H 2..., H n... H n] in each value sort from big to small, then select before sample characteristics corresponding to individual Fisher Score value, the proper subspace output that all selected Feature Combinations are become, wherein the number that represents to select to have stayed feature, by step, 2-1 determines;
Feature selecting is carried out in step 2-3, the primitive character space that utilizes Laplacian Score algorithm to produce step 1, and it comprises:
1) for the data X=[x in primitive character space 1, x 2..., x m] ∈ R n × M, use formula (16) and formula (17) structure Laplacian matrix H m × Mwith diagonal matrix D m × Mas follows:
D ii=∑ jh ij, 1≤i≤M and 1≤j≤M (17)
Wherein, R n × Mrepresent the scale of X matrix, be that X has M element that has N dimensional feature, N representation feature dimension, M represents that number of samples is amino acid residue number, σ represents Gaussian parameter, formula (16) is for trying to achieve the distance that two samples are the nuclear space of amino acid residue, and this σ is for controlling the width of nuclear space;
2) use the Laplacian matrix H of above-mentioned structure m × Mwith diagonal matrix D m × M, calculate the Laplacian Score of each characteristic dimension according to formula (18):
L n = Σ i = 1 M Σ j = 1 M ( x in - x jn ) 2 H ij Σ i = 1 M ( x in - x ‾ n ) 2 D ij , 1 ≤ n ≤ N - - - ( 18 )
Wherein, x inrepresent the value of n dimensional characteristics of i sample, represent the average of n dimensional characteristics of all samples; L nrepresent the Laplacian Score value of n characteristic dimension, N characteristic dimension has a Laplacian Score value, finally obtains a Laplacian Score vector L, L=[L according to formula (18) 1, L 2..., L n..., L n];
3) the Laplacian Score vector L=[L above-mentioned calculating being tried to achieve 1, L 2..., L n..., L n] in each value sort from big to small, then select before sample characteristics corresponding to individual Laplacian Score value, the proper subspace output that all selected Feature Combinations are become, wherein the number that represents to select to have stayed feature, by abovementioned steps, 2-1 determines.
4. protein-vitamin bindings bit point prediction the method merging based on subspace according to claim 1, it is characterized in that, in described step 3, distribution situation according to aforementioned original sample in each proper subspace, is used respectively the SVC classification algorithm training one sub spaces SVM fallout predictor in LIBSVM; Finally three different SVM fallout predictors are trained at three proper subspaces.
5. protein-vitamin bindings bit point prediction the method merging based on subspace according to claim 1, it is characterized in that, in described step 4, use weighted average method to train the SVM fallout predictor of three different characteristic subspaces that obtain to merge to step 3, it comprises:
Make ω 1and ω 2represent respectively binding site class and unbundling site class, S 1, S 2and S 3represent respectively three SVM fallout predictors under different characteristic subspace, represent assessment sample set, for determining the weight of SVM model corresponding to subspace, the amino acid residue of wherein assessing sample set is known its classification; For each x irepresented sample characteristics, S 1, S 2and S 3will export the vector (s of three 2 dimensions 1,1(x i), s 1,2(x i)) t, (s 2,1(x i), s 2,2(x i)) t(s 3,1(x i), s 3,2(x i)) t, two elements of each 2 dimensional vectors represent respectively x ibelong to ω 1and ω 2degree and two elements and be 1, therefore for assessment sample set can obtain at S respectively 1, S 2and S 3on the matrix that predicts the outcome:
R i = s i , 1 ( x 1 ) s i , 2 ( x 1 ) s i , 1 ( x 2 ) s i , 2 ( x 2 ) . . . . . . s i , 1 ( x M eva ) s i , 2 ( x M eva ) T , i = 1,2,3 - - - ( 19 )
First, according to true category construction objective result matrix:
R true = p 1 1 - p 1 p 2 1 - p 2 . . . . . . p i 1 - p i . . . . . . p M eva 1 - p M eva T , If y i1p i=1, otherwise p i=0 (20)
Secondly, calculate the error of the svm classifier device under each proper subspace:
E i = | | R true - R i | | 2 2 , i = 1,2,3 - - - ( 21 )
Again, gather in assessment according to each proper subspace SVM fallout predictor on the weight of predicated error structure different subspace SVM fallout predictor:
w i = ( M eva - E i ) Σ k = 1 3 ( M eva - E k ) , i = 1,2,3 - - - ( 22 )
Wherein, M evarepresent completely by a point error of staggering the time;
Finally, according to the SVM fallout predictor that calculates the integrated different subspace of weight on assessment sample set:
S = Σ i = 1 3 w i · S i - - - ( 23 )
Obtain as the SVM fallout predictor after above formula (23) fusion.
6. protein-vitamin bindings bit point prediction the method merging based on subspace according to claim 1, is characterized in that, in step 5, uses the SVM fallout predictor after merging to carry out protein-vitamin bindings bit point prediction to protein to be predicted:
For each amino acid residue in protein to be predicted, produce the feature of amino acid residue in primitive character space according to step 1; Then use respectively three feature selecting algorithm described in step 2 to produce three sub spaces features to the primitive character of amino acid residue; Again three sub spaces features are input to corresponding three the SVM fallout predictor S of step 3 1, S 2and S 3obtain three predicting the outcome of providing with binding vitamin Probability Forms, by the SVM fallout predictor after integrated according to the weighted average method of step 4 these three inputs that predict the outcome, vitaminic probability is bound or do not bound to output amino acid residue; Finally bind judgement using the threshold value T that maximizes Ma Xiusi relative coefficient as judgment standard: the amino acid residue that all binding probability are more than or equal to T is predicted as binding residue; Other amino acid residues are bound the amino acid residue that probability is less than threshold value T and are predicted as unbundling residue, wherein T ∈ [0,1].
CN201410164632.1A 2014-04-22 2014-04-22 The protein vitamin binding site estimation method being merged based on subspace Expired - Fee Related CN103955628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410164632.1A CN103955628B (en) 2014-04-22 2014-04-22 The protein vitamin binding site estimation method being merged based on subspace

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410164632.1A CN103955628B (en) 2014-04-22 2014-04-22 The protein vitamin binding site estimation method being merged based on subspace

Publications (2)

Publication Number Publication Date
CN103955628A true CN103955628A (en) 2014-07-30
CN103955628B CN103955628B (en) 2017-03-01

Family

ID=51332903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410164632.1A Expired - Fee Related CN103955628B (en) 2014-04-22 2014-04-22 The protein vitamin binding site estimation method being merged based on subspace

Country Status (1)

Country Link
CN (1) CN103955628B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636635A (en) * 2015-01-29 2015-05-20 南京理工大学 Protein crystallization predicting method based on two-layer SVM learning mechanism
CN104992079A (en) * 2015-06-29 2015-10-21 南京理工大学 Sampling learning based protein-ligand binding site prediction method
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN107169312A (en) * 2017-05-27 2017-09-15 南开大学 A kind of Forecasting Methodology of the natural unordered protein of low complex degree
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information
CN107463799A (en) * 2017-08-23 2017-12-12 福建师范大学福清分校 Interaction fusion feature represents the DBP recognition methods with selective ensemble
CN107609352A (en) * 2017-11-02 2018-01-19 中国科学院新疆理化技术研究所 A kind of Forecasting Methodology of protein self-interaction
CN108229102A (en) * 2017-12-19 2018-06-29 东软集团股份有限公司 Amino acid sequence feature extracting method, device, storage medium and electronic equipment
CN108563922A (en) * 2018-04-04 2018-09-21 中南大学 Predict method, system and the storage medium of protein RNA conjugate hot spots
CN108681659A (en) * 2018-04-02 2018-10-19 首都师范大学 The method for predicting protein complex based on sample data
CN111091865A (en) * 2019-12-20 2020-05-01 东软集团股份有限公司 Method, device, equipment and storage medium for generating MoRFs prediction model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1184455A1 (en) * 1999-06-08 2002-03-06 Japan Science and Technology Corporation Method for predicting binding site structure of gene regulator and apparatus therefor
CN102760210A (en) * 2012-06-19 2012-10-31 南京理工大学常熟研究院有限公司 Adenosine triphosphate binding site predicting method for protein
KR101278211B1 (en) * 2011-09-30 2013-07-01 인하대학교 산학협력단 Feature vector-based method for removing redundancy in a training dataset
CN103500292A (en) * 2013-09-27 2014-01-08 南京理工大学 Ligand specificity protein-ligand binding area forecasting method
CN103617203A (en) * 2013-11-15 2014-03-05 南京理工大学 Protein-ligand binding site predicting method based on inquiry drive

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1184455A1 (en) * 1999-06-08 2002-03-06 Japan Science and Technology Corporation Method for predicting binding site structure of gene regulator and apparatus therefor
KR101278211B1 (en) * 2011-09-30 2013-07-01 인하대학교 산학협력단 Feature vector-based method for removing redundancy in a training dataset
CN102760210A (en) * 2012-06-19 2012-10-31 南京理工大学常熟研究院有限公司 Adenosine triphosphate binding site predicting method for protein
CN103500292A (en) * 2013-09-27 2014-01-08 南京理工大学 Ligand specificity protein-ligand binding area forecasting method
CN103617203A (en) * 2013-11-15 2014-03-05 南京理工大学 Protein-ligand binding site predicting method based on inquiry drive

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DONG-JUN YU 等: "Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering", 《IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY & BIOINFORMATICS》 *
DONG-JUN YU 等: "TargetATPsite: A Template-free Method for ATP-Binding Sites Prediction with Residue Evolution Image Sparse Representation and Classifier Ensemble", 《JOURNAL OF COMPUTATIONAL CHEMISTRY》 *
HUI YAN 等: "Joint Laplacian feature weights learning", 《PATTERN RECOGNITION》 *
YANAN ZHANG 等: "Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features", 《BMC BIOINFORMATICS》 *
陈俊芬: "多神经网络分类器的融合方法的比较", 《大众科技》 *
陈睿 等: "基于约束得分的动态集成选择算法", 《计算机应用研究》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636635A (en) * 2015-01-29 2015-05-20 南京理工大学 Protein crystallization predicting method based on two-layer SVM learning mechanism
CN104636635B (en) * 2015-01-29 2018-06-12 南京理工大学 Crystallization of protein Forecasting Methodology based on two layers of SVM study mechanism
CN104992079B (en) * 2015-06-29 2018-07-06 南京理工大学 Protein-ligand based on sampling study binds site estimation method
CN104992079A (en) * 2015-06-29 2015-10-21 南京理工大学 Sampling learning based protein-ligand binding site prediction method
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN107169312A (en) * 2017-05-27 2017-09-15 南开大学 A kind of Forecasting Methodology of the natural unordered protein of low complex degree
CN107169312B (en) * 2017-05-27 2020-05-08 南开大学 Low-complexity natural disordered protein prediction method
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information
CN107463799B (en) * 2017-08-23 2020-02-14 福建师范大学福清分校 Method for identifying DNA binding protein by interactive fusion feature representation and selective integration
CN107463799A (en) * 2017-08-23 2017-12-12 福建师范大学福清分校 Interaction fusion feature represents the DBP recognition methods with selective ensemble
CN107609352A (en) * 2017-11-02 2018-01-19 中国科学院新疆理化技术研究所 A kind of Forecasting Methodology of protein self-interaction
CN107609352B (en) * 2017-11-02 2020-07-28 中国科学院新疆理化技术研究所 Prediction method of protein self-interaction
CN108229102A (en) * 2017-12-19 2018-06-29 东软集团股份有限公司 Amino acid sequence feature extracting method, device, storage medium and electronic equipment
CN108229102B (en) * 2017-12-19 2020-06-12 东软集团股份有限公司 Amino acid sequence feature extraction method, device, storage medium and electronic equipment
CN108681659A (en) * 2018-04-02 2018-10-19 首都师范大学 The method for predicting protein complex based on sample data
CN108681659B (en) * 2018-04-02 2022-04-05 首都师范大学 Method for predicting protein complex based on sample data
CN108563922A (en) * 2018-04-04 2018-09-21 中南大学 Predict method, system and the storage medium of protein RNA conjugate hot spots
CN111091865A (en) * 2019-12-20 2020-05-01 东软集团股份有限公司 Method, device, equipment and storage medium for generating MoRFs prediction model
CN111091865B (en) * 2019-12-20 2023-04-07 东软集团股份有限公司 Method, device, equipment and storage medium for generating MoRFs prediction model

Also Published As

Publication number Publication date
CN103955628B (en) 2017-03-01

Similar Documents

Publication Publication Date Title
CN103955628A (en) Subspace fusion-based protein-vitamin binding location point predicting method
CN102521656B (en) Integrated transfer learning method for classification of unbalance samples
CN105956560B (en) A kind of model recognizing method based on the multiple dimensioned depth convolution feature of pondization
CN103093235B (en) A kind of Handwritten Numeral Recognition Method based on improving distance core principle component analysis
CN104077499A (en) Supervised up-sampling learning based protein-nucleotide binding positioning point prediction method
Li et al. Vessel traffic flow forecasting by RSVR with chaotic cloud simulated annealing genetic algorithm and KPCA
CN103116762A (en) Image classification method based on self-modulated dictionary learning
CN114169442B (en) Remote sensing image small sample scene classification method based on double prototype network
CN103679185A (en) Convolutional neural network classifier system as well as training method, classifying method and application thereof
CN104966105A (en) Robust machine error retrieving method and system
CN109508740B (en) Object hardness identification method based on Gaussian mixed noise production confrontation network
CN108985360A (en) Hyperspectral classification method based on expanding morphology and Active Learning
CN105808975A (en) Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN103258210A (en) High-definition image classification method based on dictionary learning
CN114170332A (en) Image recognition model compression method based on anti-distillation technology
CN103971136A (en) Large-scale data-oriented parallel structured support vector machine classification method
CN110457677A (en) Entity-relationship recognition method and device, storage medium, computer equipment
EP4227858A1 (en) Method for determining neural network structure and apparatus thereof
CN104751175A (en) Multi-label scene classification method of SAR (Synthetic Aperture Radar) image based on incremental support vector machine
CN107330448A (en) A kind of combination learning method based on mark covariance and multiple labeling classification
CN103617203A (en) Protein-ligand binding site predicting method based on inquiry drive
CN101964061B (en) Binary kernel function support vector machine-based vehicle type recognition method
CN108985161B (en) Low-rank sparse representation image feature learning method based on Laplace regularization
CN107273714A (en) The ATP binding site estimation methods of conjugated protein sequence and structural information
CN114821340A (en) Land utilization classification method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Hu Jun

Inventor after: Wu Xuan

Inventor after: He Xue

Inventor after: Li Yang

Inventor after: Shen Hongbin

Inventor after: Yang Jingyu

Inventor before: Hu Jun

Inventor before: Wu Dongjun

Inventor before: He Xue

Inventor before: Li Yang

Inventor before: Shen Hongbin

Inventor before: Yang Jingyu

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170301

Termination date: 20200422