CN105046106A - Protein subcellular localization and prediction method realized by using nearest-neighbor retrieval - Google Patents
Protein subcellular localization and prediction method realized by using nearest-neighbor retrieval Download PDFInfo
- Publication number
- CN105046106A CN105046106A CN201510411973.9A CN201510411973A CN105046106A CN 105046106 A CN105046106 A CN 105046106A CN 201510411973 A CN201510411973 A CN 201510411973A CN 105046106 A CN105046106 A CN 105046106A
- Authority
- CN
- China
- Prior art keywords
- vector
- sequence
- aac
- protein
- protein sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
A protein subcellular localization and prediction method realized by using nearest-neighbor retrieval comprises the following steps of: (1), taking AAC characteristic vectors as characteristics of protein sequences and storing the AAC characteristic vector of each protein sequence in a training set to a plurality of hash tables with an LSH (Locality Sensitive Hashing) method; (2), during prediction, calculating a corresponding hash value of the AAC characteristic vector of a target sequence in each hash table with the LSH method, and obtaining a vector set of similar sequences; and (3), selecting Q vectors closest to a Euclidean distance of the AAC characteristic vector of the target sequence from the vector set of the similar sequences, calculating expected protein sequence distances between the AAC characteristic vector of the target sequence and the Q vectors with a global alignment dynamic programming method, and taking a corresponding interval of protein with a sequence having a longest expected distance from the target sequence in the Q vectors as a prediction interval.
Description
Technical field
The invention belongs to field of bioinformatics, especially a kind of Prediction of Protein Subcellular Location method using machine learning techniques to realize, the Prediction of Protein Subcellular Location method of specifically a kind of nearest _neighbor retrieval realization.
Background technology
Proteins subcellular location refers to that certain albumen or certain gene expression product are at intracellular concrete Present site, namely predict the subcellular location at its place according to given protein sequence.The Subcellular Localization of protein and its biological function closely related.The knowledge position of albuminous cell is at biology, and cell biology, pharmacology, plays vital effect in medical science.Although the Subcellular Localization of protein is determined by experiment, consuming time and expensive.Along with the increase of genomic data of order-checking, the Subcellular Localization method for predicted protein matter becomes more and more important, needs robotization and instrument accurately.In recent years some effective location prediction methods have been there are, study from independent sorter to ensemble machine learning, common independent classifier algorithm comprises: support vector machine, neural network, hidden Markov model, bayes method, K-arest neighbors etc. multiple Weak Classifier combines by integrated study, build a strong integrated classifier, model performance can be made to obtain and improve.Single classifier and integrated classifier are constantly attempted being used in subcellular fraction prediction location by people, accuracy rate has been difficult to improve, and these method major parts all rely on the model training process of more complicated, unless invented new method or feature, otherwise accuracy rate is difficult to be improved again.
Summary of the invention
The object of the invention is the problem for proteins subcellular location, propose a kind of Prediction of Protein Subcellular Location method that nearest _neighbor retrieval realizes.Training set sequence signature vector, using simple AAC vector as the feature of protein sequence, leaves in multiple Hash table with LSH algorithm by the method.During prediction, calculate target sequence AAC proper vector cryptographic hash corresponding in each Hash table by LSH method, obtain the set of similar sequences vector.Again from the similar collection obtained, choose from Q nearest vector of object vector Euclidean distance.By protein sequence desired distance between overall comparison dynamic programming compute vector, the corresponding interval of the sequence albumen the highest with target sequence desired distance is forecast interval.
Technical scheme of the present invention is:
The Prediction of Protein Subcellular Location method that nearest _neighbor retrieval realizes, the method comprises the following steps:
(1), using AAC proper vector as the feature of protein sequence, by LSH method, the AAC proper vector of each protein sequence in training set is left in multiple Hash table;
(2), prediction time, calculate target sequence AAC proper vector cryptographic hash corresponding in each Hash table by LSH method, obtain the set of similar sequences vector;
(3), choose from the set of the similar sequences vector obtained from Q nearest vector of target sequence AAC proper vector Euclidean distance, with overall comparison dynamic programming calculate target sequence AAC proper vector and aforementioned Q vector vector between protein sequence desired distance, using corresponding for sequence albumen the highest with target sequence desired distance in Q vector interval as forecast interval.
Step of the present invention (1) specifically comprises the following steps:
(A) the AAC proper vector of protein sequence, is extracted:
If protein sequence P is:
P=R
1r
2r
3r
t(1) wherein: t is the length of protein sequence and the number of amino acid residue, R
1for first amino acid residue in sequence word P, R
2be second amino acid residue, by that analogy, R
tbe t amino acid residue;
AAC feature extraction: then the amino acid composition information of protein sequence P and AAC proper vector are:
v=[f
1,f
2,…,f
d](2)
Wherein f
1f
2" f
20adopt following equations:
Wherein, f
u(u=1,2 ..., d) be each amino acid whose frequency of occurrences, d=20, t is the length of a protein sequence, and i represents the numbering of amino acid residue, and A (u) is amino acid residue corresponding to sequence number u; (B) Hash table, is built:
For the protein sequence of the n in training set, the AAC proper vector that the d of each protein sequence ties up is left in L Hash table, for each vector, by LSH method, put into the bucket of key assignments corresponding to L Hash table respectively.
Step of the present invention (B) specifically comprises the following steps:
(B-1), for the protein sequence of the n in training set, be the AAC proper vector of d by the dimension of each protein sequence, by formula (4), the d in v vector is expanded C and doubly round, the coordinate being converted to each vector is the vector of positive integer:
v′=[C×v](4)
Wherein: [] represents rounding operation;
(B-2), d vector is done following conversion: set r as coordinate, then g (r)=000 of vector v ... 0111 ... 1, wherein left end is 0 entirely, right-hand member be entirely 1,1 number be the size of the value of r;
Adopt operational symbol | connect two adjacent coordinates, so vector v ' changed by F (v '): v "=F (v ')=g (f1) | g (f2) | g (f3) | ... | g (fd);
(B-3), from the integer of 0 to Cd-1, random selecting k is: n
1, n
2, n
3..., n
kif, h (the n-th coordinate in v ", n) be v ", then v " '=G (v ")=h (v ", n
1) h (v ", n
2) ... h (v ", n
k); (v ") is just a hash value of AAC proper vector v to G;
(B-4), for the protein sequence of the n in training set, all obtain n hash value according to step (B-3), set up a hash table;
(B-5), in order to improve similar collision rate, setting up L by (B-3)-(B-4) step and opening hash table.
Step of the present invention (2) specifically comprises the following steps: the AAC proper vector T extracting target protein sequence, calculates AAC proper vector T cryptographic hash corresponding in each Hash table: J by LSH method
1, J
2, " J
l, extract each hash show in vector corresponding to cryptographic hash, obtain the set of similar sequences vector; Again from the set obtained, choosing from Q nearest vector of vector T Euclidean, with the protein sequence desired distance M that overall comparison dynamic programming compute vector T and Q vector is corresponding, is forecast interval between the sequence protein white area that M is the highest.
Overall comparison dynamic programming computing method of the present invention are: establish two sequence a and b, and length is x and y, and between these two sequences, desired distance is M (a
x, b
y), by the distance M (a of front j position in i position front in evaluation sequence a and sequence b
i, b
j), i ∈ [1, x], j ∈ [1, y], recursively obtain distance M (a
x, b
y).
Recurrence comparison of the present invention is divided into some steps, by span i ∈ [1, x], has three kinds of events when j ∈ [1, y] performs x × y each step increase position:
From the vertical movement of unit (i-1, j) to (i, j), be equivalent in b sequence, insert a room and similar sequences is extended, distance value subtracts 2;
Move from the diagonal line of unit (i-1, j-1) to (i, j), be equivalent to increase alphabetical a
iand b
jsimilar sequences is extended, and letter is identical, and distance value increases 1, and letter is different, and distance value subtracts 1;
From unit (i, j-1) moving horizontally to (i, j), be equivalent in sequence b, insert a room and similar sequences is extended, distance value subtracts 2;
The distance that the distance of unit (i, j) regards three adjacent cells as adds the reckling after respective weights, namely
Wherein, max refers to get the best result in three kinds of possibility scores, M (a
0, b
0)=0, S (i, j) refers to i-th letter and jth alphabetical comparing, and is all 1 mutually, is not all-1.
Beneficial effect of the present invention:
The present invention propose a kind of approximate KNN based on LSH search and overall comparison dynamic programming method protein region between location prediction model, this forecast model does not rely on complicated sequence signature, and Model suitability is strong, even if adjusting training collection sequential element, the hash as the LSH of Prediction Parameters shows also without the need to all recalculating.Forecast model obtains higher overall accuracy in the jackknife inspection of benchmark dataset, and this Forecasting Methodology can obtain predicting the outcome of target sequence fast and effectively.
Accompanying drawing explanation
Fig. 1 Hash shows the MAP curve map of quantity experiment
Fig. 2 Hash shows the MRR curve map of quantity experiment
The MAP curve map of Fig. 3 Hash table figure place experiment
The MRR curve map of Fig. 4 Hash table figure place experiment
Embodiment
Below in conjunction with drawings and Examples, the present invention is further illustrated.
Choosing of 1 test data set
Be described for the data set comprising 317 apoptin sequences obtained from SWISS-PROT database.Article 317, protein sequence, be distributed in 6 intervals, wherein cytoplasm protein (Cytoplasmicproteins) 112, memebrane protein (Membraneproteins) 55, mitochondrial protein (Mitochondrialproteins) 34, secretory protein (Secretedproteins) 17, Nuclear extract (Nuclearproteins) 52,47, endoplasmic reticulum albumen (Endoplasmicreticulumproteins).
2 experimental evaluation method and indexs
Common prediction and evaluation has three kinds of methods: self-compatibility inspection (Resubstitution), K roll over crosscheck (K-foldcrossvalidation) and jackknife (Jackknife).For self-compatibility inspection, test set comprises sequence to be predicted, and can predicting context of methods, to be detected as power be 100%.Roll over crosscheck with K to compare, jackknife inspection uses the predictive mode of one-to-many, and it is considered to more objective and strict verification method in statistics, predicts the outcome to verify with jackknife in implementation step.
Experiment uses susceptibility, specificity, related coefficient and total accuracy rate three evaluation indexes, susceptibility (SN
i), specificity (SP
i), related coefficient (MCC
i) and total accuracy rate OA be defined as follows:
SN
i=TP
i/(TP
i+FN
i)
SP
i=TP
i/(TP
i+FP
i)
OA=∑
iTP
i/∑
i(TP
i+FP
i)
In above formula, TP
ithe sequence number of the interval correct Prediction of the i-th class subcellular fraction, FN
ithe sequence number not having correct Prediction in the i-th class subcellular fraction interval, FP
iright and wrong i-th class subcellular fraction is interval but be predicted to be the sequence number of the i-th class interval, TN
iit is the sequence number in the non-i-th class subcellular fraction interval be predicted correctly.The introducing of evaluation index carries out objective, effectively assessment from three aspects to search method: susceptibility (SN
i) embody prediction algorithm in each interval accuracy, specificity (SP
i) be evaluation to algorithm degree of confidence, related coefficient MCC
ithen embody the validity of prediction algorithm entirety, total accuracy rate OA embodies the accuracy in all intervals of prediction algorithm.
The setting of 3 Forecasting Methodology parameters
Prediction algorithm will arrange the value of three parameters: Hash shows quantity L, the figure place k of Hash table and overall comparison vector number Q.In order to discuss these three parameters how to affect LSH prediction algorithm.The optimum configurations of setting acquiescence is: L=10, k=200, Q=6.When studying one of them parameter to the affecting of algorithm, fixing two other parameter is default value, often organizes parameter and does 10 experiments.
Fig. 1,2 illustrates Hash and shows quantity how to affect hash algorithm performance.When L increases, can see that the mean value (MeanAveragePrecision, MAP) of accuracy rate first increases steadily, and tend towards stability; The search of Hash table returns line number mean value (meanreturnrow, MRR) linearly increases trend, and search returns results several increase can increase predicted time.Two data centralizations, when L is 4, the sixth of the twelve Earthly Branches is through making our algorithm obtain good predicting the outcome.Result shows, when taking into account consideration accuracy and counting yield at the same time, L is rational in interval [5,20], can obtain higher success rate prediction.
How the figure place k that Fig. 3,4 illustrates Hash table affects hash algorithm performance.As seen from the figure, parameter k is larger, MAP and MRR can decline.Reason is that the larger similar collision rate of k can decline, thus have influence on Hash table return line number.When taking into account consideration accuracy and efficiency at the same time, it is more rational for arranging k=200.
During concrete enforcement:
According to the results and analysis of optimum configurations experiment, final Forecasting Methodology parameter L=10 is set, k=200, Q=4.For 317 sequences, Forecasting Methodology implementation process is described as follows:
(1) extract the AAC feature of protein sequence, obtain 317 20 dimensional feature vectors.
(2) build Hash table: leave in 10 Hash tables by the Sample Storehouse of 317 20 proper vectors tieed up, for each vector, by above-mentioned LSH method, put into the bucket of key assignments corresponding to 10 Hash tables respectively.
1) by 317 dimensions be 20 the AAC vector coordinate that is converted to each vector be the vector of positive integer.
2) each vector v can be converted into 01 string of a 1000*20 length.
3) from the integer of 0 to 1000*20-1, random selecting 200 number is: n
1, n
2, n
3..., n
200if, h (the n-th coordinate in v ", n) be v ", v " '=G (v ") and=h (v ", n
1) h (v ", n
2) " h (v ", n
200).
4) (v ") is just a hash value of AAC proper vector v to G.
5) in order to improve similar collision rate, 10 hash tables are set up by 2-4 step.
(2) for the search of AAC proper vector T in Sample Storehouse of target sequence to be predicted.Vector T cryptographic hash corresponding in each Hash table is calculated: h by LSH method
1, h
2..., h
10.Union is got in the set that taking-up 10 is vectorial from 10 Hash tables again.Again from also concentrating of obtaining, choose from 4 nearest vectors of vector T Euclidean.With the protein sequence desired distance M that overall comparison dynamic programming compute vector T is corresponding with 4 vectors, be forecast interval between the sequence protein white area that M is the highest.
A table 1317 sequence jackknife predicts the outcome
The part that the present invention does not relate to prior art that maybe can adopt all same as the prior art is realized.
Claims (6)
1., by the Prediction of Protein Subcellular Location method that nearest _neighbor retrieval realizes, it is characterized in that: the method comprises the following steps:
(1), using AAC proper vector as the feature of protein sequence, by LSH method, the AAC proper vector of each protein sequence in training set is left in multiple Hash table;
(2), prediction time, calculate target sequence AAC proper vector cryptographic hash corresponding in each Hash table by LSH method, obtain the set of similar sequences vector;
(3), choose from the set of the similar sequences vector obtained from Q nearest vector of target sequence AAC proper vector Euclidean distance, with overall comparison dynamic programming calculate target sequence AAC proper vector and aforementioned Q vector vector between protein sequence desired distance, using corresponding for sequence albumen the highest with target sequence desired distance in Q vector interval as forecast interval.
2. the Prediction of Protein Subcellular Location method of nearest _neighbor retrieval realization according to claim 1, is characterized in that step (1) specifically comprises the following steps:
(A) the AAC proper vector of protein sequence, is extracted:
If protein sequence P is:
P=R
1R
2R
3…R
t(1)
Wherein: t is the length of protein sequence and the number of amino acid residue, R
1for first amino acid residue in sequence word P, R
2be second amino acid residue, by that analogy, R
tbe t amino acid residue;
AAC feature extraction: then the amino acid composition information of protein sequence P and AAC proper vector are:
v=[f
1,f
2,…,f
d](2)
Wherein f
1f
2f
20adopt following equations:
Wherein, f
u(u=1,2 ..., d) be each amino acid whose frequency of occurrences, d=20, t is the length of a protein sequence, and i represents the numbering of amino acid residue, and A (u) is amino acid residue corresponding to sequence number u; (B) Hash table, is built:
For the protein sequence of the n in training set, the AAC proper vector that the d of each protein sequence ties up is left in L Hash table, for each vector, by LSH method, put into the bucket of key assignments corresponding to L Hash table respectively.
3. the Prediction of Protein Subcellular Location method of nearest _neighbor retrieval realization according to claim 2, is characterized in that step (B) specifically comprises the following steps:
(B-1), for the protein sequence of the n in training set, be the AAC proper vector of d by the dimension of each protein sequence, by formula (4), the d in v vector is expanded C and doubly round, the coordinate being converted to each vector is the vector of positive integer:
v′=[C×v](4)
Wherein: [] represents rounding operation;
(B-2), d vector is done following conversion: set r as coordinate, then g (r)=000 of vector v ... 0111 ... 1, wherein left end is 0 entirely, right-hand member be entirely 1,1 number be the size of the value of r;
Adopt operational symbol | connect two adjacent coordinates, so vector v ' changed by F (v '):
v″=F(v′)=g(f1)|g(f2)|g(f3)|…|g(fd);
(B-3), from the integer of 0 to Cd-1, random selecting k is: n
1, n
2, n
3..., n
kif, h (the n-th coordinate in v ", n) be v ", then v " '=G (v ")=h (v ", n
1) h (v ", n
2) ... h (v ", n
k); (v ") is just a hash value of AAC proper vector v to G;
(B-4), for the protein sequence of the n in training set, all obtain n hash value according to step (B-3), set up a hash table;
(B-5), in order to improve similar collision rate, setting up L by (B-3)-(B-4) step and opening hash table.
4. the Prediction of Protein Subcellular Location method of nearest _neighbor retrieval realization according to claim 1, it is characterized in that step (2) specifically comprises the following steps: the AAC proper vector T extracting target protein sequence, calculate AAC proper vector T cryptographic hash corresponding in each Hash table by LSH method: J
1, J
2... J
l, extract each hash show in vector corresponding to cryptographic hash, obtain the set of similar sequences vector; Again from the set obtained, choosing from Q nearest vector of vector T Euclidean, with the protein sequence desired distance M that overall comparison dynamic programming compute vector T and Q vector is corresponding, is forecast interval between the sequence protein white area that M is the highest.
5. the Prediction of Protein Subcellular Location method of nearest _neighbor retrieval realization according to claim 4, it is characterized in that: overall comparison dynamic programming computing method are: establish two sequence a and b, length is x and y, and between these two sequences, desired distance is M (a
x, b
y), by the distance M (a of front j position in i position front in evaluation sequence a and sequence b
i, b
j), i ∈ [1, x], j ∈ [1, y], recursively obtain distance M (a
x, b
y).
6. the Prediction of Protein Subcellular Location method of nearest _neighbor retrieval realization according to claim 5, it is characterized in that: recurrence comparison is divided into some steps, by span i ∈ [1, x], j ∈ [1, y] has three kinds of events when performing x × y each step increase position:
From the vertical movement of unit (i-1, j) to (i, j), be equivalent in b sequence, insert a room and similar sequences is extended, distance value subtracts 2;
Move from the diagonal line of unit (i-1, j-1) to (i, j), be equivalent to increase alphabetical a
iand b
jsimilar sequences is extended, and letter is identical, and distance value increases 1, and letter is different, and distance value subtracts 1;
From unit (i, j-1) moving horizontally to (i, j), be equivalent in sequence b, insert a room and similar sequences is extended, distance value subtracts 2;
The distance that the distance of unit (i, j) regards three adjacent cells as adds the reckling after respective weights, namely
Wherein, max refers to get the best result in three kinds of possibility scores, M (a
0, b
0)=0, S (i, j) refers to i-th letter and jth alphabetical comparing, and is all 1 mutually, is not all-1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510411973.9A CN105046106B (en) | 2015-07-14 | 2015-07-14 | A kind of Prediction of Protein Subcellular Location method realized with nearest _neighbor retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510411973.9A CN105046106B (en) | 2015-07-14 | 2015-07-14 | A kind of Prediction of Protein Subcellular Location method realized with nearest _neighbor retrieval |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105046106A true CN105046106A (en) | 2015-11-11 |
CN105046106B CN105046106B (en) | 2018-02-23 |
Family
ID=54452646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510411973.9A Expired - Fee Related CN105046106B (en) | 2015-07-14 | 2015-07-14 | A kind of Prediction of Protein Subcellular Location method realized with nearest _neighbor retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105046106B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595909A (en) * | 2018-03-29 | 2018-09-28 | 山东师范大学 | TA targeting proteins prediction techniques based on integrated classifier |
CN109273054A (en) * | 2018-08-31 | 2019-01-25 | 南京农业大学 | Protein Subcellular interval prediction method based on relation map |
CN112259160A (en) * | 2020-11-19 | 2021-01-22 | 广东工业大学 | Protein subcellular localization method, system, storage medium and computer equipment |
CN112585686A (en) * | 2018-09-21 | 2021-03-30 | 渊慧科技有限公司 | Machine learning to determine protein structure |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324933A (en) * | 2013-06-08 | 2013-09-25 | 南京理工大学常熟研究院有限公司 | Membrane protein sub-cell positioning method based on complex space multi-view feature fusion |
CN104156634A (en) * | 2014-08-14 | 2014-11-19 | 中南大学 | Key protein identification method based on subcellular localization specificity |
-
2015
- 2015-07-14 CN CN201510411973.9A patent/CN105046106B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324933A (en) * | 2013-06-08 | 2013-09-25 | 南京理工大学常熟研究院有限公司 | Membrane protein sub-cell positioning method based on complex space multi-view feature fusion |
CN104156634A (en) * | 2014-08-14 | 2014-11-19 | 中南大学 | Key protein identification method based on subcellular localization specificity |
Non-Patent Citations (4)
Title |
---|
宋杰: "蛋白质亚细胞定位预测的最近邻算法", 《计算机应用研究》 * |
张继福 等: "基于MapReduce与相关子空间的局部离群数据挖掘算法", 《软件学报》 * |
李立奇 等: "KNN法在含纤连蛋白域蛋白质亚细胞定位中的应用", 《山东医药》 * |
樊玉才 等: "基于改进的GO-PseAA方法的凋亡蛋白亚细胞定位", 《内蒙古工业大学学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595909A (en) * | 2018-03-29 | 2018-09-28 | 山东师范大学 | TA targeting proteins prediction techniques based on integrated classifier |
CN109273054A (en) * | 2018-08-31 | 2019-01-25 | 南京农业大学 | Protein Subcellular interval prediction method based on relation map |
CN109273054B (en) * | 2018-08-31 | 2021-07-13 | 南京农业大学 | Protein subcellular interval prediction method based on relational graph |
CN112585686A (en) * | 2018-09-21 | 2021-03-30 | 渊慧科技有限公司 | Machine learning to determine protein structure |
CN112259160A (en) * | 2020-11-19 | 2021-01-22 | 广东工业大学 | Protein subcellular localization method, system, storage medium and computer equipment |
CN112259160B (en) * | 2020-11-19 | 2023-05-26 | 广东工业大学 | Protein subcellular localization method, system, storage medium and computer device |
Also Published As
Publication number | Publication date |
---|---|
CN105046106B (en) | 2018-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning | |
Wei et al. | An improved protein structural classes prediction method by incorporating both sequence and structure information | |
Dong et al. | Identification of DNA-binding proteins by auto-cross covariance transformation | |
Zhang et al. | StackPDB: predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier | |
CN108009405A (en) | A kind of method based on machine learning techniques prediction Bacterial outer membrane proteins matter | |
Li et al. | Protein contact map prediction based on ResNet and DenseNet | |
CN105046106A (en) | Protein subcellular localization and prediction method realized by using nearest-neighbor retrieval | |
CN105550715A (en) | Affinity propagation clustering-based integrated classifier constructing method | |
Zhang et al. | Predicting linear B-cell epitopes by using sequence-derived structural and physicochemical features | |
CN103617203A (en) | Protein-ligand binding site predicting method based on inquiry drive | |
CN110060738A (en) | Method and system based on machine learning techniques prediction bacterium protective antigens albumen | |
CN103473416A (en) | Protein-protein interaction model building method and device | |
Wang et al. | PredDBP-stack: prediction of DNA-binding proteins from HMM profiles using a stacked ensemble method | |
Yang et al. | PseKNC and Adaboost-based method for DNA-binding proteins recognition | |
Ma et al. | Kernel soft-neighborhood network fusion for miRNA-disease interaction prediction | |
Wang | A Modified Machine Learning Method Used in Protein Prediction in Bioinformatics. | |
Chrysostomou et al. | Structural classification of protein sequences based on signal processing and support vector machines | |
CN101609486B (en) | Identification method of superclass of G-protein-coupled receptors and Web service system thereof | |
CN108388774A (en) | A kind of on-line analysis of polypeptide spectrum matched data | |
Zaki et al. | Features extraction for protein homology detection using Hidden Markov Models combining scores | |
Arango-Argoty et al. | An adaptation of Pfam profiles to predict protein sub-cellular localization in Gram positive bacteria | |
Chen et al. | FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures | |
Fu et al. | Prediction of anuran antimicrobial peptides using AdaBoost and improved PSSM profiles | |
Hassan et al. | COMPARATIVE ANALYSIS OF CLASSIFICATION BASED ON CELLULAR LOCALIZATION DATA USING MACHINE LEARNING | |
CN111951889B (en) | Recognition prediction method and system for M5C locus in RNA sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180223 Termination date: 20210714 |
|
CF01 | Termination of patent right due to non-payment of annual fee |