CN102043910A - Remote protein homology detection and fold recognition method based on Top-n-gram - Google Patents
Remote protein homology detection and fold recognition method based on Top-n-gram Download PDFInfo
- Publication number
- CN102043910A CN102043910A CN 201010600321 CN201010600321A CN102043910A CN 102043910 A CN102043910 A CN 102043910A CN 201010600321 CN201010600321 CN 201010600321 CN 201010600321 A CN201010600321 A CN 201010600321A CN 102043910 A CN102043910 A CN 102043910A
- Authority
- CN
- China
- Prior art keywords
- amino acid
- gram
- protein sequence
- matrix
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a remote protein homology detection and fold recognition method based on a Top-n-gram, and relates to a remote protein homology detection and fold recognition method. The method is used for solving a problem that a binary spectrum cannot find out an optimal threshold and cannot distinguish difference of frequency of occurrences of amino acid in the prior protein remote homology detection and fold recognition method, and comprises the following steps: 1, operating a PSII-BLAST, inputting a tested protein sequence for multiple sequence alignment, and calculating a pseudo count of an amino acid i; 2, generating a frequency spectrum; 3, transforming the frequency spectrum into the Top-n-gram; 4, obtaining a latent semantic expression vector corresponding to the tested protein sequence; 5, inputting the latent semantic expression vector corresponding to the tested protein sequence into an SVM sorter for sorting, and obtaining a forecasting result. The protein remote homology detection and fold recognition method based on the Top-n-gram is used in the filed of protein homology detection and fold recognition.
Description
Technical field
The present invention relates to the long-range homology of a kind of protein detects and the fold recognition method.
Background technology
At present, the long-range homology detection method of protein both domestic and external roughly is divided into following several types: dynamic programming algorithm, production model, discriminant model.The discriminant model is the method for prediction effect optimum in this field, and wherein (Support Vector Machine, method SVM) is present the most frequently used method based on support vector machine.Raising is to search out a kind of appropriate protein representation based on the valid approach of the prediction effect of support vector machine method, and then the protein sequence vectorization.
Comprise a large amount of evolution information in the protein multisequencing comparison result by operation PSI-BLAST (location specific iteration BLAST) output.Therefore because frequency spectrum comprises more information than protein sequence, adopt the evolution information that comprises in the frequency spectrum to improve that the long-range homology of protein detects and the prediction effect of fold recognition is significant.Have the researcher to propose a kind of proper vector based on the scale-of-two spectrum, this method is converted into the scale-of-two spectrum to frequency spectrum by frequency threshold before.Frequency represents with 1 that greater than the amino acid of threshold value frequency is represented with 0 less than the amino acid of threshold value.Scale-of-two spectrum is that a kind of protein is formed composition, and is used to solve some biological problems, protein domain Boundary Prediction for example, design of average power potential energy and protein interaction site estimation.Though the method based on the scale-of-two spectrum has obtained success, the scale-of-two spectrum has some shortcomings.At first, select by experience because frequency spectrum is converted into the frequency threshold of scale-of-two spectrum, so there is not the method for system can optimize this threshold value, the assurance of therefore having no idea can be found optimum threshold value; Secondly, the scale-of-two spectrum can not be distinguished the difference of the amino acid frequency of occurrences.Frequency all uses 1 to represent greater than the amino acid of threshold value, and this method for expressing has been ignored these amino acid and had different frequencies and have different importance during evolution.
Summary of the invention
The present invention is in order to solve in long-range homology detection of existing protein and the fold recognition method, the scale-of-two spectrum can't find optimal threshold, can't distinguish the problem of the difference of the amino acid frequency of occurrences, provide the long-range homology of a kind of protein to detect and the fold recognition method based on Top-n-gram.The concrete steps of this method are:
Step 1: operation PSI-BLAST, input test protein sequence carry out the multisequencing comparison, calculate the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step 2: according to the spurious count generated frequency spectrum of amino acid i;
Step 3: frequency spectrum is converted into Top-n-gram;
Step 4: by adding up the number of times that every kind of Top-n-gram occurs, the test protein sequence is converted into the vector of regular length, makes up speech-document matrix W then;
Step 5: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of test protein sequence correspondence;
Step 6: the potential semantic meaning representation vector input svm classifier device of test protein sequence correspondence is classified, the svm classifier device is composed to mark of test protein sequence, fractional value has homology or folding greater than 0 test protein sequence, thereby is predicted the outcome.
The method of the described generated frequency spectrum of the described step 2 of step 2 is:
The target frequency Q of 20 kinds of standard amino acids on each amino acid sites in the calculating test protein sequence
i:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
The described method that frequency spectrum is converted into Top-n-gram of step 3 is:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer.
The corresponding Top-n-gram of speech among the described speech of step 4-document matrix W, the corresponding test protein sequence of document.
The method that the described speech-document matrix W to generation of step 5 carries out svd is: speech-document matrix W is decomposed into three matrixes:
W=USV
T
Wherein matrix U is that dimension is the left singular matrix of M * K, and S is that dimension is the diagonal matrix of K * K, and its diagonal element is the singular value of matrix W, and satisfies s
1〉=s
2〉=... s
K>0, V is that dimension is the right singular matrix of N * K, thereby reaches the purpose that dimensionality reduction is removed noise by R singular value before keeping, and the dimension of the matrix U behind the dimensionality reduction, S and V is respectively M * R, R * R and N * R, and the value of R is 300.
The described svm classifier device of step 6 obtains by following training method:
In the described training method with a plurality of training protein sequences as training sample, respectively to each the training protein sequence carry out following training,
Steps A: operation PSI-BLAST, input training protein sequence carries out the multisequencing comparison, calculates the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step B: the spurious count according to amino acid i is produced frequency spectrum:
The target frequency Q of 20 kinds of standard amino acids on each amino acid sites in the calculation training protein sequence
i:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
Step C: frequency spectrum is converted into Top-n-gram:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer;
Step D:,, make up speech-document matrix W then with training protein sequence to be converted into the vector of regular length by adding up the number of times that every kind of Top-n-gram occurs;
Step e: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of training protein sequence correspondence;
Step F: adopt the potential semantic meaning representation vector training of training protein sequence correspondence to obtain the svm classifier device.
The present invention adopts Gist SVM kit commonly used in long-range homology detection and the fold recognition field implementation tool bag as the SVM algorithm.Except kernel function adopted gaussian kernel function, other parameters were used the parameter of Gist kit acquiescence.
In the inventive method, the process of vector that the frequency spectrum of test protein sequence is converted to regular length is referring to shown in Fig. 2 to 5, Fig. 2 is the frequency spectrum of test protein sequence, Fig. 3 is the frequency spectrum that frequency spectrum shown in Figure 2 is obtained after according to the target frequency descending sort, Fig. 4 selects under the n=3 situation, by the Top-n-gram that frequency spectrum shown in Figure 3 obtains, Fig. 5 is the vector by the regular length of Top-n-gram acquisition shown in Figure 4.
Method of the present invention is converted into protein sequence by the occurrence number of every kind of Top-n-gram in the statistics protein sequence vector of regular length.Top-n-gram by combination frequency spectrum medium frequency before the big amino acid of n extract evolution information in the frequency spectrum, Top-n-gram has emphasized the high amino acid whose importance of n before the frequency spectrum medium frequency.Compare with the scale-of-two spectrum, Top-n-gram does not comprise threshold value, thereby does not therefore need the parameter optimization step to avoid the generation of over-fitting; The amino acid whose frequency size of variety classes in all right crossover frequency spectrum.The present invention adopts speech-document matrix dimensionality reduction, the removal noise of latent semantic analysis to obtaining, and then has improved the prediction effect of long-range homology detection of protein and fold recognition.
Description of drawings
Fig. 1 is that the long-range homology of embodiment one described protein based on Top-n-gram detects and the fold recognition method flow diagram; Fig. 2 is the frequency spectrum of test protein sequence; The frequency spectrum of Fig. 3 for frequency spectrum shown in Figure 2 is obtained after according to the target frequency descending sort; Fig. 4 is for selecting under the n=3 situation, by the Top-n-gram of frequency spectrum acquisition shown in Figure 3; Fig. 5 is the vector by the regular length of Top-n-gram acquisition shown in Figure 4.
Embodiment
Technical solution of the present invention is not limited to following cited embodiment, also comprises the combination in any between each embodiment.
Embodiment one: in conjunction with Fig. 1 present embodiment is described, the long-range homology of a kind of protein based on Top-n-gram detects and the fold recognition method, and its concrete steps are:
Step 1: operation PSI-BLAST, input test protein sequence carry out the multisequencing comparison, calculate the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step 2: according to the spurious count generated frequency spectrum of amino acid i;
Step 3: frequency spectrum is converted into Top-n-gram;
Step 4: by adding up the number of times that every kind of Top-n-gram occurs, the test protein sequence is converted into the vector of regular length, makes up speech-document matrix W then;
Step 5: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of test protein sequence correspondence;
Step 6: the potential semantic meaning representation vector input svm classifier device of test protein sequence correspondence is classified, the svm classifier device is composed to mark of test protein sequence, fractional value has homology or folding greater than 0 test protein sequence, thereby is predicted the outcome.
The iterations of PSI-BLAST is 10 times in the present embodiment step 1, the Non-redundant data storehouse of PSI-BLAST search is the nrdb90 database, adopt sequence similarity to compose less than 98% multisequencing comparison calculated rate, the weight of every sequence adopts location-based sequence weight method assignment in the multisequencing comparison.
The computing method of amino acid whose background frequency are described in the present embodiment step 1, the mean value of 20 kinds of standard amino acids frequency of occurrences in every protein sequence in the PDB25 database, and the background frequency of 20 kinds of standard amino acids is:
Replacement matrix in the present embodiment step 1 is the mark matrix B LOSUM62 of PSI-BLAST acquiescence, the replacement matrix that promptly adopts same amino acid to make up more than 62% block group:
The dimension of the vector in the present embodiment step 4 is 20
n
Embodiment two: present embodiment is that the step 2 in long-range homology detection of embodiment one described a kind of protein based on Top-n-gram and the fold recognition method is described further, and the method for the described generated frequency spectrum of step 2 is:
The target frequency Q of 20 kinds of standard amino acids on each amino acid sites in the calculating test protein sequence
i:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
Target frequency Q
iRepresent during evolution certain amino acid whose frequency of occurrences on the protein sequence ad-hoc location.
Embodiment three: present embodiment be to the long-range homology of embodiment one described a kind of protein based on Top-n-gram detect and the fold recognition method in step 3 be described further, the described method that frequency spectrum is converted into Top-n-gram of step 3 is:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer.
The value of n can be for more than or equal to 1 and smaller or equal to 20 integer, but n gets more than or equal to 1 and best smaller or equal to 5 integer effect in practical operation.
Embodiment four: present embodiment is that the step 4 in long-range homology detection of embodiment one described a kind of protein based on Top-n-gram and the fold recognition method is described further, the corresponding Top-n-gram of speech among the described speech of step 4-document matrix W, the corresponding test protein sequence of document.
Embodiment five: present embodiment is that the step 5 in long-range homology detection of embodiment one described a kind of protein based on Top-n-gram and the fold recognition method is described further, and the described speech of step 5-document matrix W can be decomposed into three matrixes:
W=USV
T
Wherein matrix U is that dimension is the left singular matrix of M * K, and S is that dimension is the diagonal matrix of K * K, and its diagonal element is the singular value of matrix W, and satisfies s
1〉=s
2〉=... s
K>0, V is that dimension is the right singular matrix of N * K, thereby reaches the purpose of dimensionality reduction removal noise by R singular value before keeping, and the dimension of the matrix U behind the dimensionality reduction, S and V is respectively M * R, R * R and N * R,, the value of R is 300.
Embodiment six: present embodiment is that the step 6 in long-range homology detection of embodiment one described a kind of protein based on Top-n-gram and the fold recognition method is described further, and the described svm classifier device of step 6 obtains by following training method:
In the described training method with a plurality of training protein sequences as training sample, respectively to each the training protein sequence carry out following training,
Steps A: operation PSI-BLAST, input training protein sequence carries out the multisequencing comparison, calculates the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step B: the spurious count according to amino acid i is produced frequency spectrum:
The target frequency Q of 20 kinds of standard amino acids on each amino acid sites in the calculation training protein sequence
i:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
Step C: frequency spectrum is converted into Top-n-gram:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer;
Step D:,, make up speech-document matrix W then with training protein sequence to be converted into the vector of regular length by adding up the number of times that every kind of Top-n-gram occurs;
Step e: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of training protein sequence correspondence;
Step F: adopt the potential semantic meaning representation vector training of training protein sequence correspondence to obtain the svm classifier device.
Claims (6)
1. the long-range homology of the protein based on Top-n-gram detects and the fold recognition method, it is characterized in that its concrete steps are:
Step 1: operation PSI-BLAST, input test protein sequence carry out the multisequencing comparison, calculate the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step 2: according to the spurious count generated frequency spectrum of amino acid i;
Step 3: frequency spectrum is converted into Top-n-gram;
Step 4: by adding up the number of times that every kind of Top-n-gram occurs, the test protein sequence is converted into the vector of regular length, makes up speech-document matrix W then;
Step 5: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of test protein sequence correspondence;
Step 6: the potential semantic meaning representation vector input svm classifier device of test protein sequence correspondence is classified, the svm classifier device is composed to mark of test protein sequence, fractional value has homology or folding greater than 0 test protein sequence, thereby is predicted the outcome.
2. the long-range homology of a kind of protein based on Top-n-gram according to claim 1 detects and the fold recognition method, it is characterized in that, the method for the described generated frequency spectrum of step 2 is:
The target frequency Qi of 20 kinds of standard amino acids on each amino acid sites in the calculating test protein sequence:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
3. the long-range homology of a kind of protein based on Top-n-gram according to claim 1 detects and the fold recognition method, it is characterized in that, the described method that frequency spectrum is converted into Top-n-gram of step 3 is:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer.
4. the long-range homology of a kind of protein based on Top-n-gram according to claim 1 detects and the fold recognition method, it is characterized in that the corresponding Top-n-gram of speech among the described speech of step 4-document matrix W, the corresponding test protein sequence of document.
5. the long-range homology of a kind of protein based on Top-n-gram according to claim 1 detects and the fold recognition method, it is characterized in that the method that the described speech-document matrix W to generation of step 5 carries out svd is: speech-document matrix W is decomposed into three matrixes:
W=USV
T
Wherein matrix U is that dimension is the left singular matrix of M * K, and S is that dimension is the diagonal matrix of K * K, and its diagonal element is the singular value of matrix W, and satisfies s
1〉=s
2〉=... s
K>0, V is that dimension is the right singular matrix of N * K, thereby reaches the purpose that dimensionality reduction is removed noise by R singular value before keeping, and the dimension of the matrix U behind the dimensionality reduction, S and V is respectively M * R, R * R and N * R, and the value of R is 300.
6. the long-range homology of a kind of protein based on Top-n-gram according to claim 1 detects and the fold recognition method, it is characterized in that the described svm classifier device of step 6 obtains by following training method:
In the described training method with a plurality of training protein sequences as training sample, respectively to each the training protein sequence carry out following training,
Steps A: operation PSI-BLAST, input training protein sequence carries out the multisequencing comparison, calculates the spurious count g of amino acid i
i:
F wherein
jBe the observing frequency of amino acid j, p
jBe the background frequency of amino acid j, q
IjIt is the mark of the replacement matrix of correspondence between amino acid i and the amino acid j;
Step B: the spurious count according to amino acid i is produced frequency spectrum:
The target frequency Q of 20 kinds of standard amino acids on each amino acid sites in the calculation training protein sequence
i:
Q
i=(αf
i+βg
i)/(α+β)
Wherein β is a free parameter, is the default value 10 of PSI-BLAST, and α is that the amino acid kind that is occurred in a certain row in the multisequencing comparison subtracts 1;
Frequency spectrum is expressed as matrix M, and its dimension is L * N, and wherein L is the length of protein sequence, and N is a constant 20, i.e. the quantity of standard amino acid, and the element among the M is target spectrum rate Q
i
Step C: frequency spectrum is converted into Top-n-gram:
20 kinds of standard amino acids during each is gone with frequency spectrum are according to its target frequency descending sort, be preceding n amino acid of target frequency maximum a Top-n-gram according to its combination of frequency then, each Top-n-gram is by amino acid their different frequencies of diverse location difference in Top-n-gram, obtain L Top-n-gram altogether, wherein n is more than or equal to 1 and smaller or equal to 5 integer;
Step D:,, make up speech-document matrix W then with training protein sequence to be converted into the vector of regular length by adding up the number of times that every kind of Top-n-gram occurs;
Step e: the speech-document matrix W that generates is carried out svd, obtain the potential semantic meaning representation vector of training protein sequence correspondence;
Step F: adopt the potential semantic meaning representation vector training of training protein sequence correspondence to obtain the svm classifier device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010600321 CN102043910B (en) | 2010-12-22 | 2010-12-22 | Remote protein homology detection and fold recognition method based on Top-n-gram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010600321 CN102043910B (en) | 2010-12-22 | 2010-12-22 | Remote protein homology detection and fold recognition method based on Top-n-gram |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102043910A true CN102043910A (en) | 2011-05-04 |
CN102043910B CN102043910B (en) | 2012-12-12 |
Family
ID=43910044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010600321 Expired - Fee Related CN102043910B (en) | 2010-12-22 | 2010-12-22 | Remote protein homology detection and fold recognition method based on Top-n-gram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102043910B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077226A (en) * | 2012-12-31 | 2013-05-01 | 浙江工业大学 | Spatial search method for multi-modal protein conformations |
CN106709273A (en) * | 2016-12-15 | 2017-05-24 | 国家海洋局第海洋研究所 | Protein rapid detection method based on matched microalgae protein characteristics sequence label and system thereof |
CN113362900A (en) * | 2021-06-15 | 2021-09-07 | 邵阳学院 | Mixed model for predicting N4-acetylcytidine |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040219601A1 (en) * | 2003-01-02 | 2004-11-04 | Jinbo Xu | Method and system for more effective protein three-dimensional structure prediction |
CN101231677A (en) * | 2007-11-30 | 2008-07-30 | 中国科学院合肥物质科学研究院 | Long-distance interaction prediction method between residue base on sequence spectrum center and genetic optimization process |
CN101794351A (en) * | 2010-03-09 | 2010-08-04 | 哈尔滨工业大学 | Protein secondary structure engineering prediction method based on large margin nearest central point |
-
2010
- 2010-12-22 CN CN 201010600321 patent/CN102043910B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040219601A1 (en) * | 2003-01-02 | 2004-11-04 | Jinbo Xu | Method and system for more effective protein three-dimensional structure prediction |
CN101231677A (en) * | 2007-11-30 | 2008-07-30 | 中国科学院合肥物质科学研究院 | Long-distance interaction prediction method between residue base on sequence spectrum center and genetic optimization process |
CN101794351A (en) * | 2010-03-09 | 2010-08-04 | 哈尔滨工业大学 | Protein secondary structure engineering prediction method based on large margin nearest central point |
Non-Patent Citations (2)
Title |
---|
《中国博士学位论文全文数据库》 20110815 刘滨 基于频率谱的蛋白质结构和相互作用位点预测 33-73 1-6 , 第8期 * |
《中国科学C辑》 20050228 董启文等 蛋白质二级结构预测:基于词条的最大熵马尔科夫方法 87-96 1-6 第35卷, 第1期 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077226A (en) * | 2012-12-31 | 2013-05-01 | 浙江工业大学 | Spatial search method for multi-modal protein conformations |
CN103077226B (en) * | 2012-12-31 | 2015-10-07 | 浙江工业大学 | A kind of multi-modal protein conformation space search method |
CN106709273A (en) * | 2016-12-15 | 2017-05-24 | 国家海洋局第海洋研究所 | Protein rapid detection method based on matched microalgae protein characteristics sequence label and system thereof |
CN106709273B (en) * | 2016-12-15 | 2019-06-18 | 国家海洋局第一海洋研究所 | The matched rapid detection method of microalgae protein characteristic sequence label and system |
CN113362900A (en) * | 2021-06-15 | 2021-09-07 | 邵阳学院 | Mixed model for predicting N4-acetylcytidine |
Also Published As
Publication number | Publication date |
---|---|
CN102043910B (en) | 2012-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zheng et al. | A rolling bearing fault diagnosis method based on multi-scale fuzzy entropy and variable predictive model-based class discrimination | |
CN102081655B (en) | Information retrieval method based on Bayesian classification algorithm | |
WO2018120077A1 (en) | Three-level inverter fault diagnosis method based on empirical mode decomposition and decision tree rvm | |
Manimala et al. | Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining | |
CN107462785A (en) | The more disturbing signal classifying identification methods of the quality of power supply based on GA SVM | |
CN104535905A (en) | Partial discharge diagnosis method based on naive bayesian classification | |
CN102915448B (en) | A kind of three-dimensional model automatic classification method based on AdaBoost | |
Guo et al. | Improved adversarial learning for fault feature generation of wind turbine gearbox | |
CN110738232A (en) | grid voltage out-of-limit cause diagnosis method based on data mining technology | |
Zhou et al. | Text categorization based on clustering feature selection | |
CN104915679A (en) | Large-scale high-dimensional data classification method based on random forest weighted distance | |
CN104809233A (en) | Attribute weighting method based on information gain ratios and text classification methods | |
CN102043910B (en) | Remote protein homology detection and fold recognition method based on Top-n-gram | |
CN103440275A (en) | Prim-based K-means clustering method | |
CN104820702A (en) | Attribute weighting method based on decision tree and text classification method | |
Ma et al. | Cluster analysis of wind turbines of large wind farm with diffusion distance method | |
CN104809229A (en) | Method and system for extracting text characteristic words | |
Wang et al. | A new process industry fault diagnosis algorithm based on ensemble improved binary‐tree SVM | |
CN115796231B (en) | Temporal analysis ultra-short term wind speed prediction method | |
CN104573331A (en) | K neighbor data prediction method based on MapReduce | |
CN115936926A (en) | SMOTE-GBDT-based unbalanced electricity stealing data classification method and device, computer equipment and storage medium | |
CN106845799B (en) | Evaluation method for typical working condition of battery energy storage system | |
CN105116323A (en) | Motor fault detection method based on RBF | |
Xu et al. | NWP feature selection and GCN-based ultra-short-term wind farm cluster power forecasting method | |
CN109753990B (en) | User electric energy substitution potential prediction method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121212 Termination date: 20141222 |
|
EXPY | Termination of patent right or utility model |