CN109801672A - Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter - Google Patents
Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter Download PDFInfo
- Publication number
- CN109801672A CN109801672A CN201811366481.2A CN201811366481A CN109801672A CN 109801672 A CN109801672 A CN 109801672A CN 201811366481 A CN201811366481 A CN 201811366481A CN 109801672 A CN109801672 A CN 109801672A
- Authority
- CN
- China
- Prior art keywords
- amino acid
- calculated
- protein
- classification
- mutual information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to biology information technologies, to propose a kind of method to interact between the prediction protein-protein for capableing of precise and high efficiency, it can be improved effect of the useful information in predicted operation in amino acid sequence, the influence of useless noise information is effectively reduced simultaneously, the present invention, interaction prediction method between multivariate mutual information and residue combination calorie-protein matter, step (1): amino acid classification grouping;Step (2): defined feature indicates;Step (3): feature frequency table is established;Step (4): mutual information feature is calculated;Step (5): 3 tuple mutual information features are calculated;Step (6): amino acid physics and chemistry attributive character is calculated;Step (7): amino acid contact matrix AAC is calculated;Step (8): feature extraction is carried out to amino acid sequence;Step (9): singular value decomposition is carried out;Step (10): the interaction between two protein is obtained.Present invention is mainly applied to anticipating interaction between proteins occasions.
Description
Technical field
The present invention relates in biology information technology based on amino acid sequence information to mutual between protein-protein
The method predicted is acted on, the macromolecular structure prediction algorithm field in proteomics is belonged to.Specifically, it is related to polynary mutual
Interaction prediction method between information and residue combination calorie-protein matter.
Background technique
Interaction between protein-protein is the core of many bioprocess.It is mutual between identification of protein
Effect is very important the bioprocess illustrated in protein function and identification of cell.Interaction letter between protein
Breath can help people to be best understood from disease mechanism, thus the carry out drug design of more efficiently and accurately.Past
In several years, a large amount of computing technique has evolved to the stage that can carry out large scale analysis.In general, between detection protein
The calculation method of interaction mainly has three classes: the method based on evolution information, method based on natural language processing and is based on
The method of amino acid sequence feature.Method based on evolution information extracts evolution letter from the Multiple sequence alignments of homologous protein
Breath constructs chadogram to analyze the relationship between protein function.This method needs a large amount of homologous protein data and these eggs
Interaction label between white matter, therefore it uses in large-scale calculating and is above greatly limited.Based on natural language
The method of processing is relied on widely applied natural language processing technique.Such method is from being stored in biology and medicine section
It learns in a large amount of known protein-protein interaction relationships in document and excavates useful information.Due in document partial information lack
Few, prediction result is possible and imperfect.Therefore, using multivariate mutual information feature extracting method and residue based on amino acid sequence
The extensive of the precision of prediction and ensuring method to interact between protein is improved in conjunction with energy information feature extracting method
Popularization and use are just particularly important.
As the key technology for carrying out interaction prediction method between protein based on amino acid sequence information, feature is mentioned
It takes method to refer to and defines a series of mapping function, one will be mapped in one section of amino acid sequence of protein by these functions
Column can represent the character numerical value of the sequence.These numerical value will comprehensively include the useful feature of protein as far as possible, simultaneously
Exclude the noise information that can have an adverse effect to prediction result.Classical amino acid sequence feature extracting method includes certainly
Covariance, joint triplet, local protein sequence description, multiple dimensioned local feature description symbol, local phase quantificational description
The methods of symbol and the protein sequence expression based on matrix.Never Tongfang is abstracted these methods in face of amino acid sequence
It indicates, while its prediction result has very big difference.Therefore a kind of effective feature extracting method how is designed to reflect to be abstracted
Amino acid sequence is penetrated, the degree of difference between sequence is improved, reduction noise information becomes protein to the interference of prediction result
The key technology of interaction prediction method.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of prediction protein and albumen for capableing of precise and high efficiency
The method to interact between matter.Feature extraction function used in this method can be improved useful information in amino acid sequence and exist
Effect in predicted operation, while the influence of useless noise information is effectively reduced.For this reason, the technical scheme adopted by the present invention is that
Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter, steps are as follows:
Step (1): 20 kinds of standard amino acids are assigned to n functional group according to dipolar nature and volume by the grouping of amino acid classification,
This n functional group is denoted as C respectively0, C1, C2..., Cn, by original amino acid according to the functional group class where each amino acid
Classification sequence in groups is not converted;
Step (2): defining different types of 3 tuple and 2 tuple character representations, the character representation of 3 tuples are " C0C0C0",
“C0C0C1" ... " CnCnCn";The character representation of 2 tuples is " C0C0", " C0C1" ... " CnCn”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur establishes feature frequency
Table, frequency of use calculate function f (a)=(na+ 1)/(L+1) calculates separately the frequency that n classification occurs in the sequence;
Step (4): 2 tuple mutual information features, calculation formula are calculated are as follows:
Wherein f (ab) is that in binary group while the frequency of classification ab occur;
Step (5): 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)
-f(a|bc)lnf(a|bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is all
Occur occurring the frequency of classification a in the triple of classification bc simultaneously;
First part's mutual information characteristic value is obtained by above-mentioned 5 steps;
Step (6): amino acid physics and chemistry attributive character is calculated;
Step (7): by statisticalling analyze protein complex database, amino acid is calculated using Residue pairing frequency and is contacted
Matrix A AC:
Wherein i, j indicate two kinds of amino acid, NI, j=∑DnijIt is contact quantity of the i with j,
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 be 20 kinds of amino acid classes it
One, l=1 ..., L are one of L position, A in given protein sequencelIt is l amino acid classes, is obtained by the step
The substitution matrix SMR of one 20 × L;
Step (8): feature extraction is carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm;
Step (9): carrying out singular value decomposition to the transposed matrix of SMR matrix, and available 20 by singular value decomposition
Right singular vector.
Step (10): being input to a Random Forest model for the characteristic value obtained by step 1 to 9 and predict, from
And obtain the interaction between two protein.
Specifically steps are as follows for calculating for step (6):
Step (6.1): Moreau-Broto autocorrelation haracter value indicative, calculation formula are calculated are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is sequence
Position, l=1,2 ..., L-lag, and lag=1,2 ..., lg obtain lg × 6 after being indicated with six kinds of physicochemical properties
Characteristic value.
Step (6.2): obtained lg × 6 characteristic value is normalized;
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
The specific calculating process of step (8) is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm;
Step (8.5): the histogram of each gradient direction is counted, the histogram size of each gradient direction is as one
Characteristic value.
Through the above steps, every sequence obtains x characteristic value, and 2x characteristic value is obtained in two sequences one.
The features of the present invention and beneficial effect are:
Since the present invention is integrated with the multivariate mutual information and residue combination energy information of amino acid sequence.With traditional sequence
Information is compared, and multivariate mutual information had both considered each amino acid with the characteristic of two ortho position peptide ammino acid, it is contemplated that its
The mutual information of component.Histogram of gradients and singular value decomposition operation can extract the textural characteristics of albumen stromal matrix simultaneously.These
The addition of new information and feature, the interaction relationship between Accurate Prediction protein provide strong help, thus this
Method when being analyzed and predicted to protein-protein interaction relationship, the accuracy of prediction result than it is existing its
His method is more excellent.This method can not only interaction between Accurate Prediction protein, while can also interact net in protein
The new interaction relationship of middle discovery, has very big significance to different kinds of proteins interactive network is improved.
Detailed description of the invention:
The flow chart of calculating process Fig. 1 of the invention
The foundation of the character representation and frequency meter of Fig. 2 binary group and triple;
The schematic diagram of Fig. 3 calculating gradient orientation histogram;
Accuracy rate of Fig. 4 .Moreau-Broto autocorrelation characteristic when using different lg values;
Prediction result of Fig. 5 this method on single core network;
Prediction result of Fig. 6 this method on multi-core network;
Prediction result of Fig. 7 this method on crossover network.
Specific embodiment
The object of the present invention is to provide one kind to be capable of phase interaction between the prediction protein-protein of precise and high efficiency
Method.Feature extraction function used in this method can be improved in amino acid sequence useful information in predicted operation
Effect, while the influence of useless noise information is effectively reduced.
It is a feature of the present invention that it successively contains following steps:
Step (1): amino acid classification grouping.20 kinds of standard amino acids are assigned into 7 functional groups according to dipolar nature and volume.
This 7 functional groups are denoted as C respectively0, C1, C2..., C6.By original amino acid according to the functional group where each amino acid
Class switch classification sequence in groups.
Step (2): different types of 3 tuple and 2 tuple character representations are defined.The character representation of 3 tuples is " C0C0C0",
“C0C0C1" ... " C6C6C6".The character representation of 2 tuples is " C0C0", " C0C1" ... " C6C6”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur establishes feature frequency
Table, as shown in Figure 2.Frequency of use calculates function f (a)=(na+ 1)/(L+1) calculates separately what 7 classifications occurred in the sequence
Frequency.
Step (4): 28 2 tuple mutual information features are calculated.Calculation formula are as follows:
Wherein f (ab) is the frequency that binary group ab occurs.
Step (5): 84 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)
-f(a|bc)lnf(a|bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is all
Occur occurring the frequency of classification a in the triple of classification bc simultaneously.
Pass through the available 238 mutual information characteristic values of above-mentioned 5 steps.
Step (6): amino acid physics and chemistry attributive character is calculated.Available 200 characteristic values of every amino acid sequence, it is a pair of
Predict available 400 characteristic values of amino acid sequence of interaction.Circular is as follows:
Step (6.1): Moreau-Broto autocorrelation haracter value indicative is calculated.Calculation formula are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is sequence
Position, l=1,2 ..., L-lag, and the general value of lag=1,2 ..., lg, lg here be 30.With six kinds of physicochemical properties
After expression, available 30 × 6=180 characteristic value.
Step (6.2): obtain 180 characteristic values are normalized.
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
Step (7): by statisticalling analyze protein complex database, amino acid is calculated using Residue pairing frequency and is contacted
Matrix A AC:
Wherein i, j indicate two kinds of amino acid.NI, j=∑DnijIt is contact quantity of the i with j.
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 be 20 kinds of amino acid classes it
One, l=1 ..., L are one of L position, A in given protein sequencelIt is l amino acid classes, is obtained by the step
The substitution matrix SMR of one 20 × L;
Step (8): feature extraction, tool are carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm
Body calculating process is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm.
Step (8.5): the histogram of each gradient direction is counted.The histogram size of each gradient direction is as one
Characteristic value.
Through the above steps, available 81 characteristic values of every sequence, two sequences have available 162 features altogether
Value.
Step (9): singular value decomposition is carried out to the transposed matrix of SMR matrix.Available 20 by singular value decomposition
Right singular vector.Available 800 characteristic values of the step.
Step (10): by step 1 to 9, available 238+400+162+800=1600 characteristic value altogether.By this
A little characteristic values are input to a Random Forest model and are predicted, to obtain the interaction between two protein.
According to above-mentioned calculation method, the universally recognized protein of 12 researchers and albumen that we use forefathers to construct
Matter interaction data set has carried out the analysis in performance by prediction technique of the Random Forest model to us.Including
The data sets such as S.cerevisiae, H.pylori2918, human8161 and E.coli.It is also that this method is true at three simultaneously
Protein-protein interaction network on carried out analysis of experiments, such as single core network C D9, multi-core network Ras-Raf-Mek-
Erk-Elk-Srf metabolic pathway and crossover network Wnt.On S.cerevisiae data set, binary mutual information, ternary are used
Mutual information and multivariate mutual information are come to carry out Interaction Predicting accuracy rate achieved be respectively 93.56%, 93.88% and
94.23%.It is clear that carrying out feature extraction using combined multivariate mutual information can obtain than a kind of feature extraction is used alone
Better performance.For Moreau-Broto autocorrelation characteristic value calculating method, we test herein in order to obtain best lg
Nine different lg values (lg=5,10,15,20,25,30,35,40,45).Fig. 4 is shown ought use different lg values respectively
When prediction result accuracy.Curve can be seen that when lg increases to 30 from 5 from figure, and precision of prediction increases, however, working as
Lg from 30 increase to 45 when, precision is reducing.Optimum prediction precision is obtained when lg is 30, accuracy rate 92.76%.In method
The attainable accuracy rate of institute is respectively when the gradient orientation histogram and two category feature value of singular value decomposition used is used alone
93.86% and 92.93%.In our method, four category feature extracting methods are integrated, and the accuracy rate of prediction is
94.56%.Random forest grader used in method is more advantageous in prediction result than support vector machine classifier.At random
Forest classified device is integrated model, while being able to detect the importance of feature, therefore have 2% in the accuracy of prediction result
It improves.
When this method is applied to protein-protein interaction network prediction, also there is very high accuracy rate.In single core network C D9
On, our method can identify 14 in 16 protein interaction relationships, accuracy rate 87.50%.Ras-Raf-
On the multi-core network of Mek-Erk-Elk-Srf metabolic pathway, we can correctly predict 189 protein interactions
174 in relationship, accuracy rate 92.06%.With the crossover network in Wnt associated metabolic path to Guan Chong in signal transduction
It wants.Our method has found 91 from 96 interaction relationships, and accuracy rate 94.79% is better than existing prediction side
Method.For current existing other methods in these three types of network structures, the attainable accuracy of institute is respectively 81.25%, 90.00%
With 76.04%.It can be seen that our method has higher accuracy than existing method.Fig. 5,6,7 are that our method exists
But it is carried out on core network, multi-core network and crossover network acquired when the discovery and prediction of protein-protein interaction relationship
Result.Blue line indicates correctly prediction in figure, and red line indicates the prediction of mistake.As can be seen that our side from this 3 figures
Method has very high credible and practicability.
In proteomics, the greatest difficulty to protein-protein interaction forecasting problem is that existing information is not clear enough
Clear, useful information is all lain in too simple sequence information.It, cannot if accurate Thermodynamic parameters is wanted to be predicted
Only be used only direct sequence information, and the information extracting method being the need for taken out from bottom sequence information it is richer
Rich interactive information and physical and chemical attribute information.For the forecasting problem of protein-protein interaction, design a kind of general
The feature extracting method that useful information can be extracted from sequence information is main contributions of the invention.
The basic thought of the invention is: extracting and merges a plurality of types of attribute informations, by effective classifier to mutual
Effect is predicted.The invention calculates binary group of each amino acid and its composition first and triple occurs in the sequence
Frequency information, then on the basis of these frequency informations, further integration takes out multivariate mutual information, from simple sequence number
According to existing connection between middle excavation different aminoacids and its tuple.Secondly, the invention also fully considers the physics and chemistry of amino acid
The influence of confrontation interaction, extracts residue combination energy information from sequence to further increase the accuracy of prediction.
The invention mainly comprises the steps of: the multivariate mutual information calculated in amino acid sequence, obtains 238 mutual information spies
Value indicative.It calculates Moreau-Broto autocorrelation haracter value indicative and counts the frequency that amino acid occurs in the sequence in 20, obtain 400
Characteristic value.Amino acid contact matrix is calculated using Residue pairing frequency, then calculates substitution matrix.It is straight by using gradient direction
Square figure handles substitution matrix, available 162 characteristic values.Carrying out singular value decomposition to substitution matrix simultaneously can obtain
To 800 characteristic values.Obtain 1600 characteristic values are classified using random forest grader, to judge two albumen
With the presence or absence of interaction between matter.
The calculating process of the invention has the characteristics that simply to be easily achieved, and calculates required hardware device and calculate money
Source is also relatively low, has extensive workability.Our method can be realized by C++ and MATLAB, common
On the computer of 6 core CPU and 32GB memory of 2.5GHz, being predicted thousands of samples for task can be in the very short time
Interior completion.Meanwhile for the relationship between balance quality and effect, the decision tree quantity and every subtree of random forest grader
It is chosen for 500 and 400 respectively using characteristic.By adjusting these parameters, can also improve the speed of classified calculating to
Predicted operation is carried out faster.
Claims (3)
1. interaction prediction method between a kind of multivariate mutual information and residue combination calorie-protein matter, characterized in that steps are as follows:
Step (1): 20 kinds of standard amino acids are assigned to n functional group according to dipolar nature and volume, this n by the grouping of amino acid classification
A functional group is denoted as C respectively0, C1, C2..., Cn, by original amino acid according to the functional group classification where each amino acid
Convert classification sequence in groups;
Step (2): defining different types of 3 tuple and 2 tuple character representations, the character representation of 3 tuples are " C0C0C0”,
“C0C0C1”,…“CnCnCn";The character representation of 2 tuples is " C0C0”,”C0C1”,…”CnCn”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur is established feature frequency table, is made
Function f (a)=(n is calculated with frequencya+ 1)/(L+1) calculates separately the frequency that n classification occurs in the sequence;
Step (4): 2 tuple mutual information features, calculation formula are calculated are as follows:
Wherein f (ab) is that in binary group while the frequency of classification ab occur;
Step (5): 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)-f (a | bc) lnf (a | bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is to be occurred
Occurs the frequency of classification a in the triple of classification bc simultaneously;
First part's mutual information characteristic value is obtained by above-mentioned 5 steps;
Step (6): amino acid physics and chemistry attributive character is calculated;
Step (7): by statisticalling analyze protein complex database, amino acid contact matrix is calculated using Residue pairing frequency
AAC:
Wherein i, j indicate two kinds of amino acid, NI, j=∑DnijIt is contact quantity of the i with j,
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 is one of 20 kinds of amino acid classes, l=
1 ..., L is one of L position, A in given protein sequencelIt is l amino acid classes, obtains one 20 by the step
The substitution matrix SMR of × L;
Step (8): feature extraction is carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm;
Step (9): singular value decomposition is carried out to the transposed matrix of SMR matrix, passes through the available 20 right surprises of singular value decomposition
Incorgruous amount.
Step (10): being input to a Random Forest model for the characteristic value obtained by step 1 to 9 and predict, thus
Interaction between two protein.
2. interaction prediction method between multivariate mutual information as described in claim 1 and residue combination calorie-protein matter, special
Sign is that specifically steps are as follows for calculating for step (6):
Step (6.1): Moreau-Broto autocorrelation haracter value indicative, calculation formula are calculated are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is the position of sequence
It sets, l=1,2 ..., L-lag, and lag=1,2 ..., lg, after being indicated with six kinds of physicochemical properties, obtains lg × 6 characteristic value.
Step (6.2): obtained lg × 6 characteristic value is normalized;
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
3. interaction prediction method between multivariate mutual information as described in claim 1 and residue combination calorie-protein matter, special
Sign is that the specific calculating process of step (8) is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm;
Step (8.5): the histogram of each gradient direction is counted, the histogram size of each gradient direction is as a feature
Value.
Through the above steps, every sequence obtains x characteristic value, and 2x characteristic value is obtained in two sequences one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811366481.2A CN109801672A (en) | 2018-11-16 | 2018-11-16 | Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811366481.2A CN109801672A (en) | 2018-11-16 | 2018-11-16 | Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109801672A true CN109801672A (en) | 2019-05-24 |
Family
ID=66556285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811366481.2A Pending CN109801672A (en) | 2018-11-16 | 2018-11-16 | Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109801672A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111081312A (en) * | 2019-12-04 | 2020-04-28 | 浙江工业大学 | Ligand binding residue prediction method based on multi-sequence association information |
CN112242179A (en) * | 2020-09-09 | 2021-01-19 | 天津大学 | Method for identifying type of membrane protein |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060069519A1 (en) * | 2000-03-10 | 2006-03-30 | Daiichi Pharmaceutical Co., Ltd. | Method for predicting protein-protein interactions |
US20110238320A1 (en) * | 2010-03-24 | 2011-09-29 | Noriko Shimba | Interaction force change prediction apparatus and interaction force change prediction method |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
CN106778065A (en) * | 2016-12-30 | 2017-05-31 | 同济大学 | A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein |
CN107358064A (en) * | 2017-07-03 | 2017-11-17 | 苏州大学 | The system and method that predicted amino acid makes a variation to protein structure stability influence |
-
2018
- 2018-11-16 CN CN201811366481.2A patent/CN109801672A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060069519A1 (en) * | 2000-03-10 | 2006-03-30 | Daiichi Pharmaceutical Co., Ltd. | Method for predicting protein-protein interactions |
US20110238320A1 (en) * | 2010-03-24 | 2011-09-29 | Noriko Shimba | Interaction force change prediction apparatus and interaction force change prediction method |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
CN106778065A (en) * | 2016-12-30 | 2017-05-31 | 同济大学 | A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein |
CN107358064A (en) * | 2017-07-03 | 2017-11-17 | 苏州大学 | The system and method that predicted amino acid makes a variation to protein structure stability influence |
Non-Patent Citations (2)
Title |
---|
YIJIE DING等: "Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information", 《INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES》 * |
YIJIE DING等: "Predicting protein-protein interactions via multivariate mutual information of protein sequences", 《BMC BIOINFORMATICS》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111081312A (en) * | 2019-12-04 | 2020-04-28 | 浙江工业大学 | Ligand binding residue prediction method based on multi-sequence association information |
CN111081312B (en) * | 2019-12-04 | 2021-10-29 | 浙江工业大学 | Ligand binding residue prediction method based on multi-sequence association information |
CN112242179A (en) * | 2020-09-09 | 2021-01-19 | 天津大学 | Method for identifying type of membrane protein |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Erisoglu et al. | A new algorithm for initial cluster centers in k-means algorithm | |
Naim et al. | SWIFT—scalable clustering for automated identification of rare cell populations in large, high‐dimensional flow cytometry datasets, Part 1: Algorithm design | |
CN107368700A (en) | Based on the microbial diversity interaction analysis system and method for calculating cloud platform | |
US20130297607A1 (en) | Identification of pattern similarities by unsupervised cluster analysis | |
WO2022126810A1 (en) | Text clustering method | |
Matuszewski et al. | Coalescent processes with skewed offspring distributions and nonequilibrium demography | |
WO2019080411A1 (en) | Electrical apparatus, facial image clustering search method, and computer readable storage medium | |
Dinh et al. | Statistical inference for the evolutionary history of cancer genomes | |
CN109801672A (en) | Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter | |
Kuksa et al. | Spatial representation for efficient sequence classification | |
CN109344233B (en) | Chinese name recognition method | |
CN107480471A (en) | The method for the sequence similarity analysis being characterized based on wavelet transformation | |
Boyeau et al. | Deep generative modeling of sample-level heterogeneity in single-cell genomics | |
Sun et al. | Fast top-K graph similarity search via representative matrices | |
CN107748837B (en) | Method for calculating DNA identification paternity index containing gene mutation | |
Hunkler et al. | Fast conformational clustering of extensive molecular dynamics simulation data | |
Xu et al. | TrichomeYOLO: A Neural Network for Automatic Maize Trichome Counting | |
Liu et al. | AntiMF: A deep learning framework for predicting anticancer peptides based on multi-view feature extraction | |
Yang et al. | Factors analysis of protein O-glycosylation site prediction | |
Chang et al. | Mixture models with entropy regularization for community detection in networks | |
CN103761433A (en) | Network service resource classifying method | |
LU502739B1 (en) | A Prediction Method of Interaction Between Multi-Information and Residue Binding Energy Protein | |
CN106529212B (en) | Biological sequence evolution information extracting method based on sequence dependent Frequency matrix | |
Pouyan et al. | Flow-sne: a new approach for flow cytometry clustering and visualization | |
Raj et al. | Significance of Sequence Features in Classification of Protein–Protein Interactions Using Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190524 |
|
RJ01 | Rejection of invention patent application after publication |