CN109801672A - Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter - Google Patents

Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter Download PDF

Info

Publication number
CN109801672A
CN109801672A CN201811366481.2A CN201811366481A CN109801672A CN 109801672 A CN109801672 A CN 109801672A CN 201811366481 A CN201811366481 A CN 201811366481A CN 109801672 A CN109801672 A CN 109801672A
Authority
CN
China
Prior art keywords
amino acid
calculated
protein
classification
mutual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811366481.2A
Other languages
Chinese (zh)
Inventor
郭菲
邹权
丁漪杰
潘高峰
唐继军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201811366481.2A priority Critical patent/CN109801672A/en
Publication of CN109801672A publication Critical patent/CN109801672A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to biology information technologies, to propose a kind of method to interact between the prediction protein-protein for capableing of precise and high efficiency, it can be improved effect of the useful information in predicted operation in amino acid sequence, the influence of useless noise information is effectively reduced simultaneously, the present invention, interaction prediction method between multivariate mutual information and residue combination calorie-protein matter, step (1): amino acid classification grouping;Step (2): defined feature indicates;Step (3): feature frequency table is established;Step (4): mutual information feature is calculated;Step (5): 3 tuple mutual information features are calculated;Step (6): amino acid physics and chemistry attributive character is calculated;Step (7): amino acid contact matrix AAC is calculated;Step (8): feature extraction is carried out to amino acid sequence;Step (9): singular value decomposition is carried out;Step (10): the interaction between two protein is obtained.Present invention is mainly applied to anticipating interaction between proteins occasions.

Description

Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter
Technical field
The present invention relates in biology information technology based on amino acid sequence information to mutual between protein-protein The method predicted is acted on, the macromolecular structure prediction algorithm field in proteomics is belonged to.Specifically, it is related to polynary mutual Interaction prediction method between information and residue combination calorie-protein matter.
Background technique
Interaction between protein-protein is the core of many bioprocess.It is mutual between identification of protein Effect is very important the bioprocess illustrated in protein function and identification of cell.Interaction letter between protein Breath can help people to be best understood from disease mechanism, thus the carry out drug design of more efficiently and accurately.Past In several years, a large amount of computing technique has evolved to the stage that can carry out large scale analysis.In general, between detection protein The calculation method of interaction mainly has three classes: the method based on evolution information, method based on natural language processing and is based on The method of amino acid sequence feature.Method based on evolution information extracts evolution letter from the Multiple sequence alignments of homologous protein Breath constructs chadogram to analyze the relationship between protein function.This method needs a large amount of homologous protein data and these eggs Interaction label between white matter, therefore it uses in large-scale calculating and is above greatly limited.Based on natural language The method of processing is relied on widely applied natural language processing technique.Such method is from being stored in biology and medicine section It learns in a large amount of known protein-protein interaction relationships in document and excavates useful information.Due in document partial information lack Few, prediction result is possible and imperfect.Therefore, using multivariate mutual information feature extracting method and residue based on amino acid sequence The extensive of the precision of prediction and ensuring method to interact between protein is improved in conjunction with energy information feature extracting method Popularization and use are just particularly important.
As the key technology for carrying out interaction prediction method between protein based on amino acid sequence information, feature is mentioned It takes method to refer to and defines a series of mapping function, one will be mapped in one section of amino acid sequence of protein by these functions Column can represent the character numerical value of the sequence.These numerical value will comprehensively include the useful feature of protein as far as possible, simultaneously Exclude the noise information that can have an adverse effect to prediction result.Classical amino acid sequence feature extracting method includes certainly Covariance, joint triplet, local protein sequence description, multiple dimensioned local feature description symbol, local phase quantificational description The methods of symbol and the protein sequence expression based on matrix.Never Tongfang is abstracted these methods in face of amino acid sequence It indicates, while its prediction result has very big difference.Therefore a kind of effective feature extracting method how is designed to reflect to be abstracted Amino acid sequence is penetrated, the degree of difference between sequence is improved, reduction noise information becomes protein to the interference of prediction result The key technology of interaction prediction method.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of prediction protein and albumen for capableing of precise and high efficiency The method to interact between matter.Feature extraction function used in this method can be improved useful information in amino acid sequence and exist Effect in predicted operation, while the influence of useless noise information is effectively reduced.For this reason, the technical scheme adopted by the present invention is that Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter, steps are as follows:
Step (1): 20 kinds of standard amino acids are assigned to n functional group according to dipolar nature and volume by the grouping of amino acid classification, This n functional group is denoted as C respectively0, C1, C2..., Cn, by original amino acid according to the functional group class where each amino acid Classification sequence in groups is not converted;
Step (2): defining different types of 3 tuple and 2 tuple character representations, the character representation of 3 tuples are " C0C0C0", “C0C0C1" ... " CnCnCn";The character representation of 2 tuples is " C0C0", " C0C1" ... " CnCn”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur establishes feature frequency Table, frequency of use calculate function f (a)=(na+ 1)/(L+1) calculates separately the frequency that n classification occurs in the sequence;
Step (4): 2 tuple mutual information features, calculation formula are calculated are as follows:
Wherein f (ab) is that in binary group while the frequency of classification ab occur;
Step (5): 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)
-f(a|bc)lnf(a|bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is all Occur occurring the frequency of classification a in the triple of classification bc simultaneously;
First part's mutual information characteristic value is obtained by above-mentioned 5 steps;
Step (6): amino acid physics and chemistry attributive character is calculated;
Step (7): by statisticalling analyze protein complex database, amino acid is calculated using Residue pairing frequency and is contacted Matrix A AC:
Wherein i, j indicate two kinds of amino acid, NI, j=∑DnijIt is contact quantity of the i with j,
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 be 20 kinds of amino acid classes it One, l=1 ..., L are one of L position, A in given protein sequencelIt is l amino acid classes, is obtained by the step The substitution matrix SMR of one 20 × L;
Step (8): feature extraction is carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm;
Step (9): carrying out singular value decomposition to the transposed matrix of SMR matrix, and available 20 by singular value decomposition Right singular vector.
Step (10): being input to a Random Forest model for the characteristic value obtained by step 1 to 9 and predict, from And obtain the interaction between two protein.
Specifically steps are as follows for calculating for step (6):
Step (6.1): Moreau-Broto autocorrelation haracter value indicative, calculation formula are calculated are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is sequence Position, l=1,2 ..., L-lag, and lag=1,2 ..., lg obtain lg × 6 after being indicated with six kinds of physicochemical properties Characteristic value.
Step (6.2): obtained lg × 6 characteristic value is normalized;
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
The specific calculating process of step (8) is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm;
Step (8.5): the histogram of each gradient direction is counted, the histogram size of each gradient direction is as one Characteristic value.
Through the above steps, every sequence obtains x characteristic value, and 2x characteristic value is obtained in two sequences one.
The features of the present invention and beneficial effect are:
Since the present invention is integrated with the multivariate mutual information and residue combination energy information of amino acid sequence.With traditional sequence Information is compared, and multivariate mutual information had both considered each amino acid with the characteristic of two ortho position peptide ammino acid, it is contemplated that its The mutual information of component.Histogram of gradients and singular value decomposition operation can extract the textural characteristics of albumen stromal matrix simultaneously.These The addition of new information and feature, the interaction relationship between Accurate Prediction protein provide strong help, thus this Method when being analyzed and predicted to protein-protein interaction relationship, the accuracy of prediction result than it is existing its His method is more excellent.This method can not only interaction between Accurate Prediction protein, while can also interact net in protein The new interaction relationship of middle discovery, has very big significance to different kinds of proteins interactive network is improved.
Detailed description of the invention:
The flow chart of calculating process Fig. 1 of the invention
The foundation of the character representation and frequency meter of Fig. 2 binary group and triple;
The schematic diagram of Fig. 3 calculating gradient orientation histogram;
Accuracy rate of Fig. 4 .Moreau-Broto autocorrelation characteristic when using different lg values;
Prediction result of Fig. 5 this method on single core network;
Prediction result of Fig. 6 this method on multi-core network;
Prediction result of Fig. 7 this method on crossover network.
Specific embodiment
The object of the present invention is to provide one kind to be capable of phase interaction between the prediction protein-protein of precise and high efficiency Method.Feature extraction function used in this method can be improved in amino acid sequence useful information in predicted operation Effect, while the influence of useless noise information is effectively reduced.
It is a feature of the present invention that it successively contains following steps:
Step (1): amino acid classification grouping.20 kinds of standard amino acids are assigned into 7 functional groups according to dipolar nature and volume. This 7 functional groups are denoted as C respectively0, C1, C2..., C6.By original amino acid according to the functional group where each amino acid Class switch classification sequence in groups.
Step (2): different types of 3 tuple and 2 tuple character representations are defined.The character representation of 3 tuples is " C0C0C0", “C0C0C1" ... " C6C6C6".The character representation of 2 tuples is " C0C0", " C0C1" ... " C6C6”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur establishes feature frequency Table, as shown in Figure 2.Frequency of use calculates function f (a)=(na+ 1)/(L+1) calculates separately what 7 classifications occurred in the sequence Frequency.
Step (4): 28 2 tuple mutual information features are calculated.Calculation formula are as follows:
Wherein f (ab) is the frequency that binary group ab occurs.
Step (5): 84 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)
-f(a|bc)lnf(a|bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is all Occur occurring the frequency of classification a in the triple of classification bc simultaneously.
Pass through the available 238 mutual information characteristic values of above-mentioned 5 steps.
Step (6): amino acid physics and chemistry attributive character is calculated.Available 200 characteristic values of every amino acid sequence, it is a pair of Predict available 400 characteristic values of amino acid sequence of interaction.Circular is as follows:
Step (6.1): Moreau-Broto autocorrelation haracter value indicative is calculated.Calculation formula are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is sequence Position, l=1,2 ..., L-lag, and the general value of lag=1,2 ..., lg, lg here be 30.With six kinds of physicochemical properties After expression, available 30 × 6=180 characteristic value.
Step (6.2): obtain 180 characteristic values are normalized.
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
Step (7): by statisticalling analyze protein complex database, amino acid is calculated using Residue pairing frequency and is contacted Matrix A AC:
Wherein i, j indicate two kinds of amino acid.NI, j=∑DnijIt is contact quantity of the i with j.
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 be 20 kinds of amino acid classes it One, l=1 ..., L are one of L position, A in given protein sequencelIt is l amino acid classes, is obtained by the step The substitution matrix SMR of one 20 × L;
Step (8): feature extraction, tool are carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm Body calculating process is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm.
Step (8.5): the histogram of each gradient direction is counted.The histogram size of each gradient direction is as one Characteristic value.
Through the above steps, available 81 characteristic values of every sequence, two sequences have available 162 features altogether Value.
Step (9): singular value decomposition is carried out to the transposed matrix of SMR matrix.Available 20 by singular value decomposition Right singular vector.Available 800 characteristic values of the step.
Step (10): by step 1 to 9, available 238+400+162+800=1600 characteristic value altogether.By this A little characteristic values are input to a Random Forest model and are predicted, to obtain the interaction between two protein.
According to above-mentioned calculation method, the universally recognized protein of 12 researchers and albumen that we use forefathers to construct Matter interaction data set has carried out the analysis in performance by prediction technique of the Random Forest model to us.Including The data sets such as S.cerevisiae, H.pylori2918, human8161 and E.coli.It is also that this method is true at three simultaneously Protein-protein interaction network on carried out analysis of experiments, such as single core network C D9, multi-core network Ras-Raf-Mek- Erk-Elk-Srf metabolic pathway and crossover network Wnt.On S.cerevisiae data set, binary mutual information, ternary are used Mutual information and multivariate mutual information are come to carry out Interaction Predicting accuracy rate achieved be respectively 93.56%, 93.88% and 94.23%.It is clear that carrying out feature extraction using combined multivariate mutual information can obtain than a kind of feature extraction is used alone Better performance.For Moreau-Broto autocorrelation characteristic value calculating method, we test herein in order to obtain best lg Nine different lg values (lg=5,10,15,20,25,30,35,40,45).Fig. 4 is shown ought use different lg values respectively When prediction result accuracy.Curve can be seen that when lg increases to 30 from 5 from figure, and precision of prediction increases, however, working as Lg from 30 increase to 45 when, precision is reducing.Optimum prediction precision is obtained when lg is 30, accuracy rate 92.76%.In method The attainable accuracy rate of institute is respectively when the gradient orientation histogram and two category feature value of singular value decomposition used is used alone 93.86% and 92.93%.In our method, four category feature extracting methods are integrated, and the accuracy rate of prediction is 94.56%.Random forest grader used in method is more advantageous in prediction result than support vector machine classifier.At random Forest classified device is integrated model, while being able to detect the importance of feature, therefore have 2% in the accuracy of prediction result It improves.
When this method is applied to protein-protein interaction network prediction, also there is very high accuracy rate.In single core network C D9 On, our method can identify 14 in 16 protein interaction relationships, accuracy rate 87.50%.Ras-Raf- On the multi-core network of Mek-Erk-Elk-Srf metabolic pathway, we can correctly predict 189 protein interactions 174 in relationship, accuracy rate 92.06%.With the crossover network in Wnt associated metabolic path to Guan Chong in signal transduction It wants.Our method has found 91 from 96 interaction relationships, and accuracy rate 94.79% is better than existing prediction side Method.For current existing other methods in these three types of network structures, the attainable accuracy of institute is respectively 81.25%, 90.00% With 76.04%.It can be seen that our method has higher accuracy than existing method.Fig. 5,6,7 are that our method exists But it is carried out on core network, multi-core network and crossover network acquired when the discovery and prediction of protein-protein interaction relationship Result.Blue line indicates correctly prediction in figure, and red line indicates the prediction of mistake.As can be seen that our side from this 3 figures Method has very high credible and practicability.
In proteomics, the greatest difficulty to protein-protein interaction forecasting problem is that existing information is not clear enough Clear, useful information is all lain in too simple sequence information.It, cannot if accurate Thermodynamic parameters is wanted to be predicted Only be used only direct sequence information, and the information extracting method being the need for taken out from bottom sequence information it is richer Rich interactive information and physical and chemical attribute information.For the forecasting problem of protein-protein interaction, design a kind of general The feature extracting method that useful information can be extracted from sequence information is main contributions of the invention.
The basic thought of the invention is: extracting and merges a plurality of types of attribute informations, by effective classifier to mutual Effect is predicted.The invention calculates binary group of each amino acid and its composition first and triple occurs in the sequence Frequency information, then on the basis of these frequency informations, further integration takes out multivariate mutual information, from simple sequence number According to existing connection between middle excavation different aminoacids and its tuple.Secondly, the invention also fully considers the physics and chemistry of amino acid The influence of confrontation interaction, extracts residue combination energy information from sequence to further increase the accuracy of prediction.
The invention mainly comprises the steps of: the multivariate mutual information calculated in amino acid sequence, obtains 238 mutual information spies Value indicative.It calculates Moreau-Broto autocorrelation haracter value indicative and counts the frequency that amino acid occurs in the sequence in 20, obtain 400 Characteristic value.Amino acid contact matrix is calculated using Residue pairing frequency, then calculates substitution matrix.It is straight by using gradient direction Square figure handles substitution matrix, available 162 characteristic values.Carrying out singular value decomposition to substitution matrix simultaneously can obtain To 800 characteristic values.Obtain 1600 characteristic values are classified using random forest grader, to judge two albumen With the presence or absence of interaction between matter.
The calculating process of the invention has the characteristics that simply to be easily achieved, and calculates required hardware device and calculate money Source is also relatively low, has extensive workability.Our method can be realized by C++ and MATLAB, common On the computer of 6 core CPU and 32GB memory of 2.5GHz, being predicted thousands of samples for task can be in the very short time Interior completion.Meanwhile for the relationship between balance quality and effect, the decision tree quantity and every subtree of random forest grader It is chosen for 500 and 400 respectively using characteristic.By adjusting these parameters, can also improve the speed of classified calculating to Predicted operation is carried out faster.

Claims (3)

1. interaction prediction method between a kind of multivariate mutual information and residue combination calorie-protein matter, characterized in that steps are as follows:
Step (1): 20 kinds of standard amino acids are assigned to n functional group according to dipolar nature and volume, this n by the grouping of amino acid classification A functional group is denoted as C respectively0, C1, C2..., Cn, by original amino acid according to the functional group classification where each amino acid Convert classification sequence in groups;
Step (2): defining different types of 3 tuple and 2 tuple character representations, the character representation of 3 tuples are " C0C0C0”, “C0C0C1”,…“CnCnCn";The character representation of 2 tuples is " C0C0”,”C0C1”,…”CnCn”。
Step (3): in statistics group classification sequence, the number that 3 tuple features and 2 tuple features occur is established feature frequency table, is made Function f (a)=(n is calculated with frequencya+ 1)/(L+1) calculates separately the frequency that n classification occurs in the sequence;
Step (4): 2 tuple mutual information features, calculation formula are calculated are as follows:
Wherein f (ab) is that in binary group while the frequency of classification ab occur;
Step (5): 3 tuple mutual information features are calculated.Calculation formula are as follows:
I (abc)=I (ab)+f (a | c) lnf (a | c)-f (a | bc) lnf (a | bc)
Wherein f (a | c) is that in the binary group of occurred classification c while the frequency of classification a occur, and f (a | bc) it is to be occurred Occurs the frequency of classification a in the triple of classification bc simultaneously;
First part's mutual information characteristic value is obtained by above-mentioned 5 steps;
Step (6): amino acid physics and chemistry attributive character is calculated;
Step (7): by statisticalling analyze protein complex database, amino acid contact matrix is calculated using Residue pairing frequency AAC:
Wherein i, j indicate two kinds of amino acid, NI, j=∑DnijIt is contact quantity of the i with j,
Calculate substitution matrix SMR, SMRI, l=AAC (i, Al), wherein i=1 ..., 20 is one of 20 kinds of amino acid classes, l= 1 ..., L is one of L position, A in given protein sequencelIt is l amino acid classes, obtains one 20 by the step The substitution matrix SMR of × L;
Step (8): feature extraction is carried out to amino acid sequence using gradient orientation histogram HOG feature extraction algorithm;
Step (9): singular value decomposition is carried out to the transposed matrix of SMR matrix, passes through the available 20 right surprises of singular value decomposition Incorgruous amount.
Step (10): being input to a Random Forest model for the characteristic value obtained by step 1 to 9 and predict, thus Interaction between two protein.
2. interaction prediction method between multivariate mutual information as described in claim 1 and residue combination calorie-protein matter, special Sign is that specifically steps are as follows for calculating for step (6):
Step (6.1): Moreau-Broto autocorrelation haracter value indicative, calculation formula are calculated are as follows:
Wherein lag is the distance between residue, and p is p-th of physicochemical properties of above-mentioned natural amino acid, and l is the position of sequence It sets, l=1,2 ..., L-lag, and lag=1,2 ..., lg, after being indicated with six kinds of physicochemical properties, obtains lg × 6 characteristic value.
Step (6.2): obtained lg × 6 characteristic value is normalized;
Step (6.3): the frequency that 20 amino acid occurs in statistical series.
3. interaction prediction method between multivariate mutual information as described in claim 1 and residue combination calorie-protein matter, special Sign is that the specific calculating process of step (8) is as follows:
Step (8.1): gradient value G both horizontally and vertically is calculatedh(i, l), Gv(i, l), calculation formula are as follows:
Step (8.2): gradient magnitude is calculated
Step (8.3): gradient direction is calculated
Step (8.4): being the submatrix of 9 same sizes by gradient magnitude matrix and gradient direction Factorization algorithm;
Step (8.5): the histogram of each gradient direction is counted, the histogram size of each gradient direction is as a feature Value.
Through the above steps, every sequence obtains x characteristic value, and 2x characteristic value is obtained in two sequences one.
CN201811366481.2A 2018-11-16 2018-11-16 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter Pending CN109801672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811366481.2A CN109801672A (en) 2018-11-16 2018-11-16 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811366481.2A CN109801672A (en) 2018-11-16 2018-11-16 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter

Publications (1)

Publication Number Publication Date
CN109801672A true CN109801672A (en) 2019-05-24

Family

ID=66556285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811366481.2A Pending CN109801672A (en) 2018-11-16 2018-11-16 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter

Country Status (1)

Country Link
CN (1) CN109801672A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081312A (en) * 2019-12-04 2020-04-28 浙江工业大学 Ligand binding residue prediction method based on multi-sequence association information
CN112242179A (en) * 2020-09-09 2021-01-19 天津大学 Method for identifying type of membrane protein

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069519A1 (en) * 2000-03-10 2006-03-30 Daiichi Pharmaceutical Co., Ltd. Method for predicting protein-protein interactions
US20110238320A1 (en) * 2010-03-24 2011-09-29 Noriko Shimba Interaction force change prediction apparatus and interaction force change prediction method
CN104615910A (en) * 2014-12-30 2015-05-13 中国科学院深圳先进技术研究院 Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein
CN107358064A (en) * 2017-07-03 2017-11-17 苏州大学 The system and method that predicted amino acid makes a variation to protein structure stability influence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069519A1 (en) * 2000-03-10 2006-03-30 Daiichi Pharmaceutical Co., Ltd. Method for predicting protein-protein interactions
US20110238320A1 (en) * 2010-03-24 2011-09-29 Noriko Shimba Interaction force change prediction apparatus and interaction force change prediction method
CN104615910A (en) * 2014-12-30 2015-05-13 中国科学院深圳先进技术研究院 Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein
CN107358064A (en) * 2017-07-03 2017-11-17 苏州大学 The system and method that predicted amino acid makes a variation to protein structure stability influence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIJIE DING等: "Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information", 《INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES》 *
YIJIE DING等: "Predicting protein-protein interactions via multivariate mutual information of protein sequences", 《BMC BIOINFORMATICS》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081312A (en) * 2019-12-04 2020-04-28 浙江工业大学 Ligand binding residue prediction method based on multi-sequence association information
CN111081312B (en) * 2019-12-04 2021-10-29 浙江工业大学 Ligand binding residue prediction method based on multi-sequence association information
CN112242179A (en) * 2020-09-09 2021-01-19 天津大学 Method for identifying type of membrane protein

Similar Documents

Publication Publication Date Title
Erisoglu et al. A new algorithm for initial cluster centers in k-means algorithm
Naim et al. SWIFT—scalable clustering for automated identification of rare cell populations in large, high‐dimensional flow cytometry datasets, Part 1: Algorithm design
CN107368700A (en) Based on the microbial diversity interaction analysis system and method for calculating cloud platform
US20130297607A1 (en) Identification of pattern similarities by unsupervised cluster analysis
WO2022126810A1 (en) Text clustering method
Matuszewski et al. Coalescent processes with skewed offspring distributions and nonequilibrium demography
WO2019080411A1 (en) Electrical apparatus, facial image clustering search method, and computer readable storage medium
Dinh et al. Statistical inference for the evolutionary history of cancer genomes
CN109801672A (en) Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter
Kuksa et al. Spatial representation for efficient sequence classification
CN109344233B (en) Chinese name recognition method
CN107480471A (en) The method for the sequence similarity analysis being characterized based on wavelet transformation
Boyeau et al. Deep generative modeling of sample-level heterogeneity in single-cell genomics
Sun et al. Fast top-K graph similarity search via representative matrices
CN107748837B (en) Method for calculating DNA identification paternity index containing gene mutation
Hunkler et al. Fast conformational clustering of extensive molecular dynamics simulation data
Xu et al. TrichomeYOLO: A Neural Network for Automatic Maize Trichome Counting
Liu et al. AntiMF: A deep learning framework for predicting anticancer peptides based on multi-view feature extraction
Yang et al. Factors analysis of protein O-glycosylation site prediction
Chang et al. Mixture models with entropy regularization for community detection in networks
CN103761433A (en) Network service resource classifying method
LU502739B1 (en) A Prediction Method of Interaction Between Multi-Information and Residue Binding Energy Protein
CN106529212B (en) Biological sequence evolution information extracting method based on sequence dependent Frequency matrix
Pouyan et al. Flow-sne: a new approach for flow cytometry clustering and visualization
Raj et al. Significance of Sequence Features in Classification of Protein–Protein Interactions Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190524

RJ01 Rejection of invention patent application after publication