CN107607723A - A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier - Google Patents

A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier Download PDF

Info

Publication number
CN107607723A
CN107607723A CN201710653339.5A CN201710653339A CN107607723A CN 107607723 A CN107607723 A CN 107607723A CN 201710653339 A CN201710653339 A CN 201710653339A CN 107607723 A CN107607723 A CN 107607723A
Authority
CN
China
Prior art keywords
mrow
protein
mfrac
matrix
protein interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710653339.5A
Other languages
Chinese (zh)
Inventor
宋晓宇
邱泽阳
孙向阳
赵阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou Jiaotong University
Original Assignee
Lanzhou Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou Jiaotong University filed Critical Lanzhou Jiaotong University
Priority to CN201710653339.5A priority Critical patent/CN107607723A/en
Publication of CN107607723A publication Critical patent/CN107607723A/en
Pending legal-status Critical Current

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention relates to field of biology, specifically a kind of protein-protein interaction assay method based on accidental projection Ensemble classifier.Comprise the following steps:The screening of A protein datas, the expression of B replacement matrixes, C discrete cosine transforms, D establish accidental projection integrated model and E model determinations.The forecast model of protein interaction is obtained according to the characteristic of division of protein interaction by the present invention, using the method for the related protein interaction of forecast model detection disease, unpredictable protein interaction is solved the problems, such as with disease associated, and then prediction protein interaction and disease associated effect.Verify and screen present invention may apply to animal, cell protein, represented by replacing matrix, discrete cosine transform is easy to post analysis establishing accidental projection integrated model, effect display expression is more accurate, and accuracy rate, sensitivity, positive predictive value and Ma Xiusi coefficient correlations measure stability and accuracy are more excellent compared with existing conventional method.

Description

Method for measuring interaction between proteins based on random projection set classification
Technical Field
The invention relates to the field of biology, in particular to a method for measuring the interaction between proteins based on random projection set classification.
Background
Protein-protein interactions (PPIs) are the result of interactions in time and space, are the basis for the realization of protein functions, and are the key to the study of cell life activities. Although the prior art discloses a method for obtaining PPIs data from organisms by using biotechnology, the prior art has the defects of low efficiency, high cost, high false positive rate and the like, obviously does not meet the technical development requirement, and the development of a method and a technology for measuring PPIs with high efficiency and low cost is urgently needed.
Disclosure of Invention
The invention solves the defects of the prior art and provides the method for determining the interaction between the proteins based on the random projection set classification, which has high efficiency, low cost and high positive rate.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for measuring protein-protein interaction based on random projection clustering classification comprises the following steps:
A. protein data screening
Screening protein interaction pairs from a protein database DIP;
B. replacement matrix representation
Using the BLOSUM62 matrix, a protein sequence of length N will produce an N × 20 matrix, the SMR matrix representing the expression: SMR (i, j) ═ B (P (i), j) i ═ 1.. N, j ═ 1.. 20,
b (P (i), j) represents the probability of mutating the amino acid i to the amino acid j, and P (i), j represents the position of the protein sequence consisting of N amino acids;
C. discrete cosine transform
The discrete cosine transform DCT formula is as follows:
wherein,
D. establishing random projection integrated model
Selecting an n X d dimensional original matrix Xn×dObtaining a low-dimensional matrix after the mapping of the original matrixThe random projection can be expressed as:wherein k is less than or equal to d.
Also included is model E assay:
determining the accuracy Acc, the sensitivity Sen, the positive predictive value PE and the Mazis correlation coefficient MCC;
step a screens protein interaction pairs from the protein database DIP, first removing proteins of less than 50 residues, second removing protein pairs with greater than 40% sequence identity among the protein pairs, and leaving the remaining protein pairs for use.
And B, selecting different cells to re-select the remaining protein pairs with the same number of pairs as a negative data set according to the step A.
The invention has the beneficial effects that:
according to the invention, the prediction model of the protein interaction is obtained according to the classification characteristics of the protein interaction, and the method for detecting the protein interaction related to the disease by adopting the prediction model solves the problem that the protein interaction and the disease correlation cannot be predicted, thereby achieving the effect of predicting the protein interaction and the disease correlation. The method can be suitable for screening animal and strain protein pairs, and the random projection integrated model is established through substitution matrix representation and discrete cosine transform, so that later analysis is facilitated, effect display expression is more accurate, and compared with the conventional method, the method has higher accuracy, sensitivity, positive predictive value and Marx correlation coefficient measurement stability and accuracy.
Drawings
FIG. 1 is a flow chart of the measurement of the present invention.
Detailed Description
A method for measuring protein-protein interaction based on random projection clustering is characterized by comprising the following steps:
A. protein data screening
Screening protein interaction pairs from a protein database DIP;
B. replacement matrix representation
Using the BLOSUM62 matrix, a protein sequence of length N will produce an N × 20 matrix, the SMR matrix representing the expression: SMR (i, j) ═ B (P (i), j) i ═ 1.. N, j ═ 1.. 20,
b (P (i), j) represents the probability of mutating the amino acid i to the amino acid j, and P (i), j represents the position of the protein sequence consisting of N amino acids;
C. discrete cosine transform
The discrete cosine transform DCT formula is as follows:
wherein,
D. establishing random projection integrated model
Selecting an n X d dimensional original matrix Xn×dObtaining a low-dimensional matrix after the mapping of the original matrixThe random projection can be expressed as:wherein k is less than or equal to d.
E. Model determination
Determining the accuracy Acc, the sensitivity Sen, the positive predictive value PE and the Mazis correlation coefficient MCC;
step a screens protein interaction pairs from the protein database DIP, first removing proteins of less than 50 residues, second removing protein pairs with greater than 40% sequence identity among the protein pairs, and leaving the remaining protein pairs for use. And B, selecting different cells to re-select the remaining protein pairs with the same number of pairs as a negative data set according to the step A.
Selecting specific protein for determination, wherein the specific scheme is as follows:
different proteins may contain amino acid sequences of different lengths, and in order to obtain uniform eigenvectors for different proteins, we use the cosine discrete transform on the transformation matrix, Sig ∈ RN×MThe input signal matrix, i.e. the above alternative matrix, selects the first 20 x 20 matrices, i.e. the first 400 coefficients of the cosine dct result as a protein feature matrix because the information after dct is concentrated in the upper left corner of the matrix. The characteristic matrix of the protein sequence "MNEDIEAYFERIGYKNSRNKL" obtained using the above method is shown in the following table.
To construct a standard dataset, the 5594 protein pairs selected were used as positive datasets, and the corresponding 5594 protein pairs without interaction were selected from different subcells to construct negative datasets. Thus, the data set used for the experiment consisted of 11188 protein pairs from the 50% positive and 50% negative data set samples.
In order to verify whether the research method is applicable to other types of Protein pairs, two datasets are constructed, the first is Human Protein, data are collected from a Human Protein References Database (HPRD) Database, Human Protein pairs with more than 25% of sequence identity are also removed, 3899 Protein pairs with correlation are obtained through screening, and the 3899 pairs are used as positive datasets. 4262 non-reactive pairs of human proteins from different subcellular groups were selected as negative data sets according to the principle that the proteins from different subcellular groups in an organism cannot interact with each other. Finally, the Human dataset was composed of 8161 protein pairs. The second data set consisted of 2916 Helicobacter pylori (Helicobacter pylori) protein pairs described by Martin et al, including 1458 Helicobacter pylori interacting pairs and 1458 Helicobacter pylori non-interacting pairs.
B. Replacement matrix representation
Using the BLOSUM62 matrix, a protein sequence of length N will produce an N × 20 matrix, the SMR matrix representing the expression: SMR (i, j) ═ B (P (i), j) i ═ 1.. N, j ═ 1.. 20,
b (P (i), j) represents the probability of mutating the amino acid i to the amino acid j, and P (i), j represents the position of the protein sequence consisting of N amino acids;
the substitution matrix method based on BLOSUM62 matrix is illustrated by taking a protein sequence "MNEDIEAYFERIGYKNSRNKL" as an example, and the following table is a BLOSUM62 matrix table,
TABLE 1 BLOSUM62 matrix table
Based on the above table, amino acid M in the amino acid sequence can be replaced with "-1-1-1213-2-3-20-2-1-1512-20-1-1" by BLOSUM matrix. The same approach as "-310-2-206100-100-2-3-3-3-3-2-4" for amino acid N gives EDIEAYFERIGYKNSRNKL substitution vectors and thus substitution matrices for the entire sequence, as shown in the following table, where each row represents one amino acid and 21 amino acids are represented by the substitution matrix as a 21 x 20 matrix.
Table 2 is a protein sequence substitution matrix table
-1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5 1 2 -2 0 -1 -1
-3 1 0 -2 -2 0 6 1 0 0 -1 0 0 -2 -3 -3 -3 -3 -2 -4
-4 0 0 -1 -1 -2 0 2 5 2 0 0 1 -2 -3 -3 -3 -3 -2 -3
-3 0 1 -1 -2 -1 1 6 2 0 -1 -2 -1 -3 -3 -4 -3 -3 -3 -4
-1 -2 -2 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4 2 1 0 -1 -3
-4 0 0 -1 -1 -2 0 2 5 2 0 0 1 -2 -3 -3 -3 -3 -2 -3
0 1 -1 -1 4 0 -1 -2 -1 -1 -2 -1 -1 -1 -1 -1 -2 -2 -2 -3
-2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7 2
-2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6 3 1
-4 0 0 -1 -1 -2 0 2 5 2 0 0 1 -2 -3 -3 -3 -3 -2 -3
-3 -1 -1 -2 -1 -2 0 -2 0 1 0 5 2 -1 -3 -2 -3 -3 -2 -3
-1 -2 -2 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4 2 1 0 -1 -3
-3 0 1 -2 0 6 -2 -1 -2 -2 -2 -2 -2 -3 -4 -4 0 -3 -3 -2
-2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7 2
-3 0 0 -1 -1 -2 0 -1 1 1 -1 2 5 -1 -3 -2 -3 -3 -2 -3
-3 1 0 -2 -2 0 6 1 0 0 -1 0 0 -2 -3 -3 -3 -3 -2 -4
-1 4 1 -1 1 0 1 0 0 0 -1 -1 0 -1 -2 -2 -2 -2 -2 -3
-3 -1 -1 -2 -1 -2 0 -2 0 1 0 5 2 -1 -3 -2 -3 -3 -2 -3
-3 1 0 -2 -2 0 6 1 0 0 -1 0 0 -2 -3 -3 -3 -3 -2 -4
-3 0 0 -1 -1 -2 0 -1 1 1 -1 2 5 -1 -3 -2 -3 -3 -2 -3
-1 -2 -2 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4 3 0 -1 -2
C. Discrete cosine transform
The Discrete Cosine Transform (DCT) is a transform defined on a signal, and the transform results in a signal in the frequency domain. DCT has a very important property (energy concentration property): most of the energy of the signal is concentrated in the low frequency part after the discrete cosine transform. DCT has thus gained widespread use in information transformation
The discrete cosine transform DCT is calculated as follows
Wherein,
different proteins may contain amino acid sequences of different lengths, and in order to obtain uniform eigenvectors for different proteins, we use the cosine discrete transform on the transformation matrix, Sig ∈ RN×MThe input signal matrix, i.e. the above alternative matrix, selects the first 20 x 20 matrices, i.e. the first 400 coefficients of the cosine dct result as a protein feature matrix because the information after dct is concentrated in the upper left corner of the matrix. Protein sequence MNE obtained by using the methodDIEAYFERIGYKNSRNKL "is shown in the following table.
Table 3 is a protein sequence feature matrix
Traversing the feature matrix according to rows to obtain a feature vector of the protein: -20.84-0.33.... -1.132.11.... -6.47-8.70... 3.79.... -0.07.
D. Establishing random projection integrated model
Random Projection (RP) is a very effective dimension reduction technique, which uses a Random Projection matrix to project high-dimensional data into a low-dimensional subspace for the purpose of dimension reduction. The ensemble learning method is to combine a plurality of models to obtain a better effect, the integrated models have stronger generalization capability, the ensemble algorithm is usually better than a single classifier, the random projection ensemble algorithm combines the two, high-dimensional data are projected for a plurality of times to obtain low-dimensional data, the low-dimensional data are trained and classified by using a basic classifier, and in the RP algorithm, the original d-dimensional data are projected to a k (k is less than or equal to d) dimensional subspace under the action of an n x d dimensional random matrix A.
Selecting an n X d dimensional original matrix Xd×nThen, a random matrix A of d multiplied by k is screened, and a low-dimensional matrix after mapping is obtainedThe random projection can be expressed as:
a specific case description random projection integration algorithm is used, a training data set is initialized to be a 50 x 100 matrix and a label vector, test data is a 100 x 100 matrix, the training set is projected, an error estimation mode is set in the projection process to ensure that a projected data block expresses original information with minimum error, in the example, an error estimation method is selected to be Leave-One-0ut (Loo method), a projection matrix is set to be 100 x 10, a 50 x 10 low-dimensional training matrix and a 100 x 10 low-dimensional test matrix can be obtained, 10 projections are carried out to obtain a plurality of low-dimensional matrixes, a basic classifier, such as a K Nearest Neighbor (KNN) classifier, is used for classifying each sample on the basis of the training data and the label vector, 10 projections are carried out on the matrix, and each sample in the training set and the test set can obtain 10 classification results, and setting a label threshold according to the classification result of the training set and the label vector, wherein in a sample, the average value of the labels exceeds the threshold, the sample belongs to the label 2, and otherwise, the sample belongs to the label 1. In this case, the average value of 10 classification results of each sample is calculated according to 50 × 10 classification results of 50 samples of the training set, and it is found that when the average value of the label 2 is mostly greater than 1.6, and the average value of the label 1 is mostly less than 1.6, the obtained threshold value is 1.6, that is, if the average value of 10 classification results of a certain sample in the test set is greater than 1.6, the sample is the label 2, otherwise, the sample is the label 1. Table 4 is a partial 50 x 100 matrix training data set
TABLE 5 training data set labels
Table 6 is a partial 100 by 100 matrix test data set
Table 7 shows the classification results after projection
Table 8 shows the final classification results
E. Model determination
We used the following parameter evaluation proposed method, accuracy Acc, sensitivity Sen, positive predictive value PE and Mazis correlation coefficient MCC determination,
wherein TP, FP, FN and TN respectively represent the number of samples of different types.
As shown in table 9 below:
experiments were performed on the 3 databases using the proposed method, and in order to avoid overfitting and stability, quintupling was used for the experiments, i.e. each database in the experiments was divided into 5 parts, of which 4/5 was the remaining 1/5 as training samples as test samples. 5 sets of experiments were performed per database, with the following results:
TABLE 10 results of yeast database experiments under quintupling cross
TABLE 11 results of human database experiments under quintupling cross
TABLE 12 results of H.pylori database experiments under quintupling cross
Comparison with SVM classifier
To evaluate the proposed method, we compared our method with the mainstream classifier Support Vector Machine (SVM), using the excellent SVM toolbox LIBSVM toolbox, where c and g parameters were optimized using a grid search method, c is 0.5, g is 0.6 and c is 0.5, g is 0.5 on yeast and human protein respectively. In the helicobacter pylori experiment, the RBF kernel function was used with c being set to 0.08 and g to 22. Experimental results the following, RPEC is the method proposed herein: and (4) a random projection integration algorithm. The experimental results can be compared that the method is better than the SVM method.
Table 13 shows the results of the SVM comparisons
Comparison with other methods we collected other prediction method results and compared our proposed method, as shown in tables 14 and 15, taking yeast protein prediction as an example, the accuracy of other methods is generally 86.15% to 94.72%. In table 15, we compare other integration algorithms with the integration algorithm proposed by us, and experimental data show that most of our reference indexes are excellent. The traditional method translation in the figure is as follows: AC (Auto Covariance) Auto-Covariance transformation feature value extraction method, RoF (Rotation Forest Classifier), LDA (Linear Analysis) linear discriminant Analysis method, RF (Random Forest Classifier) Random Forest classification method, LD (Local Protein Sequence Descriptors) Local Protein Sequence description method, ACC (Auto Covariance) Auto-Covariance transformation feature value extraction method, KNN (K New Neighbor Classifier) K classification algorithm, PR-LPQ (physical Property Matrix combining with Local Quantization Descriptor) Physicochemical Property reaction combined Local fragment Quantization description method, MAC (MAC) Auto-Covariance transformation feature value extraction method, MCD (Continuous correlation Matrix describing method-Local Multi-scale learning method, MLD (Local scale learning) Multi-scale learning method, MLD (Random Forest Classifier) Random Forest classification method, LD (Local Protein Sequence Descriptor) Local Protein Sequence description method, and MLD (Local scale learning Multi-parameter description) feature Set, CT (Cosine Transform) Cosine Transform.
TABLE 14 comparison of different prediction methods for yeast proteins
TABLE 15 comparison of different prediction methods for human proteins

Claims (4)

1. A method for measuring protein-protein interaction based on random projection clustering is characterized by comprising the following steps:
A. protein data screening
Screening protein interaction pairs from a protein database DIP;
B. replacement matrix representation
Using the BLOSUM62 matrix, a protein sequence of length N will produce an N × 20 matrix, the SMR matrix representing the expression: SMR (i, j) ═ B (P (i), j) i ═ 1.. N, j ═ 1.. 20,
b (P (i), j) represents the probability of mutating the amino acid i to the amino acid j, and P (i), j represents the position of the protein sequence consisting of N amino acids;
C. discrete cosine transform
The discrete cosine transform DCT formula is as follows:
<mrow> <mi>D</mi> <mi>C</mi> <mi>T</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>k</mi> <mi>i</mi> </msub> <msub> <mi>k</mi> <mi>j</mi> </msub> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>m</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>S</mi> <mi>i</mi> <mi>g</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mfrac> <mrow> <mi>&amp;pi;</mi> <mrow> <mo>(</mo> <mn>2</mn> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>i</mi> </mrow> <mrow> <mn>2</mn> <mi>M</mi> </mrow> </mfrac> <mo>&amp;CenterDot;</mo> <mi>cos</mi> <mfrac> <mrow> <mi>&amp;pi;</mi> <mrow> <mo>(</mo> <mn>2</mn> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>i</mi> </mrow> <mrow> <mn>2</mn> <mi>N</mi> </mrow> </mfrac> <mo>,</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>&amp;le;</mo> <mi>M</mi> <mo>,</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> </mrow>
wherein,
D. establishing random projection integrated model
Selecting an n X d dimensional original matrix Xn×dObtaining a low-dimensional matrix after the mapping of the original matrixThe random projection can be expressed as:wherein k is less than or equal to d.
2. The method of claim 1, further comprising an E-model test of the interaction between proteins in the classes based on stochastic projection:
determining the accuracy Acc, the sensitivity Sen, the positive predictive value PE and the Mazis correlation coefficient MCC;
<mrow> <mi>A</mi> <mi>c</mi> <mi>c</mi> <mo>=</mo> <mfrac> <mrow> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>T</mi> <mi>N</mi> </mrow> <mrow> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>F</mi> <mi>P</mi> <mo>+</mo> <mi>T</mi> <mi>N</mi> <mo>+</mo> <mi>F</mi> <mi>N</mi> </mrow> </mfrac> </mrow>
<mrow> <mi>S</mi> <mi>e</mi> <mi>n</mi> <mo>=</mo> <mfrac> <mrow> <mi>T</mi> <mi>P</mi> </mrow> <mrow> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>F</mi> <mi>N</mi> </mrow> </mfrac> </mrow>
<mrow> <mi>P</mi> <mi>E</mi> <mo>=</mo> <mfrac> <mrow> <mi>T</mi> <mi>P</mi> </mrow> <mrow> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>F</mi> <mi>P</mi> </mrow> </mfrac> </mrow>
<mrow> <mi>M</mi> <mi>C</mi> <mi>C</mi> <mo>=</mo> <mfrac> <mrow> <mi>T</mi> <mi>P</mi> <mo>&amp;times;</mo> <mi>T</mi> <mi>N</mi> <mo>-</mo> <mi>F</mi> <mi>P</mi> <mo>&amp;times;</mo> <mi>F</mi> <mi>N</mi> </mrow> <msqrt> <mrow> <mo>(</mo> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>F</mi> <mi>N</mi> <mo>)</mo> <mo>&amp;times;</mo> <mo>(</mo> <mi>T</mi> <mi>N</mi> <mo>+</mo> <mi>F</mi> <mi>P</mi> <mo>)</mo> <mo>&amp;times;</mo> <mo>(</mo> <mi>T</mi> <mi>P</mi> <mo>+</mo> <mi>F</mi> <mi>P</mi> <mo>)</mo> <mo>&amp;times;</mo> <mo>(</mo> <mi>T</mi> <mi>N</mi> <mo>+</mo> <mi>F</mi> <mi>N</mi> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>
3. the method of claim 1, wherein step a is performed to select protein interaction pairs from the protein database DIP, wherein less than 50 residues of protein are removed, protein pairs with sequence identity greater than 40% are removed, and the remaining protein pairs are kept for use.
4. The method according to claim 3, wherein the remaining protein pairs are selected as positive data sets, and different cells are selected to re-select the same number of remaining protein pairs as negative data sets according to step A.
CN201710653339.5A 2017-08-02 2017-08-02 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier Pending CN107607723A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710653339.5A CN107607723A (en) 2017-08-02 2017-08-02 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710653339.5A CN107607723A (en) 2017-08-02 2017-08-02 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Publications (1)

Publication Number Publication Date
CN107607723A true CN107607723A (en) 2018-01-19

Family

ID=61064844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710653339.5A Pending CN107607723A (en) 2017-08-02 2017-08-02 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Country Status (1)

Country Link
CN (1) CN107607723A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280236A (en) * 2018-02-28 2018-07-13 福州大学 A kind of random forest visualization data analysing method based on LargeVis
CN111916148A (en) * 2020-08-13 2020-11-10 中国计量大学 Method for predicting protein interaction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096636A (en) * 2016-05-31 2016-11-09 安徽工业大学 A kind of Advancement Type mild cognition impairment recognition methods based on neuroimaging
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096636A (en) * 2016-05-31 2016-11-09 安徽工业大学 A kind of Advancement Type mild cognition impairment recognition methods based on neuroimaging
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHIBIAO WAN ET AL.: "Ensemble Random Projection for Multi-label Classification with Application to Protein Subcellular Localization", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 *
YU-AN HUANG ET AL.: "Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence", 《BIOMED RESEARCH INTERNATIONAL》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280236A (en) * 2018-02-28 2018-07-13 福州大学 A kind of random forest visualization data analysing method based on LargeVis
CN108280236B (en) * 2018-02-28 2022-03-15 福州大学 Method for analyzing random forest visual data based on LargeVis
CN111916148A (en) * 2020-08-13 2020-11-10 中国计量大学 Method for predicting protein interaction
CN111916148B (en) * 2020-08-13 2023-01-31 中国计量大学 Method for predicting protein interaction

Similar Documents

Publication Publication Date Title
Erisoglu et al. A new algorithm for initial cluster centers in k-means algorithm
EP3657392A1 (en) Image feature acquisition
Galluccio et al. Graph based k-means clustering
Lall et al. Structure-aware principal component analysis for single-cell RNA-seq data
CN103942562B (en) Hyperspectral image classifying method based on multi-classifier combining
CN103064941B (en) Image search method and device
CN108596154A (en) Classifying Method in Remote Sensing Image based on high dimensional feature selection and multi-level fusion
CN105046323B (en) Regularization-based RBF network multi-label classification method
Mukhopadhyay Large-scale mode identification and data-driven sciences
CN108985161B (en) Low-rank sparse representation image feature learning method based on Laplace regularization
Thomas et al. Enhancing classification of mass spectrometry imaging data with deep neural networks
CN113724195B (en) Quantitative analysis model and establishment method of protein based on immunofluorescence image
US20220414108A1 (en) Classification engineering using regional locality-sensitive hashing (lsh) searches
Salman et al. Gene expression analysis via spatial clustering and evaluation indexing
CN107607723A (en) A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier
Wang et al. Structured sparse multi-view feature selection based on weighted hinge loss
CN112085245A (en) Protein residue contact prediction method based on deep residual error neural network
Maji et al. Multimodal Omics Data Integration Using Max Relevance--Max Significance Criterion
CN111048145A (en) Method, device, equipment and storage medium for generating protein prediction model
CN114118292B (en) Fault classification method based on linear discriminant neighborhood preserving embedding
Wong et al. A probabilistic mechanism based on clustering analysis and distance measure for subset gene selection
Zhen et al. A novel framework for single-cell hi-c clustering based on graph-convolution-based imputation and two-phase-based feature extraction
Arcolano et al. Nyström approximation of Wishart matrices
Toussi et al. Feature selection in spectral clustering
Wang et al. Prediction of Protein‐Protein Interactions from Protein Sequences by Combining MatPCA Feature Extraction Algorithms and Weighted Sparse Representation Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180119

RJ01 Rejection of invention patent application after publication