CN108804870B - Markov random walk-based key protein identification method - Google Patents

Markov random walk-based key protein identification method Download PDF

Info

Publication number
CN108804870B
CN108804870B CN201810499870.6A CN201810499870A CN108804870B CN 108804870 B CN108804870 B CN 108804870B CN 201810499870 A CN201810499870 A CN 201810499870A CN 108804870 B CN108804870 B CN 108804870B
Authority
CN
China
Prior art keywords
proteins
protein
values
formula
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810499870.6A
Other languages
Chinese (zh)
Other versions
CN108804870A (en
Inventor
刘维
马良玉
陈昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201810499870.6A priority Critical patent/CN108804870B/en
Publication of CN108804870A publication Critical patent/CN108804870A/en
Application granted granted Critical
Publication of CN108804870B publication Critical patent/CN108804870B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention aims to provide a key protein identification method based on Markov random walk, and belongs to the technical field of biological information. A key protein identification method based on Markov random walk comprises the following steps: using Markov random walk idea, assigning a score representing the importance degree of each vertex, wherein the scores of all the vertices form a vector of n columns, giving an initial value of the score, and allowing the score to randomly walk in the network according to a certain probability and modify in transmission; finally, the values are arranged from large to small according to the values, and the values corresponding to the output valueskThe final result is the individual protein. The invention integrates biological attributes and topological characteristics, improves the accuracy of identifying key proteins, simultaneously enables prediction results to be more accurate, and improves the prediction efficiency.

Description

Markov random walk-based key protein identification method
Technical Field
The invention belongs to the technical field of biological information, mainly relates to a technology for identifying key protein by a Markov random walk algorithm in a protein interaction network, and particularly relates to a method for identifying key protein by network topology information and protein biological attributes in a PPI network.
Background
Proteins are indispensable substances in life activities, almost participate in all cycles of life activities, key proteins play an indispensable role in the process, and the absence of key proteins can cause that a living body cannot survive. Therefore, identification of key proteins in PPI networks not only helps understanding the growth regulation process of cells, but also can help research on the mechanism of biological evolution. In addition, in the biomedical field, the identification of key proteins is of great importance in disease treatment and the design of drug target cells.
Before the present invention is proposed, the field of identification of key proteins is initially identified by topological features of networks, for example, degree-centrality (DC), betweenness-centrality (BC), Local Average Connectivity (LAC), Li, etc. fused PPI and gene expression data, the centrality measure method PeC is proposed, and Zhang, etc. fused PPI network topological features and gene co-expression information, the CoEWC method is proposed, but these methods have the disadvantages of identifying key proteins: (1) only the topological characteristics of the network are considered, and the inherent biological characteristics of the protein are ignored. (2) PPI networks obtained by biological experiments are noisy, so that protein interaction data are false positive.
Disclosure of Invention
The invention aims to overcome the defects and develop a key protein identification method based on Markov random walk. The key protein identification method based on Markov random walk uses the idea of Markov random walk to assign a score representing the importance degree of each vertex, the scores of all the vertices form a vector of n columns, an initial value of the score is given, and the score is randomly walked in a network according to a certain probability and is modified in transmission. And finally, arranging the values from large to small, and outputting k proteins corresponding to the values, namely the final result.
The key protein identification method based on Markov random walk is mainly technically characterized by comprising the following steps of:
(1) inputting a PPI network and biological information;
(2) calculating the weight q between the protein vertexes according to the attribute values and the edge weights of the protein vertexes, and constructing a weight matrix;
(3) normalizing all the attribute values to construct an attribute matrix;
(4) constructing a transfer matrix according to the interaction relation between the protein vertexes;
(5) iterating to obtain a score vector r according to a PageRank algorithm, and determining a return probability P through the attributes of the vertexes;
(6) obtaining a target function, optimizing the target function, and performing iterative update on the initial values r and q by using a gradient descent formula;
(7) obtaining the post-iteration r(t)=(r1,r2,…,rn) The values of (a) are sorted from large to small, and the largest k values after sorting are key proteins.
Calculating the weight q between the protein vertexes according to the attribute values and the edge weights of the protein vertexes, and constructing a weight matrix: according to the PPI network, the weight between the proteins is obtained through the similarity of common neighbors between the proteins, expression similarity and GO semantic similarity.
And (3) normalizing all the attribute values to construct an attribute matrix, wherein the attribute values are all included in the range of (0,1) by a Z-Score or normalization method, and all the vertex attribute vectors form the attribute matrix.
The method has the advantages and effects that not only the topological characteristics of the protein interaction network are considered, but also the biological attributes of the protein are considered, and further the negative influence caused by high data noise is overcome. The accuracy of identifying the key protein is improved by fusing biological attributes and topological characteristics, the prediction result is more accurate, and the prediction efficiency is improved. The application range and the practicability of the technology in the field of biological information are expanded.
Drawings
FIG. 1 is a schematic flow chart of the Markov random walk-based key protein identification method of the present invention.
FIG. 2a is a graph comparing the number of key proteins in the first 100 proteins of the present invention;
FIG. 2b is a graph comparing the number of key proteins in the first 200 proteins of the present invention;
FIG. 2c is a graph comparing the number of key proteins in the first 300 proteins of the present invention;
FIG. 2d is a graph comparing the number of key proteins in the first 400 proteins of the present invention;
FIG. 2e is a graph comparing the number of key proteins in the first 500 proteins of the present invention;
FIG. 2f is a graph comparing the number of key proteins in the first 600 proteins of the present invention;
FIG. 3 is a graph comparing the statistical indicator results of the present invention with other methods.
Detailed Description
The technical idea of the invention is as follows: combining biological attributes and topological characteristics, using the idea of Markov random walk, assigning a score representing the importance degree of each vertex, forming a vector of n columns by the scores of all the vertices, giving an initial value of the score, and allowing the score to randomly walk in the network according to a certain probability and modify the score in transmission. Firstly, obtaining the weight between proteins according to the similarity of common neighbors, expression similarity and GO semantic similarity, obtaining a weight matrix, and forming an attribute matrix according to all vertex attribute vectors. Second, transition probabilities between pairs of vertices are obtained by protein interaction relationships, thereby obtaining a transition matrix. And finally, obtaining an objective function, optimizing the objective function, and finally identifying the key protein. The fusion of biological attributes and topological characteristics is helpful for understanding the functions of unknown proteins, has important significance for explaining the molecular mechanism of specific functions, and can provide important theoretical basis for the design of drug target cells and the like. The key protein identification method based on Markov random walk is naturally applicable to the detection of key proteins.
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the method for identifying key proteins based on Markov random walk comprises the following steps:
step 1: inputting a PPI network and biological information;
step 2: calculating the weight q between the protein vertexes according to the attribute values and the edge weights of the protein vertexes, and constructing a weight matrix;
common neighbor similarity (NTE): the topological features of protein interaction networks have irreplaceable positions in the identification of key proteins, which are more likely to be present in clusters in the network according to the "center-lethality" rule, and therefore, we used common neighbor similarity (NTE) as one of the indicators to measure the criticality of proteins. In undirected graph G (V, E), the mutual neighbor similarity between proteins u and V is expressed as:
NTE(u,v)=|Cu∩Cv< 1 > +1 type (1)
Wherein, Cu(or C)v) A set of neighbors representing node u (or v) in the PPI network; i Cu∩CvAnd | represents the number of common neighbors of the nodes u and v, namely the number of triangles to which the edge belongs. The results are all made larger than 0 by adding "1" after the results, thereby avoiding problems during the specification.
Gene Expression Similarity (GES): since gene expression data are easier to obtain and widely used in the field of identification of key proteins, and co-expressed genes are more likely to become key proteins, we use Gene Expression Similarity (GES) as an index for measuring key proteins. The formula we used to calculate the similarity of gene expression between proteins u and v is as follows:
Figure GDA0003248770270000031
wherein s is the number of samples in the gene expression data, U and V are the gene codes for the corresponding proteins U, V, UiAnd ViRespectively representing the expression levels of genes U and V in the corresponding sample i,
Figure GDA0003248770270000032
and
Figure GDA0003248770270000033
is the average of the expression levels of genes U and V, σ (U) and σ (V) represent the standard deviation of the expression level of gene U, V, respectively.
GO semantic Similarity (GOs): gene Ontology (GO) provides a canonical, accurate set of terms for information about the molecular function, biological processes, and cellular components of genes (gene products). Measuring the semantic similarity of GO terms is an important aspect of GO applications, the semantic similarity of GO is to reveal the functional similarity of genes based on their biological features, and two key proteins linked are more likely to participate in the same biological process. In recent years, many scholars propose a GO semantic similarity measurement method, and the GO semantic similarity is calculated by adopting a Lin method, and the method is characterized in that: first, normalization of the sum of the information quantities of two concepts to be compared; second, assume that the two concepts to be compared are independent. The following formula is used to define the semantic similarity of GO between proteins u and v:
Figure GDA0003248770270000034
Figure GDA0003248770270000041
wherein gene U, V encodes interacting proteins u and v, c1、c2GO term, S (c), for gene U, V, respectively1,c2) Is node c1、c2The set of nearest common ancestor nodes of (c), the instance probability of the variable c being P (c), PmsIs the probability that their common nearest ancestor occurs.
Weight (w) between proteins in PPI networksij) The similarity between the two can be obtained, and the specific calculation formula is as follows:
wij=a1NTE(i,j)+a2GES(i,j)+a3GOS (i, j) formula (5)
Wherein the parameter a1、a2、a3In the range of (0,1), and the sum is 1.
Matrix W ═ Wij]Weight matrix, w, for PPI networkijIs an edge (v)i,vj) The weight of (c):
Figure GDA0003248770270000042
and step 3: normalizing all the attribute values to construct an attribute matrix;
normalizing all attribute values (the attribute values can be totally included in the range of (0,1) by using a Z-Score or normalization method), and forming an attribute matrix B ═ B by using all vertex attribute vectorsij]nxm
And 4, step 4: constructing a transfer matrix according to the interaction relation between the protein vertexes;
given the constant k < n, the k proteins of greatest importance in G, i.e., Top-k, are found and referred to as key proteins. We adopt Markov random walkFor each vertex viAssigning a score representing its degree of importance
Figure GDA0003248770270000043
The score values of all the vertexes form a score vector
Figure GDA0003248770270000044
Is a column vector of n x 1, gives an initial value of r, lets the score wander in the network and make modifications in the delivery according to a certain probability. From viIs transmitted to vjThe probability of (c) is defined as:
Figure GDA0003248770270000045
thus, the transition probabilities between all the point pairs form an n × n transition matrix P ═ Pij]。
And 5: iterating to obtain a score vector r according to a PageRank algorithm, and determining a return probability P through the attributes of the vertexes;
in the conventional random walk based PageRank algorithm, the score vector r is updated with the following iterations:
r(k+1)=αPTr(k)+(1-α)P0formula (8)
Wherein alpha is a constant, alpha belongs to (0,1), P0And epsilon (0,1) is a constant and is the probability that the wandering particle returns to the original starting place. In the algorithm proposed in this chapter, we use the attribute b of the vertexiTo decide the return probability P0Is provided with
Figure GDA0003248770270000047
Here, the
Figure GDA0003248770270000048
Is a m × 1 column vector, qjIs the weight of the jth attribute, so the formula is:
Figure GDA0003248770270000051
let the function be (10) and r(k+1)Square error of (d):
Figure GDA0003248770270000052
we solve r, q so that J (r, q) is minimized, i.e. solve the following optimization problem:
Figure GDA0003248770270000053
the constraint r > 0, q > 0 means that all scores in r, q are positive.
Step 6: obtaining a target function, optimizing the target function, and performing iterative update on the initial values r and q by using a gradient descent formula;
after the objective function is obtained, we start optimizing the objective function. First, the partial derivatives of J for r, q are calculated:
Figure GDA0003248770270000054
from the formula (11):
Figure GDA0003248770270000055
from equation (13):
Figure GDA0003248770270000056
Figure GDA0003248770270000057
from the above gradient, for an initial value r(0),q(0)We iteratively update using the gradient descent formula:
Figure GDA0003248770270000058
Figure GDA0003248770270000059
where ρ is the total number of iterations.
And 7: obtaining the post-iteration r(t)=(r1,r2,…,rn) The values of (a) are sorted from large to small, and the largest k values after sorting are key proteins.
Example (b):
to verify the performance of the algorithm EPM proposed in this chapter, we compared the number of key proteins identified with the other five methods (DC, BC, LAC, PeC and CoEWC). For each method, the protein identification results of top100, top200, top300, top400, top500 and top600 are selected as candidate sets, and the intersection of the proteins in each candidate set and the standard key protein set is calculated, so that the number of the real key proteins in the candidate set is obtained.
As can be seen from fig. 2a, 2b, 2c, 2d, 2e, and 2f, in the yeast PPI network, the algorithm EPM proposed by us can achieve better effect on identifying key proteins than other methods, and when key proteins of top100, top200, top300, top400, top500, and top600 are taken as candidate sets, the amount of proteins identified by the algorithm proposed in this chapter is significantly higher than that identified by other methods. Compared with the PeC method, the accuracy of EPM is respectively improved by 16.4%, 18.8%, 19.5%, 19.4%, 20.5% and 22.6% when the top100, top200, top300, top400, top500 and top600 proteins are taken.
To further demonstrate the advantage of EPM in predicting key proteins, we attempted to analyze EPM on a smaller dataset (taking top200 protein) from other methods. We found the proteins that overlap with EPM among the 200 proteins and performed critical analysis on the remaining proteins as shown in table 1.
TABLE 1 comparative analysis of key protein amounts
Figure GDA0003248770270000061
Table 1 analyzes the quantitative comparison of key and non-key proteins identified in the top200 dataset by EPM and 5 other methods. Wherein M isiRepresent other 5 centrality methods, | EPM ≧ M @, for comparisoniI is the amount of overlap of EPM with key proteins identified by other methods, | MiEPM | represents a passing of MiThe number of key proteins identified that EPM fails to identify, similarly, | EPM-MiI indicates that EPM recognizes and MiNumber of key proteins not identified. It is clear from the table that the number of key proteins identified by EPM but not by other methods is significantly greater than the number of key proteins identified by other methods but by EPM, while the number of non-key proteins identified by EPM is also significantly less than by other methods. These results show that the EPM algorithm considers topology and more biological information to effectively improve the prediction result of key proteins.
To further evaluate the performance of EPM methods in key protein prediction, we compared it with five other centrality methods, we introduced statistical performance evaluation methods, including 6 evaluation indices, sensitivity (sn), specificity (sp), Positive Predictive Value (PPV), Negative Predictive Value (NPV), F-evaluation F-measure (F) and accuracy (acc), which are defined as follows:
SN represents the proportion of the key protein that is correctly predicted.
Figure GDA0003248770270000071
SP indicates the proportion of non-critical proteins that are correctly excluded.
Figure GDA0003248770270000072
PPV indicates the proportion of correctly identified key proteins.
Figure GDA0003248770270000073
NPV represents the proportion of excluded proteins that are correctly predicted to be non-critical proteins.
Figure GDA0003248770270000074
F denotes the harmonic mean of sensitivity and positive predictive value.
Figure GDA0003248770270000075
ACC denotes the proportion of correct results among all recognition results.
Figure GDA0003248770270000076
Wherein tp (true positivity) means the amount of key protein correctly identified as key protein; FP (false positives) represents the number of non-key proteins that are misidentified as key by the algorithm; TN (true negotives) means the number of non-critical proteins identified as non-critical proteins that are truly negative; FN (false negatives) indicates the number of critical proteins that were incorrectly identified as non-critical proteins. The larger the values of the above six indexes are, the better the recognition performance of the algorithm is.
As can be seen from fig. 3, each of the 6 indexes of the EPM is significantly higher than any of the other five centrality measures, and compared with DC, BC and LAC methods based on network topology, the accuracy of the EPM is significantly higher, and compared with PeC method in which gene expression data is incorporated, the algorithm in this chapter can still obtain higher accuracy.

Claims (2)

1. The key protein identification method based on Markov random walk is characterized by comprising the following steps:
(1) inputting a PPI network, biological information and the number k of key proteins to be obtained;
(2) calculating the weight q between the protein vertexes according to the attribute values and the edge weights of the protein vertexes, and constructing a weight matrix; according to the PPI network, the weight between protein vertexes is obtained through the similarity of common neighbors, the similarity of gene expression and the similarity of GO semantics;
the common neighbor similarity is expressed as:
NTE(u,v)=|Cu∩Cv< 1 > +1 type (1)
Wherein, CuSet of neighbors representing node u in PPI network, CvA set of neighbors representing node v in the PPI network; i Cu∩CvL represents the number of common neighbors of the nodes u and v, namely the number of triangles to which the edges belong;
the formula for calculating the similarity of gene expression between proteins u and v is as follows:
Figure FDA0003196841870000011
wherein s is the number of samples in the gene expression data, U and V are the gene codes for the corresponding proteins U, V, UiAnd ViRespectively representing the expression levels of the gene codes U and V in the corresponding sample i,
Figure FDA0003196841870000012
and
Figure FDA0003196841870000013
is the average of the expression levels of genes encoding U and V, then σ (U) and σ (V) represent the standard deviation of the expression level of gene U, V, respectively;
calculating the semantic similarity of GO by adopting a Lin method:
Figure FDA0003196841870000014
Figure FDA0003196841870000015
wherein the genes encode U, V interacting proteins u and v, c1、c2GO term, S (c), for gene coding U, V, respectively1,c2) Is node c1、c2The set of nearest common ancestor nodes of (c), the instance probability of the variable c being P (c), PmsIs node c1、c2Probability of occurrence of common nearest ancestors;
protein v in PPI networksiAnd vjWeight w of biological similarity therebetweenijThe specific calculation formula of (2) is as follows:
wij=a1NTE(i,j)+a2GES(i,j)+a3GOS (i, j) formula (5)
Wherein the parameter a1、a2、a3In the range of (0,1), and a1、a2、a3The sum is 1;
matrix W ═ Wij]Weight matrix, w, for PPI networkijIs an edge (v)i,vj) The weight of (c):
Figure FDA0003196841870000016
(3) normalizing all the attribute values to construct an attribute matrix;
(4) constructing a transfer matrix according to the interaction relation between the protein vertexes;
and (4) according to the interaction relation between the protein vertexes, the method for constructing the transfer matrix comprises the following steps:
a constant k < n is given, and k proteins with the greatest importance, namely Top-k, are found in the PPI network and are called key proteins; the idea of Markov random walk is adopted to carry out the random walk on each vertex viAssigning a score r representing its degree of importancei (0)The score values of all the vertexes form a score vector
Figure FDA0003196841870000021
The column vector is n multiplied by 1, an initial value of r is given, and the score is walked in the network and modified in the transmission according to a certain probability; from viIs transmitted to vjThe probability of (c) is defined as:
Figure FDA0003196841870000022
thus, the transition probabilities between all the point pairs form an n × n transition matrix P ═ Pij];
(5) Iterating to obtain a score vector r according to a PageRank algorithm, and determining a return probability P through the attributes of the vertexes;
and (5) iterating to obtain a score vector r according to a PageRank algorithm, and determining a return probability P through the attributes of the vertexes by the specific method:
in the conventional random walk based PageRank algorithm, the score vector r is updated with the following iterations:
r(t+1)=αPΤr(t)+(1-α)P0formula (8)
Wherein alpha is a constant, alpha belongs to (0,1), P0The epsilon (0,1) is a constant and is the probability of returning the wandering particles to the original departure place; by attribute b of the vertexiTo decide the return probability P0Is provided with
Figure FDA0003196841870000023
Here, the
Figure FDA0003196841870000024
Is a m × 1 column vector, qjIs the weight of the jth attribute, so the formula is:
r(t+1)=αPΤr(t)+(1-α)P0=αPΤr(t)+(1-α)B·q(t)formula (10)
The function is given by the formula (10) and r(t+1)Square error of (d):
Figure FDA0003196841870000025
solving r, q so that J (r, q) is minimized, i.e. solving the following optimization problem:
Figure FDA0003196841870000026
the constraint condition r is more than 0, and q is more than 0, which means that all scores in r and q are positive numbers;
(6) obtaining a target function, optimizing the target function, and performing iterative update on the initial values of r and q by using a gradient descent formula;
the step (6) of obtaining the objective function and optimizing the objective function, wherein the method for iteratively updating the initial values of r and q by using a gradient descent formula comprises the following steps:
after the objective function is obtained, optimizing the objective function: initial value r based on r and q(0)、q(0)First, we find the partial derivatives of J (r, q) with respect to r, q:
Figure FDA0003196841870000031
from the formula (11):
Figure FDA0003196841870000032
from equation (13):
Figure FDA0003196841870000033
Figure FDA0003196841870000034
from the above gradient, for an initial value r(0),q(0)Iteratively updating using a gradient descent formula:
Figure FDA0003196841870000035
Figure FDA0003196841870000036
wherein rho is the total number of iterations;
(7) obtaining the post-iteration r(t)=(r1,r2,···,rn) The values of (a) are sorted from large to small, and the largest k values after sorting are key proteins.
2. The Markov random walk-based key protein identification method according to claim 1, wherein the step (3) normalizes all attribute values, and the method for constructing the attribute matrix is as follows: all the attribute values are brought into the range of (0,1) through a Z-Score or normalization method, and all the vertex attribute vectors form an attribute matrix.
CN201810499870.6A 2018-05-23 2018-05-23 Markov random walk-based key protein identification method Active CN108804870B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810499870.6A CN108804870B (en) 2018-05-23 2018-05-23 Markov random walk-based key protein identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810499870.6A CN108804870B (en) 2018-05-23 2018-05-23 Markov random walk-based key protein identification method

Publications (2)

Publication Number Publication Date
CN108804870A CN108804870A (en) 2018-11-13
CN108804870B true CN108804870B (en) 2021-11-19

Family

ID=64091419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810499870.6A Active CN108804870B (en) 2018-05-23 2018-05-23 Markov random walk-based key protein identification method

Country Status (1)

Country Link
CN (1) CN108804870B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637579B (en) * 2018-12-18 2022-04-15 长沙学院 Tensor random walk-based key protein identification method
CN110660448B (en) * 2019-09-20 2022-02-01 长沙学院 Key protein identification method based on topological and functional characteristics of protein
CN110910952B (en) * 2019-11-21 2023-05-12 衡阳师范学院 Method for predicting basic protein by using chemical reaction strategy
CN113436729A (en) * 2021-07-08 2021-09-24 湖南大学 Synthetic lethal interaction prediction method based on heterogeneous graph convolution neural network
CN113936743B (en) * 2021-11-12 2024-04-26 大连海事大学 Protein complex identification method based on heterogeneous PPI network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868582A (en) * 2016-03-25 2016-08-17 陕西师范大学 A method of identifying protein compounds by using a fruit fly optimization method
CN106355044A (en) * 2016-08-15 2017-01-25 上海电机学院 Protein composite identification method based on random walking model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868582A (en) * 2016-03-25 2016-08-17 陕西师范大学 A method of identifying protein compounds by using a fruit fly optimization method
CN106355044A (en) * 2016-08-15 2017-01-25 上海电机学院 Protein composite identification method based on random walking model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于随机游走的迭代加权子图查询算法;张小驰 等;《计算机研究与发展》;20151215;第52卷(第12期);第2824-2833页 *
基于PPI网络的关键蛋白质的高效预测算法;洪海燕 等;《计算机科学》;20161115;第43卷(第11A期);第16-20,25页 *

Also Published As

Publication number Publication date
CN108804870A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108804870B (en) Markov random walk-based key protein identification method
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
CN113705772A (en) Model training method, device and equipment and readable storage medium
Wu et al. Sode: Self-adaptive one-dependence estimators for classification
CN104992078B (en) A kind of protein network complex recognizing method based on semantic density
CN113488104B (en) Cancer driving gene prediction method and system based on local and global network centrality analysis
Wang et al. Ppisb: a novel network-based algorithm of predicting protein-protein interactions with mixed membership stochastic blockmodel
Kumar et al. Future of machine learning (ML) and deep learning (DL) in healthcare monitoring system
AlJadda et al. Pgmhd: A scalable probabilistic graphical model for massive hierarchical data problems
Ma et al. Improving uncertainty calibration of deep neural networks via truth discovery and geometric optimization
Hussain et al. Clustering uncertain graphs using ant colony optimization (ACO)
CN114420201A (en) Method for predicting interaction of drug targets by efficient fusion of multi-source data
Wang et al. scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation
CN111898039B (en) Attribute community searching method integrating hidden relations
CN111128292A (en) Key protein identification method based on protein clustering characteristic and activity co-expression
Uyar et al. The analysis and optimization of CNN Hyperparameters with fuzzy tree modelfor image classification
Renjith et al. An empirical research and comparative analysis of clustering performance for processing categorical and numerical data extracts from social media
CN112651590B (en) Instruction processing flow recommending method
Chen et al. Community Detection Based on DeepWalk Model in Large‐Scale Networks
CN114970684A (en) Community detection method for extracting network core structure by combining VAE
Vanitha et al. Detection and diagnosis of hepatitis virus infection based on human blood smear data in machine learning segmentation technique
Kermani et al. Integrating graph structure information and node attributes to predict protein-protein interactions
Mansouri et al. A new algorithm for hidden Markov models learning problem
Juneja et al. Context aware clustering using glove and K-means
DevidasMandaokar et al. Optimized Clustering Algorithm with Swarm Intelligence for Large Dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant