CN109509509B - Protein compound mining method based on dynamic weighted protein interaction network - Google Patents

Protein compound mining method based on dynamic weighted protein interaction network Download PDF

Info

Publication number
CN109509509B
CN109509509B CN201811145616.2A CN201811145616A CN109509509B CN 109509509 B CN109509509 B CN 109509509B CN 201811145616 A CN201811145616 A CN 201811145616A CN 109509509 B CN109509509 B CN 109509509B
Authority
CN
China
Prior art keywords
protein
proteins
network
nodes
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811145616.2A
Other languages
Chinese (zh)
Other versions
CN109509509A (en
Inventor
毛伊敏
朱海湾
胡健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Science and Technology
Original Assignee
Jiangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Science and Technology filed Critical Jiangxi University of Science and Technology
Priority to CN201811145616.2A priority Critical patent/CN109509509B/en
Publication of CN109509509A publication Critical patent/CN109509509A/en
Application granted granted Critical
Publication of CN109509509B publication Critical patent/CN109509509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a protein complex mining method based on a dynamic weighted protein interaction network, comprising the steps of: filtering inactive proteins by using gene expression profile data to construct a dynamic protein interaction network, weighting the dynamic protein interaction network by using comprehensive weight measurement and adding new interaction, thereby constructing a dynamic weighted protein interaction network; utilizing the criticality of the protein and the intrinsic properties of the complex to construct a protein complex core; the picking-up rule of the ant colony algorithm is improved by adopting a similarity function of fuzzy granularity, and the laying-down rule is optimized by utilizing compactness, so that the mining of a protein compound is realized; the local weight updating strategy is utilized to realize the transmission of the optimal solution information among different ant colonies, and the global weight updating strategy is utilized to realize the transmission of the function information among the dynamic weighted protein interaction networks at adjacent moments; and outputting the excavated protein complex.

Description

Protein compound mining method based on dynamic weighted protein interaction network
Technical Field
The disclosure relates to the field of system biology, in particular to a protein complex mining method based on a dynamic weighted protein interaction network.
Background
Proteins are the basis for maintaining all vital activities, and their functions are generally expressed by the interaction between proteins. In a living body, a network formed by the interaction of several protein complexes is called a protein-protein interaction (PPI) network, and the protein complexes are a collection of proteins that together perform a certain function in the same space and at the same time. The research on the interaction of proteins and the identification of modules with significance in the PPI network, such as protein complexes and functional modules, can help people to understand the process of life activities and predict proteins with unknown functions, and also provides a theoretical basis for disease diagnosis and drug development, so that the efficient protein complex detection method is still one of the most important challenges in the post-genome era on the background that interaction data generally has higher false positive and false negative, and the rapid and effective mining of protein complexes has very important significance in disclosing the basic principle of cell composition and function, researching the position of proteins in the metabolic pathways of organisms, deeply understanding the behaviors of organisms, drug design and the like.
Currently, biological assay methods for identifying protein complexes are time consuming, costly, and not universal for all species. Therefore, an effective protein complex mining method based on a calculation method is urgently needed to reduce the cost of the experiment and improve the experiment efficiency.
With the increasing perfection of high-throughput PPI data and protein data, a large number of learners gradually turn to the research of complex mining based on calculation, and a plurality of traditional mining algorithms, such as a density-based molecular complex prediction algorithm MCODE, a partition-based proximity search clustering algorithm RNSC, a hierarchy-based jerera algorithm and the like, are also provided. However, these algorithms have certain disadvantages, some algorithms have poor effects on sparse networks, some algorithms cannot detect overlapped compounds, some algorithms are sensitive to noise, and the like. In recent years, researchers have proposed new complex detection methods, such as a detection method based on flow simulation, a detection method based on a core-attachment structure, a spectral clustering algorithm, a group intelligence algorithm, and the like. However, the clustering result of the functional flow algorithm is greatly influenced by given parameters, the clustering method based on the core-auxiliary structure has high complexity and is not suitable for a large-scale PPI network, and the spectral clustering algorithm returns to the traditional clustering method after the dimension reduction of data. The swarm intelligence optimization algorithm has strong global optimization capability and strong robustness. Especially, the ant colony algorithm has unique advantages compared with other colony intelligent algorithms, the ant colony algorithm can directly cluster without other clustering algorithms, and the advantages of the colony intelligent algorithm can be fully exerted. At present, the ant colony algorithm is successfully applied to PPI network complex and functional module mining, and becomes a new research hotspot in the field. Liu Shi et al proposes an ant colony optimized PPI network function module detection algorithm NACO-FMD, and the method designs a more purposeful function to guide ant colony optimization and obtain a better clustering effect. Liuhongxin provides an ant colony clustering functional module detection algorithm ACC-FMD, the method clusters nodes by picking up and putting down a model, updates a similarity function by an optimal solution, enables a clustering result to tend to be optimal by continuous iteration, and finally merges and filters the clustering result. The ant colony clustering algorithms are applied to a large-scale PPI network, and a large amount of operations such as picking up, putting down, merging, filtering and the like are required, so that the convergence speed is low, and the solving time is too long. Lujiawei et al propose an ant colony optimization algorithm MGRACO-FMD based on a multi-granularity model, and try to improve convergence speed, but the accuracy of a clustering result is not high. Lei et al propose a PPI network ant colony optimization clustering algorithm based on connection strength, which reduces the time overhead and has a low recall ratio. The algorithms improve the time performance, and simultaneously reduce the accuracy and the recall ratio.
The prediction accuracy of the above algorithms depends on the reliability of the PPI network, however, the currently obtained protein interaction data contains a large amount of false positive and false negative data, and in addition, the PPI network is regarded as static and unchangeable by the algorithms, but the static PPI network cannot truly reflect the dynamic change in the cell, so that the protein complex mining based on the dynamic PPI network is more reasonable. With the increase of protein biological data and sequence data, recently, some researchers tried to build more reliable dynamic PPI networks in combination with their biological information, and further, to mine more reliable protein complexes.
Tang et al, using gene expression data and static PPI networks, constructs a Time series protein interaction network (TC-PIN) with a specified uniform threshold, and successfully applies it to protein functional module mining. Due to the fact that the gene expression levels of all proteins are inconsistent, the set unified threshold value can cause the built PPI network to be inaccurate, and further the clustering effect is influenced. Hu et al cancels the uniform threshold, uses the average expression level of each protein as a standard for judging whether the protein is active, constructs a dynamic weighting network by combining the complex information and the domain information, and proposes a protein function prediction method D-PIN. Su et al propose a dynamic weighting PPI network-based compound mining algorithm GECIuster, which firstly uses GO-Slim to weight a dynamic network and secondly mines a protein compound according to a seed node expansion strategy. The method measures the functional similarity between proteins only by adopting gene ontology information, and does not fuse various data, so that the interaction between the proteins cannot be well reflected. Yi et al propose a nuclear-dependent protein complex detection method DCA by weighting each protein by using edge aggregation coefficients and continuous co-expression length, and the weighting mode of the algorithm is integrated with the time sequence characteristics of complex evolution, so that the similarity between proteins can be better described. In the same year, Zhao et al propose a new complex recognition algorithm by using the time sequence function retention characteristics of the complex and combining ant colony clustering. The algorithm analyzes the mining method of the compound from a new visual angle, and is not only innovative on a clustering method. The clustering accuracy of the method is high, but the recall rate of the algorithm is general and may be related to weight measurement and an ant colony searching mode. Although the dynamic PPI network-based protein complex mining has achieved a certain success, it is still necessary to study how to effectively filter false positive data by using gene expression profiles, how to reasonably integrate PPI data and multivariate biological information, and provide an effective weighting method to reduce the gap between the constructed network and the real network. In addition, the ant colony algorithm is applied to the large-scale PPI network clustering problem, a large amount of picking-up, putting-down and filtering operations are required, so that the convergence speed is low, meanwhile, due to the high randomness of the algorithm, the accuracy and the recall rate are generally not high, and the problems still need to be solved urgently.
Disclosure of Invention
To address at least one of the above technical problems, the present disclosure provides a protein complex mining method based on a dynamically weighted protein interaction network.
According to one aspect of the present disclosure, a protein complex mining method based on a dynamically weighted protein interaction network includes the steps of:
constructing a dynamic weighted protein interaction network: inputting protein interaction data, gene expression profile data and gene body information, carrying out duplication removal operation on the protein interaction network data, filtering inactive proteins by using the gene expression profile data so as to construct a dynamic protein interaction network, weighting the dynamic protein interaction network by using comprehensive weight measurement and adding new interaction, thereby constructing the dynamic weighted protein interaction network;
constructing a protein complex core: inputting a dynamic weighted protein interaction network and a key protein set at each moment, optimizing selection of seed nodes by adopting a point-edge aggregation coefficient, and constructing a protein composite nucleus by utilizing the key properties of the protein and the internal properties of a compound;
ant colony clustering: improving a picking rule of an ant colony algorithm by adopting a similarity function of fuzzy granularity, continuously loading protein nodes to form an initial clustering result, and correcting the initial clustering result by utilizing a compactness optimization putting-down rule so as to realize the mining of a protein compound;
global and local weight updating: the local weight updating strategy is utilized to realize the transmission of the optimal solution information among different ant colonies, and the global weight updating strategy is utilized to realize the transmission of the function information among the dynamic weighted protein interaction networks at adjacent moments; and
and outputting a result: outputting the excavated protein complex.
According to at least one embodiment of the present disclosure, the step of constructing a dynamically weighted protein interaction network comprises:
the 36 time points of the gene expression profile data were combined into 12 time points by the following formula 1:
Figure BDA0001816720680000041
wherein, Tu(i) Represents the gene expression value of the protein u at the moment i, i is more than or equal to 1 and less than or equal to 12;
non-co-expressed proteins were filtered according to the following formula 2:
Figure BDA0001816720680000042
wherein, T'uRepresents the mean gene expression value of protein u;
add interactions for each dynamic subnetwork: assuming that the proteins u, v are interacting and co-expressed on a static protein interaction network, a set of interactions is added to the network at that moment; assuming that the proteins u, v are not interacting but co-expressed on the static protein interaction network, whether or not an interaction is added is judged by the following formula 3:
Figure BDA0001816720680000043
wherein CWM (u, v) represents the integrated weight metric of the proteins u, v, CEcc(u, v) represents a point-edge clustering coefficient, FS (u, v) represents gene ontology functional similarity, Pcc (u, v) represents a Pearson correlation coefficient;
adding a set of interactions when the CWM (u, v) is greater than 0, otherwise not adding;
according to the formula 3, the 12 dynamic subnetworks are weighted by adopting the comprehensive weight measurement, and then the dynamic weighted protein interaction network is obtained.
According to at least one embodiment of the present disclosure, the point-edge clustering coefficient CEcc(u, v) is calculated by the following formula 4:
Figure BDA0001816720680000051
wherein, tanu,vRepresenting the number of triangles jointly formed by network nodes u, v, du,dvDegree, C, representing network nodes u, v, respectivelyu,CvPoint aggregation coefficients representing network nodes u, v, respectively;
the gene ontology functional similarity FS (u, v) was calculated using the following formula 5:
Figure BDA0001816720680000052
wherein, | fu∩fvI denotes the number of gene ontology terms common to proteins u and v, | fu|,|fv| denotes the number of gene ontology terms for proteins u and v, respectively;
the pearson correlation coefficient Pcc (u, v) is calculated using the following formula 6:
Figure BDA0001816720680000053
wherein k is the number of samples, i is the number of times in the gene expression data, Exp(u,i),Exp(v, i) represents the expression values of proteins u and v at time i, respectively,
Figure BDA0001816720680000054
and σ (u), σ (v) representing the mean expression value and standard deviation of proteins u and v, respectively, at all times, Pcc (u),v)∈[-1,1]。
According to at least one embodiment of the present disclosure, the step of constructing the protein complex core includes:
b1 calculating the sum SoCE of the point-edge aggregation coefficients of all the associated edges of the nodes of each key proteinccAnd put into an ordered queue Q in descending order1
B2 Slave queue Q1Initializing a compound core C by the key protein node with the maximum sum of the median-taken point edge aggregation coefficients, and enabling the key protein node to meet an interaction threshold eta and have continuous co-expression times of more than or equal to mAdding a composite core C adjacent to the adjacent nodes;
b3 judging whether the composite core C meets the density threshold d, if not, recursively deleting SoCEccThe nodes with small values until the composite core C meets the density threshold d;
b4 when the composite core C satisfies the density threshold d, storing the composite core C into the result queue Q2From the ordered queue Q1Deleting all nodes in the composite core C;
b5 repeating steps B2, B3 and B4 until ordered queue Q1Is empty.
According to at least one embodiment of the present disclosure, the sum of the point-edge aggregation coefficients of all associated edges of the nodes of key proteins is SoCEccCalculated by the following formula 7:
Figure BDA0001816720680000061
wherein, SoCEcc(u) represents the sum of the point-edge clustering coefficients of all the associated edges of the key protein node u.
According to at least one embodiment of the present disclosure, the step of ant colony clustering includes:
c1 at result queue Q2Randomly selecting a composite core C as the initial position of the ant;
c2 calculating fuzzy granularity of node u in ant neighborhood range, picking up neighbor node satisfying condition, proceeding to the neighbor node, updating composite core and ant neighborhood range; if no neighbor node meeting the condition exists, skipping the step C3 and directly entering the step C4;
c3 judging whether the ant load capacity reaches the maximum, if not, repeating the step C2, continuing clustering the nodes in the new neighborhood range of the ants, if so, performing the step C4;
c4 obtaining the initial clustering result corresponding to the composite core C, and queuing Q from the result2Deleting composite core C and judging result queue Q2If not, randomly selecting a composite core as the initial position of the ant and returning toStep C2 begins a new round of search; if result queue Q2If it is empty, go to step C5;
c5 calculates the compactness of node u and compound PC, cuts off the nodes with compactness less than 1 to obtain compound PC, and outputs compound set CS.
According to at least one embodiment of the present disclosure, the haze particle size is calculated by the following equation 8:
Figure BDA0001816720680000062
wherein,A(u) represents the fuzzy granularity of the node u in the ant neighborhood range, | C | is the node number in the composite kernel C, and alpha is the dissimilarity factor.
The compactness is calculated by the following formula 9:
Figure BDA0001816720680000063
where CD (u, PC) represents the closeness of node u to complex PC, din(u,v1) Indicates that the protein u is complexed with other proteins v in the PC1Weight of the connecting edge, dout(u,v2) Indicates that the protein u is complexed with a protein v other than the PC2The weight of the connecting edge.
According to at least one embodiment of the present disclosure, the local weight update is performed according to the following equation 10:
CWM(u,v)=(1+PCuv) CWM (u, v) formula 10
Wherein, PCuvThe probability that the proteins u, v share the complex in the optimal solution of the last iteration is shown as an enhancement factor.
According to at least one embodiment of the present disclosure, the coefficient of enhancement PCuvCalculated by the following equation 11:
Figure BDA0001816720680000071
wherein, Cu,CvRespectively, a collection of complexes to which the proteins u, v belong, Cu∩CvRepresents a complex set comprising both proteins u, v.
According to at least one embodiment of the present disclosure, global weight update is performed according to the following equation 12:
Figure BDA0001816720680000072
wherein,
Figure BDA0001816720680000073
and
Figure BDA0001816720680000074
are respectively shown at Ti-1And TiThe times of the occurrence of the proteins u and v in the same compound in the optimal solution of the instantaneous network at the moment is that alpha is more than or equal to 0 and beta is more than or equal to 1,
Figure BDA0001816720680000075
and β is a constant.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the principles of the disclosure.
Fig. 1 is a schematic diagram of the construction of a dynamically weighted protein interaction network according to at least one embodiment of the present disclosure.
Fig. 2 is a flow diagram of a protein complex mining method based on a dynamically weighted protein interaction network in accordance with at least one embodiment of the present disclosure.
FIG. 3 is a graph comparing clustering results of algorithms on a dynamic protein interaction network, in accordance with at least one embodiment of the present disclosure.
FIG. 4 is a graph comparing the results of DNA-directed RNA polymerase II complex detection according to various algorithms in at least one embodiment of the present disclosure.
Detailed Description
The present disclosure will be described in further detail with reference to the drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limitations of the present disclosure. It should be further noted that, for the convenience of description, only the portions relevant to the present disclosure are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The disclosure provides an ant colony clustering dynamic weighted PPI network complex mining method (FGCDACC-DPC) based on fuzzy granularity and compactness based on the mining of protein complexes by ant colony clustering. Firstly, constructing a dynamic protein interaction network by using gene expression profile data, weighting each dynamic subnet by adopting comprehensive weight measurement CWM and adding new interaction, and further constructing a dynamic weighting network; then, a group of dense and highly co-expressed composite nuclei is constructed by using the basic characteristics of the protein composite, and the mining of the protein composite is realized by adopting a picking-up and laying-down model (GCM) with fuzzy granularity and compactness; and meanwhile, local and global weight updating strategies are adopted to realize the transfer of optimal solution function information between different ant colonies and networks at different moments.
In an alternative embodiment of the present disclosure, data analysis and experimental validation is preferably performed using yeast proteins as an example.
Constructing a dynamic weighted PPI network:
the yeast protein interaction network was derived from the DIP database, which, after deduplication, contained 5093 proteins and 24734 sets of interactions. Gene expression profiling data the data numbered GSE3431 was selected to include expression values for 6777 genes at 36 times, of which only 4981 genes were in the yeast PPI network. The standard protein complex information was derived from the CYC2008 module set containing 408 standard complexes with a maximum scale of 81 and a minimum scale of 3. Gene Ontology (GO) functional annotation information is downloaded from a gene ontology library. Key protein data were obtained by integrating the data in 4 databases of MIPS, SGD, DEG and SGDP, which contained 1285 key proteins, with only 1167 key proteins in the yeast PPI network. Given the limitations imposed by experimental detection conditions and the "non-scale" and "small-world" nature of PPI networks, some biological data in protein interaction networks and bioinformatics present inaccuracies, and the accuracy of detecting protein complexes is susceptible to false positives and false negatives. In order to reduce the influence of false positive and false negative data on the experimental result, a dynamic weighting PPI network is constructed by combining the topological characteristic and the biological characteristic of the network based on the static PPI network, so that the accuracy of protein compound mining is improved. And continuously adjusting and optimizing the static PPI network according to the gene expression profile data to further construct a dynamic PPI network, weighting the dynamic PPI network by comprehensively utilizing the point-edge aggregation coefficient, the Pearson correlation coefficient and the GO functional similarity, and adding new interaction to construct the dynamic weighted PPI network. The detailed process of constructing a dynamically weighted PPI network is as follows:
from the gene expression profile data, the 36 time points were combined into 12 time points by the following formula 1:
Figure BDA0001816720680000091
wherein, Tu(i) Represents the gene expression value of protein u at time i, i is 1. ltoreq. i.ltoreq.12.
Non-co-expressed proteins were filtered according to the following formula 2:
Figure BDA0001816720680000092
wherein, T'uRepresents the average gene expression value of protein u.
Add interactions for each dynamic subnetwork: assuming that the proteins u, v are interacting and co-expressed on a static protein interaction network, a set of interactions is added to the network at that moment; assuming that the proteins u, v are not interacting but co-expressed on the static protein interaction network, whether or not an interaction is added is judged by the following formula 3:
Figure BDA0001816720680000093
wherein CWM (u, v) represents the integrated weight metric of the proteins u, v, CEcc(u, v) represents the point-edge clustering coefficient, FS (u, v) represents the functional similarity of gene ontology, and Pcc (u, v) represents the Pearson correlation coefficient.
Further, the point-edge clustering coefficient CEcc(u, v) is calculated by the following formula 4:
Figure BDA0001816720680000094
wherein, tanu,vRepresenting the number of triangles jointly formed by network nodes u, v, du,dvDegree, C, representing network nodes u, v, respectivelyu,CvPoint aggregation coefficients for network nodes u, v, respectively.
The gene ontology functional similarity FS (u, v) was calculated using the following formula 5:
Figure BDA0001816720680000095
wherein, | fu∩fvI denotes the number of gene ontology terms common to proteins u and v, | fu|,|fvThe | indicates the number of gene ontology terms for the proteins u and v, respectively.
Pearson's correlation coefficient Pcc(u, v) is calculated by the following formula 6:
Figure BDA0001816720680000101
wherein k is the number of samples, i is the number of times in the gene expression data, Exp(u,i),Exp(v, i) represents the expression values of proteins u and v at time i, respectively,
Figure BDA0001816720680000102
and σ (u), σ (v) representing the mean expression value and standard deviation of the proteins u and v, respectively, at all times, Pcc (u, v) e [ -1,1]。
When the combined weight metric CWM (u, v) for the proteins u, v is greater than 0, then a set of interactions is added, otherwise not. According to the formula 3, the 12 dynamic subnetworks are weighted by adopting the comprehensive weight measurement, and then the dynamic weighted protein interaction network is obtained.
The method for constructing the dynamic weighted PPI network fully considers the situations that a large amount of false positive and false negative data exist in the PPI network due to the limitation of experimental conditions and the 'scale-free' and 'small-world' characteristics of the protein network, can effectively reduce the influence of noise data on the clustering result of protein complex mining, and can fuse the biological information of the protein to improve the accuracy of the protein complex mining.
Fig. 1 is a schematic diagram of the construction of a dynamic weighted PPI network, which embodies the dynamic characteristics of a yeast PPI network. As can be seen from fig. 1, the activity of the network and the interaction between proteins are very different at different times for different proteins. Since the actual protein network is constantly changing and proteins must be in an active state to interact with other proteins, the interacting proteins in the transient network should be in an active state. Although the construction of the dynamic network can reduce a large amount of false positive data, false negative increase is inevitably caused, and in order to reduce the negative influence of the false negative on the clustering result, the comprehensive weight measurement is adopted to weight the dynamic PPI network and add new interaction, so that the reliability of the network is improved. The results of the analysis showed that: the construction of the dynamic weighted PPI network can be closer to a real yeast PPI network, so that the clustering accuracy is improved. Meanwhile, the distribution of the protein functional modules in each dynamic weighted PPI network has obvious statistical properties, and the protein functional modules are mainly enriched in certain interaction subnetworks, which shows that the comprehensive weight measurement of the protein of all dynamic weighted PPI networks is not useful for mining protein complexes in cells.
(II) constructing a protein complex core:
the intrinsic property and biological property of the protein complex are utilized to construct a more real and reliable complex core. Firstly, all key proteins in the subnet at each moment are selected as a seed node set, and then whether the constructed composite nucleus meets the conditions of an interaction threshold, a density threshold and continuous co-expression times is judged, so that the composite nucleus is constructed. The detailed process for constructing the protein complex core is as follows:
1) first, calculate the sum SoCE of the point-edge aggregation coefficients of all the related edges of the nodes of each key proteinccAnd put into an ordered queue Q in descending order1
SoCEccCalculated by the following formula 7:
Figure BDA0001816720680000111
wherein, SoCEcc(u) the sum of the point edge clustering coefficients of all the associated edges representing the key protein node u;
2) slave queue Q1Initializing a composite kernel C by the key protein node with the maximum sum of the median-taken point edge aggregation coefficients, and adding a direct neighbor node which meets an interaction threshold eta and has continuous co-expression times of more than or equal to m into the composite kernel C, wherein the value range of m can be determined according to actual needs;
3) judging whether the composite core C meets the density threshold d, if not, recursively deleting the SoCEccThe nodes with small values until the composite core C meets the density threshold d;
4) when the composite core C meets the density threshold d, the composite core C is stored in a result queue Q2From the ordered queue Q1Deleting all nodes in the composite core C;
5) repeating steps 2), 3) and 4) until the ordered queue Q is reached1Is empty.
And (III) ant colony clustering based on fuzzy granularity and compactness:
and continuously loading data to form an initial clustering result by adopting a fuzzy granularity-based pick-up rule (FGP), and correcting the initial clustering result by utilizing the compactness. Specifically, ants randomly select a composite core and initialize a cluster, search for nodes in a visual range, and pick up the nodes and advance to the positions of the nodes if the fuzzy granularity similarity is larger than the initial granularity P. And when the ants traverse all nodes meeting the conditions in the neighborhood of the current composite core or reach the maximum loading capacity, the ants randomly select the next composite core to start the next round of search. And repeating the process until all the composite cores are traversed by the ants, and obtaining an initial clustering result. And correcting the initial clustering result according to a closeness put-down rule (CDD), and discarding some nodes with tight external connection and loose internal connection, thereby realizing the mining of the protein compound. The detailed process of ant colony clustering is as follows:
1) in the result queue Q2Randomly selecting a composite core C as the initial position of the ant;
2) calculating the fuzzy granularity of the node u in the range of the ant neighborhood (direct neighbor), picking up the neighbor node meeting the condition, advancing to the neighbor node, and updating the ranges of the composite core and the ant neighborhood; if no neighbor node meeting the condition exists, skipping the step 3) and directly entering the step 4); the haze particle size is calculated by the following formula 8:
Figure BDA0001816720680000121
wherein,A(u) represents the fuzzy granularity of the node u in the ant neighborhood range, | C | is the node number in the composite kernel C, and alpha is the dissimilarity factor.
3) Judging whether the ant load capacity (the maximum scale of the standard compound) reaches the maximum, if not, repeating the step 2), continuing clustering nodes in the new neighborhood range of the ants, and if so, performing the step 4);
4) obtaining the initial clustering result corresponding to the composite kernel C, and obtaining the resultQueue Q2Deleting composite core C and judging result queue Q2If the ant is not empty, randomly selecting a composite core as the initial position of the ant, and returning to the step 2 to start a new round of search; if result queue Q2If the result is empty, entering the step 5);
5) calculating the compactness of the node u and the compound PC, eliminating the nodes with the compactness less than 1 to obtain the compound PC, and outputting a compound set CS;
the compactness is calculated by the following formula 9:
Figure BDA0001816720680000122
where CD (u, PC) represents the closeness of node u to complex PC, din(u,v1) Indicates that the protein u is complexed with other proteins v in the PC1Weight of the connecting edge, dout(u,v2) Indicates that the protein u is complexed with a protein v other than the PC2The weight of the connecting edge.
(IV) global and local weight updating:
and local weight value updating is carried out by utilizing a function information transfer mechanism and the optimal solution information in the population, and the optimal solution information of the previous iteration is transferred through the weight value through information transfer among different ant colonies, so that the probability that similar data is distributed to the same cluster in the next iteration is increased, and the probability that dissimilar data is distributed to the same cluster is reduced.
Local weight update is performed according to the following equation 10:
CWM(u,v)=(1+PCuv) CWM (u, v) formula 10
Wherein, PCuvThe probability that the proteins u, v share the complex in the optimal solution of the last iteration is shown as an enhancement factor.
PCuvCalculated by the following equation 11:
Figure BDA0001816720680000131
wherein, Cu,CvRespectively, a collection of complexes to which the proteins u, v belong, Cu∩CvRepresents a complex set comprising both proteins u, v.
And realizing weight updating between PPI networks at adjacent moments by utilizing a global weight updating strategy based on time sequence correlation and functional transitivity. The strategy transmits the clustering result of the network at the previous moment to the network at the next moment through the positive feedback of the CWM, so that the interaction degree between two proteins belonging to the same cluster can be effectively increased, and the convergence speed is accelerated.
The global weight update formula is shown in equation 12 below:
Figure BDA0001816720680000132
wherein,
Figure BDA0001816720680000133
and
Figure BDA0001816720680000134
are respectively shown at Ti-1And TiThe times of the occurrence of the proteins u and v in the same compound in the optimal solution of the instantaneous network at the moment is that alpha is more than or equal to 0 and beta is more than or equal to 1,
Figure BDA0001816720680000135
and β is a constant. Preferably, are respectively provided with
Figure BDA0001816720680000136
And β is 0.1 and 0.2.
(V) outputting the result: all protein complexes excavated by the above method are exported.
FIG. 2 shows a flow chart of FGCDACC-DPC method. The above method can be summarized according to fig. 2 as follows: firstly, a dynamic weighting model based on a static PPI network and combined with gene expression profile data and gene body information is adopted to construct a more real and reliable dynamic weighting protein interaction network; secondly, constructing a group of dense and highly co-expressed composite kernels, then adopting a model (FGCDM) based on fuzzy granularity and compactness to pick up and put down to realize the excavation of protein compounds, and evaluating the solution quality according to the modularity M after clustering is finished; and finally, in order to improve the clustering accuracy and accelerate the clustering speed, updating the interaction between the proteins by adopting a global and local weight updating strategy based on functional information transfer and time sequence function correlation, and outputting all the excavated protein compounds.
In order to verify the effectiveness and performance superiority of the FGCDACC-DPC method compared to other methods, the FGCDACC-DPC method was compared with MCODE, RNSC, MCL, COACH, JSACO, ACC-FDM, and ACC-DPC methods in terms of accuracy and recall of the mined protein complex, clustering performance of functional module mining, and execution efficiency. Preferably, the various methods described above are applied to yeast protein interaction networks for experimental validation.
1) Comparing the FGCDACC-DPC with protein functional modules mined by other methods in the accuracy, recall rate and F-measure metric value:
in order to verify the effectiveness of the FGCDACC-DPC algorithm in the dynamic PPI network, the clustering performance of the FGCDACC-DPC is evaluated by adopting a correct rate, a recall rate and an F-measure metric value. The FGCDACC-DPC method and the other 7 methods are independently operated for 20 times, and the average value of the experimental results is taken for analysis and comparison. As shown in FIG. 3, the comparison results of the three metrics of the algorithms show that the FGCDACC-DPC algorithm has the highest F-measure value, and is improved by 144.3%, 61.06%, 19.24%, 37.58%, 17.49%, 42.161% and 25.52% compared with the MCODE, MCL, COACH, RNSC, ACC-DPC, JSACO and ACC-FMD algorithms, respectively. The main reasons for this result are: the dynamic weighted PPI network constructed by the FGCDACC-DPC algorithm is closer to a real PPI network, and the influence of false positive and false negative on clustering accuracy is reduced; and on the other hand, the F-measure metric value of the algorithm can be effectively improved by picking up the improved strategy and the weight value updating strategy. The FGCDACC-DPC algorithm is bitwise second in precision (accuracy) next to the JSACO algorithm, which indicates that the dynamic network constructed by the FGCDACC-DPC algorithm contains fewer false positives. The FGCDACC-DPC algorithm has better performance on the recall rate, and is respectively improved by 252.2 percent, 38.025 percent, 7.08 percent, 14.01 percent, 27.17 percent, 95.758 percent and 40.157 percent compared with the MCODE, MCL, COACH, RNSC, ACC-DPC, JSACO algorithm and ACC-FMD. Although the dynamic network constructed by the FGCDACC-DPC algorithm is lack of a certain amount of protein, which may cause the recall rate to be reduced, the effectiveness of the weighting mode causes the network to contain fewer false negatives, so that the recall rate is improved as a whole. The FGCDACC-DPC algorithm has better performance by comprehensively measuring three index values of the accuracy, the recall rate and the F-measure.
2) Comparison of clustering performance of FGCDACC-DPC with protein complexes mined by other methods:
in order to further evaluate the clustering performance of the FGCDACC-DPC algorithm, the four aspects of the number of complexes identified by each algorithm, the average size of clusters, the number of coverage proteins and the running time are respectively analyzed.
As can be seen from Table 1 below, the FGCDACC-DPC algorithm recognizes that the average size and the coverage protein of the complex are closer to the standard class than other algorithms recognize; although the number of the identified complexes is 637, which is second only to the MCL algorithm, the MCL algorithm covers 4096 proteins, so the accuracy is lower than that of the FGCDACC-DPC algorithm.
To verify the time efficiency of the FGCDACC-DPC algorithm, it was subjected to comparative experiments with various ant colony clustering based algorithms. From Table 1, it can be seen that the FGCDACC-DPC algorithm has better time performance. Firstly, because the FGCDACC-DPC algorithm is based on small-scale dynamic weighted PPI network clustering, the problem that the convergence speed of the ant colony algorithm applied to a large-scale PPI network is low is solved; and secondly, the effectiveness of the improved pick-up and drop-down rule and the weight value updating can effectively reduce the calculated amount and the times of accessing but not picking up, thereby shortening the clustering time. The FGCDACC-DPC algorithm is therefore more time efficient than the ACC-DPC and ACC-FMD algorithms. Although the runtime of the FGCDACC-DPC algorithm is slightly inferior to the JSACO algorithm, other indicators of the algorithm are higher than the JSACO algorithm.
TABLE 1 comparison of Performance of various algorithms for mining protein complexes
Figure BDA0001816720680000151
The protein complexes identified by the FGCDACC-DPC algorithm, whether the average size, number of clusters, or the number of proteins covered, are very close to the standard class and also low at clustering time, second only to the JSACO algorithm. In general, the clustering performance of the FGCDACC-DPC algorithm is high, and a good optimization effect is achieved.
3) Comparison of method FGCDACC-DPC with clustering results of protein complexes mined by other methods:
the clustering results of the FGCDACC-DPC algorithm were analyzed and table 2 shows the 6 protein complexes identified using the algorithm. Evaluating the clustering effect of the FGCDACC-DPC algorithm by analyzing correct and wrong clustering results in the predicted compound.
As can be seen from Table 2, the predicted complexes 2, 3, 5 and 6 are perfect matches with the standard complexes, indicating that the protein complexes detected by the FGCDACC-DPC algorithm are closer to the true protein complexes and more biologically significant.
To more intuitively analyze the clustering result, the detection result of the DNA-directed RNA polymerase II complex was visualized. FIG. 4 shows the predicted results of detecting DNA-directed RNA polymerase II complex using different algorithms, where the grey nodes represent the proteins with clustering errors. FIG. 4(a) is a standard complex; FIG. 4(b) shows the results of FGCDACC-DPC algorithm, correctly detecting all proteins of the complex; FIG. 4(c) shows the results of the ACC-DPC algorithm, 11 proteins were correctly detected, and only protein YHR143W-A was not detected because the node is linked to only the in-cluster YIL021W and is linked to the out-cluster more tightly; FIG. 4(d) shows the results of ACC-FMD algorithm with 10 proteins detected and two non-complexed proteins misdetected, where YPL203W wrongly replaced YHR143W-A, because YPL203W was tightly linked to all proteins in the cluster. As can be seen from the clustering results of fig. 4(c) and (d), the compound based on dynamic network mining is more accurate in the case of using the same algorithm; FIGS. 4(e) and (f) are the results of the MCL and MCODE algorithms, both of which correctly detected only 9 proteins, wherein the YPR110C in the results of the MCL algorithm wrongly replaced YPR187W, and the MCODE algorithm wrongly detected two proteins. Therefore, the detection result of the FGCDACC-DPC algorithm based on the dynamic weighted PPI network is closer to the standard compound, and the effectiveness of the algorithm is further illustrated.
TABLE 2 analysis of the results of 6 complexes identified by the FGCDACC-DPC algorithm
Figure BDA0001816720680000161
Figure BDA0001816720680000171
In conclusion, the accuracy of the protein complex mined by the ant colony clustering-based dynamic weighted PPI network protein complex mining method and the matching precision, recall rate, clustering effect and the like of the protein complex with the standard protein complex are remarkably improved.
Compared with the existing protein complex identification method based on the dynamic PPI network, the technical scheme disclosed by the invention is obviously improved in the aspects of prediction accuracy, recall rate, matching rate with known protein complexes and the like, and is helpful for providing valuable reference information for the prediction experiment and further research of unknown functions of proteins for biologists.
It will be understood by those skilled in the art that the foregoing embodiments are merely for clarity of illustration of the disclosure and are not intended to limit the scope of the disclosure. Other variations or modifications may occur to those skilled in the art, based on the foregoing disclosure, and are still within the scope of the present disclosure.

Claims (11)

1. A protein complex mining method based on a dynamic weighting protein interaction network is characterized by comprising the following steps:
constructing a dynamic weighted protein interaction network: inputting protein interaction data, gene expression profile data and gene body information, carrying out duplication removal operation on protein interaction network data, filtering inactive proteins by using the gene expression profile data so as to construct a dynamic protein interaction network, weighting the dynamic protein interaction network by using comprehensive weight measurement and adding new interaction, thereby constructing the dynamic weighted protein interaction network;
constructing a protein complex core: inputting a dynamic weighted protein interaction network and a key protein set at each moment, optimizing selection of seed nodes by adopting a point-edge aggregation coefficient, and constructing a protein composite nucleus by utilizing the key properties of the protein and the internal properties of a compound;
ant colony clustering: improving a picking rule of an ant colony algorithm by adopting a similarity function of fuzzy granularity, continuously loading protein nodes to form an initial clustering result, and correcting the initial clustering result by utilizing a compactness optimization putting-down rule so as to realize the mining of a protein compound; the picking rule of the ant colony algorithm is that ants randomly select a composite core and initialize a cluster to search nodes in a visual range, if the fuzzy granularity similarity is larger than the initial granularity P, the nodes are picked up and move to the positions of the nodes, when the ants traverse all nodes meeting the conditions in the neighborhood of the current composite core or reach the maximum loading capacity, the ants randomly select the next composite core to start the next round of search, the process is repeated until all the composite cores are traversed by the ants, and the initial clustering result is obtained; correcting the initial clustering result by using a compactness optimization putting-down rule, discarding some nodes with tight external connection and loose internal connection, and further realizing the mining of the protein compound;
global and local weight updating: the transmission of optimal solution information among different ant colonies is realized by utilizing a local weight updating strategy, and the transmission of function information among the dynamic weighted protein interaction networks at adjacent moments is realized by utilizing a global weight updating strategy; and
and outputting a result: outputting the excavated protein complex.
2. The method of claim 1, wherein the step of constructing a dynamically weighted protein interaction network comprises:
the 36 time points of the gene expression profile data were combined into 12 time points by the following formula 1:
Figure FDA0002646417340000021
wherein, Tu(i) Represents the gene expression value of the protein u at the moment i, i is more than or equal to 1 and less than or equal to 12;
non-co-expressed proteins were filtered according to the following formula 2:
Figure FDA0002646417340000022
wherein, T'uRepresents the mean gene expression value of protein u;
add interactions for each dynamic subnetwork: assuming that the proteins u, v are interacting and co-expressed on a static protein interaction network, a set of interactions is added to the network at that moment; assuming that the proteins u, v are not interacting but co-expressed on the static protein interaction network, whether or not an interaction is added is judged by the following formula 3:
Figure FDA0002646417340000023
wherein CWM (u, v) represents the integrated weight metric of the proteins u, v, CEcc(u, v) represents a point-edge clustering coefficient, FS (u, v) represents gene ontology functional similarity, Pcc (u, v) represents a Pearson correlation coefficient;
adding a set of interactions when the CWM (u, v) is greater than 0, otherwise not adding;
according to the formula 3, the 12 dynamic subnetworks are weighted by adopting the comprehensive weight measurement, and then the dynamic weighted protein interaction network is obtained.
3. The method of claim 2,
the point-edge clustering coefficient CEcc(u, v) is calculated by the following formula 4:
Figure FDA0002646417340000024
wherein, tanu,vRepresenting the number of triangles jointly formed by network nodes u, v, du,dvDegree, C, representing network nodes u, v, respectivelyu,CvPoint aggregation coefficients representing network nodes u, v, respectively;
the gene ontology functional similarity FS (u, v) was calculated using the following formula 5:
Figure FDA0002646417340000025
wherein, | fu∩fvI denotes the number of gene ontology terms common to proteins u and v, | fu|,|fv| denotes the number of gene ontology terms for proteins u and v, respectively;
the pearson correlation coefficient Pcc (u, v) is calculated using the following equation 6:
Figure FDA0002646417340000031
wherein k is the number of samples, i is the number of times in the gene expression data, Exp(u,i),Exp(v, i) represents the expression values of proteins u and v at time i, respectively,
Figure FDA0002646417340000033
and σ (u), σ (v) representing the mean expression value and standard deviation of the proteins u and v, respectively, at all times, Pcc (u, v) e [ -1,1]。
4. The method of claim 1, wherein the step of constructing the protein complex core comprises:
b1 calculating the sum of the point-edge aggregation coefficients of all the associated edges of the nodes of each key proteinccAnd put into an ordered queue Q in descending order1
B2 Slave queue Q1Initializing a composite core C by the key protein node with the maximum sum of the aggregation coefficients of the middle-fetching point edges, and adding a direct neighbor node which meets an interaction threshold eta and has continuous co-expression times of more than or equal to m into the composite core C;
b3 judging whether the composite core C meets the density threshold d, if not, recursively deleting SoCEccNodes with small values until the composite kernel C satisfies a density threshold d;
b4 when the composite core C meets the density threshold d, storing the composite core C into a result queue Q2From the ordered queue Q1Deleting all nodes in the composite core C;
b5 repeating steps B2, B3 and B4 until ordered queue Q1Is empty.
5. The method according to claim 4, wherein the sum of the point-edge aggregation coefficients of all associated edges of the nodes of the key protein SoCEccCalculated by the following formula 7:
Figure FDA0002646417340000032
wherein, SoCEcc(u) represents the sum of the point-edge clustering coefficients of all the associated edges of the key protein node u.
6. The method of claim 4, wherein the step of ant colony clustering comprises:
c1 at result queue Q2Randomly selecting a composite core C as the initial position of the ant;
c2 calculating fuzzy granularity of node u in the ant neighborhood range, picking up the neighbor node satisfying the condition, advancing to the neighbor node, and updating the composite core and the ant neighborhood range; if no neighbor node meeting the condition exists, skipping the step C3 and directly entering the step C4;
c3 judging whether the ant load capacity reaches the maximum, if not, repeating the step C2, continuing clustering the nodes in the new neighborhood range of the ants, if so, performing the step C4;
c4 obtaining the initial clustering result corresponding to the composite core C, and queuing Q from the result queue2Deleting composite core C and judging result queue Q2If not, randomly selecting a composite core as the initial position of the ant, and returning to the step C2 to start a new round of search; if result queue Q2If it is empty, go to step C5;
c5 calculates the compactness of node u and compound PC, cuts off the nodes with compactness less than 1 to obtain compound PC, and outputs compound set CS.
7. The method of claim 6,
the haze particle size is calculated by the following formula 8:
Figure FDA0002646417340000041
wherein CMW (u, v) represents the integrated weight measurement of the protein u, v,A(u) represents the fuzzy granularity of a node u in the ant neighborhood range, | C | is the number of nodes in the composite kernel C, and alpha is a dissimilarity factor;
the compactness is calculated by the following formula 9:
Figure FDA0002646417340000042
where CD (u, PC) represents the closeness of node u to complex PC, din(u,v1) Indicates that the protein u is complexed with other proteins v in the PC1Weight of the connecting edge, dout(u,v2) Indicates that the protein u is complexed with a protein v other than the PC2The weight of the connecting edge.
8. The method according to claim 1 or 7,
local weight update is performed according to the following equation 10:
CWM(u,v)=(1+PCuv) CWM (u, v) formula 10
Wherein CMW (u, v) represents the integrated weight measurement of the proteins u, v, PCuvThe probability that the proteins u, v share the complex in the optimal solution of the last iteration is shown as an enhancement factor.
9. The method of claim 8,
the enhancement factor PCuvCalculated by the following equation 11:
Figure FDA0002646417340000051
wherein, Cu,CvRespectively, a collection of complexes to which the proteins u, v belong, Cu∩CvRepresents a complex set comprising both proteins u, v.
10. The method of claim 1,
global weight update is performed according to the following equation 12:
Figure FDA0002646417340000052
wherein CMW (u, v) represents the integrated weight degree of the protein u, vThe amount of the compound (A) is,
Figure FDA0002646417340000053
and
Figure FDA0002646417340000054
are respectively shown at Ti-1And TiThe times of the occurrence of the proteins u and v in the same compound in the optimal solution of the instantaneous network at the moment is that alpha is more than or equal to 0 and beta is more than or equal to 1,
Figure FDA0002646417340000055
and β is a constant.
11. The method of claim 8,
global weight update is performed according to the following equation 12:
Figure FDA0002646417340000056
wherein CMW (u, v) represents the integrated weight measurement of the protein u, v,
Figure FDA0002646417340000057
and
Figure FDA0002646417340000058
are respectively shown at Ti-1And TiThe times of the occurrence of the proteins u and v in the same compound in the optimal solution of the instantaneous network at the moment is that alpha is more than or equal to 0 and beta is more than or equal to 1,
Figure FDA0002646417340000059
and β is a constant.
CN201811145616.2A 2018-09-29 2018-09-29 Protein compound mining method based on dynamic weighted protein interaction network Active CN109509509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811145616.2A CN109509509B (en) 2018-09-29 2018-09-29 Protein compound mining method based on dynamic weighted protein interaction network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811145616.2A CN109509509B (en) 2018-09-29 2018-09-29 Protein compound mining method based on dynamic weighted protein interaction network

Publications (2)

Publication Number Publication Date
CN109509509A CN109509509A (en) 2019-03-22
CN109509509B true CN109509509B (en) 2020-12-22

Family

ID=65746318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811145616.2A Active CN109509509B (en) 2018-09-29 2018-09-29 Protein compound mining method based on dynamic weighted protein interaction network

Country Status (1)

Country Link
CN (1) CN109509509B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517729B (en) * 2019-09-02 2021-05-04 吉林大学 Method for excavating protein compound from dynamic and static protein interaction network
CN111128301A (en) * 2019-12-06 2020-05-08 北部湾大学 Overlapped protein compound identification method based on fuzzy clustering
CN111667886B (en) * 2020-04-22 2023-04-18 大连理工大学 Dynamic protein compound identification method
CN112506999B (en) * 2020-12-17 2021-07-16 福建顶点软件股份有限公司 Cloud computing and artificial intelligence based big data mining method and digital content server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006107864A1 (en) * 2005-04-04 2006-10-12 Blueshift Biotechnologies, Inc. Screening using polarization anisotropy in fret emissions
CN102176223A (en) * 2011-01-12 2011-09-07 中南大学 Protein complex identification method based on key protein and local adaptation
CN105590039A (en) * 2015-03-05 2016-05-18 华中师范大学 Method for identifying protein complex based on BSO (Brain Storm Optimization)
CN105868582A (en) * 2016-03-25 2016-08-17 陕西师范大学 A method of identifying protein compounds by using a fruit fly optimization method
CN107784196A (en) * 2017-09-29 2018-03-09 陕西师范大学 Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006107864A1 (en) * 2005-04-04 2006-10-12 Blueshift Biotechnologies, Inc. Screening using polarization anisotropy in fret emissions
CN102176223A (en) * 2011-01-12 2011-09-07 中南大学 Protein complex identification method based on key protein and local adaptation
CN105590039A (en) * 2015-03-05 2016-05-18 华中师范大学 Method for identifying protein complex based on BSO (Brain Storm Optimization)
CN105868582A (en) * 2016-03-25 2016-08-17 陕西师范大学 A method of identifying protein compounds by using a fruit fly optimization method
CN107784196A (en) * 2017-09-29 2018-03-09 陕西师范大学 Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Clustering PPI Data Based on Ant ColonyOptimization Algorithm;LEI Xiujuan 等;《Chinese Journal of Electronics》;20130131;第22卷(第1期);118-123 *
基于拓扑势加权的动态PPI网络复合物挖掘方法;雷秀娟 等;《电子学报》;20180131;第46卷(第1期);145-151 *
融合时序保持特征和蚁群聚类的动态PPI网络复合物识别;赵学武 等;《小型微型计算机系统》;20170630;第38卷(第6期);1311-1316 *

Also Published As

Publication number Publication date
CN109509509A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN109509509B (en) Protein compound mining method based on dynamic weighted protein interaction network
Li et al. An ant colony optimization based dimension reduction method for high-dimensional datasets
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
CN108319812B (en) Method for identifying key protein based on cuckoo search algorithm
Zhang et al. Protein complex prediction in large ontology attributed protein-protein interaction networks
CN107885971B (en) Method for identifying key protein by adopting improved flower pollination algorithm
CN104992078B (en) A kind of protein network complex recognizing method based on semantic density
Ribeiro et al. Efficient parallel subgraph counting using g-tries
CN106372458A (en) Critical protein identification method based on NCCO (Neighbor Closeness Centrality and Orthology) information
Pizzuti et al. A coclustering approach for mining large protein-protein interaction networks
CN109326328B (en) Pedigree clustering-based ancient organism pedigree evolution analysis method
CN111145830A (en) Protein function prediction method based on network propagation
Džeroski et al. Analysis of time series data with predictive clustering trees
CN113539479B (en) Similarity constraint-based miRNA-disease association prediction method and system
Ji et al. ACC–FMD: ant colony clustering for functional module detection in protein–protein interaction networks
Zhao et al. I/O-efficient calculation of H-group closeness centrality over disk-resident graphs
Hvidsten A tutorial-based guide to the ROSETTA system: A Rough Set Toolkit for Analysis of Data
Wu et al. Algorithms for detecting protein complexes in PPI networks: an evaluation study
Tan et al. Combining multiple types of biological data in constraint-based learning of gene regulatory networks
Oucheikh et al. Data Clustering using Two-Stage Eagle Strategy Based on Slime Mould Algorithm
CN115631799B (en) Sample phenotype prediction method and device, electronic equipment and storage medium
Wang et al. Detecting Protein Complexes by an Improved Affinity Propagation Algorithm in Protein-Protein Interaction Networks.
Zhou et al. Protein Complex Identification Based on Heterogeneous Protein Information Network
Carter et al. Deployment and retrieval simulation of a single tether satellite system
Zhou et al. Heterogeneous PPI network representation learning for protein complex identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant