CN107798215B - PPI-based network hierarchy prediction function module and function method - Google Patents

PPI-based network hierarchy prediction function module and function method Download PDF

Info

Publication number
CN107798215B
CN107798215B CN201711153530.XA CN201711153530A CN107798215B CN 107798215 B CN107798215 B CN 107798215B CN 201711153530 A CN201711153530 A CN 201711153530A CN 107798215 B CN107798215 B CN 107798215B
Authority
CN
China
Prior art keywords
hierarchical structure
network
structure tree
tree
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711153530.XA
Other languages
Chinese (zh)
Other versions
CN107798215A (en
Inventor
刘维
马良玉
陈昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201711153530.XA priority Critical patent/CN107798215B/en
Publication of CN107798215A publication Critical patent/CN107798215A/en
Application granted granted Critical
Publication of CN107798215B publication Critical patent/CN107798215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention relates to a PPI-based network hierarchy prediction function module and an action method. The technical scheme of the invention relates to a genetic algorithm, function module mining and action prediction for inputting PPI network and biological information, constructing a hierarchical structure tree T according to a protein interaction network, calculating a likelihood value of the protein interaction network, coding the hierarchical structure tree T, searching a hierarchical tree structure tree T with a maximum likelihood value. The invention overcomes the defects of poor effect and randomness in a sparse PPI network with low density. The invention carries out mining and action prediction on the function module according to the maximum likelihood value hierarchical structure tree T, and simultaneously realizes the mining and action prediction of the function module through the likelihood value calculation of the network.

Description

PPI-based network hierarchy prediction function module and function method
Technical Field
The invention belongs to the technical field of biological information, mainly relates to a technology for mining a function module and predicting an action through a network hierarchical structure analysis algorithm in a protein interaction network, and particularly relates to a method for predicting the function module and the action based on a network hierarchical structure in a PPI network.
Background
Protein interaction networks (PPI) play an important role in life activities and have important application values in aspects of living body, drug target design, disease treatment and prediction and the like. Although some achievements are achieved for mining functional modules in a protein interaction network at present, due to the high complexity and randomness of a living system, the methods with high success in other fields do not always achieve ideal effects in PPI network analysis, and therefore the predicted protein accuracy is low.
Before the invention is made, in the existing method, the density of a protein network is mostly calculated, some closely-connected functional regions existing in the PPI network are detected by calculating the density, a node with the maximum local neighborhood density is selected as an initial functional module, and then the node is expanded outwards to form a final functional module. The shortcomings of such mining function modules and action prediction are: (1) the existing method can effectively detect the functional module with high density, but the effect is not good in the sparse PPI network with low density. (2) Due to the high complexity and randomness of the life system, the method of mining the functional modules by calculating the network density is not always ideal. And because the interaction and interconnection of proteins in the PPI network have randomness, the optimal solution is more difficult to obtain.
Disclosure of Invention
The invention aims to overcome the defects and develop a method for predicting a functional module and an effect based on a PPI network hierarchical structure.
The technical scheme of the invention is as follows:
the PPI network hierarchy structure-based function module and function predicting method is mainly technically characterized by comprising the following steps of:
(1) inputting a PPI network and biological information;
(2) constructing a hierarchical structure tree T according to a protein interaction network;
(3) likelihood value calculation of protein interaction network: obtaining a likelihood value corresponding to the original network G according to the combination of the hierarchical structure tree T and the assigned probability value on the internal hierarchy;
(4) coding hierarchical structure tree T: a middle-order traversal mode is adopted, namely a left child node is traversed, then a root node is traversed, finally a right child node is traversed, and the hierarchical structure tree T is coded;
(5) genetic algorithm for finding the hierarchical tree structure of maximum likelihood values T: selecting a pair of individuals which are not crossed according to the probability to carry out cross operation, and selecting one individual according to the probability to carry out mutation operation;
(6) functional module mining and action prediction: and calculating the modularity of each module according to the maximum likelihood value hierarchical structure tree T, and mining the functional modules to obtain the interaction probability.
The step (2) of calculating the likelihood value of the protein interaction network: through the step (1) of constructing a hierarchical structure tree T according to the PPI network, the interaction probability between protein vertexes is convenient to obtain, namely the number of edges of two vertexes in the network just taking the root as the nearest common ancestor is reduced, the calculation mode is simplified, and the likelihood value of the network is calculated.
The step (4) is to search a genetic algorithm of the maximum likelihood value hierarchical structure tree T: selecting a pair of individuals which are not crossed according to the probability to carry out cross operation, and selecting one individual according to the probability to carry out mutation operation; and simultaneously, global search is carried out, biological evolution is taken as a prototype, and the maximum likelihood value hierarchical structure tree T is searched, so that the modularity value of each module is obtained.
The method has the advantages and effects that the functional module is mined and the function is predicted according to the maximum likelihood value hierarchical structure tree T, the mining and the function prediction of the functional module are realized simultaneously through the likelihood value calculation of the network, the corresponding biological information is fused on the basis of considering the network topology, the prediction result is more accurate, and the reliability of the prediction result is improved. Meanwhile, the method provided by the invention can completely describe the hierarchical structure of the network and reflect the internal relationship among the network nodes. A hierarchical structure tree T with the maximum likelihood value corresponding to a network is found through a genetic algorithm, so that a plurality of unnecessary density calculations are reduced, functional module divisions are obtained through hierarchical division of the tree, the possibility of interaction of the nodes in the tree is obtained through information of common ancestors among the nodes, the efficiency of protein function mining and action prediction is improved, and the application range and the practicability of the technology in the field of biological information are expanded.
The invention relates to a network hierarchical structure analysis, which is a modular analysis of a protein interaction network, and comprises the steps of firstly constructing a hierarchical structure tree T according to the protein interaction network, then obtaining the hierarchical tree structure tree T with the maximum likelihood value by using a genetic algorithm, and finally carrying out hierarchical division and deeply excavating functional modules in the hierarchical tree structure tree T.
Drawings
FIG. 1 is a schematic diagram of the functional module mining and action prediction process of the present invention.
FIG. 2 is a comparison of the present invention with the MCODE method to identify the functional blocks Rpd 3S; wherein (a) is the identified function module by the MCODE algorithm, and (b) is the identified function module, wherein the black circles are the real function modules.
FIG. 3-a comparison of the four algorithms in predicted performance; wherein Pr represents the accuracy, Sn represents the sensitivity, Acc represents the accuracy, and FMM-HS represents the method of the present invention.
Detailed Description
The technical idea of the invention is as follows:
the invention provides a high-efficiency functional module mining and action prediction method by combining the deep mining performance of a network hierarchical structure analysis algorithm, namely, firstly, a hierarchical structure tree T is constructed according to a given protein interaction network, then, the hierarchical structure tree T is obtained through a genetic algorithm, the likelihood value of the hierarchical structure tree T is maximum, then, hierarchical division is carried out, and the optimal division scheme is selected according to the value of modularity. The network hierarchical structure analysis is helpful for understanding the function of unknown protein, has important significance for explaining the molecular mechanism of specific functions, and can provide important theoretical basis for the design of drug target cells and the like. The network-based hierarchical analysis method is naturally applicable to the detection of protein functional modules, while enabling the prediction of interactions.
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Step 1: inputting PPI networks and biological information
Step 2: constructing a hierarchical tree T according to a protein interaction network
In a protein interaction network, the connections between proteins can be represented as an undirected graph G (V, E), whose hierarchy can be represented by a hierarchical tree structure T. Where V represents the set of N proteins in the undirected graph and E represents the interaction between them. The N nodes are leaf nodes of the tree, N-1 non-leaf nodes are connected into a binary tree, each non-leaf node is endowed with a probability value P, and the probability of the connecting edge of each pair of leaf nodes is equal to the probability value P corresponding to the nearest common ancestor r of the leaf nodesr
And step 3: likelihood value calculation for protein interaction network
Integrating a undirected network G and a hierarchical tree T (the probability of an internal vertex r is unknown), and setting P for an internal node rrRepresenting the probability value at the vertex, ErThe number of edges in G whose two vertices exactly have r as the nearest common ancestor (i.e., the number of edges in G that connect between leaf nodes of subtree whose root is r) is represented by Lr、RrThe left and right subtrees representing r contain the number of leaf nodes, the combination of the hierarchical tree T and the assigned probability values on the internal hierarchy { P }iThe likelihood value corresponding to the original network G is:
Figure BSA0000154071160000041
to make L (T, { P)r}) reaches the maximum, then order
Figure BSA0000154071160000042
(i-1, 2, … N-1), i.e.
Figure BSA0000154071160000043
Can be solved to obtain
Figure BSA0000154071160000044
Such { P }iThe combination of values can maximize the likelihood values.
And 4, step 4: coding hierarchical structure tree T:
in the algorithm, each individual is a hierarchical tree, for this reason, we first encode the hierarchical tree T, we encode the hierarchical tree T (binary tree) in a way of a middle-order traversal, and we assign a label value to internal nodes, from 1 to n-1. The root node is given 1, for each internal node with children, the index values of its two children are both greater than its own index value, and for the leaf nodes, we give the values vl, …, vn, corresponding to the network vertices vl, …, vn.
And 5: genetic algorithm for finding the hierarchical structure tree of maximum likelihood values T:
the genetic algorithm generates an initial population consisting of m individuals according to the coding rule, then carries out operations such as crossing, mutation, selection and the like to generate a new generation of population, uses the likelihood value as a fitness value as a basis for individual selection,
and (3) cross operation: two individuals are provided: s1=(s1,s2,L,s2n-1),S2=(r1,r2,L,r2n-1) We randomly choose two positions l1、l2:l≤l1<l2Less than or equal to 2n-1, adding S1In
Figure BSA0000154071160000045
And S2In
Figure BSA0000154071160000046
Exchange to obtain
Figure BSA0000154071160000047
Figure BSA0000154071160000048
Setting a transformation area S1In is
Figure BSA0000154071160000049
S2In is
Figure BSA00001540711600000410
After the exchange, S1In the set W1∪W2-W1Medium elements are heavy; in the set W1∪W2-W2Is absent. Therefore, it is required to be at S1' receive W outside the switching area1∪W2-W1The elements in (1) are changed into W one by one1∪W2-W2Of (1).
Similarly, S2In the set W1∪W2-W2Are heavy, in the set W1∪W2-W1The element(s) in (b) is not present. Therefore, it is required to be at S2' receive W outside the switching area1∪W2-W2The elements in (1) are changed into W one by one1∪W2-W1Of (1).
For example, let S1=a2b4d3e1c,S2=d4e1a3c2b,l1=4,l2When the value is 6, then W1={4,d,3},W21, {1, a, 3), exchanged to yield: s'1=a2b1a3e1c,S′2=d4e4d3c2b,W1∪W2-W1={1,a},W1∪W2-W24, d. Is stated in S'1In (1, a), there is a repetition but the absence of {4, d } is at S'2There are repeats but there is a lack of {1, a }. We are at S'1Except the exchange region of (a) to (d), 1 to (4), to form S ″1D2b1a3e4c, we are at S'2Outside the exchange region of (a) by changing d to a, 4 to 1, to form S ″2A1e4d3c2b, such that S ″)1、S″2Are all legal codes.
Mutation operation: let S be ═ S1,s2,…,s2n-1) We select two positions l1、l2:l≤l1<l22n-1 or less, and l1-l2Is even (i.e. |)1、l2Parity), exchange
Figure BSA0000154071160000051
And
Figure BSA0000154071160000052
to obtain S', i.e
Figure BSA0000154071160000053
Step 6: functional module mining and role prediction
And (3) performing functional module mining according to the maximum likelihood value hierarchical structure tree T, and firstly labeling a layer number on a vertex in the hierarchical structure tree T: the root node is the first level, the children of the root node are the second level … …, and so on, and generally, the children of the i-th level node are the i +1 th level. Let us divide the functional blocks from the k-th layer, let the k-th layer have nkInternal nodes corresponding to nkSub-trees, the leaf nodes of each sub-tree constituting a functional module, whereby the nodes in the network G are divided into nkAnd (4) a module. To determine the optimal partition, we let k be 2, 3, … kmaxWhere k ismaxTo the maximum number of layers, k is obtainedmaxAnd (4) dividing the schemes, calculating the modularity value of each scheme, wherein the scheme with the maximum modularity is the required result.
When predicting protein interactions based on a hierarchical tree T of maximum likelihood values, at each internal node r in the hierarchical tree T, they carry a probability PrFor each vertex pair vi、vjWhenever its nearest common ancestor r, P is found in the treerI.e. the probability that they will interact.
Example (b):
comparing each predicted functional module with a reference functional module, wherein the matching degree between the predicted functional module and the reference functional module is measured by an Overlap Ratio (OR), and the calculation formula is as follows:
OR=2×O/(A+B)
wherein O represents a protein shared by the identified functional module and the reference functional module, a represents the number of proteins in the predicted functional module, B represents the number of proteins in the reference functional module, and the overlap ratio thereof is between 0 and 1, OR ═ 0 indicates that the predicted functional module and the reference functional module do not have a common protein, OR ═ 1 indicates that the predicted functional module and the reference functional module are completely identical, and a larger overlap ratio indicates that the higher the degree of matching between the mined functional module and the reference functional module, the larger the significance of the mined module is. A reasonable threshold should be one that ensures sufficient similarity between them while not being particularly strict, if a certain threshold is exceeded, where the threshold is set to 0.4, to be considered as predictive successful.
When compared with the MCODE, CFinder and ClusterONE algorithms, the proposed algorithm FMM-HS can be more accurate in the recognition of some functional modules, for example, among the recognized functional modules, only FMM-HS is completely recognized by the Rpd3S functional module as shown in FIG. 2 (b). The functional module identified by the MCODE algorithm is shown in the diagram (a), wherein the real functional module is shown in the black circle, and the other two methods cannot accurately identify the functional module.
In order to evaluate the effectiveness of the FMM-HS algorithm, three indexes of Precision (Precision), Sensitivity (Sensitivity) and Accuracy (Accuracy) are used as evaluation parameters.
Figure BSA0000154071160000061
Figure BSA0000154071160000062
Figure BSA0000154071160000063
Wherein TP represents the number of identified functional modules that overlap with the reference functional module by a ratio greater than or equal to 0.4, FP represents the number of functional modules that are not themselves incorrectly predicted as functional modules, the value is the total number of identified functional modules minus TP, FN represents the number of incorrectly predicted functional modules that are not themselves functional modules, and TN represents the number of correctly predicted functional modules that are not functional modules. Fig. 3 shows the prediction performance of each algorithm, where Pr represents accuracy, Sn represents sensitivity, and Acc represents road accuracy. From FIG. 3, it can be seen that the proposed algorithm FMM-HS is better than MCODE, CFinder and ClusterONE than the other three methods. Data sets of 4 yeast interaction networks were selected, Gavin, Krogan core, Collins and BioGRID, respectively. The results are shown in Table 1:
table 1: accuracy of four methods on 4 data sets
Figure BSA0000154071160000071
"N/A" in the CFinder algorithm indicates that no results were observed for 24 hours of operation on the BioGRID dataset. The proposed method may exhibit advantages on different data sets.

Claims (2)

1. A method for predicting a functional module and an effect based on a PPI network hierarchical structure is characterized by comprising the following steps:
(1) inputting a PPI network and biological information;
(2) constructing a hierarchical structure tree T according to a protein interaction network;
(3) likelihood value calculation of protein interaction network: obtaining a likelihood value corresponding to the original network G according to the combination of the hierarchical structure tree T and the assigned probability value on the internal hierarchy;
(4) coding hierarchical structure tree T: a middle-order traversal mode is adopted, namely a left child node is traversed, then a root node is traversed, finally a right child node is traversed, and the hierarchical structure tree T is coded;
(5) genetic algorithm for finding the hierarchical structure tree of maximum likelihood values T: selecting a pair of individuals which are not crossed according to the probability to carry out cross operation, and selecting one individual according to the probability to carry out mutation operation;
(6) functional module mining and action prediction: and calculating the modularity of each module according to the maximum likelihood value hierarchical structure tree T, and mining the functional modules to obtain the interaction probability.
2. The method of claim 1, wherein said step (5) of finding a genetic algorithm for a hierarchical tree of maximum likelihood values T comprises: selecting a pair of individuals which are not crossed according to the probability to carry out cross operation, and selecting one individual according to the probability to carry out mutation operation; and simultaneously, global search is carried out, the biological evolution is used as a prototype, and the convergence of the prototype is utilized to search the maximum likelihood value hierarchical structure tree T so as to obtain the modularity value of each module.
CN201711153530.XA 2017-11-15 2017-11-15 PPI-based network hierarchy prediction function module and function method Active CN107798215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711153530.XA CN107798215B (en) 2017-11-15 2017-11-15 PPI-based network hierarchy prediction function module and function method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711153530.XA CN107798215B (en) 2017-11-15 2017-11-15 PPI-based network hierarchy prediction function module and function method

Publications (2)

Publication Number Publication Date
CN107798215A CN107798215A (en) 2018-03-13
CN107798215B true CN107798215B (en) 2021-07-23

Family

ID=61536294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711153530.XA Active CN107798215B (en) 2017-11-15 2017-11-15 PPI-based network hierarchy prediction function module and function method

Country Status (1)

Country Link
CN (1) CN107798215B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681659B (en) * 2018-04-02 2022-04-05 首都师范大学 Method for predicting protein complex based on sample data
CN108647490B (en) * 2018-05-04 2022-06-17 安徽大学 Large-scale protein functional module identification method and system based on multi-objective evolutionary algorithm
CN109376842B (en) * 2018-08-20 2022-04-05 安徽大学 Functional module mining method based on attribute optimization protein network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298674A (en) * 2010-06-25 2011-12-28 清华大学 Method for determining medicament target and/or medicament function based on protein network
EP2971290A2 (en) * 2013-03-14 2016-01-20 The Governing Council Of The University Of Toronto Scaffolded peptidic libraries and methods of making and screening the same
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107092812A (en) * 2017-03-06 2017-08-25 扬州大学 A kind of method based on genetic algorithm in identification key protein matter in PPI networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001233276A1 (en) * 2000-02-03 2001-08-14 Immunomatrix Inc. Method and apparatus for signal transduction pathway profiling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298674A (en) * 2010-06-25 2011-12-28 清华大学 Method for determining medicament target and/or medicament function based on protein network
EP2971290A2 (en) * 2013-03-14 2016-01-20 The Governing Council Of The University Of Toronto Scaffolded peptidic libraries and methods of making and screening the same
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107092812A (en) * 2017-03-06 2017-08-25 扬州大学 A kind of method based on genetic algorithm in identification key protein matter in PPI networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"A Novel Link Prediction Algorithm Based on Spatial Mapping in PPI Network";Qiang-Mei Wu 等;《IDEAL 2016》;20161014;106-113 *
"Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction";Daniela Stojanova 等;《BMC Bioinformatics》;20130926;1-18 *
"基于网络分析的蛋白质相互作用预测研究与应用";章月阳;《中国优秀硕士学位论文全文数据库-基础科学辑》;20170215;第2017年卷(第2期);A006-495 *
李满生." 蛋白质相互作用文献挖掘方法、注释体系及挖掘平台研究".《中国博士学位论文全文数据库-基础科学辑》.2016,第2016年卷(第8期),A006-136. *

Also Published As

Publication number Publication date
CN107798215A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
CN103208027B (en) Method for genetic algorithm with local modularity for community detecting
CN107798215B (en) PPI-based network hierarchy prediction function module and function method
Wang et al. An evolutionary autoencoder for dynamic community detection
CN104298873A (en) Attribute reduction method and mental state assessment method on the basis of genetic algorithm and rough set
CN106845536B (en) Parallel clustering method based on image scaling
Xu et al. Graph partitioning and graph neural network based hierarchical graph matching for graph similarity computation
Li et al. Greedy optimization for K-means-based consensus clustering
Jiang et al. A density peak clustering algorithm based on the K-nearest Shannon entropy and tissue-like P system
CN114357313A (en) Data processing method and device
Malhotra Community detection in complex networks using link strength-based hybrid genetic algorithm
Shinde et al. Cbica: Correlation based incremental clustering algorithm, a new approach
Anagnostou et al. Approximate kNN classification for biomedical data
Muscoloni et al. Angular separability of data clusters or network communities in geometrical space and its relevance to hyperbolic embedding
Wang et al. GA-based membrane evolutionary algorithm for ensemble clustering
Liu et al. A Network Hierarchy-Based method for functional module detection in protein–protein interaction networks
Liu et al. Evolutionary multi-objective optimization in searching for various antimicrobial peptides [feature]
Shirmohammady et al. PPI-GA: a novel clustering algorithm to identify protein complexes within protein-protein interaction networks using genetic algorithm
Gong et al. Computational intelligence for network structure analytics
CN115544070A (en) Similarity query optimization method based on trajectory representation learning
Wang et al. A novel subgraph querying method on directed weighted graphs
Brahim et al. A literature survey on label propagation for community detection
Vijendra et al. An effective clustering algorithm for data mining
Xu Deep mining method for high-dimensional big data based on association rule
Hu et al. Learning deep representations in large integrated network for graph clustering
Strazzeri A Morse-theoretical clustering algorithm for annotated networks and spectral bounds for fuzzy clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant