CN109376842B - Functional module mining method based on attribute optimization protein network - Google Patents
Functional module mining method based on attribute optimization protein network Download PDFInfo
- Publication number
- CN109376842B CN109376842B CN201810946353.9A CN201810946353A CN109376842B CN 109376842 B CN109376842 B CN 109376842B CN 201810946353 A CN201810946353 A CN 201810946353A CN 109376842 B CN109376842 B CN 109376842B
- Authority
- CN
- China
- Prior art keywords
- protein
- node
- individual
- population
- pop
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/002—Biomolecular computers, i.e. using biomolecules, proteins, cells
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Abstract
The invention discloses a function based on attribute optimization protein networkThe module mining method comprises the following steps: s1, extracting protein candidate node pairs; s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual; s3, performing cross variation among population individuals to generate a progeny population; s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual; s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population; and S6, repeatedly executing the steps S3-S5 until the maximum iteration number is reached, and outputting the function module set of each individual in the pareto optimal solution set of the population.
Description
Technical Field
The invention relates to the technical field of functional module identification, in particular to a functional module mining method based on an attribute optimization protein network.
Background
Thousands of proteins in an organism constitute a protein module with a wide variety of functions at different time and different space stages, among the cellular functions of biological interest, the protein function module, one of its most basic building blocks, plays a very important role in the binding of the respective gene product, how to mine a protein functional module closely related to biological functions from protein interaction data becomes an important breakthrough for people to uncover the relationship between protein interaction and biological function detection, the existing scheme only utilizes the structure of the protein network, and the detection result is possibly not accurate enough for some incomplete protein networks, therefore, the functional module identification method for optimizing the protein network by the attribute information can effectively mine better protein module combinations and provide more protein module selection combinations.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a functional module mining method based on an attribute optimization protein network;
the invention provides a functional module mining method based on an attribute optimization protein network, which comprises the following steps:
s1, extracting protein candidate node pairs;
s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual;
s3, performing cross variation among population individuals to generate a progeny population;
s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual;
s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population;
and S6, repeatedly executing the steps S3, S4 and S5 until the maximum iteration number is reached, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is a protein function module partition set.
Preferably, step S1 specifically includes:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi;
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, taking out a node pair (u, v) from the protein node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
Preferably, step S2 specifically includes:
s201, defining the maximum iteration time as maxgen, the initial iteration time as t 1, the number of population individuals as pop, and pop individuals { X ] in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204、The pop is repeated for the steps S202 and S203 to obtain the initial population code { X }1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor nodeSelecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 untilObtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularityWherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
Preferably, step S3 specifically includes:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
Preferably, step S4 specifically includes:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodesProtein nodes in the point pairs obtain a protein node set Vcg;
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, repeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculatedAdding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population,represents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the offspring seedsThe g-th individual X in the populationgTwo objective functions of (2):
degree of modularityWherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
Preferably, step S5 specifically includes:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
Preferably, step S6 specifically includes:
and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1 until t is greater than maxgen, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is the protein function module partition set.
The invention comprehensively considers the unique attribute information of the protein nodes and the interaction between the proteins, the combination of the protein functional modules is obtained by extracting useful attribute information to continuously optimize the structure of the protein network and adjusting the attribution condition of partial protein nodes, thereby greatly improving the accuracy and effectiveness of the functional module mining in the protein network and achieving the purpose of better dividing the protein network, secondly, protein node pairs are extracted before evolution, individual coding length is reduced, combination of protein network function modules can be rapidly obtained based on the method, protein mining efficiency is improved to a great extent, and finally, the multi-objective evolutionary algorithm is used for mining functional modules in the protein network, the advantages of the multi-objective evolutionary algorithm are fully utilized, various choices are provided for decision makers, and mining results are diversified.
Drawings
Fig. 1 is a schematic flow chart of a functional module mining method for optimizing a protein network based on attributes according to the present invention.
Detailed Description
Referring to fig. 1, the functional module mining method for optimizing a protein network based on attributes provided by the invention comprises the following steps:
step S1, extracting protein candidate node pairs, specifically including:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi;
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, taking out a node pair (u, v) from the protein node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
In the specific scheme, the individual coding length is the number of node pairs in the protein network, so that the protein node pairs are extracted before evolution, the individual coding length is reduced, the combination of functional modules of the protein network can be quickly obtained based on the method, and the protein mining efficiency is improved to a great extent.
Step S2, initializing the population and the function module set of each individual in the population through the extraction of the protein candidate node pairs and calculating the fitness value of each individual, which specifically comprises the following steps:
s201, defining the maximum iteration time as maxgen, the initial iteration time as t 1, the number of population individuals as pop, and pop individuals { X ] in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204, repeatedly executing the pop steps S202 and S203 to obtain the initialStarting group code { X1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor nodeSelecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 untilObtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularityWherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
Step S3, performing cross variation among population individuals to generate a progeny population, specifically including:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
Step S4, the offspring individuals inherit the function module set of the parent individuals, and adjust the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individual, to obtain the function module set of each individual in the offspring population and calculate the fitness value of each individual, which specifically includes:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodes in the protein node pairs to obtain a protein node set Vcg;
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, repeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculatedAdding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population, kv rRepresents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the g individual X in the filial generation populationgTwo objective functions of (2):
degree of modularityWherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
Step S5, selecting an environment according to the fitness values of the parent population and the child population to obtain a new population, specifically including:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
Step S6, repeating steps S3, S4, and S5 until the maximum number of iterations is reached, outputting a set of function modules of each individual in the population, where the set of function modules of an individual is a protein function module partition set, and specifically includes: and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1 until t is greater than maxgen, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is the protein function module partition set.
The embodiment comprehensively considers the interaction between the unique attribute information of the protein nodes and the protein, continuously optimizes the protein network structure by extracting useful attribute information, adjusts the attribution condition of partial protein nodes to obtain the combination of the protein functional modules, greatly improves the accuracy and effectiveness of the functional module mining in the protein network, achieves the aim of better dividing the protein network, extracts the protein node pairs before evolution, reduces the individual coding length, can quickly obtain the combination of the protein network functional modules based on the method, greatly improves the efficiency of protein mining, finally, uses the multi-target evolutionary algorithm to mine the protein modules in the protein network, fully utilizes the advantages of the multi-target evolutionary algorithm, provides multiple choices for decision makers, and the mining results are diversified.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (6)
1. A functional module mining method for optimizing a protein network based on attributes is characterized by comprising the following steps:
s1, extracting protein candidate node pairs;
s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual;
s3, performing cross variation among population individuals to generate a progeny population;
s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual;
s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population;
s6, repeatedly executing the steps S3, S4 and S5 until the maximum iteration times are reached, outputting a function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is a protein function module partition set;
the step S2 specifically includes:
s201, defining the maximum iteration number as max gen, defining the initial iteration number as t 1, wherein the number of population individuals is pop, and pop individuals { X ] exist in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204, repeatedly executing the pop steps S202 and S203 to obtain an initial population code { X }1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor nodeSelecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 untilObtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularityWherein lkIndicating the number of connecting edges in the kth functional module,dkrepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
2. The method for mining functional modules based on attribute-optimized protein networks according to claim 1, wherein the step S1 specifically comprises:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi;
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, from eggTaking out a node pair (u, v) from the white matter node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
3. The method for mining functional modules based on attribute-optimized protein networks according to claim 1, wherein the step S3 specifically comprises:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
4. The method for mining functional modules based on attribute-optimized protein networks according to claim 3, wherein the step S4 specifically comprises:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodes in the protein node pairs to obtain a protein node set Vcg;
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, will orderRepeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculatedAdding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population,represents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the g individual X in the filial generation populationgTwo objective functions of (2):
degree of modularityWherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of propertiesWherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
5. The method for mining functional modules based on attribute-optimized protein networks according to claim 4, wherein the step S5 specifically comprises:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
6. The method for mining functional modules based on attribute-optimized protein networks according to claim 5, wherein the step S6 specifically comprises:
and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1, and outputting the function module set of each individual in the pareto optimal solution set of the population when t is larger than max gen, wherein the function module set of each individual is the protein function module partition set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810946353.9A CN109376842B (en) | 2018-08-20 | 2018-08-20 | Functional module mining method based on attribute optimization protein network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810946353.9A CN109376842B (en) | 2018-08-20 | 2018-08-20 | Functional module mining method based on attribute optimization protein network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109376842A CN109376842A (en) | 2019-02-22 |
CN109376842B true CN109376842B (en) | 2022-04-05 |
Family
ID=65403779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810946353.9A Active CN109376842B (en) | 2018-08-20 | 2018-08-20 | Functional module mining method based on attribute optimization protein network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109376842B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103208027A (en) * | 2013-03-13 | 2013-07-17 | 北京工业大学 | Method for genetic algorithm with local modularity for community detecting |
CN106874708A (en) * | 2017-01-23 | 2017-06-20 | 陕西师范大学 | The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food |
CN106991295A (en) * | 2017-03-31 | 2017-07-28 | 安徽大学 | A kind of protein network module method for digging based on multiple-objection optimization |
CN107798215A (en) * | 2017-11-15 | 2018-03-13 | 扬州大学 | Method based on PPI network hierarchical structure forecast function modules and effect |
CN108388769A (en) * | 2018-03-01 | 2018-08-10 | 安徽大学 | The protein function module recognition method of label propagation algorithm based on side driving |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030044866A1 (en) * | 2001-08-15 | 2003-03-06 | Charles Boone | Yeast arrays, methods of making such arrays, and methods of analyzing such arrays |
US20080228699A1 (en) * | 2007-03-16 | 2008-09-18 | Expanse Networks, Inc. | Creation of Attribute Combination Databases |
-
2018
- 2018-08-20 CN CN201810946353.9A patent/CN109376842B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103208027A (en) * | 2013-03-13 | 2013-07-17 | 北京工业大学 | Method for genetic algorithm with local modularity for community detecting |
CN106874708A (en) * | 2017-01-23 | 2017-06-20 | 陕西师范大学 | The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food |
CN106991295A (en) * | 2017-03-31 | 2017-07-28 | 安徽大学 | A kind of protein network module method for digging based on multiple-objection optimization |
CN107798215A (en) * | 2017-11-15 | 2018-03-13 | 扬州大学 | Method based on PPI network hierarchical structure forecast function modules and effect |
CN108388769A (en) * | 2018-03-01 | 2018-08-10 | 安徽大学 | The protein function module recognition method of label propagation algorithm based on side driving |
Also Published As
Publication number | Publication date |
---|---|
CN109376842A (en) | 2019-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102413029B (en) | Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition | |
CN108334949B (en) | Image classifier construction method based on optimized deep convolutional neural network structure fast evolution | |
Kassahun et al. | Efficient reinforcement learning through Evolutionary Acquisition of Neural Topologies. | |
CN106991295B (en) | A kind of protein network module method for digging based on multiple-objection optimization | |
CN103745258B (en) | Complex network community mining method based on the genetic algorithm of minimum spanning tree cluster | |
CN111898689A (en) | Image classification method based on neural network architecture search | |
Dürr et al. | Neuroevolution with analog genetic encoding | |
WO2011135410A1 (en) | Optimization technique using evolutionary algorithms | |
CN112085161B (en) | Graph neural network method based on random information transmission | |
CN112084877A (en) | NSGA-NET-based remote sensing image identification method | |
CN116964594A (en) | Neural network structure searching method and system based on evolution learning | |
CN116167617A (en) | Geological disaster risk assessment method and system integrating random forest and attention | |
CN109376842B (en) | Functional module mining method based on attribute optimization protein network | |
CN114999635A (en) | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec | |
CN107577918A (en) | The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model | |
CN108509764B (en) | Ancient organism pedigree evolution analysis method based on genetic attribute reduction | |
Kim et al. | A modified genetic algorithm for fast training neural networks | |
CN111669288B (en) | Directional network link prediction method and device based on directional heterogeneous neighbor | |
CN111352650A (en) | Software modularization multi-objective optimization method and system based on INSGA-II | |
US11915155B2 (en) | Optimization calculation apparatus and optimization calculation method | |
CN109918659B (en) | Method for optimizing word vector based on unreserved optimal individual genetic algorithm | |
Chandra et al. | Modularity adaptation in cooperative coevolution of feedforward neural networks | |
CN110048945B (en) | Node mobility clustering method and system | |
Murthy | Genetic Algorithms: Basic principles and applications | |
CN109390057B (en) | Disease module detection method based on multi-objective optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |