CN109376842B - Functional module mining method based on attribute optimization protein network - Google Patents

Functional module mining method based on attribute optimization protein network Download PDF

Info

Publication number
CN109376842B
CN109376842B CN201810946353.9A CN201810946353A CN109376842B CN 109376842 B CN109376842 B CN 109376842B CN 201810946353 A CN201810946353 A CN 201810946353A CN 109376842 B CN109376842 B CN 109376842B
Authority
CN
China
Prior art keywords
protein
node
individual
population
pop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810946353.9A
Other languages
Chinese (zh)
Other versions
CN109376842A (en
Inventor
张兴义
刘振杰
田野
程凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201810946353.9A priority Critical patent/CN109376842B/en
Publication of CN109376842A publication Critical patent/CN109376842A/en
Application granted granted Critical
Publication of CN109376842B publication Critical patent/CN109376842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/002Biomolecular computers, i.e. using biomolecules, proteins, cells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Abstract

The invention discloses a function based on attribute optimization protein networkThe module mining method comprises the following steps: s1, extracting protein candidate node pairs; s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual; s3, performing cross variation among population individuals to generate a progeny population; s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual; s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population; and S6, repeatedly executing the steps S3-S5 until the maximum iteration number is reached, and outputting the function module set of each individual in the pareto optimal solution set of the population.

Description

Functional module mining method based on attribute optimization protein network
Technical Field
The invention relates to the technical field of functional module identification, in particular to a functional module mining method based on an attribute optimization protein network.
Background
Thousands of proteins in an organism constitute a protein module with a wide variety of functions at different time and different space stages, among the cellular functions of biological interest, the protein function module, one of its most basic building blocks, plays a very important role in the binding of the respective gene product, how to mine a protein functional module closely related to biological functions from protein interaction data becomes an important breakthrough for people to uncover the relationship between protein interaction and biological function detection, the existing scheme only utilizes the structure of the protein network, and the detection result is possibly not accurate enough for some incomplete protein networks, therefore, the functional module identification method for optimizing the protein network by the attribute information can effectively mine better protein module combinations and provide more protein module selection combinations.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a functional module mining method based on an attribute optimization protein network;
the invention provides a functional module mining method based on an attribute optimization protein network, which comprises the following steps:
s1, extracting protein candidate node pairs;
s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual;
s3, performing cross variation among population individuals to generate a progeny population;
s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual;
s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population;
and S6, repeatedly executing the steps S3, S4 and S5 until the maximum iteration number is reached, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is a protein function module partition set.
Preferably, step S1 specifically includes:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
Figure GDA0003461248730000021
AuAnd AvRespectively representing attribute sets of the protein node u and the protein node v;
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, taking out a node pair (u, v) from the protein node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
Preferably, step S2 specifically includes:
s201, defining the maximum iteration time as maxgen, the initial iteration time as t 1, the number of population individuals as pop, and pop individuals { X ] in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204、The pop is repeated for the steps S202 and S203 to obtain the initial population code { X }1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }
Figure GDA0003461248730000031
Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor node
Figure GDA0003461248730000041
Selecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 until
Figure GDA0003461248730000042
Obtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularity
Figure GDA0003461248730000043
Wherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure GDA0003461248730000044
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
Preferably, step S3 specifically includes:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
Preferably, step S4 specifically includes:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodesProtein nodes in the point pairs obtain a protein node set Vcg
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, repeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculated
Figure GDA0003461248730000051
Adding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population,
Figure GDA0003461248730000052
represents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the offspring seedsThe g-th individual X in the populationgTwo objective functions of (2):
degree of modularity
Figure GDA0003461248730000053
Wherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure GDA0003461248730000061
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
Preferably, step S5 specifically includes:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
Preferably, step S6 specifically includes:
and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1 until t is greater than maxgen, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is the protein function module partition set.
The invention comprehensively considers the unique attribute information of the protein nodes and the interaction between the proteins, the combination of the protein functional modules is obtained by extracting useful attribute information to continuously optimize the structure of the protein network and adjusting the attribution condition of partial protein nodes, thereby greatly improving the accuracy and effectiveness of the functional module mining in the protein network and achieving the purpose of better dividing the protein network, secondly, protein node pairs are extracted before evolution, individual coding length is reduced, combination of protein network function modules can be rapidly obtained based on the method, protein mining efficiency is improved to a great extent, and finally, the multi-objective evolutionary algorithm is used for mining functional modules in the protein network, the advantages of the multi-objective evolutionary algorithm are fully utilized, various choices are provided for decision makers, and mining results are diversified.
Drawings
Fig. 1 is a schematic flow chart of a functional module mining method for optimizing a protein network based on attributes according to the present invention.
Detailed Description
Referring to fig. 1, the functional module mining method for optimizing a protein network based on attributes provided by the invention comprises the following steps:
step S1, extracting protein candidate node pairs, specifically including:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
Figure GDA0003461248730000071
AuAnd AvRespectively representing attribute sets of the protein node u and the protein node v;
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, taking out a node pair (u, v) from the protein node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
In the specific scheme, the individual coding length is the number of node pairs in the protein network, so that the protein node pairs are extracted before evolution, the individual coding length is reduced, the combination of functional modules of the protein network can be quickly obtained based on the method, and the protein mining efficiency is improved to a great extent.
Step S2, initializing the population and the function module set of each individual in the population through the extraction of the protein candidate node pairs and calculating the fitness value of each individual, which specifically comprises the following steps:
s201, defining the maximum iteration time as maxgen, the initial iteration time as t 1, the number of population individuals as pop, and pop individuals { X ] in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204, repeatedly executing the pop steps S202 and S203 to obtain the initialStarting group code { X1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }
Figure GDA0003461248730000091
Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor node
Figure GDA0003461248730000092
Selecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 until
Figure GDA0003461248730000093
Obtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularity
Figure GDA0003461248730000094
Wherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure GDA0003461248730000095
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
Step S3, performing cross variation among population individuals to generate a progeny population, specifically including:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
Step S4, the offspring individuals inherit the function module set of the parent individuals, and adjust the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individual, to obtain the function module set of each individual in the offspring population and calculate the fitness value of each individual, which specifically includes:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodes in the protein node pairs to obtain a protein node set Vcg
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, repeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculated
Figure GDA0003461248730000101
Adding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population, kv rRepresents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the g individual X in the filial generation populationgTwo objective functions of (2):
degree of modularity
Figure GDA0003461248730000111
Wherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure GDA0003461248730000112
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
Step S5, selecting an environment according to the fitness values of the parent population and the child population to obtain a new population, specifically including:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
Step S6, repeating steps S3, S4, and S5 until the maximum number of iterations is reached, outputting a set of function modules of each individual in the population, where the set of function modules of an individual is a protein function module partition set, and specifically includes: and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1 until t is greater than maxgen, outputting the function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is the protein function module partition set.
The embodiment comprehensively considers the interaction between the unique attribute information of the protein nodes and the protein, continuously optimizes the protein network structure by extracting useful attribute information, adjusts the attribution condition of partial protein nodes to obtain the combination of the protein functional modules, greatly improves the accuracy and effectiveness of the functional module mining in the protein network, achieves the aim of better dividing the protein network, extracts the protein node pairs before evolution, reduces the individual coding length, can quickly obtain the combination of the protein network functional modules based on the method, greatly improves the efficiency of protein mining, finally, uses the multi-target evolutionary algorithm to mine the protein modules in the protein network, fully utilizes the advantages of the multi-target evolutionary algorithm, provides multiple choices for decision makers, and the mining results are diversified.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A functional module mining method for optimizing a protein network based on attributes is characterized by comprising the following steps:
s1, extracting protein candidate node pairs;
s2, initializing the population and the function module set of each individual in the population through the extraction of protein candidate node pairs and according to the modularity QgAnd attribute density SAgCalculating a fitness value of each individual;
s3, performing cross variation among population individuals to generate a progeny population;
s4, enabling the offspring individuals to inherit the function module set of the parent individuals, adjusting the function modules of the offspring individuals according to the difference between the gene values of each offspring individual and the parent individuals, obtaining the function module set of each individual in the offspring population, and calculating the fitness value of each individual;
s5, selecting the environment according to the fitness values of the parent population and the child population to obtain a new population;
s6, repeatedly executing the steps S3, S4 and S5 until the maximum iteration times are reached, outputting a function module set of each individual in the pareto optimal solution set of the population, wherein the function module set of each individual is a protein function module partition set;
the step S2 specifically includes:
s201, defining the maximum iteration number as max gen, defining the initial iteration number as t 1, wherein the number of population individuals is pop, and pop individuals { X ] exist in the population1,X2,…,Xg,…,Xpop},XgRepresents the g-th individual;
s202, taking out a protein node pair (u, v) from the protein node pair set, randomly generating a random number R between 0 and 1, and calculating the ith gene coefficient zetai=0.5+SuvAvg (S), if R ≦ ζiThe value of the ith gene of the individual is 1, otherwise, the value is 0, wherein Suv(ii) represents the attribute similarity of the protein node pair (u, v), and avg(s) represents the average attribute similarity of the candidate protein node pair;
s203, repeating step S202 for the remaining protein node pairs in the set of protein node pairs until the set of protein node pairs is equal to the empty set, and obtaining the code X ═ g of the individual1,g2,...,gi,...,gm};
S204, repeatedly executing the pop steps S202 and S203 to obtain an initial population code { X }1,X2,...,XPOP};
S205, obtaining { X1,X2,...,XPOPAn individual, let i equal 1 if the individual's ith gene value giEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s206, repeating step S205 until i > m to obtain a new protein network GnM represents an individual code length;
s207, pairing the population { X1,X2,...,XPOPRepeat steps S205, S206 for the remaining individuals to obtain the corresponding protein network G ═ G {1,G2,...,GPOP};
S208, slave G ═ G1,G2,...,GPOPSelect a network GiCalculating the node priority of each node in the network { V, E, A }
Figure FDA0003461248720000021
Wherein n isiRepresenting the number of edges connected between the neighbor nodes of the protein node i, and k representing the number of the neighbors of the protein node i;
s209, selecting the protein node V with the maximum node priority from the V, and calculating the similarity between the protein node V and each neighbor node
Figure FDA0003461248720000022
Selecting the neighbor nodes u, u, v with the maximum similarity and the common neighbor of u, v to form a functional module CiRemoving u, V and common neighbors of u, V from V, and calculating node priority of nodes in V, wherein N isrA neighbor node representing a protein node r;
s210, repeatedly executing the step S209 until
Figure FDA0003461248720000023
Obtaining the functional module division of the network;
s211, G ═ G1,G2,...,GPOPRepeatedly executing the steps S208, S209 and S210 by the rest networks to obtain pop protein functional module partition sets;
s212, calculating and initializing the g individual X in the parent populationgTwo objective functions of (2):
degree of modularity
Figure FDA0003461248720000024
Wherein lkIndicating the number of connecting edges in the kth functional module,dkrepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure FDA0003461248720000031
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresents the number of protein nodes within the kth protein module;
s213, executing step S212 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the parent population.
2. The method for mining functional modules based on attribute-optimized protein networks according to claim 1, wherein the step S1 specifically comprises:
s11, definition the protein network is characterised by G ═ (V, E, a), V ═ V1,v2,…,vi,…,vnDenotes the set of all protein nodes in the protein network, viRepresents the ith protein node; n is the total number of protein nodes;
s12, calculating the attribute similarity of any two protein nodes
Figure FDA0003461248720000032
AuAnd AvRespectively representing attribute sets of the protein node u and the protein node v;
s13, adding the protein node pairs into the range of [0,1 ] according to the attribute similarity]In 100 buckets with a gradient of 0.01, the number Buck of protein node pairs in each bucket is calculatedi
S14 according to BuckiThe buckets are arranged in descending order, with the first bucket corresponding to [0,1 ]]Value in between1The second bucket corresponds to [0,1 ]]Value in between2,Value1And Value2The average value T of the attribute similarity is used as a threshold value of the attribute similarity;
s15, from eggTaking out a node pair (u, v) from the white matter node pair set, if SuvIf the node pair (u, v) is not less than T, adding the node pair (u, v) into a candidate protein node pair set Nodepair, and removing the node pair (u, v) from the protein node pair set;
s16, repeating step S15 for the remaining protein nodes to obtain the extracted candidate protein node pair set Nodepair ═ { P }1,P2,...,PkIn which P isrRepresenting the r protein node pair.
3. The method for mining functional modules based on attribute-optimized protein networks according to claim 1, wherein the step S3 specifically comprises:
s31, making t equal to 1, selecting an individual g and an individual j from the population P in a binary tournament mode, and performing cross variation on the individual g and the individual j to obtain a child individual child;
s32, execute pop step S31 to obtain the offspring population O ═ X1,X2,...,XPOP}。
4. The method for mining functional modules based on attribute-optimized protein networks according to claim 3, wherein the step S4 specifically comprises:
s41, selecting from the offspring population O ═ { X ═ X1,X2,...,XPOPGet an individual XKThe individual XKComparing with corresponding parent individuals, finding out protein node pairs corresponding to the gene positions with changed gene values in the candidate protein node pairs, and extracting the protein nodes in the protein node pairs to obtain a protein node set Vcg
S42 for individual XKWhich individually code for XK={g1,g2,...,gi,...,gmIf the i-th gene value g of the individual is 1iEstablishing a connecting edge between a node u and a node v in the protein network G, wherein the ith gene of an individual corresponds to the ith candidate protein node pair (u, v) in the candidate protein set;
s43, will orderRepeating step S42 until i > m to obtain a new protein network GnM represents an individual code length;
s44, extracting protein network GnMiddle by VcgSubgraph composed of protein nodes in (1);
s45, V is processed according to the mode that the number of the neighbors of the sub graph is increased progressivelycgThe protein nodes in the sequence are sorted, the first protein node v is selected, and the modularity change of v from the current functional module i to any functional module j is calculated
Figure FDA0003461248720000041
Adding the protein node V into the module k corresponding to the maximum module degree change, and separating the protein node from VcgWherein L represents the total number of edges in the kth protein network of the progeny population,
Figure FDA0003461248720000042
represents the number of neighbors, k, of the protein node v in the r-th protein functional modulevRepresents the number of neighbors of the protein node v, KrRepresents the total number of the r protein functional modules;
s46, execute | VcgI ] Steps S45 get Individual XKThe protein functional module partition set of (3);
s47, executing pop steps S41, S42, S43, S44, S45 and S46 to obtain pop protein functional module partition sets of the offspring population;
s48, calculating the g individual X in the filial generation populationgTwo objective functions of (2):
degree of modularity
Figure FDA0003461248720000051
Wherein lkDenotes the number of connecting edges in the kth functional module, dkRepresents the total degree in the kth functional module; l denotes the G-th protein network GgTotal number of edges in;
density of properties
Figure FDA0003461248720000052
Wherein S (i, j) represents the similarity of the attributes of the protein node i and the protein node j; r iskRepresenting the number of protein nodes in the kth protein module to obtain the modularity and attribute density corresponding to pop individuals of the offspring population;
s49, executing step S48 on the pop protein functional module division sets to obtain the functional module set modularity and attribute density of the filial population.
5. The method for mining functional modules based on attribute-optimized protein networks according to claim 4, wherein the step S5 specifically comprises:
merging the parent population and the offspring population to obtain a population PunionSorted from P by congestion distance according to non-dominated sorting maximizationunionPop individuals were selected as a new population P.
6. The method for mining functional modules based on attribute-optimized protein networks according to claim 5, wherein the step S6 specifically comprises:
and (4) repeatedly executing the steps S3, S4 and S5 when t is equal to t +1, and outputting the function module set of each individual in the pareto optimal solution set of the population when t is larger than max gen, wherein the function module set of each individual is the protein function module partition set.
CN201810946353.9A 2018-08-20 2018-08-20 Functional module mining method based on attribute optimization protein network Active CN109376842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810946353.9A CN109376842B (en) 2018-08-20 2018-08-20 Functional module mining method based on attribute optimization protein network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810946353.9A CN109376842B (en) 2018-08-20 2018-08-20 Functional module mining method based on attribute optimization protein network

Publications (2)

Publication Number Publication Date
CN109376842A CN109376842A (en) 2019-02-22
CN109376842B true CN109376842B (en) 2022-04-05

Family

ID=65403779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810946353.9A Active CN109376842B (en) 2018-08-20 2018-08-20 Functional module mining method based on attribute optimization protein network

Country Status (1)

Country Link
CN (1) CN109376842B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103208027A (en) * 2013-03-13 2013-07-17 北京工业大学 Method for genetic algorithm with local modularity for community detecting
CN106874708A (en) * 2017-01-23 2017-06-20 陕西师范大学 The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food
CN106991295A (en) * 2017-03-31 2017-07-28 安徽大学 A kind of protein network module method for digging based on multiple-objection optimization
CN107798215A (en) * 2017-11-15 2018-03-13 扬州大学 Method based on PPI network hierarchical structure forecast function modules and effect
CN108388769A (en) * 2018-03-01 2018-08-10 安徽大学 The protein function module recognition method of label propagation algorithm based on side driving

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030044866A1 (en) * 2001-08-15 2003-03-06 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
US20080228699A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Creation of Attribute Combination Databases

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103208027A (en) * 2013-03-13 2013-07-17 北京工业大学 Method for genetic algorithm with local modularity for community detecting
CN106874708A (en) * 2017-01-23 2017-06-20 陕西师范大学 The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food
CN106991295A (en) * 2017-03-31 2017-07-28 安徽大学 A kind of protein network module method for digging based on multiple-objection optimization
CN107798215A (en) * 2017-11-15 2018-03-13 扬州大学 Method based on PPI network hierarchical structure forecast function modules and effect
CN108388769A (en) * 2018-03-01 2018-08-10 安徽大学 The protein function module recognition method of label propagation algorithm based on side driving

Also Published As

Publication number Publication date
CN109376842A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN102413029B (en) Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition
CN108334949B (en) Image classifier construction method based on optimized deep convolutional neural network structure fast evolution
Kassahun et al. Efficient reinforcement learning through Evolutionary Acquisition of Neural Topologies.
CN106991295B (en) A kind of protein network module method for digging based on multiple-objection optimization
CN103745258B (en) Complex network community mining method based on the genetic algorithm of minimum spanning tree cluster
CN111898689A (en) Image classification method based on neural network architecture search
Dürr et al. Neuroevolution with analog genetic encoding
WO2011135410A1 (en) Optimization technique using evolutionary algorithms
CN112085161B (en) Graph neural network method based on random information transmission
CN112084877A (en) NSGA-NET-based remote sensing image identification method
CN116964594A (en) Neural network structure searching method and system based on evolution learning
CN116167617A (en) Geological disaster risk assessment method and system integrating random forest and attention
CN109376842B (en) Functional module mining method based on attribute optimization protein network
CN114999635A (en) circRNA-disease association relation prediction method based on graph convolution neural network and node2vec
CN107577918A (en) The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
CN108509764B (en) Ancient organism pedigree evolution analysis method based on genetic attribute reduction
Kim et al. A modified genetic algorithm for fast training neural networks
CN111669288B (en) Directional network link prediction method and device based on directional heterogeneous neighbor
CN111352650A (en) Software modularization multi-objective optimization method and system based on INSGA-II
US11915155B2 (en) Optimization calculation apparatus and optimization calculation method
CN109918659B (en) Method for optimizing word vector based on unreserved optimal individual genetic algorithm
Chandra et al. Modularity adaptation in cooperative coevolution of feedforward neural networks
CN110048945B (en) Node mobility clustering method and system
Murthy Genetic Algorithms: Basic principles and applications
CN109390057B (en) Disease module detection method based on multi-objective optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant