CN104200272A - Complex network community mining method based on improved genetic algorithm - Google Patents

Complex network community mining method based on improved genetic algorithm Download PDF

Info

Publication number
CN104200272A
CN104200272A CN201410429721.4A CN201410429721A CN104200272A CN 104200272 A CN104200272 A CN 104200272A CN 201410429721 A CN201410429721 A CN 201410429721A CN 104200272 A CN104200272 A CN 104200272A
Authority
CN
China
Prior art keywords
population
pop
community
node
individuality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410429721.4A
Other languages
Chinese (zh)
Inventor
杨新武
杨丽军
李�瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201410429721.4A priority Critical patent/CN104200272A/en
Publication of CN104200272A publication Critical patent/CN104200272A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a complex network community mining method based on an improved genetic algorithm and belongs to the technical field of complex network community mining method research. The complex network community mining method based on the improved genetic algorithm uses the improved genetic algorithm based on clustering and double population thought fusion to mine communities in a complex network. The complex network community mining method based on the improved genetic algorithm uses a normalization common information similarity standard as the standard for measuring the similarity between individuals in the population and fuses the clustering and double population thought. The complex network community mining method based on the improved genetic algorithm includes that introducing the clustering thought, using a minimum spanning tree clustering method to classify the population, introducing the double population thought, and determining the main type and auxiliary type for the clustering. The main type maintains the population evolution direction to get close to the optimal solution of an objective function; the auxiliary type is mainly used for duly providing diversity for the main type so as to enable the main type to be capable of coming out to search the other solution space to realize the complex network community mining when the main type is located at the local optimum.

Description

A kind of complex network community mining method based on improved genetic algorithms method
Technical field
The invention belongs to complex network community mining method studying technological domain, specifically used a kind of improved genetic algorithms method based on cluster and two population thought fusions to excavate the community in complex network, it is a kind of computer technology of utilizing, genetic algorithm, cluster and two population thought fusion methods realize the new method of complex network community mining.
Background technology
In real world, many complication systems exist with the form of network, or can be by the abstract complex network that changes into.The research of complex network has become a study hotspot in a lot of fields.A lot of systems in real world can be regarded as the interactional result of numerous subsystems.Complex network is the abstract of complication system, if the individuality in complication system is regarded as in the summit in complex network, the limit in complex network can be regarded as connecting each other between individuality in complication system so.Complex network is a kind of abstract structure by system simplification, can help us to understand better character and the function of complication system.
Since the nineties in 20th century, along with the fast development of computer technology and Internet technology, indicated that the mankind have marched toward cybertimes.The mankind have lived in the world of various complex networks.As large-scale power net, global traffic net, metabolic net, social relationships net etc.Because complexity problem research and the complexity concerns of network have relation very closely, Network Science becomes the field of multidisciplinary intersection, and has caused people's great attention.At present a large amount of results of study shows, complex network ubiquity worldlet effect, without basic statistical properties such as characteristics of scales.Along with people are to the going deep into of the property research of complex network, found a common feature existing in a lot of live networks, i.e. community structure, another important feature of complex network.Community structure, in complex network research field by extensive concern, becomes a hotter field at present.Community in real network is representing the set of special object.For example, in community network, community structure is according to certain interest or the real public organization of background formation; In citation network, community structure representative is for the relevant paper of same subject; In WWW, some websites of related subject etc. are discussed in community structure representative.In real life, the excavation of community structure has application extremely widely.The excavation of community structure, can help people more in depth to understand the relation of the 26S Proteasome Structure and Function of complex network, thereby finds the behavior of rule hiding in complex network and prediction complex network.Therefore the research of complex network community mining has important theory significance and practical value.
In recent years, the method for related complicated mining network community emerges in an endless stream.They have adopted respectively the theory and technology from fields such as physics, mathematics and computer science.Principle with regard to its foundation can be divided into based on division, based on modularity optimization, based on label propagation, based on dynamics and the method based on bionical calculating etc. in the community mining method based on dividing, there is the famous community mining method GN being proposed by Girvan and Newman; Community mining method based on modularity optimization has based on the improved FN algorithm of GN algorithm, the SA algorithm based on simulated annealing principle, quick modularity optimization method FUA etc.; The community mining method of propagating based on label has: famous label propagation algorithm LPA etc.
2004, the people such as Newman proposed mixed-media network modules mixed-media function (Q), this function be can quantitative evaluation community structure quality module.Combinatorial optimization problem using Q function as objective function becomes one of main method of detection network community structure afterwards.Thereby complex network community discovery problem transforms into a kind of objective function optimization problem, but maximize Q function, be complete np problem.Genetic algorithm is a kind of effective ways of the NP of a solving difficult problem, there is not the dependence to concrete application in it, particular problem is not needed to initial information yet, for solving complicated optimization problem, provide a kind of framework, and there is very strong robustness and concurrency, therefore in a lot of fields, be all widely used.Therefore genetic algorithm is used in and on complex network community mining, there is certain theory significance and realized value.
Gong et al. has proposed the community mining algorithm-MA (memetic algorithm) based on GA, the method exists local optimum phenomenon easily occurs, be difficult to find the defect of globally optimal solution, the community discovery algorithm (MIGA) that Ronghua Shang proposes based on modularity and improved genetic algorithms method solves this problem, yet MIGA algorithm exists, need the community's number in priori-complex network, the performance of this algorithm in the complex network community mining problem of processing unknown community number decreased.
For genetic algorithm presented above, excavate the defect of community, be subject to the inspiration of two population thoughts, the present invention proposes a kind of complex network community mining method based on improved genetic algorithms method---MGACD (Modified Genetic Algorithm for Community Detecting) method.The method has solved population incident phenomenon that is absorbed in local optimum and Loss of diversity in genetic evolution process to a great extent, has improved search performance of the present invention.
Summary of the invention
Content of the present invention be proposed a kind of based on improved genetic algorithms method the new method for complex network community mining.The method is used normalization shared information Measurement of Similarity as measuring the similarity between individuality in population, has merged cluster and two population thought.First introduce Clustering, with minimum spanning tree clustering method, population is divided to classification, then introduce two population thought, cluster is determined to main classes and secondary class.Wherein main classes maintains the evolution direction of population, approaching to the optimum solution of objective function; Secondary class is mainly main classes in time provides diversity, and main classes can be jumped out when being absorbed in local optimum, searches for other solution space, has proposed a kind of complex network community mining method based on improved genetic algorithms method.
The technical scheme of the inventive method is as follows:
A complex network community mining method based on improved genetic algorithms method, is characterized in that comprising the steps:
Step 1: computer initialization;
Step 2: initialization of population, each individual gene position selects the numbering of a certain neighbor node of node of its gene position representative as the allele of this gene position at random, obtains father population, and step is as follows:
(1) each individuality is initialized as the coding that a length is n position, and the allele of each gene position is that 0, n is individual code length entirely;
(2) each gene position v to individuality, finds neighbor node numbering collection N (v)={ u| node u is directly connected with node v } that in network, node numbering is v;
(3) select at random a node numbering u ' in neighbor node numbering collection N (v) as the allele of gene position v, step individual in initialization population is carried out to cycle P opsize (population scale) inferior, complete initialization of population;
Step 3: calculate the fitness Q of all individualities in father population, method is as follows:
Q = 1 2 E Σ uv [ A uv - k u k v 2 E ] δ ( r ( u ) , r ( v ) )
Wherein, A=(A uv) n * nthe adjacency matrix that represents node in network G, if exist limit to be connected between node u and v, A uv=1, otherwise A uv=0; δ (r (u), r (v)) is community's degree of recognition function, wherein, r (u) represents the community at u place, and r (v) represents the community at v place, if r (u)=r (v), its value is 1, represents that node u and v are in same community; Otherwise value is 0, represent that node u and v be not in same community; k uthe degree that represents node u, k vthe degree that represents node v; E represents limit number total in network G, is defined as
Step 4: population is carried out to minimum spanning tree cluster, and carry out classification mark, determine main classes and secondary class;
Step 5: two individualities in algorithm of tournament selection main classes carry out crossover and mutation operation, generates the individual Pop of Popsize/2 offspring m; Two individualities in the secondary class of algorithm of tournament selection carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring r; The body one by one of algorithm of tournament selection main classes and the body one by one in secondary class carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring c, Popsize=100 wherein, Pop mand Pop cvalue is 50;
Step 6:Pop mand Pop cform the candidate solution O of main classes population m, Pop rand Pop cform the candidate solution O of secondary class population r; According to main classes population objective function from O mmiddle individual as main classes population at individual of future generation with Popsize/2 of u+ λ selection strategy selection; According to the fitness function of secondary class population from O rmiddle μ+the λ of using selection strategy selects Popsize/2 individuality as secondary class population at individual of future generation, and u+ λ selection strategy selects μ individuality from parent, selects λ individuality from filial generation, and then from u+ λ individuality, selects μ individuality;
Step 7: judge whether δ no longer reduces within 50 continuous generations, if so, a part of individuality of random generation enters follow-on genetic manipulation; If not, the changing method of execution step 8, δ is as follows:
δ t+1=δ t+α·Δt
Wherein, the decimal that α is 0~1, t represents current algebraically, Δ t value is as follows:
Δt = 1 | G t | · Σ ( p , p ′ ) ∈ G t Dis ( p , q ) - δ t if | G t | > 0 0 , otherwise
Wherein, G twhile being illustrated in evolutionary generation t, the parents couple of successful reproduction, | G t| represent the right number of parents of successful reproduction, Dis (p, q) represents the distance between individual p and individual q; mean distance between the parents of expression successful reproduction, δ trepresent t for time distance threshold; Δ t represents the poor of mean distance between the parents of successful reproduction in t generation and distance threshold;
Successful reproduction comprises two kinds of situations: the one, and parents, from main classes, have offspring's individuality to be well successful reproduction than parents; Two be a parent individuality from main classes, another parent individuality, from secondary class, has offspring's individuality better than the parent individuality from main classes in parents, is successful reproduction;
In Evolution of Population process, δ constantly reduces, and when population trends towards restraining, δ substantially no longer reduces, setting of the present invention be no longer to reduce within 50 generations, generate 15 individualities and enter follow-on genetic manipulation just at random;
Step 8: repeating step 4--step 7 is until arrival population iterations T obtains community's optimum division;
Described complex network adopts figure G (V, E) to represent, wherein V is the set of node v, and E is the set of limit e, if nodal point number is n in V, in E, the number on limit is m, and node v's is numbered (1,2, ..., v ..., n), v ∈ (1,2 ..., v, ..., n), e ∈ (1,2, ..., e ..., m);
Described population represents with Pop, refers to the some possible community's division results of complex network, and Community approach is called community mining method and represents with S, s be belong to a kind of division methods in S be s ∈ (1,2 ..., s, ..., S), S represents the sum of division methods, any division result is wherein called individuality, with Pop (s), represent, the number of all possible division results is called population scale, with Popsize, represents;
Described individual coding adopts the coded representation based on locus adjacency, and in this expression, a gene represents a summit in network, and the allele of a gene represents with its neighbor node.
After described population minimum spanning tree cluster, determine main classes and secondary class, it is characterized in that comprising the steps:
Step S4-1: utilize normalization shared information I (Pop (s a), Pop (s b)) tolerance population in two individual Pop (s a) and Pop (s b) spacing, normalization shared information method is as follows:
I ( Pop ( s A ) , Pop ( s B ) ) = - 2 Σ i = 1 I Σ j = 1 J V ij log ( V ij V / C i . C . j ) Σ i = 1 I C i . log ( C i . / V ) + Σ j = 1 J C . j log ( C . j / V )
Wherein, C is Scrambling Matrix, the capable J row of total I, and described I is the first division methods s ain community's number of comprising, described J is the second division methods s bin community's number of comprising, C i.the capable element sum of i in Scrambling Matrix C, c .jthe element sum of j row in described Scrambling Matrix C, v ijthe first division methods s ain the i of community and the second division methods s bin the j of community in the nodal point number owned together: when there is no common node, V ij=0; When having the common node of part, V ijfor the nodal point number in its common factor; When all nodes are all identical, V ijget the nodal point number in the i of community or the j of community; V is the total nodal point number in described complex network;
As the first division methods s aresult and the second division methods s bresult when identical, I (Pop (s a), Pop (s b))=1; As the first division methods s aresult and the second division methods s bresult when different, I (Pop (s a), Pop (s b))=0;
Step S4-2: the result Pop (s that calculates two kinds of division methods a) and Pop (s b) between distance d method as follows: d=1-I (Pop (s a), Pop (s b));
Step S4-3: utilize Puli's nurse method to try to achieve the minimum spanning tree that comprises all individualities of population;
Step S4-4: in disconnection minimum spanning tree, weights surpass the limit of threshold value, obtain the forest of population, obtain the clustering of population, threshold value is allly in minimum spanning tree to have the ultimate range that is less than 0.88*avg in limit most, avg is the mean value on all limits in minimum spanning tree, the scope of threshold value is in (0,1);
Step S4-5: the degree of depth travels through this forest, and population at individual is carried out to classification mark be about to classification member variable in individuality and be set to the classification number under it, and be saved in classid array;
Step S4-6: find the class at optimum individual place as main classes, all the other are all kinds of is classified as secondary class.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the inventive method;
Fig. 2 is the process flow diagram of cluster in the inventive method;
Fig. 3 is the process flow diagram of individual variation operation in the inventive method.
Embodiment
Below in conjunction with American politics book network (Polbooks network) and process flow diagram, the specific embodiment of the present invention is elaborated
Step 1, computer initialization, set following parameter:
Complex network: represent with figure G (V, E), wherein V is the set of node v, and E is the set of limit e.If nodal point number is n in V, in E, the number on limit is m, being numbered of node v (1,2 ..., v ..., n), v ∈ (1,2 ..., v ..., n).e∈(1,2,...,e,...,m)
Gene a: node represents a gene,
Population: represent with Pop, refer to the some possible community's division results of complex network.Community approach is called community mining method and represents with S.S be belong to a kind of division methods in S be s ∈ (1,2 ..., s, ..., S), S represents the sum of division methods, any division result is wherein called individuality, with Pop (s), represent, the number of all possible division results is called population scale, with Popsize, represents.
Individual coding: for representing a kind of array or bit string of division result, be also referred to as chromosome.The position of gene in chromosome is called as locus or gene position, and gene also represents a node in described complex network simultaneously.What chromosome was corresponding is a kind of division methods of complex network, chromosomal solution space is corresponding to whole possible division methods, from described solution space, be mapped to a described chromosomal process and be called coding, the process that is mapped to described solution space from a described chromosome is called decoding.
Step 2, initialization of population
What the coded representation in the method adopted is the coded representation based on locus adjacency, and in this coded representation, each genotype g has n gene, and each gene has represented a node in network G.Each gene u can get a v{v ∈ (1,2 ... v ... n) | node u is directly connected with node v as its allele.Coded representation based on locus adjacency is a kind of figure method for expressing, in the represented figure of genotype g, if there is a limit between u and v, has illustrated that genotype g decoding postjunction u and v are in same community simultaneously.
When initialization population, any one gene in individuality selects its a certain neighbor node as its allele, to generate the individuality of population, reduce to a great extent community and divided the search volume of separating, make to a certain extent initial solution space near optimum solution space, accelerated the process of evolving simultaneously.
The result of selecting a kind of complex network community to divide arbitrarily, represents with individual Pop (s), and its specific implementation step is as follows:
(1) each individuality is initialized as the coding that a length is n position, and the allele of each gene position is that 0, n is individual code length entirely.
(2) each gene position v to individuality, finds neighbor node numbering collection N (v)={ u| node u is directly connected with node v } that in network, node numbering is v.
(3) select at random a node numbering u ' in neighbor node numbering collection N (v) as the allele of gene position v, i.e. Pop (s, v)=u '
Step individual in initialization population is carried out to cycle P opsize (population scale) inferior, complete initialization of population.
Fitness function Q in step 3, calculating father population
In this method, adopt the mixed-media network modules mixed-media degree function (Q function) of extensively being approved as fitness function individual in main classes.Q function definition is that in community, actual linking number is expected linking number proportion poor in network in network in community under shared ratio and random connection, and the expression formula of Q function is:
Q = 1 2 E Σ uv [ A uv - k u k v 2 E ] δ ( r ( u ) , r ( v ) ) - - - ( 1 )
Wherein, A=(A uv) n * nthe adjacency matrix that represents node in network G, if exist limit to be connected between node u and v, A uv=1, otherwise A uv=0; δ (r (u), r (v)) is community's degree of recognition function, wherein, r (u) represents the community at u place, and r (v) represents the community at v place, if r (u)=r (v), its value is 1, represents that node u and v are in same community; Otherwise value is 0, represent that node u and v be not in same community; k uthe degree that represents node u, k vthe degree that represents node v; E represents limit number total in network G, is defined as
Fitness function value is larger, shows that the effect of mining network community is better, so mixed-media network modules mixed-media degree function (Q function) is also a standard being widely used weighing mining network community quality.
This step is mainly according to formula (1), Popsize in population individuality calculated fitness and be kept in fitness Pop_Q array.
Step 4, population is carried out to cluster and definite main classes and secondary class
(1) utilize normalization shared information I (Pop (s a), Pop (s b)) tolerance population in two individual Pop (s a) and Pop (s b) spacing d, step is as follows:
1. the formula (2) of pressing below calculates normalization shared information I (Pop (s a), Pop (s b))
I ( Pop ( s A ) , Pop ( s B ) ) = - 2 Σ i = 1 I Σ j = 1 J V ij log ( V ij V / C i . C . j ) Σ i = 1 I C i . log ( C i . / V ) + Σ j = 1 J C . j log ( C . j / V ) - - - ( 2 )
Wherein, C is Scrambling Matrix, the capable J row of total I, and described I is the first division methods s ain community's number of comprising, described J is the second division methods s bin community's number of comprising.C i.the capable element sum of i in Scrambling Matrix C, c .jthe element sum of j row in described Scrambling Matrix C, v ijthe first division methods s ain the i of community and the second division methods s bin the j of community in the nodal point number owned together: when there is no common node, V ij=0; When having the common node of part, V ijfor the nodal point number in its common factor; When all nodes are all identical, V ijget the nodal point number in the i of community or the j of community; Total nodal point number described in V in complex network.
As the first division methods s aresult and the second division methods s bresult when identical, I (Pop (s a), Pop (s b))=1; As the first division methods s aresult and the second division methods s bresult when different, I (Pop (s a), Pop (s b))=0
2. by formula (3) below, calculate the result Pop (s of two kinds of division methods a) and Pop (s b) between distance d:
d=1-I(Pop(s A),Pop(s B)) (3)
(2) utilize minimum spanning tree to carry out cluster to population Pop
Because minimum spanning tree is guaranteed two nearest summits and connects limit in generative process, guaranteed that the similarity in the various piece after disconnecting according to the threshold value of setting is higher, similarity between various piece is lower, and this meets the criterion of individual cluster in population.Therefore we introduce the minimum spanning tree that (Puli's nurse) Prim method obtains all divisions in population, the limit that surpasses threshold value by disconnecting weights in the minimum spanning tree clustering of population of can getting profit, utilize Prim method to guarantee that in population, the individual similarity in same class is higher, inhomogeneous individual similarity is lower.
The implementation procedure of population being carried out to cluster and definite main classes and secondary class is as follows:
1) by above-mentioned range formula (3), calculate the distance matrix between each Pop (s) in population Pop, this matrix is a lower triangular matrix, as follows:
2) according to step 1) result of gained utilizes Prim method to generate the minimum spanning tree with S-1 bar directed edge, and step is as follows:
1. define a structure array edge[S-1], for depositing the information on S-1 bar limit, comprising the starting point fromvex on every limit, terminal endvex, weight weight; Weight between described starting point fromvex and terminal endvex with apart from d, be inversely proportional to.
2. the first row j in above-mentioned distance matrix 1in find out in all the other each individualities from individual Pop (s 1) the nearest Pop of body one by one (s 1').
3. the secondary series j in above-mentioned distance matrix 2in find out in all the other each individualities from individual Pop (s 1') the nearest Pop of body one by one (s 2'), at the 3rd row j 3in find out in all the other each individualities from individual (Pop (s 1'), Pop (s 2')) the nearest Pop of body one by one (s 3') ..., until S classifies as only, obtain the minor face of S-1 bar.
4. the mean distance avg of the minor face of S-1 bar in the minimum spanning tree obtaining in calculating 3., and using the ultimate range that is less than 0.88*avg in the minor face of described S-1 bar as weight lower limit.
5. from described individual Pop (s 1) start, travel through the minor face of S-1 bar downwards, remove all limits that weight is wherein greater than described weight lower limit, make above-mentioned minimum spanning tree be broken into a forest, complete the division to population cluster; Next the individuality in the respectively boy's spanning tree in forest is carried out to classification mark, is saved in classification array classid[S] in, classification mark comprises the sequence number of classification sequence number and individual Pop (s).
6. the class that finds best individual place is main classes Pop_zhu, and all the other are all kinds of is classified as secondary class Pop_fu, and calculates each individual fitness in Pop_fu with the fitness function of secondary class.
Formula (4) below the fitness function of secondary class:
f δ(x)=1-|δ-Dis(best,x)| (4)
Wherein, δ is desired main classes and the distance between secondary class, is 1 (0≤δ≤1) during initialization; Dis (best, x) is the distance between best individual best in individual x and main classes in secondary class, (0 < Dis (best, x) < 1), and the formula for calculating (3) of Dis (best, x) calculates.
Step 5, algorithm of tournament selection individuality carry out cross and variation operation
As the reproductive patterns in biological evolution process, by the exchange of two genes of individuals, combine, produce the individuality making new advances, inherited father and mother both sides' portion gene, form the new assortment of genes.In interlace operation, add algorithm of tournament selection, make the individuality intersecting have higher fitness value, add the animal migration in large search candidate solution space, accelerate the generation of optimal dividing, mutation operation is the key that produces new gene, has local search ability.Concrete cross and variation operation is as follows:
1. two individual mate1 in algorithm of tournament selection Pop_zhu, mate2, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popm, otherwise does not carry out interlace operation
2. the individuality in Popm is carried out to mutation operation, and calculate each
3. repeat 1. and 2., until produce Popsize/2 offspring's individuality
4. two individual mate1 in algorithm of tournament selection Pop_fu, mate2, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popr, otherwise does not carry out interlace operation
5. the individuality in Popr is carried out to mutation operation
6. repeat 4. and 5., until produce Popsize/2 offspring's individuality
7. body mate1 one by one in algorithm of tournament selection Pop_zhu, body mate2 one by one in Pop_fu, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popc, otherwise does not carry out interlace operation
8. the individuality in Popc is carried out to mutation operation
9. repeat 7. and 8., until produce Popsize/2 offspring's individuality
Further illustrate, above-mentioned mutation operation is the key that produces new gene, has local search ability.According to the concrete property of complex network community structure, and the inner total limit number in definition-community, weak community is greater than the limit that other parts of community and network are connected and counts sum, introduces localized mode lumpiness and define on the basis of our Ruo community definition:
M l = edge in edge out - - - ( 3 )
Wherein, M lrepresent that inner total limit, community counts the ratio that sum is counted on limit that sum is connected with other parts of community and network, edge inrepresent the fillet number of inside, community, edge outthe fillet that represents this community and other parts of network is counted sum.M lbe worth greatlyr, this community is more reasonable.This mutation operation is pointed, has strengthened the local search ability of mutation operator, has improved search performance of the present invention.
Successively needs are participated in the individual P execution following steps of variation:
(1) according to the following steps individual P decoding is obtained to its community's division result successively:
1) obtain all directed connection limits in P, and described directed edge is sequentially arranged by the node numbering on limit
2) ergodic state on the whole described directed connection of initialization limit, set:
1. the access vector v isited on whole described directed connections limit, is the vector of a 1 * V, and component of a vector represents with 0,1, and 1 represents to travel through, and 0 represents traversal, when initial, is 0
2. the community on whole described directed connections limit numbers vectorial lables, is the vector of a 1 * V, and component of a vector represents community's numbering of node numbering, represents the division result of community, is 0 during initialization
3. loop control variable, with node numbering, v represents, v=0 when initial
3) from the loop control variable v of P 1start traversal, do not travel through visited[v 1]=0, community's numbering l=1; After traversal, lables[v 1]=l, visited[v 1]=1
4) continue execution step 3), by node numbering order traversal, until till v=V, execution step 5)
5) find out all and node v 1the node numbering that has limit to connect but not yet travel through, composition node numbering collection u}, repetitive cycling execution step 3)~4), to node u 1mark, lables[u 1]=l, visited[u 1]=1 execution step 6)
6) find out all and node u 1limit connects, but the node not yet traveling through composition node numbering collection w}, to the node w execution step 5 in w}), until numbering collection in w}, node numbering has all traveled through, then performs step 4), up to node V, finish
(2) set variation probability P m=0.2, optionally generate at random the decimal r of 0~1, make r < Pm
(3) judge that whether the gene position v of individual P is less than the code length of described gene, if gene position v is equal to or greater than code length V, exits; If not, obtain upper each allele u as neighbor node of gene position v with and the label lables of community, execution step (4),
(4) community's label of each allele u the localized mode lumpiness M when calculating each allele u and belonging to community separately in traversal step (3) l
(5) from the result of step (4), find out and can make M lmaximum community's label get at random again community a node as variation, be worth
(6) repeated execution of steps (3)~step (5), until the individual P in P completes mutation operation
Operation is selected in step 6, execution according to the following steps
The candidate solution Om of the main classes population that the Popm obtaining from step 5 and Popc form according to u+ λ selection strategy for its main classes population objective function; In the candidate solution Or that Popr and Popc form, according to the diversity function of secondary class population, with u+ λ, select Popsize/2 individuality as secondary class population at individual of future generation
Operation is selected in step 7, execution according to the following steps
(1) δ is pressed to formula (5) below and carries out adaptive variation:
δ t+1=δ t+α·Δt (5)
Wherein, the decimal that α is 0~1, t represents current algebraically, Δ t is as the following formula shown in (6)
&Delta;t = 1 | G t | &CenterDot; &Sigma; ( p , p &prime; ) &Element; G t Dis ( p , q ) - &delta; t if | G t | > 0 0 , otherwise - - - ( 6 )
Wherein, G twhile being illustrated in evolutionary generation t, the parents couple of successful reproduction; | G t| represent the right number of parents of successful reproduction; Dis (p, q) represents the distance between individual p and individual q; mean distance between the parents of expression successful reproduction, δ trepresent t for time distance threshold; Δ t represents the poor of mean distance between the parents of successful reproduction in t generation and distance threshold
Successful reproduction comprises two kinds of situations: the one, and parents, from main classes, have offspring's individuality to be well successful reproduction than parents; Two be a parent individuality from main classes, another parent individuality, from secondary class, has offspring's individuality better than the parent individuality from main classes in parents, is successful reproduction.
Although update rule (5) is not dull, when main classes convergence in population, the δ in (5) trends towards reducing, because the mean distance between parents trends towards reducing.So secondary class population is also restrained, finally become closely similar with main classes population.Once it is very little that δ becomes, it is not just reducing, and when we notice that two populations are all restrained, if also do not find the solution of a satisfaction, we are 1 by δ is set, and redirect search.
(2) judge whether δ no longer reduces within 50 continuous generations, if so, a part of individuality of random generation enters follow-on genetic manipulation; If not, execution step eight
Step 8, repeated execution of steps four, to step 7, obtain community's optimum division
(1) set iterations T=100,
(2) carry out iterative operation,
(3) judgement iterations t:
If t≤n, returns to step 4, get n=20,0<n<T
If n<t<T, returns to step 5
(4), during t=100, obtain the best community of complex network and divide.
Detailed description experimental result of the present invention below:
In table 1, listed each method result that community divides on Polbooks network, wherein the experimental result of BGLL, CNM, PL, MOGA is taken from Clara Pizzuti and is published in the experimental result in IEEE Transaction on Evolutionary Computation.It can be seen from the table we can compare with additive method, and the inventive method (MGACD) shows outstanding performance.
The comparison of table 1 Dui Ge method community division result (the modularity function Q value that list intermediate value is each method)
Method FN GN BGLL CNM PL MOGA MGACD
Q value 0.502 0.5168 0.515 0.502 0.515 0.518 0.5225

Claims (2)

1. the complex network community mining method based on improved genetic algorithms method, is characterized in that comprising the steps:
Step 1: computer initialization;
Step 2: initialization of population, each individual gene position selects the numbering of a certain neighbor node of node of its gene position representative as the allele of this gene position at random, obtains father population, and step is as follows:
(1) each individuality is initialized as the coding that a length is n position, and the allele of each gene position is that 0, n is individual code length entirely;
(2) each gene position v to individuality, finds neighbor node numbering collection N (v)={ u| node u is directly connected with node v } that in network, node numbering is v;
(3) select at random a node numbering u ' in neighbor node numbering collection N (v) as the allele of gene position v, step individual in initialization population is carried out to cycle P opsize (population scale) inferior, complete initialization of population;
Step 3: calculate the fitness Q of all individualities in father population, method is as follows:
Q = 1 2 E &Sigma; uv [ A uv - k u k v 2 E ] &delta; ( r ( u ) , r ( v ) )
Wherein, A=(A uv) n * nthe adjacency matrix that represents node in network G, if exist limit to be connected between node u and v, A uv=1, otherwise A uv=0; δ (r (u), r (v)) is community's degree of recognition function, wherein, r (u) represents the community at u place, and r (v) represents the community at v place, if r (u)=r (v), its value is 1, represents that node u and v are in same community; Otherwise value is 0, represent that node u and v be not in same community; k uthe degree that represents node u, k vthe degree that represents node v; E represents limit number total in network G, is defined as
Step 4: population is carried out to minimum spanning tree cluster, and carry out classification mark, determine main classes and secondary class;
Step 5: two individualities in algorithm of tournament selection main classes carry out crossover and mutation operation, generates the individual Pop of Popsize/2 offspring m; Two individualities in the secondary class of algorithm of tournament selection carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring r; The body one by one of algorithm of tournament selection main classes and the body one by one in secondary class carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring c, Popsize=100 wherein, Pop mand Pop cvalue is 50;
Step 6:Pop mand Pop cform the candidate solution O of main classes population m, Pop rand Pop cform the candidate solution O of secondary class population r; According to main classes population objective function from O mmiddle individual as main classes population at individual of future generation with Popsize/2 of u+ λ selection strategy selection; According to the fitness function of secondary class population from O rmiddle μ+the λ of using selection strategy selects Popsize/2 individuality as secondary class population at individual of future generation, and u+ λ selection strategy selects μ individuality from parent, selects λ individuality from filial generation, and then from u+ λ individuality, selects μ individuality;
Step 7: judge whether δ no longer reduces within 50 continuous generations, if so, a part of individuality of random generation enters follow-on genetic manipulation; If not, the changing method of execution step 8, δ is as follows:
δ t+1=δ t+α·Δt
Wherein, the decimal that α is 0~1, t represents current algebraically, Δ t value is as follows:
&Delta;t = 1 | G t | &CenterDot; &Sigma; ( p , p &prime; ) &Element; G t Dis ( p , q ) - &delta; t if | G t | > 0 0 , otherwise
Wherein, G twhile being illustrated in evolutionary generation t, the parents couple of successful reproduction, | G t| represent the right number of parents of successful reproduction, Dis (p, q) represents the distance between individual p and individual q; mean distance between the parents of expression successful reproduction, δ trepresent t for time distance threshold; Δ t represents the poor of mean distance between the parents of successful reproduction in t generation and distance threshold;
Successful reproduction comprises two kinds of situations: the one, and parents, from main classes, have offspring's individuality to be well successful reproduction than parents; Two be a parent individuality from main classes, another parent individuality, from secondary class, has offspring's individuality better than the parent individuality from main classes in parents, is successful reproduction;
In Evolution of Population process, δ constantly reduces, and when population trends towards restraining, δ substantially no longer reduces, setting of the present invention be no longer to reduce within 50 generations, generate 15 individualities and enter follow-on genetic manipulation just at random;
Step 8: repeating step 4--step 7 is until arrival population iterations T obtains community's optimum division;
Described complex network adopts figure G (V, E) to represent, wherein V is the set of node v, and E is the set of limit e, if nodal point number is n in V, in E, the number on limit is m, and node v's is numbered (1,2, ..., v ..., n), v ∈ (1,2 ..., v, ..., n), e ∈ (1,2, ..., e ..., m);
Described population represents with Pop, refers to the some possible community's division results of complex network, and Community approach is called community mining method and represents with S, s be belong to a kind of division methods in S be s ∈ (1,2 ..., s, ..., S), S represents the sum of division methods, any division result is wherein called individuality, with Pop (s), represent, the number of all possible division results is called population scale, with Popsize, represents;
Described individual coding adopts the coded representation based on locus adjacency, and in this expression, a gene represents a summit in network, and the allele of a gene represents with its neighbor node.
2. after population minimum spanning tree cluster according to claim 1, determine main classes and secondary class, it is characterized in that comprising the steps:
Step S4-1: utilize normalization shared information I (Pop (s a), Pop (s b)) tolerance population in two individual Pop (s a) and Pop (s b) spacing, normalization shared information method is as follows:
I ( Pop ( s A ) , Pop ( s B ) ) = - 2 &Sigma; i = 1 I &Sigma; j = 1 J V ij log ( V ij V / C i . C . j ) &Sigma; i = 1 I C i . log ( C i . / V ) + &Sigma; j = 1 J C . j log ( C . j / V )
Wherein, C is Scrambling Matrix, the capable J row of total I, and described I is the first division methods s ain community's number of comprising, described J is the second division methods s bin community's number of comprising, C i.the capable element sum of i in Scrambling Matrix C, c .jthe element sum of j row in described Scrambling Matrix C, v ijthe first division methods s ain the i of community and the second division methods s bin the j of community in the nodal point number owned together: when there is no common node, V ij=0; When having the common node of part, V ijfor the nodal point number in its common factor; When all nodes are all identical, V ijget the nodal point number in the i of community or the j of community; V is the total nodal point number in described complex network;
As the first division methods s aresult and the second division methods s bresult when identical, I (Pop (s a), Pop (s b))=1; As the first division methods s aresult and the second division methods s bresult when different, I (Pop (s a), Pop (s b))=0;
Step S4-2: the result Pop (s that calculates two kinds of division methods a) and Pop (s b) between distance d method as follows: d=1-I (Pop (s a), Pop (s b));
Step S4-3: utilize Puli's nurse method to try to achieve the minimum spanning tree that comprises all individualities of population;
Step S4-4: in disconnection minimum spanning tree, weights surpass the limit of threshold value, obtain the forest of population, obtain the clustering of population, threshold value is allly in minimum spanning tree to have the ultimate range that is less than 0.88*avg in limit most, avg is the mean value on all limits in minimum spanning tree, the scope of threshold value is in (0,1);
Step S4-5: the degree of depth travels through this forest, and population at individual is carried out to classification mark be about to classification member variable in individuality and be set to the classification number under it, and be saved in classid array;
Step S4-6: find the class at optimum individual place as main classes, all the other are all kinds of is classified as secondary class.
CN201410429721.4A 2014-08-28 2014-08-28 Complex network community mining method based on improved genetic algorithm Pending CN104200272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410429721.4A CN104200272A (en) 2014-08-28 2014-08-28 Complex network community mining method based on improved genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410429721.4A CN104200272A (en) 2014-08-28 2014-08-28 Complex network community mining method based on improved genetic algorithm

Publications (1)

Publication Number Publication Date
CN104200272A true CN104200272A (en) 2014-12-10

Family

ID=52085561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410429721.4A Pending CN104200272A (en) 2014-08-28 2014-08-28 Complex network community mining method based on improved genetic algorithm

Country Status (1)

Country Link
CN (1) CN104200272A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700634A (en) * 2015-03-19 2015-06-10 北京工业大学 Adjacent intersection road coordinate control method based on minimum spanning tree clustering improved genetic algorithm
CN105160404A (en) * 2015-08-19 2015-12-16 西安电子科技大学 Complex network balance clustering method based on multi-objective optimization
CN106776792A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 The method for digging and device of Web Community
CN106953768A (en) * 2017-04-13 2017-07-14 西安电子科技大学 A kind of network reliability model and mixing intelligent optimizing method
CN107064794A (en) * 2016-12-16 2017-08-18 南阳师范学院 A kind of fire-proof motor fault detection method based on genetic algorithm
CN107169871A (en) * 2017-04-20 2017-09-15 西安电子科技大学 It is a kind of to optimize many relation community discovery methods expanded with seed based on composition of relations
CN109150237A (en) * 2018-08-15 2019-01-04 桂林电子科技大学 A kind of robust multi-user detector design method
CN109376544A (en) * 2018-09-18 2019-02-22 浙江工业大学 A method of prevent the community structure in complex network from being excavated by depth
CN110008525A (en) * 2019-03-12 2019-07-12 南昌大学 Automobile form characteristic crossover Evolution Forecasting method based on INGBM (1,1)
CN110008967A (en) * 2019-04-08 2019-07-12 北京航空航天大学 A kind of the behavior characterizing method and system of fusion structure and semantic mode
CN112270120A (en) * 2020-09-25 2021-01-26 广东工业大学 Multi-objective optimization method based on hierarchical decomposition of tree structure

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700634A (en) * 2015-03-19 2015-06-10 北京工业大学 Adjacent intersection road coordinate control method based on minimum spanning tree clustering improved genetic algorithm
CN105160404A (en) * 2015-08-19 2015-12-16 西安电子科技大学 Complex network balance clustering method based on multi-objective optimization
CN106776792B (en) * 2016-11-23 2020-07-17 北京锐安科技有限公司 Network community mining method and device
CN106776792A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 The method for digging and device of Web Community
CN107064794A (en) * 2016-12-16 2017-08-18 南阳师范学院 A kind of fire-proof motor fault detection method based on genetic algorithm
CN106953768A (en) * 2017-04-13 2017-07-14 西安电子科技大学 A kind of network reliability model and mixing intelligent optimizing method
CN107169871A (en) * 2017-04-20 2017-09-15 西安电子科技大学 It is a kind of to optimize many relation community discovery methods expanded with seed based on composition of relations
CN107169871B (en) * 2017-04-20 2020-08-28 西安电子科技大学 Multi-relationship community discovery method based on relationship combination optimization and seed expansion
CN109150237A (en) * 2018-08-15 2019-01-04 桂林电子科技大学 A kind of robust multi-user detector design method
CN109376544A (en) * 2018-09-18 2019-02-22 浙江工业大学 A method of prevent the community structure in complex network from being excavated by depth
CN109376544B (en) * 2018-09-18 2022-04-29 浙江工业大学 Method for preventing community structure in complex network from being deeply excavated
CN110008525A (en) * 2019-03-12 2019-07-12 南昌大学 Automobile form characteristic crossover Evolution Forecasting method based on INGBM (1,1)
CN110008967A (en) * 2019-04-08 2019-07-12 北京航空航天大学 A kind of the behavior characterizing method and system of fusion structure and semantic mode
CN112270120A (en) * 2020-09-25 2021-01-26 广东工业大学 Multi-objective optimization method based on hierarchical decomposition of tree structure

Similar Documents

Publication Publication Date Title
CN104200272A (en) Complex network community mining method based on improved genetic algorithm
CN102413029B (en) Method for partitioning communities in complex dynamic network by virtue of multi-objective local search based on decomposition
CN103745258B (en) Complex network community mining method based on the genetic algorithm of minimum spanning tree cluster
Shang et al. Community detection based on modularity and an improved genetic algorithm
Cai et al. A survey on network community detection based on evolutionary computation
CN103208027B (en) Method for genetic algorithm with local modularity for community detecting
Park et al. Graph transplant: Node saliency-guided graph mixup with local structure preservation
CN102594909A (en) Multi-objective community detection method based on spectrum information of common neighbour matrix
CN103605793A (en) Heterogeneous social network community detection method based on genetic algorithm
Sree et al. Identification of protein coding regions in genomic DNA using unsupervised FMACA based pattern classifier
CN104268629A (en) Complex network community detecting method based on prior information and network inherent information
Pourabbasi et al. A new single-chromosome evolutionary algorithm for community detection in complex networks by combining content and structural information
CN104361462B (en) Social network influence maximization approach based on cultural gene algorithm
Pourkazemi et al. Community detection in social network by using a multi-objective evolutionary algorithm
Sun A study of solving traveling salesman problem with genetic algorithm
CN105740949A (en) Group global optimization method based on randomness best strategy
Fushimi et al. Estimating node connectedness in spatial network under stochastic link disconnection based on efficient sampling
CN104156462A (en) Complex network community mining method based on cellular automatic learning machine
CN116662412A (en) Data mining method for big data of power grid distribution and utilization
CN105162648A (en) Club detecting method based on backbone network expansion
CN102663230A (en) Method for land resource evaluation factor level classification based on genetic algorithm
CN107679326A (en) A kind of two-value FPRM circuit areas and delay comprehensive optimization method
Zhang Combinatorial optimization problem solution based on improved genetic algorithm
Chiu et al. Cluster analysis based on artificial immune system and ant algorithm
Ye et al. An efficient and scalable algorithm for the traveling salesman problem

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141210

RJ01 Rejection of invention patent application after publication