CN104200272A

CN104200272A - Complex network community mining method based on improved genetic algorithm

Info

Publication number: CN104200272A
Application number: CN201410429721.4A
Authority: CN
Inventors: 杨新武; 杨丽军; 李�瑞
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2014-08-28
Filing date: 2014-08-28
Publication date: 2014-12-10

Abstract

The invention discloses a complex network community mining method based on an improved genetic algorithm and belongs to the technical field of complex network community mining method research. The complex network community mining method based on the improved genetic algorithm uses the improved genetic algorithm based on clustering and double population thought fusion to mine communities in a complex network. The complex network community mining method based on the improved genetic algorithm uses a normalization common information similarity standard as the standard for measuring the similarity between individuals in the population and fuses the clustering and double population thought. The complex network community mining method based on the improved genetic algorithm includes that introducing the clustering thought, using a minimum spanning tree clustering method to classify the population, introducing the double population thought, and determining the main type and auxiliary type for the clustering. The main type maintains the population evolution direction to get close to the optimal solution of an objective function; the auxiliary type is mainly used for duly providing diversity for the main type so as to enable the main type to be capable of coming out to search the other solution space to realize the complex network community mining when the main type is located at the local optimum.

Description

A kind of complex network community mining method based on improved genetic algorithms method

Technical field

The invention belongs to complex network community mining method studying technological domain, specifically used a kind of improved genetic algorithms method based on cluster and two population thought fusions to excavate the community in complex network, it is a kind of computer technology of utilizing, genetic algorithm, cluster and two population thought fusion methods realize the new method of complex network community mining.

Background technology

In real world, many complication systems exist with the form of network, or can be by the abstract complex network that changes into.The research of complex network has become a study hotspot in a lot of fields.A lot of systems in real world can be regarded as the interactional result of numerous subsystems.Complex network is the abstract of complication system, if the individuality in complication system is regarded as in the summit in complex network, the limit in complex network can be regarded as connecting each other between individuality in complication system so.Complex network is a kind of abstract structure by system simplification, can help us to understand better character and the function of complication system.

Since the nineties in 20th century, along with the fast development of computer technology and Internet technology, indicated that the mankind have marched toward cybertimes.The mankind have lived in the world of various complex networks.As large-scale power net, global traffic net, metabolic net, social relationships net etc.Because complexity problem research and the complexity concerns of network have relation very closely, Network Science becomes the field of multidisciplinary intersection, and has caused people's great attention.At present a large amount of results of study shows, complex network ubiquity worldlet effect, without basic statistical properties such as characteristics of scales.Along with people are to the going deep into of the property research of complex network, found a common feature existing in a lot of live networks, i.e. community structure, another important feature of complex network.Community structure, in complex network research field by extensive concern, becomes a hotter field at present.Community in real network is representing the set of special object.For example, in community network, community structure is according to certain interest or the real public organization of background formation; In citation network, community structure representative is for the relevant paper of same subject; In WWW, some websites of related subject etc. are discussed in community structure representative.In real life, the excavation of community structure has application extremely widely.The excavation of community structure, can help people more in depth to understand the relation of the 26S Proteasome Structure and Function of complex network, thereby finds the behavior of rule hiding in complex network and prediction complex network.Therefore the research of complex network community mining has important theory significance and practical value.

In recent years, the method for related complicated mining network community emerges in an endless stream.They have adopted respectively the theory and technology from fields such as physics, mathematics and computer science.Principle with regard to its foundation can be divided into based on division, based on modularity optimization, based on label propagation, based on dynamics and the method based on bionical calculating etc. in the community mining method based on dividing, there is the famous community mining method GN being proposed by Girvan and Newman; Community mining method based on modularity optimization has based on the improved FN algorithm of GN algorithm, the SA algorithm based on simulated annealing principle, quick modularity optimization method FUA etc.; The community mining method of propagating based on label has: famous label propagation algorithm LPA etc.

2004, the people such as Newman proposed mixed-media network modules mixed-media function (Q), this function be can quantitative evaluation community structure quality module.Combinatorial optimization problem using Q function as objective function becomes one of main method of detection network community structure afterwards.Thereby complex network community discovery problem transforms into a kind of objective function optimization problem, but maximize Q function, be complete np problem.Genetic algorithm is a kind of effective ways of the NP of a solving difficult problem, there is not the dependence to concrete application in it, particular problem is not needed to initial information yet, for solving complicated optimization problem, provide a kind of framework, and there is very strong robustness and concurrency, therefore in a lot of fields, be all widely used.Therefore genetic algorithm is used in and on complex network community mining, there is certain theory significance and realized value.

Gong et al. has proposed the community mining algorithm-MA (memetic algorithm) based on GA, the method exists local optimum phenomenon easily occurs, be difficult to find the defect of globally optimal solution, the community discovery algorithm (MIGA) that Ronghua Shang proposes based on modularity and improved genetic algorithms method solves this problem, yet MIGA algorithm exists, need the community's number in priori-complex network, the performance of this algorithm in the complex network community mining problem of processing unknown community number decreased.

For genetic algorithm presented above, excavate the defect of community, be subject to the inspiration of two population thoughts, the present invention proposes a kind of complex network community mining method based on improved genetic algorithms method---MGACD (Modified Genetic Algorithm for Community Detecting) method.The method has solved population incident phenomenon that is absorbed in local optimum and Loss of diversity in genetic evolution process to a great extent, has improved search performance of the present invention.

Summary of the invention

Content of the present invention be proposed a kind of based on improved genetic algorithms method the new method for complex network community mining.The method is used normalization shared information Measurement of Similarity as measuring the similarity between individuality in population, has merged cluster and two population thought.First introduce Clustering, with minimum spanning tree clustering method, population is divided to classification, then introduce two population thought, cluster is determined to main classes and secondary class.Wherein main classes maintains the evolution direction of population, approaching to the optimum solution of objective function; Secondary class is mainly main classes in time provides diversity, and main classes can be jumped out when being absorbed in local optimum, searches for other solution space, has proposed a kind of complex network community mining method based on improved genetic algorithms method.

The technical scheme of the inventive method is as follows:

A complex network community mining method based on improved genetic algorithms method, is characterized in that comprising the steps:

Step 1: computer initialization;

Step 2: initialization of population, each individual gene position selects the numbering of a certain neighbor node of node of its gene position representative as the allele of this gene position at random, obtains father population, and step is as follows:

(1) each individuality is initialized as the coding that a length is n position, and the allele of each gene position is that 0, n is individual code length entirely;

(2) each gene position v to individuality, finds neighbor node numbering collection N (v)={ u| node u is directly connected with node v } that in network, node numbering is v;

(3) select at random a node numbering u ' in neighbor node numbering collection N (v) as the allele of gene position v, step individual in initialization population is carried out to cycle P opsize (population scale) inferior, complete initialization of population;

Step 3: calculate the fitness Q of all individualities in father population, method is as follows:

Q = \frac{1}{2 E} \underset{uv}{Σ} [A_{uv} - \frac{k_{u} k_{v}}{2 E}] δ (r (u), r (v))

Wherein, A=(A _uv) _{n * n}the adjacency matrix that represents node in network G, if exist limit to be connected between node u and v, A _uv=1, otherwise A _uv=0; δ (r (u), r (v)) is community's degree of recognition function, wherein, r (u) represents the community at u place, and r (v) represents the community at v place, if r (u)=r (v), its value is 1, represents that node u and v are in same community; Otherwise value is 0, represent that node u and v be not in same community; k _uthe degree that represents node u, k _vthe degree that represents node v; E represents limit number total in network G, is defined as

Step 4: population is carried out to minimum spanning tree cluster, and carry out classification mark, determine main classes and secondary class;

Step 5: two individualities in algorithm of tournament selection main classes carry out crossover and mutation operation, generates the individual Pop of Popsize/2 offspring _m; Two individualities in the secondary class of algorithm of tournament selection carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring _r; The body one by one of algorithm of tournament selection main classes and the body one by one in secondary class carry out crossover and mutation operation, generate the individual Pop of Popsize/2 offspring _c, Popsize=100 wherein, Pop _mand Pop _cvalue is 50;

Step 6:Pop _mand Pop _cform the candidate solution O of main classes population _m, Pop _rand Pop _cform the candidate solution O of secondary class population _r; According to main classes population objective function from O _mmiddle individual as main classes population at individual of future generation with Popsize/2 of u+ λ selection strategy selection; According to the fitness function of secondary class population from O _rmiddle μ+the λ of using selection strategy selects Popsize/2 individuality as secondary class population at individual of future generation, and u+ λ selection strategy selects μ individuality from parent, selects λ individuality from filial generation, and then from u+ λ individuality, selects μ individuality;

Step 7: judge whether δ no longer reduces within 50 continuous generations, if so, a part of individuality of random generation enters follow-on genetic manipulation; If not, the changing method of execution step 8, δ is as follows:

δ _t+1＝δ _t+α·Δt

Wherein, the decimal that α is 0～1, t represents current algebraically, Δ t value is as follows:

Δt = \{\begin{matrix} \frac{1}{| G_{t} |} \cdot Σ_{(p, p^{'}) &Element; G_{t}} Dis (p, q) - δ_{t} & if | G_{t} | > 0 \\ 0, & otherwise \end{matrix}

Wherein, G _twhile being illustrated in evolutionary generation t, the parents couple of successful reproduction, | G _t| represent the right number of parents of successful reproduction, Dis (p, q) represents the distance between individual p and individual q; mean distance between the parents of expression successful reproduction, δ _trepresent t for time distance threshold; Δ t represents the poor of mean distance between the parents of successful reproduction in t generation and distance threshold;

Successful reproduction comprises two kinds of situations: the one, and parents, from main classes, have offspring's individuality to be well successful reproduction than parents; Two be a parent individuality from main classes, another parent individuality, from secondary class, has offspring's individuality better than the parent individuality from main classes in parents, is successful reproduction;

In Evolution of Population process, δ constantly reduces, and when population trends towards restraining, δ substantially no longer reduces, setting of the present invention be no longer to reduce within 50 generations, generate 15 individualities and enter follow-on genetic manipulation just at random;

Step 8: repeating step 4--step 7 is until arrival population iterations T obtains community's optimum division;

Described complex network adopts figure G (V, E) to represent, wherein V is the set of node v, and E is the set of limit e, if nodal point number is n in V, in E, the number on limit is m, and node v's is numbered (1,2, ..., v ..., n), v ∈ (1,2 ..., v, ..., n), e ∈ (1,2, ..., e ..., m);

Described population represents with Pop, refers to the some possible community's division results of complex network, and Community approach is called community mining method and represents with S, s be belong to a kind of division methods in S be s ∈ (1,2 ..., s, ..., S), S represents the sum of division methods, any division result is wherein called individuality, with Pop (s), represent, the number of all possible division results is called population scale, with Popsize, represents;

Described individual coding adopts the coded representation based on locus adjacency, and in this expression, a gene represents a summit in network, and the allele of a gene represents with its neighbor node.

After described population minimum spanning tree cluster, determine main classes and secondary class, it is characterized in that comprising the steps:

Step S4-1: utilize normalization shared information I (Pop (s _a), Pop (s _b)) tolerance population in two individual Pop (s _a) and Pop (s _b) spacing, normalization shared information method is as follows:

I (Pop (s_{A}), Pop (s_{B})) = \frac{- 2 Σ_{i = 1}^{I} Σ_{j = 1}^{J} V_{ij} \log (V_{ij} V / C_{i .} C_{. j})}{Σ_{i = 1}^{I} C_{i .} \log (C_{i .} / V) + Σ_{j = 1}^{J} C_{. j} \log (C_{. j} / V)}

Wherein, C is Scrambling Matrix, the capable J row of total I, and described I is the first division methods s _ain community's number of comprising, described J is the second division methods s _bin community's number of comprising, C _i.the capable element sum of i in Scrambling Matrix C, c _.jthe element sum of j row in described Scrambling Matrix C, v _ijthe first division methods s _ain the i of community and the second division methods s _bin the j of community in the nodal point number owned together: when there is no common node, V _ij=0; When having the common node of part, V _ijfor the nodal point number in its common factor; When all nodes are all identical, V _ijget the nodal point number in the i of community or the j of community; V is the total nodal point number in described complex network;

As the first division methods s _aresult and the second division methods s _bresult when identical, I (Pop (s _a), Pop (s _b))=1; As the first division methods s _aresult and the second division methods s _bresult when different, I (Pop (s _a), Pop (s _b))=0;

Step S4-2: the result Pop (s that calculates two kinds of division methods _a) and Pop (s _b) between distance d method as follows: d=1-I (Pop (s _a), Pop (s _b));

Step S4-3: utilize Puli's nurse method to try to achieve the minimum spanning tree that comprises all individualities of population;

Step S4-4: in disconnection minimum spanning tree, weights surpass the limit of threshold value, obtain the forest of population, obtain the clustering of population, threshold value is allly in minimum spanning tree to have the ultimate range that is less than 0.88*avg in limit most, avg is the mean value on all limits in minimum spanning tree, the scope of threshold value is in (0,1);

Step S4-5: the degree of depth travels through this forest, and population at individual is carried out to classification mark be about to classification member variable in individuality and be set to the classification number under it, and be saved in classid array;

Step S4-6: find the class at optimum individual place as main classes, all the other are all kinds of is classified as secondary class.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the inventive method;

Fig. 2 is the process flow diagram of cluster in the inventive method;

Fig. 3 is the process flow diagram of individual variation operation in the inventive method.

Embodiment

Below in conjunction with American politics book network (Polbooks network) and process flow diagram, the specific embodiment of the present invention is elaborated

Step 1, computer initialization, set following parameter:

Complex network: represent with figure G (V, E), wherein V is the set of node v, and E is the set of limit e.If nodal point number is n in V, in E, the number on limit is m, being numbered of node v (1,2 ..., v ..., n), v ∈ (1,2 ..., v ..., n).e∈(1,2,...,e,...,m)

Gene a: node represents a gene,

Population: represent with Pop, refer to the some possible community's division results of complex network.Community approach is called community mining method and represents with S.S be belong to a kind of division methods in S be s ∈ (1,2 ..., s, ..., S), S represents the sum of division methods, any division result is wherein called individuality, with Pop (s), represent, the number of all possible division results is called population scale, with Popsize, represents.

Individual coding: for representing a kind of array or bit string of division result, be also referred to as chromosome.The position of gene in chromosome is called as locus or gene position, and gene also represents a node in described complex network simultaneously.What chromosome was corresponding is a kind of division methods of complex network, chromosomal solution space is corresponding to whole possible division methods, from described solution space, be mapped to a described chromosomal process and be called coding, the process that is mapped to described solution space from a described chromosome is called decoding.

Step 2, initialization of population

What the coded representation in the method adopted is the coded representation based on locus adjacency, and in this coded representation, each genotype g has n gene, and each gene has represented a node in network G.Each gene u can get a v{v ∈ (1,2 ... v ... n) | node u is directly connected with node v as its allele.Coded representation based on locus adjacency is a kind of figure method for expressing, in the represented figure of genotype g, if there is a limit between u and v, has illustrated that genotype g decoding postjunction u and v are in same community simultaneously.

When initialization population, any one gene in individuality selects its a certain neighbor node as its allele, to generate the individuality of population, reduce to a great extent community and divided the search volume of separating, make to a certain extent initial solution space near optimum solution space, accelerated the process of evolving simultaneously.

The result of selecting a kind of complex network community to divide arbitrarily, represents with individual Pop (s), and its specific implementation step is as follows:

(1) each individuality is initialized as the coding that a length is n position, and the allele of each gene position is that 0, n is individual code length entirely.

(2) each gene position v to individuality, finds neighbor node numbering collection N (v)={ u| node u is directly connected with node v } that in network, node numbering is v.

(3) select at random a node numbering u ' in neighbor node numbering collection N (v) as the allele of gene position v, i.e. Pop (s, v)=u '

Step individual in initialization population is carried out to cycle P opsize (population scale) inferior, complete initialization of population.

Fitness function Q in step 3, calculating father population

In this method, adopt the mixed-media network modules mixed-media degree function (Q function) of extensively being approved as fitness function individual in main classes.Q function definition is that in community, actual linking number is expected linking number proportion poor in network in network in community under shared ratio and random connection, and the expression formula of Q function is:

Q = \frac{1}{2 E} \underset{uv}{Σ} [A_{uv} - \frac{k_{u} k_{v}}{2 E}] δ (r (u), r (v)) - - - (1)

Fitness function value is larger, shows that the effect of mining network community is better, so mixed-media network modules mixed-media degree function (Q function) is also a standard being widely used weighing mining network community quality.

This step is mainly according to formula (1), Popsize in population individuality calculated fitness and be kept in fitness Pop_Q array.

Step 4, population is carried out to cluster and definite main classes and secondary class

(1) utilize normalization shared information I (Pop (s _a), Pop (s _b)) tolerance population in two individual Pop (s _a) and Pop (s _b) spacing d, step is as follows:

1. the formula (2) of pressing below calculates normalization shared information I (Pop (s _a), Pop (s _b))

I (Pop (s_{A}), Pop (s_{B})) = \frac{- 2 Σ_{i = 1}^{I} Σ_{j = 1}^{J} V_{ij} \log (V_{ij} V / C_{i .} C_{. j})}{Σ_{i = 1}^{I} C_{i .} \log (C_{i .} / V) + Σ_{j = 1}^{J} C_{. j} \log (C_{. j} / V)} - - - (2)

Wherein, C is Scrambling Matrix, the capable J row of total I, and described I is the first division methods s _ain community's number of comprising, described J is the second division methods s _bin community's number of comprising.C _i.the capable element sum of i in Scrambling Matrix C, c _.jthe element sum of j row in described Scrambling Matrix C, v _ijthe first division methods s _ain the i of community and the second division methods s _bin the j of community in the nodal point number owned together: when there is no common node, V _ij=0; When having the common node of part, V _ijfor the nodal point number in its common factor; When all nodes are all identical, V _ijget the nodal point number in the i of community or the j of community; Total nodal point number described in V in complex network.

As the first division methods s _aresult and the second division methods s _bresult when identical, I (Pop (s _a), Pop (s _b))=1; As the first division methods s _aresult and the second division methods s _bresult when different, I (Pop (s _a), Pop (s _b))=0

2. by formula (3) below, calculate the result Pop (s of two kinds of division methods _a) and Pop (s _b) between distance d:

d＝1-I(Pop(s _A),Pop(s _B)) (3)

(2) utilize minimum spanning tree to carry out cluster to population Pop

Because minimum spanning tree is guaranteed two nearest summits and connects limit in generative process, guaranteed that the similarity in the various piece after disconnecting according to the threshold value of setting is higher, similarity between various piece is lower, and this meets the criterion of individual cluster in population.Therefore we introduce the minimum spanning tree that (Puli's nurse) Prim method obtains all divisions in population, the limit that surpasses threshold value by disconnecting weights in the minimum spanning tree clustering of population of can getting profit, utilize Prim method to guarantee that in population, the individual similarity in same class is higher, inhomogeneous individual similarity is lower.

The implementation procedure of population being carried out to cluster and definite main classes and secondary class is as follows:

1) by above-mentioned range formula (3), calculate the distance matrix between each Pop (s) in population Pop, this matrix is a lower triangular matrix, as follows:

2) according to step 1) result of gained utilizes Prim method to generate the minimum spanning tree with S-1 bar directed edge, and step is as follows:

1. define a structure array edge[S-1], for depositing the information on S-1 bar limit, comprising the starting point fromvex on every limit, terminal endvex, weight weight; Weight between described starting point fromvex and terminal endvex with apart from d, be inversely proportional to.

2. the first row j in above-mentioned distance matrix ₁in find out in all the other each individualities from individual Pop (s ₁) the nearest Pop of body one by one (s ₁').

3. the secondary series j in above-mentioned distance matrix ₂in find out in all the other each individualities from individual Pop (s ₁') the nearest Pop of body one by one (s ₂'), at the 3rd row j ₃in find out in all the other each individualities from individual (Pop (s ₁'), Pop (s ₂')) the nearest Pop of body one by one (s ₃') ..., until S classifies as only, obtain the minor face of S-1 bar.

4. the mean distance avg of the minor face of S-1 bar in the minimum spanning tree obtaining in calculating 3., and using the ultimate range that is less than 0.88*avg in the minor face of described S-1 bar as weight lower limit.

5. from described individual Pop (s ₁) start, travel through the minor face of S-1 bar downwards, remove all limits that weight is wherein greater than described weight lower limit, make above-mentioned minimum spanning tree be broken into a forest, complete the division to population cluster; Next the individuality in the respectively boy's spanning tree in forest is carried out to classification mark, is saved in classification array classid[S] in, classification mark comprises the sequence number of classification sequence number and individual Pop (s).

6. the class that finds best individual place is main classes Pop_zhu, and all the other are all kinds of is classified as secondary class Pop_fu, and calculates each individual fitness in Pop_fu with the fitness function of secondary class.

Formula (4) below the fitness function of secondary class:

f _δ(x)＝1-|δ-Dis(best,x)| (4)

Wherein, δ is desired main classes and the distance between secondary class, is 1 (0≤δ≤1) during initialization; Dis (best, x) is the distance between best individual best in individual x and main classes in secondary class, (0 < Dis (best, x) < 1), and the formula for calculating (3) of Dis (best, x) calculates.

Step 5, algorithm of tournament selection individuality carry out cross and variation operation

As the reproductive patterns in biological evolution process, by the exchange of two genes of individuals, combine, produce the individuality making new advances, inherited father and mother both sides' portion gene, form the new assortment of genes.In interlace operation, add algorithm of tournament selection, make the individuality intersecting have higher fitness value, add the animal migration in large search candidate solution space, accelerate the generation of optimal dividing, mutation operation is the key that produces new gene, has local search ability.Concrete cross and variation operation is as follows:

1. two individual mate1 in algorithm of tournament selection Pop_zhu, mate2, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popm, otherwise does not carry out interlace operation

2. the individuality in Popm is carried out to mutation operation, and calculate each

3. repeat 1. and 2., until produce Popsize/2 offspring's individuality

4. two individual mate1 in algorithm of tournament selection Pop_fu, mate2, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popr, otherwise does not carry out interlace operation

5. the individuality in Popr is carried out to mutation operation

6. repeat 4. and 5., until produce Popsize/2 offspring's individuality

7. body mate1 one by one in algorithm of tournament selection Pop_zhu, body mate2 one by one in Pop_fu, first sees with Bernoulli trials whether crossover probability occurs, the behaviour of intersecting if occur does: produce at random two point of crossing jcross1, jcross2, jcross1, jcross2 ∈ (1,2, ..., V), if jcross1 > is jcross2, two values exchange, and make jcross1 < jcross2; Exchange mate1, the jcross1 position of mate2, to jcross2 bit position, forms two new individualities, is kept in Popc, otherwise does not carry out interlace operation

8. the individuality in Popc is carried out to mutation operation

9. repeat 7. and 8., until produce Popsize/2 offspring's individuality

Further illustrate, above-mentioned mutation operation is the key that produces new gene, has local search ability.According to the concrete property of complex network community structure, and the inner total limit number in definition-community, weak community is greater than the limit that other parts of community and network are connected and counts sum, introduces localized mode lumpiness and define on the basis of our Ruo community definition:

M_{l} = \frac{{edge}_{in}}{{edge}_{out}} - - - (3)

Wherein, M _lrepresent that inner total limit, community counts the ratio that sum is counted on limit that sum is connected with other parts of community and network, edge _inrepresent the fillet number of inside, community, edge _outthe fillet that represents this community and other parts of network is counted sum.M _lbe worth greatlyr, this community is more reasonable.This mutation operation is pointed, has strengthened the local search ability of mutation operator, has improved search performance of the present invention.

Successively needs are participated in the individual P execution following steps of variation:

(1) according to the following steps individual P decoding is obtained to its community's division result successively:

1) obtain all directed connection limits in P, and described directed edge is sequentially arranged by the node numbering on limit

2) ergodic state on the whole described directed connection of initialization limit, set:

1. the access vector v isited on whole described directed connections limit, is the vector of a 1 * V, and component of a vector represents with 0,1, and 1 represents to travel through, and 0 represents traversal, when initial, is 0

2. the community on whole described directed connections limit numbers vectorial lables, is the vector of a 1 * V, and component of a vector represents community's numbering of node numbering, represents the division result of community, is 0 during initialization

3. loop control variable, with node numbering, v represents, v=0 when initial

3) from the loop control variable v of P ₁start traversal, do not travel through visited[v ₁]=0, community's numbering l=1; After traversal, lables[v ₁]=l, visited[v ₁]=1

4) continue execution step 3), by node numbering order traversal, until till v=V, execution step 5)

5) find out all and node v ₁the node numbering that has limit to connect but not yet travel through, composition node numbering collection u}, repetitive cycling execution step 3)～4), to node u ₁mark, lables[u ₁]=l, visited[u ₁]=1 execution step 6)

6) find out all and node u ₁limit connects, but the node not yet traveling through composition node numbering collection w}, to the node w execution step 5 in w}), until numbering collection in w}, node numbering has all traveled through, then performs step 4), up to node V, finish

(2) set variation probability P m=0.2, optionally generate at random the decimal r of 0～1, make r < Pm

(3) judge that whether the gene position v of individual P is less than the code length of described gene, if gene position v is equal to or greater than code length V, exits; If not, obtain upper each allele u as neighbor node of gene position v with and the label lables of community, execution step (4),

(4) community's label of each allele u the localized mode lumpiness M when calculating each allele u and belonging to community separately in traversal step (3) _l

(5) from the result of step (4), find out and can make M _lmaximum community's label get at random again community a node as variation, be worth

(6) repeated execution of steps (3)～step (5), until the individual P in P completes mutation operation

Operation is selected in step 6, execution according to the following steps

The candidate solution Om of the main classes population that the Popm obtaining from step 5 and Popc form according to u+ λ selection strategy for its main classes population objective function; In the candidate solution Or that Popr and Popc form, according to the diversity function of secondary class population, with u+ λ, select Popsize/2 individuality as secondary class population at individual of future generation

Operation is selected in step 7, execution according to the following steps

(1) δ is pressed to formula (5) below and carries out adaptive variation:

δ _t+1＝δ _t+α·Δt (5)

Wherein, the decimal that α is 0～1, t represents current algebraically, Δ t is as the following formula shown in (6)

Δt = \{\begin{matrix} \frac{1}{| G_{t} |} \cdot Σ_{(p, p^{'}) &Element; G_{t}} Dis (p, q) - δ_{t} & if | G_{t} | > 0 \\ 0, & otherwise \end{matrix} - - - (6)

Wherein, G _twhile being illustrated in evolutionary generation t, the parents couple of successful reproduction; | G _t| represent the right number of parents of successful reproduction; Dis (p, q) represents the distance between individual p and individual q; mean distance between the parents of expression successful reproduction, δ _trepresent t for time distance threshold; Δ t represents the poor of mean distance between the parents of successful reproduction in t generation and distance threshold

Successful reproduction comprises two kinds of situations: the one, and parents, from main classes, have offspring's individuality to be well successful reproduction than parents; Two be a parent individuality from main classes, another parent individuality, from secondary class, has offspring's individuality better than the parent individuality from main classes in parents, is successful reproduction.

Although update rule (5) is not dull, when main classes convergence in population, the δ in (5) trends towards reducing, because the mean distance between parents trends towards reducing.So secondary class population is also restrained, finally become closely similar with main classes population.Once it is very little that δ becomes, it is not just reducing, and when we notice that two populations are all restrained, if also do not find the solution of a satisfaction, we are 1 by δ is set, and redirect search.

(2) judge whether δ no longer reduces within 50 continuous generations, if so, a part of individuality of random generation enters follow-on genetic manipulation; If not, execution step eight

Step 8, repeated execution of steps four, to step 7, obtain community's optimum division

(1) set iterations T=100,

(2) carry out iterative operation,

(3) judgement iterations t:

If t≤n, returns to step 4, get n=20,0<n<T

If n<t<T, returns to step 5

(4), during t=100, obtain the best community of complex network and divide.

Detailed description experimental result of the present invention below:

In table 1, listed each method result that community divides on Polbooks network, wherein the experimental result of BGLL, CNM, PL, MOGA is taken from Clara Pizzuti and is published in the experimental result in IEEE Transaction on Evolutionary Computation.It can be seen from the table we can compare with additive method, and the inventive method (MGACD) shows outstanding performance.

The comparison of table 1 Dui Ge method community division result (the modularity function Q value that list intermediate value is each method)

Method	FN	GN	BGLL	CNM	PL	MOGA	MGACD
								Q value	0.502	0.5168	0.515	0.502	0.515	0.518	0.5225

Claims

1. the complex network community mining method based on improved genetic algorithms method, is characterized in that comprising the steps:

Step 1: computer initialization;

Q = \frac{1}{2 E} \underset{uv}{Σ} [A_{uv} - \frac{k_{u} k_{v}}{2 E}] δ (r (u), r (v))

δ _t+1＝δ _t+α·Δt

Δt = \{\begin{matrix} \frac{1}{| G_{t} |} \cdot Σ_{(p, p^{'}) &Element; G_{t}} Dis (p, q) - δ_{t} & if | G_{t} | > 0 \\ 0, & otherwise \end{matrix}

2. after population minimum spanning tree cluster according to claim 1, determine main classes and secondary class, it is characterized in that comprising the steps:

I (Pop (s_{A}), Pop (s_{B})) = \frac{- 2 Σ_{i = 1}^{I} Σ_{j = 1}^{J} V_{ij} \log (V_{ij} V / C_{i .} C_{. j})}{Σ_{i = 1}^{I} C_{i .} \log (C_{i .} / V) + Σ_{j = 1}^{J} C_{. j} \log (C_{. j} / V)}