CN103605793A - Heterogeneous social network community detection method based on genetic algorithm - Google Patents

Heterogeneous social network community detection method based on genetic algorithm Download PDF

Info

Publication number
CN103605793A
CN103605793A CN201310651893.1A CN201310651893A CN103605793A CN 103605793 A CN103605793 A CN 103605793A CN 201310651893 A CN201310651893 A CN 201310651893A CN 103605793 A CN103605793 A CN 103605793A
Authority
CN
China
Prior art keywords
community
node
population
individual
centerdot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310651893.1A
Other languages
Chinese (zh)
Inventor
刘静
焦李成
曾玉洁
马文萍
马晶晶
李阳阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201310651893.1A priority Critical patent/CN103605793A/en
Publication of CN103605793A publication Critical patent/CN103605793A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a heterogeneous social network community detection method based on a genetic algorithm and the heterogeneous social network community detection method is used for mainly solving the problem that the accuracy rate of the detected community structure is obviously reduced when the social network data and relationship are large in scale in the prior art. The implementation scheme of the method comprises the following steps: constructing an adjacency matrix for describing a heterogeneous social network according to the number of nodes in the network and information of relation among the nodes; generating random symbolic coding individual according to the adjacency matrix; evaluating the advantages and disadvantages of the individuals by taking the improved modularity density as a fitness function; optimizing the individual according to the fitness function value of the individual by adopting a genetic algorithm; reducing the optimized individual with the highest fitness function value into a corresponding heterogeneous network, and decoding to obtain a partitioned community structure. The experimental result proves that the community structure of the heterogeneous social network can be effectively detected, the detection accuracy rate is high, and the method can be used for community detection of a large-scale heterogeneous social network.

Description

Isomery community network community detection method based on genetic algorithm
Technical field
The invention belongs to community network computing technique field, particularly a kind of isomery community network community detection method, can be used for the structural research to complex social system and large scale community network.
Background technology
Social system, refers to the system consisting of economic relation, political relation and cultural relations between social man and social man and social man, such as family, political party, community are all the social systems of different levels.Social system is a kind of typical complication system, can be abstracted into and process for complex network, and being about to entity in system abstract is node, by the contact of inter-entity abstract be company limit between node, obtain a community network being formed by node and Lian Bian.Complex network abstract by social system and that obtain is called community network.
Community's test problems is an important research direction of complex network, starts be in recent years subject to the extensive concern in the fields such as computer science, biology, sociology and economics and embodied certain using value.Community in complex network refers to that some are more similar each other, and has the node cluster of larger difference with other the most of node in network.The community structure of complex network show community inner connect between tight ,Er community, connect sparse.Community's testing goal of complex network is to survey and disclose the intrinsic community structure of complex network, and community structure contributes to understand and infer the 26S Proteasome Structure and Function of whole network.Community structure can be used for protein function identification, metabolic pathway prediction, web community mining, connects in the practical problemss such as prediction.
The community network generally studied is all comprised of same node, but the formation of community network is more complicated in real life, and node species may surpass a kind of.The community network that comprises more than one nodes is called isomery community network.Such as evaluating in tag system at a film, the entity of film, label, user's three types has formed whole system, a user marks and has added a label a film, is just related like this between these three kinds of entities, between corresponding node, has the limit of company; And between different user, between different films, between different labels, be do not have associated, so with not connecting limit between category node.
Community's test problems of isomery community network can be described as: the isomery community network that comprises k class entity represents by scheming G, schemes G (V, E) and is comprised of node set V and Lian Bian set E.Wherein, node set V can regard as by k category node subset V 1, V 2..., V kform, between non-same category node, have to connect and with not connecting between category node.In such a isomery community network, according to the link division community between non-same category node, the community structure that makes to mark off has the non-same category node that the non-same category node in community connects between tight ,Er community and connects sparse feature.
At present, the community network community detection method proposing in existing document, the single traditional society's network of research node type mostly, mainly contain figure dividing method, spectral method and fast algorithm, the common ground of these methods is to build similarity matrix, and by solving proper vector, realizes the division of community structure.These methods are not all considered the situation of multiple node in network, and in actual applications, node diversity in community network is very common and be not allow to ignore, and therefore traditional community network community detection method cannot be applied to community's test problems of isomery community network.In addition, also have now a kind of community detection method of isomery community network, i.e. many relations clustering method, the method is that similarity is calculated in the contact based on multiple inter-entity, and according to similarity by entity division cluster.Though this method can mark off the community structure of isomery community network, yet because community network type is many, data are a large amount of, relation is complicated, there is scaling concern, along with community network data be related to the increase of scale, survey community structure accuracy and obviously reduce.
Summary of the invention
The object of the invention is to the deficiency for above-mentioned prior art, a kind of isomery community network community detection method based on genetic algorithm is proposed, so that isomery community network data are classified or cluster, and then realize the detection of community function and prediction, be met the isomery community network community structure compared with high-accuracy.
Technical scheme of the present invention is achieved in that
For achieving the above object, performing step of the present invention is as follows:
1) the node classification in heterogeneous network is counted to the number n of k and every category node 1, n 2..., n kadd up, obtain the total number n=n of nodes 1+ n 2+ ... + n k; By contact details between the number of every category node and node, build the k dimension adjacency matrix A that describes isomery community network, the size of A is n 1* n 2* n k;
2) make the big or small pn=50 of initial population, according to the total number n of node, produce at random the individuality of pn symbolization coding, with these individual initial population p that forms 0; Crossover probability pc=0.8 is set, variation Probability p m=0.2, first initial algebra g 0=1, maximum algebraically mg=50, current algebraically g=g 0, make g godfather for population p gequal initial population p 0, i.e. p g=p 0;
3) calculate g godfather for population p gin each individual fitness function value D:
D = Σ c = 1 m L ( V c ) - L ( V ‾ c ) | V c | - L ( V ) n ,
In formula, m is the genic value kind number in individuality, V 1, V 2..., V mbe the node set of m community being obtained by individuality, V is all node set, i.e. V=V in network 1∪ V 2∪ ... ∪ V m,
Figure BDA0000431226190000032
node set V cdifference set in V,
Figure BDA0000431226190000033
1≤c≤m, | V c| be the V of community cin node number, L (V c) be c the company's edge strip number in community,
Figure BDA0000431226190000034
be from c community, to be connected to c the company's edge strip number in community, L (V) always connects edge strip number in network;
4) according to g godfather for population p gin each individual fitness function value D, to g godfather for population p gcarry out elite's reservation operations and algorithm of tournament selection operation, the g godfather after being upgraded is for population p g';
5) produce at random the first random number rand 1, and to the first random number rand 1pc compares with crossover probability, if rand 1<pc, carries out the 6th) step; Otherwise, obtain with upgrade after g godfather for population p g' g that equates is for progeny population chp g, i.e. chp g=p g', carry out the 7th) step;
6) to the g godfather after upgrading for population p g' carry out single channel interlace operation, obtain g for progeny population chp g;
7) to g for progeny population chp gin each individual R carry out mutation operation, the g after being upgraded is for progeny population chp g';
8) make current algebraically g=g+1, current algebraically g and maximum algebraically mg are compared, if g≤mg, the parent population p in g generation gwith upgrade after g-1 for progeny population chp g-1' equate i.e. p g=chp g-1', return to step 3); Otherwise, perform step 9);
9) get g-1 after renewal for progeny population chp g-1' the middle the highest individuality of fitness function value, this individuality is reduced into corresponding heterogeneous network, the community structure that decoding obtains marking off.
The present invention has the following advantages compared with prior art:
1. the present invention is directed to the feature of isomery community network, designed the fitness function of weighing isomery community network community structure quality, not only solved conventional module density function and can not weigh the problem of heterogeneous network community structure quality, and guaranteed that the community structure of using the present invention to obtain has higher accuracy.
2. the mutation operation of the present invention's design has local search ability, makes evolutionary process easily jump out local optimum, has not only improved efficiency of evolution, and can efficiently obtain optimum solution.
3. with respect to existing heterogeneous network community detection method, the present invention can obtain the community structure that accuracy is higher to the test of user-commodity network.
Experimental result shows, the community detection method based on genetic algorithm that the present invention proposes can effectively detect the community structure of heterogeneous network.
Accompanying drawing explanation
Fig. 1 is the general flow chart of realizing of the present invention;
Fig. 2 is the mutation operation sub-process figure in the present invention;
Fig. 3 is user-commodity network topological diagram that emulation of the present invention is used;
Fig. 4 is the adjacency matrix schematic diagram of describing user-commodity network;
Fig. 5 is to the detected community structure schematic diagram of user-commodity network with the present invention.
Embodiment
For the present invention being known to user-commodity network topological diagram that Fig. 3 provides is take in description, this example, be example, but do not form any limitation of the invention, the present invention goes for all isomery community networks.
With reference to Fig. 1, implementation step of the present invention is as follows:
Node classification in step 1. pair heterogeneous network is counted the number n of k and every category node 1, n 2..., n kadd up, obtain the total number n=n of nodes 1+ n 2+ ... + n k.
As described in Figure 3, it is user-commodity heterogeneous network to the heterogeneous network of this example.In this network, have 2 kinds of nodes, node classification is counted k=2, the 1st category node representative of consumer wherein, and the 2nd category node represents commodity, the square in figure represents the 1st category node, has 50, i.e. n 1=50, circle represents the 2nd category node, has 50, i.e. n 2=50, the total number of node in network is 100, i.e. n=100;
Step 2. builds the k dimension adjacency matrix A that describes isomery community network by contact details between the number of every category node and node.
2a) according to the every category node number n in network 1, n 2..., n k, the size of setting adjacency matrix A is n 1* n 2* n k;
In the heterogeneous network of this example, the 1st category node number n 1be 50, the 2 category node number n 2be 50, setting 2 dimension adjacency matrix A sizes is 50 * 50;
2b) according to the contact details between nodes, determine the value of each element in adjacency matrix A:
If the 1st category node v in network i1, the 2nd category node v i2..., k category node v ikbetween mutually have connection, between this k node, have limit to be connected, the elements A (i in corresponding adjacency matrix 1, i 2..., i k)=1,
If the 1st category node v i1, the 2nd category node v i2..., k category node v ikbetween be not mutually to have connection, between this k node, do not have limit to be connected, the elements A (i in corresponding adjacency matrix 1, i 2..., i k)=0, wherein 1≤i 1≤ n 1, 1≤i 2≤ n 2..., 1≤i k≤ n k;
In this example, if the i in user-commodity heterogeneous network 1individual user is to i 2individual commodity have a purchaser record, A (i 1, i 2)=1; If the i in user-commodity heterogeneous network 1individual user is to i 2individual commodity do not have purchaser record, A (i 1, i 2)=0, wherein 1≤i 1≤ 50,1≤i 2≤ 50, as shown in Figure 4.In Fig. 4, horizontal ordinate represents the 1st category node label, and ordinate represents the 2nd category node label, and in figure, stain represents that the element value that transverse and longitudinal coordinate is corresponding is 1.
Step 3. makes the big or small pn=50 of initial population, produces at random the individuality of pn symbolization coding according to the total number n of node, with these individual initial population p that forms 0.
The individuality that above-mentioned symbolic coding obtains, refers to each individual R=(r 1, r 2..., r i..., r n) in each gene r iget at random the integer between 1 to n, wherein, 1≤i≤n, i genic value r irepresent i node v icommunity's label at place.
In the heterogeneous network of this example, symbolization coding produces 50 individualities, wherein each individual R=(r 1, r 2..., r i..., r 100) in each gene r iget at random the integer between 1 to 100, wherein, 1≤i≤100.
Step 4. arranges crossover probability pc=0.8, variation Probability p m=0.2, first initial algebra g 0=1, maximum algebraically mg=50, current algebraically g=g 0, make g godfather for population p gequal initial population p 0, i.e. p g=p 0;
Step 5. is calculated g godfather for population p gin each individual fitness function value D:
5a) according to g godfather for population p gin individuality, obtain m genic value kind in individuality, obtain m the node set in community simultaneously and be: V 1, V 2..., V c, V m, 1≤c≤m; The all node set that obtained in network by the union of m node set are V, i.e. V=V 1∪ V 2∪ ... ∪ V m, wherein ∪ is union operational symbol, according to c node set V cwith all node set V, obtain c node set V cdifference set in all node set V is V &OverBar; c , ? V &OverBar; c = V - V c ;
5b) according to adjacency matrix A, calculate c the company's edge strip in community and count L (V c):
L ( V c ) = &Sigma; v i 1 , v i 2 , &CenterDot; &CenterDot; &CenterDot; , v ik &Element; V c A ( i 1 , i 2 , &CenterDot; &CenterDot; &CenterDot; , i k ) ;
5c) according to adjacency matrix A, calculate company's edge strip number that Cong Ge c community is connected to other communities except c community in network
Figure BDA0000431226190000064
L ( V &OverBar; c ) = &Sigma; v i 1 , v i 2 , &CenterDot; &CenterDot; &CenterDot; , v ik A ( i 1 , i 2 , &CenterDot; &CenterDot; &CenterDot; , i k ) ,
Wherein, k node v i1, v i2..., v ikin have a node at least at c node set V cin, at least separately there is a node at c node set V cdifference set in all node set V in;
5d) according to adjacency matrix A, the edge strip that always connects calculating in network is counted L (V):
L ( V ) = &Sigma; v i 1 , v i 2 , &CenterDot; &CenterDot; &CenterDot; , v ik &Element; V A ( i 1 , i 2 , &CenterDot; &CenterDot; &CenterDot; , i k ) ;
5e) according to the company's edge strip in above-mentioned c the community calculating, count L (V c), from c community, be connected to company's edge strip number of other communities except c community network
Figure BDA0000431226190000068
count L (V) with the edge strip that always connects in network, calculate g godfather for population p gin each individual corresponding fitness function value D:
D = &Sigma; c = 1 m L ( V c ) - L ( V &OverBar; c ) | V c | - L ( V ) n ,
Wherein, | V c| be c node set V cin node number.
Step 6. according to g godfather for population p gin each individual fitness function value D, to g godfather for population p gcarry out elite's reservation operations and algorithm of tournament selection operation, the g godfather after being upgraded is for population p g'.
In the prior art, the selection operation of genetic algorithm has elite's reservation operations, roulette to select operation, algorithm of tournament selection operation, random ergodic select operation and block and select operation etc., because elite's reservation operations wherein can make classic individuality in parent population preserve, and algorithm of tournament selection operation can make the parent population obtaining after operation have more much higher sample, so the present invention has adopted elite's reservation operations and algorithm of tournament selection operation.
Concrete is described below:
6a) establishing new population pnew is empty set, carries out elite's reservation operations 1 time, is about to parent population p gthe individuality of middle fitness function value maximum is put into new population pnew;
6b) carry out pn-1 algorithm of tournament selection operation, at parent population p gin choose at random two individualities, relatively these two individual fitness function values sizes, put into new population pnew by the larger individuality of fitness function value;
6c) the g godfather after order renewal is for population p g' equal new population pnew, i.e. p g'=pnew.
The random first random number rand that produces of step 7. 1, and to the first random number rand 1pc compares with crossover probability, if rand 1<pc, performs step 8; Otherwise, obtain with upgrade after g godfather for population p g' g that equates is for progeny population chp g, i.e. chp g=p g', perform step 9.
G godfather after step 8. pair renewal is for population p g' carry out single channel interlace operation, obtain g for progeny population chp g.
In prior art, the interlace operation of genetic algorithm has single-point interlace operation, multiple spot interlace operation, uniform crossover operator, discrete interlace operation and single channel interlace operation etc., due to single channel wherein, intersecting is a kind of interlace operation that is more suitable for symbolic coding individuality, so the present invention has adopted single channel interlace operation.
The concrete operations that single channel is intersected are as follows:
8a) establish iterations q=1, establish g for progeny population chp gfor empty set;
8b) random g godfather after renewal is for population p g' two individual R of middle selection 1and R 2, and by R 1individual as source, by R 2individual as object, for example, suppose the total number n=5 of nodes, random g godfather after renewal is for population p g' two individual R of middle selection 1=(3,2,4,4,2) and R 2=(4,1,3,4,1), by R 1individual as source, by R 2individual as object;
8c) be chosen at random the integer i between 1 to n, establish genes of interest value e and equal the individual R in source 1in i genic value r 1 i, i.e. e=r 1 i, establishing nodal scheme set U is empty set, the individual R of reference source 1in each genic value r 1 jwith genes of interest value e, if r 1 j=e, puts into nodal scheme set U by nodal scheme j, obtains nodal scheme set V to be changed e=U, wherein 1≤j≤n, chooses the integer i being chosen between 1 to 5 at random, obtains i=3; If genes of interest value e equals the individual R in source 1in the 3rd genic value r 1 3, i.e. e=r 1 3, e=4, establishing nodal scheme set U is empty set, the individual R of reference source 1in each genic value r 1 jwith genes of interest value e, the individual R in source 1the node that middle genic value equals genes of interest value e is the 3rd and the 4th, nodal scheme 3 and 4 is put into set U, obtains nodal scheme set V to be changed 4=U;
8d) for nodal scheme set V to be changed ein each nodal scheme j, by the individual R of object 2j genic value r of middle correspondence 2 jchange into genes of interest value e, i.e. r 2 j=e, the new individual R after being intersected 2', by new individual R 2' put into g for progeny population chp gin, for nodal scheme set V to be changed 4in each nodal scheme 3 and 4, by R 2the 3rd of middle correspondence and the 4th genic value are all changed into genes of interest value e, i.e. R 2in r 2 3=e, r 2 4=e, the new individual R after being intersected 2'=(4, Isosorbide-5-Nitrae, 4,1), by new individual R 2' put into g for progeny population chp gin;
8e) make q=q+1, compare iterations q and parent Population Size np, if q≤np returns to step 8b); Otherwise, finish to carry out.
Step 9. couple g is for progeny population chp gin each individual R carry out mutation operation, the g after being upgraded is for progeny population chp g'.
With reference to Fig. 2, the specific descriptions of this step are as follows:
9a) establish iterations q=1;
9b) produce at random the second random number rand 2m compares with variation Probability p, if rand 2<pm, forwards step 9c to); Otherwise, forward step 9f to);
9c) establishing genic value set L is empty set, by adjacency matrix A, is obtained and q node v qeach the node v that has connection j, by each node v jj genic value r of correspondence in individual R jput into genic value set L, 1≤j≤n wherein, establishing maximum localized mode lumpiness max is minus infinity, i.e. max=-∞;
9d) for each the genic value r in genic value set L j, suppose q genic value r in individual R qequal genic value r j, calculate r under this supposed situation jthe localized mode lumpiness f of individual community rj, and compare r jthe localized mode lumpiness function f of individual community rjwith maximum localized mode lumpiness max, if f rj>max, makes max=f rj, order obtains the label l of community of maximum localized mode lumpiness q=r j;
Wherein, r jthe localized mode lumpiness f of individual community rjcomputing formula is as follows:
f r j = &Sigma; v i 1 , v i 2 , &CenterDot; &CenterDot; &CenterDot; , v it , &CenterDot; &CenterDot; &CenterDot; v ik V r j ( A ( i 1 , i 2 , &CenterDot; &CenterDot; &CenterDot; , i t , &CenterDot; &CenterDot; &CenterDot; , i k ) - d i 1 d i 2 &CenterDot; &CenterDot; &CenterDot; d it &CenterDot; &CenterDot; &CenterDot; d ik ( 2 | V | ) k - 1 ) ,
In formula, V rjbe r jnode set in individual community, 1≤c≤n wherein, d itfor node v itdegree, be connected to node v itcompany's limit number, 1≤i wherein t≤ n t, 1≤t≤k, | V| always connects limit number in network;
If each genic value r 9e) in genic value set L jcorresponding localized mode lumpiness f rjall calculate completely, make q genic value r in individual R qequal to obtain the label l of community of maximum localized mode lumpiness q, i.e. r q=l q, execution step 9f); Otherwise, return to step 9d);
9f) make q=q+1, and compare iterations q and the total number n of node, if q≤n returns to step 9b); Otherwise, finish to carry out.
Step 10. makes current algebraically g=g+1, current algebraically g and maximum algebraically mg is compared, if g≤mg makes the g parent population p in generation gwith upgrade after g-1 for progeny population chp g-1' equate i.e. p g=chp g-1', return to step 5; Otherwise, perform step 11.
Step 11. is got g-1 after renewal for progeny population chp g-1' the middle the highest individuality of fitness function value, this individuality is reduced into corresponding heterogeneous network, the community structure that decoding obtains marking off.
Effect of the present invention can be verified by following emulation experiment:
1. test running environment and evaluation criterion
The environment of experiment operation: processor is Intel (R) Core (TM) 2Duo CPU E6550 2.33GHz, inside saves as 1.99GB, and hard disk is 120G, and operating system is Microsoft windows7, and programmed environment is MATLAB7.13.
The community structure quality that this experimental selection normalized mutual information NMI comes evaluation experimental method to detect as the evaluation criterion of community structure:
NMI = H ( C 0 ) + H ( C e ) - H ( C 0 , C e ) H ( C 0 ) H ( C e ) ,
In above formula, C 0represent real community structure, C erepresent that experiment detects the community structure obtaining, H (C) represents the shannon entropy of community structure C.If community structure and Fiel's plot structure that experiment is found are on all four, the value of NMI is maximal value 1; If community structure and Fiel's plot structure that experiment is found are completely independently, the value of NMI is minimum value 0.
2. experiment content and interpretation of result
Emulation one, carries out community structure detection by the inventive method to the user-commodity network shown in Fig. 3, and as shown in Figure 5, in Fig. 5, the representative of the node of different gray scales is in different communities for testing result.
As can be seen from Figure 5, in user-commodity isomery community network, 3 communities have been detected.From the contrast of Fig. 5 and Fig. 3, can find out, the community structure that obtains of experiment can make to connect closely non-similar node division in same community, and makes to connect sparse non-similar node division in different communities.
Utilize above-mentioned evaluation criterion to calculate the present invention and normalized mutual information value NMI corresponding to community structure detected 1, obtain NMI 1=1, i.e. the community structure that the present invention detects has very high accuracy.Therefore the present invention is a kind of effective community detection method.
Emulation two, user-commodity network that the Multicomm method that adopts the people such as Xutao Li to propose represents Fig. 3 carries out community's detection, and utilizes above-mentioned evaluation criterion calculating Multicomm method normalized mutual information value NMI corresponding to community structure to be detected 2, obtain NMI 2=0.9315.
The normalized mutual information NMI that Multicomm method is obtained 2=0.9315 normalized mutual information NMI obtaining with the present invention 1=1 compares, and found that the community structure that the present invention detects has higher accuracy, and experiment effect of the present invention is better than Multicomm method.

Claims (6)

1. the isomery community network community detection method based on genetic algorithm, comprises the steps:
1) the node classification in heterogeneous network is counted to the number n of k and every category node 1, n 2..., n kadd up, obtain the total number n=n of nodes 1+ n 2+ ... + n k; By contact details between the number of every category node and node, build the k dimension adjacency matrix A that describes isomery community network, the size of A is n 1* n 2* n k;
2) make the big or small pn=50 of initial population, according to the total number n of node, produce at random the individuality of pn symbolization coding, with these individual initial population p that forms 0; Crossover probability pc=0.8 is set, variation Probability p m=0.2, first initial algebra g 0=1, maximum algebraically mg=50, current algebraically g=g 0, make g godfather for population p gequal initial population p 0, i.e. p g=p 0;
3) calculate g godfather for population p gin each individual fitness function value D:
D = &Sigma; c = 1 m L ( V c ) - L ( V &OverBar; c ) | V c | - L ( V ) n ,
In formula, m is the genic value kind number in individuality, V 1, V 2..., V mbe the node set of m community being obtained by individuality, V is all node set, i.e. V=V in network 1∪ V 2∪ ... ∪ V m,
Figure FDA0000431226180000012
node set V cdifference set in V,
Figure FDA0000431226180000013
1≤c≤m, | V c| be the V of community cin node number, L (V c) be c the company's edge strip number in community,
Figure FDA0000431226180000014
shi Congge c community is connected to company's edge strip number of other communities except c community in network, and L (V) always connects edge strip number in network;
4) according to g godfather for population p gin each individual fitness function value D, to g godfather for population p gcarry out elite's reservation operations and algorithm of tournament selection operation, the g godfather after being upgraded is for population p g';
5) produce at random the first random number rand 1, and to the first random number rand 1pc compares with crossover probability, if rand 1<pc, carries out the 6th) step; Otherwise, obtain with upgrade after g godfather for population p g' g that equates is for progeny population chp g, i.e. chp g=p g', carry out the 7th) step;
6) to the g godfather after upgrading for population p g' carry out single channel interlace operation, obtain g for progeny population chp g;
7) to g for progeny population chp gin each individual R carry out mutation operation, the g after being upgraded is for progeny population chp g';
8) make current algebraically g=g+1, current algebraically g and maximum algebraically mg are compared, if g≤mg, the parent population p in g generation gwith upgrade after g-1 for progeny population chp g-1' equate i.e. p g=chp g-1', return to step 3); Otherwise, perform step 9);
9) get g-1 after renewal for progeny population chp g-1' the middle the highest individuality of fitness function value, this individuality is reduced into corresponding heterogeneous network, the community structure that decoding obtains marking off.
2. the isomery community network community detection method based on genetic algorithm as claimed in claim 1, wherein the number between the every category node of the use described in step 1) and contact details build the k dimension adjacency matrix A that describes isomery community network, according to following rule, carry out:
If the 1st category node v i1, the 2nd category node v i2..., k category node v ikbetween mutually have connection, between this k node, have limit to be connected, the elements A (i in corresponding adjacency matrix 1, i 2..., i k)=1, wherein 1≤i 1≤ n 1, 1≤i 2≤ n 2..., 1≤i k≤ n k;
If the 1st category node v i1, the 2nd category node v i2..., k category node v ikbetween be not mutually to have connection, between this k node, do not have limit to be connected, the elements A (i in corresponding adjacency matrix 1, i 2..., i k)=0.
3. the isomery community network community detection method based on genetic algorithm as claimed in claim 1, wherein step 2) described according to the total number n of node, produce at random the individuality that pn symbolization encoded, refer to each individual R=(r 1, r 2..., r i..., r n) in each gene r iget at random the integer between 1 to n, wherein, 1≤i≤n, i genic value r irepresent i node v icommunity's label at place.
4. the isomery community network community detection method based on genetic algorithm as claimed in claim 1, wherein described in step 4) to g godfather for population p gcarry out elite's reservation operations and algorithm of tournament selection operation, carry out in accordance with the following steps:
4a) establishing new population pnew is empty set, carries out elite's reservation operations 1 time, is about to parent population p gthe individuality of middle fitness function value maximum is put into new population pnew;
4b) carry out pn-1 algorithm of tournament selection operation, at parent population p gin choose at random two individualities, relatively these two individual fitness function values sizes, put into new population pnew by the larger individuality of fitness function value;
4c) the g godfather after order renewal is for population p g' equal new population pnew, i.e. p g'=pnew.
5. the isomery community network community detection method based on genetic algorithm as claimed in claim 1, wherein described in step 6) to the g godfather after upgrading for population p g' carry out single channel interlace operation, carry out in accordance with the following steps:
5a) establish iterations q=1, establish g for progeny population chp gfor empty set;
5b) random g godfather after renewal is for population p g' two individual R of middle selection 1and R 2, and by R 1individual as source, by R 2individual as object;
5c) be chosen at random the integer i between 1 to n, establish genes of interest value e and equal the individual R in source 1in i genic value r 1 i, i.e. e=r 1 i, establishing nodal scheme set U is empty set, the individual R of reference source 1in each genic value r 1 jwith genes of interest value e, if r 1 j=e, puts into nodal scheme set U by nodal scheme j, obtains nodal scheme set V to be changed e=U, wherein 1≤j≤n;
5d) for nodal scheme set V to be changed ein each nodal scheme j, by the individual R of object 2j genic value r of middle correspondence 2 jchange into genes of interest value e, i.e. r 2 j=e, the new individual R after being intersected 2', by new individual R 2' put into g for progeny population chp gin;
5e) make q=q+1, compare iterations q and parent Population Size np, if q≤np returns to step 5b); Otherwise, finish to carry out.
6. the isomery community network community detection method based on genetic algorithm as claimed in claim 1, wherein step 7) described to g for progeny population chp gin each individual R carry out mutation operation, carry out in accordance with the following steps:
6a) establish iterations q=1;
6b) produce at random the second random number rand 2m compares with variation Probability p, if rand 2<pm, forwards step 6c to); Otherwise, forward step 6f to);
6c) establishing genic value set L is empty set, by adjacency matrix A, is obtained and q node v qeach the node v that has connection j, by each node v jj genic value r of correspondence in individual R jput into genic value set L, wherein 1≤j≤n, is made as minus infinity by maximum localized mode lumpiness max, i.e. max=-∞;
6d) to each the genic value r in genic value set L j, suppose q genic value r in individual R qequal genic value r j, calculate r under this supposed situation jthe localized mode lumpiness f of individual community rj, and compare r jthe localized mode lumpiness letter f of individual community rjwith maximum localized mode lumpiness max, if f rj>max, makes max=f rj, order obtains the label l of community of maximum localized mode lumpiness q=r j;
Wherein, r jthe localized mode lumpiness f of individual community rjcomputing formula is as follows:
f r j = &Sigma; v i 1 , v i 2 , &CenterDot; &CenterDot; &CenterDot; , v it , &CenterDot; &CenterDot; &CenterDot; v ik V r j ( A ( i 1 , i 2 , &CenterDot; &CenterDot; &CenterDot; , i t , &CenterDot; &CenterDot; &CenterDot; , i k ) - d i 1 d i 2 &CenterDot; &CenterDot; &CenterDot; d it &CenterDot; &CenterDot; &CenterDot; d ik ( 2 | V | ) k - 1 ) ,
In formula, V rjbe r jthe node set of individual community, 1≤c≤n wherein, d itfor node v itdegree, be connected to node v itcompany's limit number, 1≤i wherein t≤ n t, 1≤t≤k, | V| always connects limit number in network;
If each genic value r 6e) in genic value set L jcorresponding localized mode lumpiness f rjall calculate completely, make q genic value r in individual R qequal to obtain the label l of community of maximum localized mode lumpiness q, i.e. r q=l q, execution step 6f); Otherwise, return to step 6d);
6f) make q=q+1, and compare iterations q and the total number n of node, if q≤n returns to step 6b); Otherwise, finish to carry out.
CN201310651893.1A 2013-12-04 2013-12-04 Heterogeneous social network community detection method based on genetic algorithm Pending CN103605793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310651893.1A CN103605793A (en) 2013-12-04 2013-12-04 Heterogeneous social network community detection method based on genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310651893.1A CN103605793A (en) 2013-12-04 2013-12-04 Heterogeneous social network community detection method based on genetic algorithm

Publications (1)

Publication Number Publication Date
CN103605793A true CN103605793A (en) 2014-02-26

Family

ID=50124015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310651893.1A Pending CN103605793A (en) 2013-12-04 2013-12-04 Heterogeneous social network community detection method based on genetic algorithm

Country Status (1)

Country Link
CN (1) CN103605793A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN104199884A (en) * 2014-08-19 2014-12-10 东北大学 Social networking service viewpoint selection method based on R coverage rate priority
CN104318306A (en) * 2014-10-10 2015-01-28 西安电子科技大学 Non-negative matrix factorization and evolutionary algorithm optimized parameter based self-adaption overlapping community detection method
CN104484365A (en) * 2014-12-05 2015-04-01 华中科技大学 Method and system for predicting social relation in multi-source heterogeneous networks
CN109150237A (en) * 2018-08-15 2019-01-04 桂林电子科技大学 A kind of robust multi-user detector design method
CN109166022A (en) * 2018-08-01 2019-01-08 浪潮通用软件有限公司 Screening technique based on fuzzy neural network and genetic algorithm
CN109726001A (en) * 2018-12-29 2019-05-07 中山大学 A kind of genetic algorithm for heterogeneous system
CN110334264A (en) * 2019-06-27 2019-10-15 北京邮电大学 A kind of community detection method and device for isomery dynamic information network
CN112351033A (en) * 2020-11-06 2021-02-09 北京石油化工学院 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network
WO2021227130A1 (en) * 2020-05-13 2021-11-18 深圳计算科学研究院 Heterogeneous network community detection method, device, computer apparatus, and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN103942308B (en) * 2014-04-18 2017-04-05 中国科学院信息工程研究所 The detection method and device of extensive myspace
CN104199884B (en) * 2014-08-19 2017-09-22 东北大学 A kind of social networks point of observation choosing method preferential based on R coverage rates
CN104199884A (en) * 2014-08-19 2014-12-10 东北大学 Social networking service viewpoint selection method based on R coverage rate priority
CN104318306A (en) * 2014-10-10 2015-01-28 西安电子科技大学 Non-negative matrix factorization and evolutionary algorithm optimized parameter based self-adaption overlapping community detection method
CN104318306B (en) * 2014-10-10 2017-03-15 西安电子科技大学 Self adaptation based on Non-negative Matrix Factorization and evolution algorithm Optimal Parameters overlaps community detection method
CN104484365A (en) * 2014-12-05 2015-04-01 华中科技大学 Method and system for predicting social relation in multi-source heterogeneous networks
CN104484365B (en) * 2014-12-05 2017-12-12 华中科技大学 In a kind of multi-source heterogeneous online community network between network principal social relationships Forecasting Methodology and system
CN109166022A (en) * 2018-08-01 2019-01-08 浪潮通用软件有限公司 Screening technique based on fuzzy neural network and genetic algorithm
CN109150237A (en) * 2018-08-15 2019-01-04 桂林电子科技大学 A kind of robust multi-user detector design method
CN109726001A (en) * 2018-12-29 2019-05-07 中山大学 A kind of genetic algorithm for heterogeneous system
CN110334264A (en) * 2019-06-27 2019-10-15 北京邮电大学 A kind of community detection method and device for isomery dynamic information network
WO2021227130A1 (en) * 2020-05-13 2021-11-18 深圳计算科学研究院 Heterogeneous network community detection method, device, computer apparatus, and storage medium
CN112351033A (en) * 2020-11-06 2021-02-09 北京石油化工学院 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network
CN112351033B (en) * 2020-11-06 2022-09-13 北京石油化工学院 Deep learning intrusion detection method based on double-population genetic algorithm in industrial control network

Similar Documents

Publication Publication Date Title
CN103605793A (en) Heterogeneous social network community detection method based on genetic algorithm
Corizzo et al. Anomaly detection and repair for accurate predictions in geo-distributed big data
Fronza et al. Failure prediction based on log files using random indexing and support vector machines
Huang et al. A broader picture of random-walk based graph embedding
Olteanu et al. On-line relational and multiple relational SOM
Liang et al. Structure of the global virtual carbon network: revealing important sectors and communities for emission reduction
CN104731962A (en) Method and system for friend recommendation based on similar associations in social network
US9075734B2 (en) Method for updating betweenness centrality of graph
CN103810288B (en) Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
CN104134159A (en) Method for predicting maximum information spreading range on basis of random model
CN104200272A (en) Complex network community mining method based on improved genetic algorithm
De Andrés et al. A HYBRID DEVICE OF SELF ORGANIZING MAPS (SOM) AND MULTIVARIATE ADAPTIVE REGRESSION SPLINES (MARS) FOR THE FORECASTING OF FIRMS'BANKRUPTCY.
Deshpande et al. PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets
CN104268629A (en) Complex network community detecting method based on prior information and network inherent information
Achar et al. RNA motif discovery: a computational overview
CN102722578A (en) Unsupervised cluster characteristic selection method based on Laplace regularization
CN109783805A (en) A kind of network community user recognition methods and device
CN105740949A (en) Group global optimization method based on randomness best strategy
Huang et al. The Mahalanobis–Taguchi system–Neural network algorithm for data-mining in dynamic environments
Wind et al. Link prediction in weighted networks
Wu et al. Automatic network clustering via density-constrained optimization with grouping operator
Király et al. Geodesic distance based fuzzy c-medoid clustering–searching for central points in graphs and high dimensional data
Kuang et al. Coarformer: Transformer for large graph via graph coarsening
CN106651461A (en) Film personalized recommendation method based on gray theory
CN104899283A (en) Frequent sub-graph mining and optimizing method for single uncertain graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140226

WD01 Invention patent application deemed withdrawn after publication