CN102571431B - Group concept-based improved Fast-Newman clustering method applied to complex network - Google Patents

Group concept-based improved Fast-Newman clustering method applied to complex network Download PDF

Info

Publication number
CN102571431B
CN102571431B CN201210004690.9A CN201210004690A CN102571431B CN 102571431 B CN102571431 B CN 102571431B CN 201210004690 A CN201210004690 A CN 201210004690A CN 102571431 B CN102571431 B CN 102571431B
Authority
CN
China
Prior art keywords
community
network
node
limit
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210004690.9A
Other languages
Chinese (zh)
Other versions
CN102571431A (en
Inventor
童超
戴彬
牛建伟
韩军威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Zhongcheng information Polytron Technologies Inc
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201210004690.9A priority Critical patent/CN102571431B/en
Publication of CN102571431A publication Critical patent/CN102571431A/en
Application granted granted Critical
Publication of CN102571431B publication Critical patent/CN102571431B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a group concept-based improved Fast-Newman clustering method applied to a complex network. According to the invention, the group concept is introduced; the adjacent cluster concept is confined according to the characteristics of complex network cluster structure; a modularity evaluation function proposed by Newman is improved, the maximal modularity evaluation functional value is saved, and the problem that the clustering precision is not highest at global maximum is solved, so the clustering result can more accurately reflect the real network cluster structure. Compared with a conventional FN clustering method, according to the method provided by the invention, the precision of the cluster analysis for the large scale complex network is greatly improved; and especially for the familiar complex network with large size, sparse connection and uneven relation, the clustering effect is more remarkable.

Description

For complex network based on the improved Fast-Newman clustering method of group's thought
Technical field
The invention belongs to the Data Mining of community network, for the cluster of complex network cluster structure, be specifically related to a kind of optimization class clustering method that improves target function based on group's thought.
Background technology
Along with the development of the subjects such as computer, mathematics, physics, biology, sociology, complexity science, it is found that, numerous systems in real world all exist with the form of complex network, as internet, mobile telephone network, band blank sheet of paper DIALOGUES, neuron net etc.Due to the isomerism of this class nodes and annexation, clustering architecture (cluster structure) becomes one of the most general and most important topological structure attribute of complex network.Network cluster structure has that bunch interior nodes interconnects closely, bunch intermediate node connects sparse feature.Research complex network clustering algorithm and the real network cluster structure of announcement are the bases of the great number of issues such as propagation velocity and the behavior of scope and prediction nodes of Analysis of Complex nodes relation evolutionary process, signal or information in time in network, have important theory significance.Meanwhile, clustering algorithm has been applied to the various fields such as terroristic organization's identification, social network analysis and organization and administration, agnoprotein matter function prediction, the identification of master control gene and Web community mining and search engine, has broad application prospects.
Early stage complex network clustering algorithm has spectral method and Kernighan-Lin algorithm (KL algorithm).Spectral method is a figure by complex network modeling, and clustering problem is changed into quadratic form optimization problem, minimizes predefined " cutting function ", thereby produce the effect of cutting apart network by calculating the characteristic vector of Special matrix.Spectral method need to rely on priori while termination, and two points of strategies of its recurrence balance have obvious inferior position for many bunches of network configurations.KL algorithm is cut apart thought based on figure equally, using between minimization bunch, connect with bunch in the difference of linking number as optimization aim, by clustering architecture under continuous knot modification, select and accept to make the candidate solution of target function minimization.KL algorithm is same in application relies on priori, and very responsive to initial solution, and bad initial solution can cause that cluster process convergence rate is slow and result is poor.
2002, the people such as Flake proposed heuristic clustering algorithm Maximum Flow Community (MFC algorithm) based on max-flow min-cut theorem.Flake thinks to have in the network of clustering architecture, network " bottleneck " by bunch between connect and compose, MFC algorithm by calculate minimum cut set, recognition network " bottleneck ", delete bunch between connect, network is divided into clustering architecture gradually.But MFC algorithm carries out cluster based on connection, be not suitable for the network of node isomery.In the same year, Girvan and Newman have proposed Girvan-Newman algorithm (GN algorithm).This algorithm uses heuristic rule equally, by the limit betweenness in computing network repeatedly, between identifying and delete bunch, connects, and generates a top-down hierarchical clustering and sets.The shortcoming of GN algorithm maximum is that amount of calculation is excessive, and algorithm the convergence speed is slow, is not suitable for being applied to large scale network.
2004, the Fast-Newman algorithm (FN algorithm) that Newman has proposed, this algorithm is a kind of optimized algorithm, optimization aim is the famous mixed-media network modules mixed-media evaluation function (or claiming Q function) that Newman and Girvan proposed in the same year.Under initial condition, FN algorithm is regarded each node as one bunch, by maximize the union operation of Q function in iterative process, calculates the bottom-up clustering architecture relational tree that comprises hierarchical clustering process.Based on Q function, Guimera and Amaral have proposed to merge the Guimera-Amaral algorithm (GA algorithm) of simulated annealing, this algorithm is evaluated its quality by Q functional value corresponding to calculated candidate solution, and determining whether accept candidate solution by the Metropolis criterion of simulated annealing strategy, this algorithm is the highest algorithm of current clustering precision.In addition, a lot of complex network clustering algorithms are all to maximize Q function as optimization aim, and this class algorithm has solved the excessively slow problem of convergence rate in initial solution and heuritic approach of depending on unduly.
But the optimization of Q function still exists defect: first, the network cluster structure quality that the clustering algorithm based on optimizing thought identifies depends on the target function of optimization completely, the target function of " having partially " can cause the solution of " having partially ".Because Q function is to have inclined to one side target function, so clustering precision is not the highest in the time that Q function reaches global maximum, optimized algorithm cluster result now can not be portrayed to entirely accurate real network cluster structure.Secondly, along with the continuous expansion of complex network scale, in optimized algorithm, target function value calculating and the time complexity of iterative process own improve constantly, and the time and the resource that cause cluster computing to consume are more and more.
Summary of the invention
The defect existing for the optimization of Q function in current FN algorithm: clustering precision is not the highest in the time that Q function reaches global maximum, cluster result now can not be portrayed to entirely accurate real network cluster structure, and along with the continuous expansion of complex network scale, time and resource that cluster consumes are more and more, the present invention proposes a kind of for complex network based on the improved Fast-Newman clustering method of group's thought.
The present invention propose a kind of for complex network based on the improved Fast-Newman clustering method of group's thought, specifically comprise the steps:
Step 1: all nodes in statistics network, and be each node sequence numbering, establish node and add up to N, the numbering that i is node, 1≤i≤N, to the each node i in network, society's area code that its place is set is i;
Step 2: for each node i creates a community structure, and be provided for for each community the survival mark alive that represents whether this community exists, node i is added in the community member of the i of community, the value that the parameter alive of this community structure is set is ture, ture represents that this community exists, and false represents that this community does not exist; It is nodes N total in network that the sum nalive of community existing in current network is set;
Step 3: to each i of community, determine that its inner limit counts in_edge[i] with and inner number of degrees degree[i];
Step 4: to every couple of i of community, j, cross_edge[i is counted on definite limit between the two] [j], 1≤i≤N, 1≤j≤N, and i ≠ j;
Step 5: the modularity evaluation function value Q ' [i] that determines each i of community:
Q ′ [ i ] = Σ i = 1 nalive [ m i m - d i 2 m q d q 2 m ] - - - ( 1 )
Wherein, m represents the limit number of whole network, m irepresent that the limit in the i of community counts in_edge[i], d irepresent the degree sum degree[i of all nodes in the i of community], q represents the group that the i of community is corresponding, m qrepresent the limit number in group q, d qrepresent the degree sum of all nodes in group q; The group q that the i of community is corresponding refers to the set of the i of community and the adjacent community of i of community; Described adjacent community is defined as: if at least exist arbitrary node in a node and the p of community to exist at least one to connect limit in the i of community, the i of community and the p of community are exactly adjacent community;
Step 6: variable maxQ ' is set, for preserving the maximum Q ' value of current network community;
Step 7: judge in current network whether have the community that is greater than, if exist, enumerate communities all in current network to i, j, then perform step 8; Otherwise, execution step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j;
Step 8: judge that communities all in current network is to whether all being got, if do not have, get arbitrarily a pair of community of not getting to i, j, if all got, goes to step 12 execution;
Step 9: judge the limit that whether has connection between the i of community and the j of community, if exist, execution step 10, if do not exist, goes to step 8 execution;
Step 10: supposition merges the i of community and the j of community to obtain the i ' of new communities, i ' is new communities number, determine that total limit of the inside of the i ' of new communities counts in_edge[i '] and inner total number of degrees degree[i '], then determine the modularity evaluation function value Q ' [i '] of the i ' of new communities:
Q ′ [ i ′ ] = Σ i ′ = 1 nalive ′ [ m i ′ m - d i ′ 2 m q ′ d q ′ 2 m ] - - - ( 2 )
Wherein, nalive ' carries out by the i of community and the j of community the community's sum existing in the current network under combination situation for supposition, and its value is the sum nalive-1 of community existing in current network; Q ' represents the group of the i ' of community correspondence, and m represents the limit number of whole network, m i 'represent that the interior limit of the i ' of community counts in_edge[i '], m q 'represent the interior limit number of group q ', d i 'represent the degree sum of the interior all nodes of the i ' of community, d q 'represent the degree sum of the interior all nodes of group q ';
Step 11: whether the modularity evaluation function value Q ' [i '] relatively obtaining is greater than the variable maxQ ' of current maximum Q ' value, if not, does not upgrade, and goes to step 8 execution; If so, the modularity evaluation function value Q ' [i] that the value of upgrading maxQ ' is new communities, and the j of community is merged in the i of community, then go to step 7 execution;
Step 12: preserve the middle maximum Q ' value of current variable maxQ ', and final community partition structure, then method ends.
Advantage of the present invention and good effect are: the inventive method is by introducing group, reset mixed-media network modules mixed-media evaluation function (Q function), it is not the highest problem that clustering precision has been avoided in the time reaching global maximum, the cluster result obtaining can be portrayed real network cluster structure more exactly, the ratio of precision that large-scale complex network clustering is analyzed adopts the precision of the clustering method of legacy network modularity evaluation function (Q function) to want high, large in network size, and connect in network sparse or that annexation is inhomogeneous, Clustering Effect is especially outstanding.
Brief description of the drawings
Fig. 1 is the overall flow chart of steps of clustering method of the present invention;
Fig. 2 adopts clustering method of the present invention and the Clustering Effect comparison diagram of the method that adopts traditional Q function to carry out cluster in data set " Neural Network ", wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
Fig. 3 adopts clustering method of the present invention and the Clustering Effect comparison diagram of the method that adopts traditional Q function to carry out cluster in data set " Political Blogs ", wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
Fig. 4 adopts clustering method of the present invention and the Clustering Effect comparison diagram of the method that adopts traditional Q function to carry out cluster in data set " Email ", wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
As Tu5Shi karate club network abstraction member relation figure out.
Embodiment
Below in conjunction with the drawings and specific embodiments, the inventive method is described.
Along with network size expands gradually, the probability that nodes has global information reduces gradually.In large-scale complex network, node only has global information under minimum probability; Under normal circumstances, node has the local message taking oneself as core.In clustering algorithm, use knowledge and strategy under global context, although can find theoretic globally optimal solution, cannot obtain the most real clustering architecture.Therefore, the local message scope that analog node is grasped with and policy setting when cluster, and search for optimum cluster result under this environment prerequisite, become the necessary condition that obtains live network clustering architecture.
The local message environment that the present invention is based on node has proposed and has defined the concept of " group ", and group's thought of proposition is intended to the local message scope that accurate description Node Contraction in Complex Networks is grasped, for node provides policy setting in cluster process.By the introducing of " group " concept, node in the live network clustering architecture decision region in cluster process is described more exactly.Based on this concept, the present invention improves modularity evaluation function (Q function), and reducing it has bias, and is applied in improved FN clustering method.Be as the criterion and determine justice group's scope, the present invention has carried out trace analysis to the forming process of true clustering architecture in different system, by the comparative study to large amount of complex Network data set, draws to draw a conclusion:
(1) the local message scope that node is grasped, is equivalent to companys of foundation frontier juncture system and the effective range that may the company of foundation frontier juncture is of node.The angle analysis forming from network configuration, actuating force foundation and the renewal on limit just that network cluster structure constantly develops, therefore, the local message scope that can find node to grasp by the probability distribution on the company of foundation limit between node.
(2), in the complex network of physical system or biosystem, for example, in neuroid, exist Effective Probability to set up to connect the node on limit to concentrate in clustering architecture adjacent on clustering architecture under destination node and physical location with destination node.Between node, pass through the clustering architecture that physical location is adjacent and set up the possibility existence that connects frontier juncture system, but its probable value is less, and cluster process is had no significant effect.
(3), in the complex network of social system or communication system, for example, in human relation networks such as " Renren Networks ", the people that may set up good friend's relation with target individual concentrates in target individual's " two degree good friends " (being good friend's good friend).Study and find, two degree good friends concentrate and are present in the clustering architecture at the current place of target individual and directly connect in the clustering architecture of frontier juncture system with this clustering architecture existence.Two degree good friends are present in not in two clustering architectures that are directly connected in theory, but it is less in this situation, to set up the probable value of good friend's relation, has randomness, and the relation of foundation is less on the differentiation impact of clustering architecture.
The network abstraction member relation figure out of karate club (Zachary ' s karate club) as shown in Figure 5.In figure, can find out, this network has 34 nodes, 78 limits, has represented respectively 78 pairs of good friend's relations between these club 34 members and member.In addition, in figure, No. 1 node degree is that 16, No. 17 node degrees are 2, represents that No. 1 corresponding member has 16 good friends, and No. 17 corresponding clubbite has 2 good friends.Finally, because No. 6 nodes (or No. 7 nodes) are the good friends of No. 1 node, and the good friend that No. 17 nodes are No. 6 nodes (or No. 7 nodes), so No. 1 node and No. 17 nodes are two degree good friend relations.
The present invention is defined as follows group: the group of node is the set of having set up in current clustering architecture with this node and may set up effective node being connected, it is the intersection of the current affiliated network cluster structure of node and all adjacent cluster structures of this clustering architecture, be called the group of this node, shown in the expression formula of (1).
Com i={Clu i|i∈Clu i}∪{Clu k=1...n|Clu i~Clu k} (1)
Wherein, Com irepresent group corresponding to node i, Clu irepresent the clustering architecture that node i is affiliated, Clu i~Clu krepresent that two bunches is adjacent cluster.
Adjacent cluster is defined as follows: bunch AClu awith a bunch BClu bin adjacent and if only if bunch A, at least one node and bunch B, arbitrary node exists at least one to connect limit, shown in (2).
Figure BDA0000129370860000051
Wherein, i~j representation node i and j exist and connect frontier juncture system.
It should be noted that: because bunch interior nodes connects closely, physical structure is adjacent and information sharing rate is high, so at a time, all nodes in same clustering architecture have identical group rings border.Therefore, use group's scope of equal this clustering architecture of definable of group's scope of any one node in clustering architecture.Formula (3) represents that group's scope that node c is corresponding is identical with group's scope of any one node in the affiliated clustering architecture of node c.
{Com c|c=Clu i}={Com i|i∈Clu i} (3)
Modularity evaluation function, Q function is expected the poor of linking number in lower bunch of actual linking number and random connection in being defined as bunch.Q function is suc as formula shown in (4):
Q = Σ S = 1 K [ m S m - ( d S 2 m ) 2 ] - - - ( 4 )
Wherein, K represents total number of network cluster structure, and m represents the total limit number connecting in network, m srepresent total limit number of the connection in network cluster S, d srepresent node degree in network cluster S and.
Can find out by formula (4), in bunch, expect that linking number has used the ratio of bunch interior nodes degree sum and the whole network degree sum, therefore, other all nodes of arbitrary node and the whole network all likely connect, this overall thought, cause Q function to produce bias, make the various optimized algorithms based on Q function cannot obtain optimal solution in the time that the Q value overall situation is maximum.
Based on group's concept, the present invention improves Q function.The desired value that the local message scope of utilizing node to grasp connects at random to the node in clustering architecture redefines, and shown in (5), new Q function Q ' is:
Q ′ = Σ S = 1 K [ m S m - d S 2 m q d q 2 m ] - - - ( 5 )
Wherein, S representative bunch, the sum that K represents in the whole network bunch; Group corresponding to q representative bunch S, m represents the limit of the whole network, m slimit number in representative bunch S, m qrepresent the limit number in group q, d sthe degree sum of all nodes in representative bunch S, d qrepresent the degree sum of all nodes in group q.
Wherein,
Figure BDA0000129370860000062
described the node in clustering architecture within the scope of group,, under local message environment, the probability connecting is expected.This shows: the node in clustering architecture, the in the situation that of random connection, only exists with group's interior nodes the possibility being connected; And with the whole network in group outside node, the extraneous node of local message that in clustering architecture, node is grasped, not have connection possibility.Optimization aim function after improvement has embodied the application of group's thought in complex network clustering method.
The inventive method is particularly useful for the fixed telephone network situation in human relation network, the communication system in social system, description be to occur in individuality in network as starting point, analyze the application scenarios of describing dependence between individuality.By the aggregation between individuality in statistics human relation network or fixed telephone network, taking higher statistics dividing precision as target, can reflect truly whole topology of networks, to provide best user to experience for user.The human relation network that statistics in application scenarios relies on or the application of fixed telephone network scene, the target function that can be parameter in order to node and internodal even frontier juncture shows, then by statistics in practical application scene and division network configuration topology, how the problem of enhancing network configuration accuracy is as parameter according to node and internodal even frontier juncture if being converted to, optimization objective function, target function is leveled off to more without inclined to one side target function, each node in network is carried out to accurate sub-clustering, to obtain the process of approximate optimal solution.
The inventive method defines the community in application scenarios and network using data structure as shown in table 1.
The data structure of table 1 community and network
Figure BDA0000129370860000063
Figure BDA0000129370860000071
As shown in Figure 1, for the present invention is directed to the overall flow figure based on the improved Fast-Newman clustering method of group's thought of complex network, specifically comprise the steps.
Step 1: all interstitial content N in statistics network, and be each node sequence numbering, to the each node i in network, the id of community that its place is set is i, and the node_map[id of i node is also set]=i.I represents the numbering of node, 1≤i≤N.
Step 2: for each node i creates a community structure, in this community member member, add node i, and this community of mark exists at present, the value that the survival mark alive of this community is specifically set is true, true represents that this community exists, and false represents that this community does not exist.The all communities that create are added in set cmtys.The quantity nalive of community existing in current network is the nodes N in network.
Step 3: to each i of community, determine that in_edge[i is counted on total limit of its inside] with and inner total number of degrees degree[i].Under initial situation, because only there is node, therefore an in_edge[i in each community]=0, degree[i]=0.
Step 4: to every couple of i of community, j, cross_edge[i is counted on definite limit between the two] [j], 1≤i≤N, 1≤j≤N, and i ≠ j.
Step 5: to each i of community, according to formula (5) determination module evaluation function Q '.In formula (5), bunch just represent community, specifically adopt formula (6) to obtain for the value of the modularity evaluation function Q ' [i] of the i of community:
Q ′ [ i ] = Σ i = 1 nalive [ m i m - d i 2 m q d q 2 m ] - - - ( 6 )
Wherein, nalive is the community's sum existing in current network, and q represents the group that the i of community is corresponding, and m represents the limit number of whole network, m irepresent that the limit in the i of community counts in_edge[i], m qrepresent the limit number in group q, d irepresent the degree sum of all nodes in the i of community, d qrepresent the degree sum of all nodes in group q.
Certain i of community corresponding group q according to definition formula (1) Suo Shu, refer to the set of the adjacent community of the i of community and the i of community.According to definition formula (2) Suo Shu, if at least exist arbitrary node in a node and the p of community to exist at least one to connect limit in the i of community, the i of community and the p of community are exactly adjacent community.
Specifically judge the i of Liang Ge community, whether j is adjacent community, can count cross_edge[i by the limit of the Liang Ge community obtaining in step 4] [j] determine, if cross_edge[i] [j] be not equal to 0, represent that the i of community and the j of community are adjacent communities, if cross_edge[i] [j] equal 0, represents that the i of community and the j of community are not adjacent communities.
Step 6: the variable maxQ ' that is provided for representing current maximum Q ' value:
maxQ′=max(Q′[i],1≤i≤nalive)
Step 7: if exist in current network while being greater than the community of, when nalive > 1, enumerate communities all in current network to i, j, then performs step 8; Otherwise, execution step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j.
Step 8: judge that communities all in current network is to whether all being got, if do not have, get arbitrarily a pair of community of not getting to i, j; If all community, to all being got, goes to step 12 execution.
Step 9: judge the limit that whether has connection between the i of community and the j of community, specifically according to cross_edge[i] value of [j] judgement, if cross_edge[i] [j] ≠ 0, it is true that mark found is set, and represents to have found two communities that may merge, and then performs step 10, if cross_edge[i] [j]=0, represent this pair of i of community, between j, do not have the limit of connection, go to step 8 execution.
Step 10: supposition merges the i of community and the j of community, the id that merges the new communities that obtain is set to i ', determine that total limit of the inside of the i ' of new communities counts in_edge[i '] and inner total number of degrees degree[i '], equally according to formula (5) determination module evaluation function value Q ' [i '].Be specially:
Q ′ [ i ′ ] = Σ i ′ = 1 nalive ′ [ m i ′ m - d i ′ 2 m q ′ d q ′ 2 m ] - - - ( 7 )
Wherein, nalive ' carries out by the i of community and the j of community the community's sum existing in the current network under combination situation for supposition, and its value is the sum nalive-1 of community existing in current network; Q ' represents the group of the i ' of community correspondence, and m represents the limit number of whole network, m i 'represent that the interior limit of the i ' of community counts in_edge[i '], m q 'represent the interior limit number of group q ', d i 'represent the degree sum of the interior all nodes of the i ' of community, d q 'represent the degree sum of the interior all nodes of group q '.The group q ' of the i ' of community correspondence refers to the set of the i ' of community and the adjacent community of the i ' of community.According to definition formula (2) Suo Shu, if at least exist arbitrary node in a node and the p of community to exist at least one to connect limit in the i ' of community, the i ' of community is exactly adjacent community with the p of community.
In_edge[i ' is counted on total limit of the inside of the concrete i ' of new communities] be the internal edges number that the internal edges number of the i of community is added to the j of community, add the limit number connecting between the i of community and the j of community and obtain.Total number of degrees degre[i ' of concrete new communities inside] be that the number of degrees that the number of degrees of the j of community are added to the i of community obtain.
Step 11: whether the modularity evaluation function value Q ' [i '] relatively obtaining is greater than the variable maxQ ' of current maximum Q ' value, otherwise, do not do and upgrade, execution step 8; If so, the modularity evaluation function value Q ' [i] that the value of upgrading maxQ ' is new communities, and the j of community is merged in the i of community, then perform step 7.Specifically just the j of community merges to the i of community and comprises following operation: the id i ' that merges the new community obtaining is set and equals i, the value that the survival mark alive of the j of community is set is false, affiliated society's area code node_map value of node in the j of community is revised as to i, node in the j of community is joined in the community member member of the i of community, in_edge[i is counted in the limit of the inside of the i of new communities more] and inner total number of degrees degree[i], other intercommunal limit numbers that more exist in the i of new communities and current whole network, the modularity evaluation function value Q ' [i] of the current i of community is exactly the Q ' [i '] that step 10 obtains.
In_edge[i is counted on the limit of i inside, community] be exactly the in_edge[i ' obtaining in step 10], total the number of degrees degree[i of i inside, community] be exactly the degre[i ' obtaining in step 10].More the i of new communities counts cross_edge[with other intercommunal limits that exist in current whole network] []: intercommunal limit that be connected with the j of community, except the i of community is added in the limit being connected with the i of community to statistics limit number.
Step 12: preserve the middle maximum Q ' value of current variable maxQ ', and final community partition structure, then method ends.
The final community partition structure obtaining is exactly the cluster result that clustering method of the present invention obtains, and this result can reflect whole topology of networks more truly, to provide best user to experience for user.Specifically by testing to illustrate the advantage of the inventive method below.In Fig. 2-Fig. 4, FN represents to adopt the traditional F N clustering method suc as formula the traditional Q function shown in (4); FN-group represents to adopt clustering method of the present invention, employing be the Q ' function based on group.
In order to verify in the inventive method based on the effect of improved modularization evaluation function in optimization method of group's thought and the effect in cluster process, the present invention chooses to adopt and compares suc as formula traditional F N clustering method and the inventive method of the traditional Q function shown in (4), mainly considers based on following reason:
1) traditional F N clustering method is the clustering method that uses modularization evaluation function to be optimized as target function the earliest and the most directly, and most such clustering methods are all using traditional F N clustering method as basis thereafter.Realize the improvement to target function in traditional F N clustering method, therefore choose traditional F N clustering method, there is basic meaning for this class FN clustering method.
2) core concept of FN clustering method is the optimization to target function, and the candidate solution search strategy in clustering method is relative with reception strategy simple, and Clustering Effect depends on the definition of target function completely.The improvement effect of the performance target function that therefore, clustering precision can be accurate and visual.
The final output network clustering architecture of the inventive method hierarchical relationship tree.The instrument that has used Stanford Network Analysis Project (SNAP) to provide in development, the algorithm time complexity of the method is O (mn).
For objective comprehensive relatively improvement effect of the inventive method, simultaneously to three scales and attribute all different data set (Neural network, Political Blogs and Email) carry out cluster computing, and use respectively Conductance and two kinds of Clustering Effect functions of Expansion in Network Community Profile (NCP) to carry out evaluation analysis to cluster result.
The definition of Conductance function and Expansion function is as follows respectively:
Conductance: f ( S ) = c S 2 m S + c S Expansion: f ( S ) = c S n S
Wherein, c srepresent that bunch S interior nodes and bunch S exterior node connect the sum on limit; m srepresent the company's limit sum in bunch S; n srepresent the node sum in bunch S.The functional value of two evaluation functions is lower, illustrates that clustering precision is higher, effect better.
Experimental result shows that the inventive method is significantly increased to the ratio of precision traditional F N clustering method of large-scale complex network clustering analysis.Large for network size, and it is especially outstanding to connect network effect sparse or that annexation is inhomogeneous.
Fig. 2 has shown in data set " Neural Network ", uses the FN algorithm before and after improving to carry out cluster computing, and the result that uses Conductance and Expansion function to evaluate Clustering Effect." Neural Network " data set belongs to the neuron complex network in life system, and the node in network and limit have real physical significance, and its basic parameter is as shown in table 2.
Table 2 Neural Network data set attribute
Attribute Describe Numerical value
Number of nodes Nodes total quantity 297
Average clustering coefficient Average cluster efficiency 0.2924
Number of edges Total limit number in network 2359
Diameter Network diameter 5
Number of triangles The nodes total number that connects frontier juncture system triangular in shape 3241
Average shortest path length Average shortest path 2.4553
In Fig. 2 (a), adopt Conductance function to evaluate the cluster result of the inventive method, the Conductance mean value obtaining is 0.521, and the Conductance mean value that the cluster result evaluation of traditional F N clustering method is obtained is 0.7633.The Conductance mean value of the inventive method is lower than traditional F N clustering method 87.1% in the situation that, and along X-direction, difference constantly expands.This explanation is along with cluster passing operation time, clustering architecture scale constantly become large, and the lifting effect of clustering precision is remarkable all the more.
Known from Fig. 2 (b), the Expansion mean value of the inventive method is 13.48, and the Expansion mean value of traditional F N clustering method is 15.4232.The Expansion value of the inventive method is lower than FN algorithm 80.65% in the situation that, and the Expansion value of the inventive method has larger fluctuation, and in different clustering architecture scales, numerical value has notable difference.This explanation the inventive method has embodied the local message environmental difference in cluster process, and can find more real clustering architecture based on local optimum.
Fig. 3 has shown the inventive method and the traditional F N clustering method Clustering Effect in " Political Blogs " data centralization." Political Blogs " data set belongs to the political blog complex network in social system, and node and limit have social effect.Compared with " Neural network " data set, " Political Blogs " data set is larger, number of nodes has expanded 3.1 times, connect limit quantity and increased severely 7 times, therefore, connection relation between nodes is tightr, and cluster coefficients improves, and the short loop quantity (triangular relationship) in network increases.But meanwhile, the average shortest path between network node is elongated, this illustrates in data centralization, and the raising degree of relationships between nodes tightness is limited, cannot cancellation network scale increase the impact bringing.Its basic parameter is as shown in table 3.
Table 3 Political Blogs data set attribute
Attribute Describe Numerical value
Number of nodes Nodes total quantity 1222
Average clustering coefficient Average cluster efficiency 0.3203
Number of edges Total limit number in network 16717
Diameter Network diameter 8
Number of triangles The nodes total number that connects frontier juncture system triangular in shape 101043
Average shortest path length Average shortest path 2.7375
From Fig. 3 (a), the Conductance mean value of the inventive method is 0.5499, and the Conductance mean value of traditional F N clustering method is 0.8969.The Conductance value of the inventive method is lower than traditional F N clustering method 85.53% in the situation that.Compared with (a) of Fig. 2, the clustering precision of the inventive method becomes and improves greatly more significantly with clustering architecture scale, the complex network that this explanation scale is larger, the local group concept that the inventive method embodies is more obvious to the castering action of clustering precision.
From Fig. 3 (b), the Expansion mean value of the inventive method is 24.9263, and the Expansion mean value of traditional F N clustering method is 34.5256.The Expansion value of the inventive method is lower than traditional F N clustering method 77.63% in the situation that.Relatively find with Fig. 2 (b), the Expansion value fluctuation of the inventive method is more violent, but decrease than the more excellent probable value of traditional F N clustering method result, this explanation is along with the increase of complex network scale, localized network structural difference increases, the inventive method can fully demonstrate this local difference, and in most of the cases optimizes cluster result.
Fig. 4 has shown the inventive method and the traditional F N clustering method Clustering Effect in " Email " data centralization." Email " data set is described the complex network of community's Email dealing relation." Email " data set, compared with the first two data set, has the many but sparse feature of relation of number of nodes, and therefore its cluster coefficients is lower, average shortest path numerical value is higher, in this case, the locality of node is stronger, and quantity and the possibility of grasping global information are less.Its basic parameter is as shown in table 4.
Table 4 Email data set attribute
Attribute Describe Numerical value
Number of nodes Nodes total quantity 1133
Average clustering coefficient Average cluster efficiency 0.2202
Number of edges Total limit number in network 5452
Diameter Network diameter 8
Number of triangles The nodes total number that connects frontier juncture system triangular in shape 5453
Average shortest path length Average shortest path 3.6060
From Fig. 4 (a), the Conductance mean value of the inventive method is 0.521, and the Conductance mean value of traditional F N clustering method is 0.7633.The Conductance value of the inventive method is lower than traditional F N clustering method 92.31% in the situation that.From Fig. 4 (b), the Expansion mean value of the inventive method is 9.3233, and the Expansion mean value of traditional F N clustering method is 9.4885.The Expansion value of the inventive method is lower than traditional F N clustering method 92.31% in the situation that.Compared with Fig. 2, Fig. 3, the inventive method is further remarkable to the raising of clustering precision, in this complex network that explanation scale is larger, relation is relatively sparse, cluster coefficients is lower, the inventive method can be simulated the information environment of local finite more really, and then promotes to greatest extent clustering precision.
The experimental result explanation of three group data sets: the inventive method is significantly increased to the ratio of precision traditional F N clustering method of large-scale complex network clustering analysis.Large for network size, and it is especially outstanding to connect network effect sparse or that annexation is inhomogeneous.

Claims (2)

  1. For complex network based on the improved Fast-Newman clustering method of group's thought, it is characterized in that, specifically comprise the steps:
    Step 1: all nodes in statistics network, and be each node sequence numbering, establish node and add up to N, the numbering that i is node, 1≤i≤N, to the each node i in network, society's area code that its place is set is i;
    Step 2: for each node i creates a community structure, and be provided for for each community the survival mark alive that represents whether this community exists, node i is added in the community member of the i of community, the value that the parameter alive of this community structure is set is ture, ture represents that this community exists, and false represents that this community does not exist; It is nodes N total in network that the sum nalive of community existing in current network is set;
    Step 3: to each i of community, determine that its inner limit counts in_edge[i] with and inner number of degrees degree[i];
    Step 4: to every couple of i of community, j, cross_edge[i is counted on definite limit between the two] [j], 1≤i≤N, 1≤j≤N, and i ≠ j;
    Step 5: the modularity evaluation function value Q'[i that determines each i of community]:
    Figure FDA0000470138460000011
    Wherein, m represents the limit number of whole network, m irepresent that the limit in the i of community counts in_edge[i], d irepresent the degree sum degree[i of all nodes in the i of community], q represents the group that the i of community is corresponding, m qrepresent the limit number in group q, d qrepresent the degree sum of all nodes in group q; The group q that the i of community is corresponding refers to the set of the i of community and the adjacent community of i of community; Described adjacent community is defined as: if at least exist arbitrary node in a node and the p of community to exist at least one to connect limit in the i of community, the i of community and the p of community are exactly adjacent community;
    Step 6: variable maxQ' is set, for preserving the maximum Q' value of current network community;
    Step 7: judge in current network whether have the community that is greater than, if exist, enumerate communities all in current network to i, j, then perform step 8; Otherwise, execution step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j;
    Step 8: judge that communities all in current network is to whether all being got, if do not have, get arbitrarily a pair of community of not getting to i, j, if all got, goes to step 12 execution;
    Step 9: judge the limit that whether has connection between the i of community and the j of community, if exist, execution step 10, if do not exist, goes to step 8 execution;
    Step 10: supposition merges the i of community and the j of community to obtain the i' of new communities, i' is new communities number, determine that total limit of the inside of the i' of new communities counts in_edge[i'] and inner total number of degrees degree[i'], then determine the modularity evaluation function value Q'[i' of the i' of new communities]:
    Figure FDA0000470138460000012
    Wherein, nalive' is that the i of community and the j of community are carried out the community's sum existing in the current network under combination situation by supposition, the community sum nalive-1 of its value for existing in current network; Q' represents the group that the i' of new communities is corresponding, and m represents the limit number of whole network, m i'represent that the limit in the i' of new communities counts in_edge[i'], m q'represent the limit number in group q', d i'represent the degree sum of all nodes in the i' of new communities, d q'represent the degree sum of all nodes in group q';
    In_edge[i' is counted on total limit of the described i' of new communities inside], it is the internal edges number that the internal edges number of the i of community is added to the j of community, add the limit number connecting between the i of community and the j of community and obtain, total number of degrees degree[i' of the described i' of new communities inside] be that the number of degrees that the number of degrees of the j of community are added to the i of community obtain;
    Step 11: the modularity evaluation function value Q'[i' relatively obtaining] whether be greater than the variable maxQ' of current maximum Q' value, if not, do not upgrade, go to step 8 execution; If so, the modularity evaluation function value Q'[i' that the value of upgrading maxQ' is new communities], and the j of community is merged in the i of community, then go to step 7 execution;
    Step 12: preserve maximum Q' value in current variable maxQ', and final community partition structure, then method ends.
  2. According to claim 1 a kind of for complex network based on the improved Fast-Newman clustering method of group's thought, it is characterized in that, the j of community is merged in the i of community described in step 11, specifically comprise following operation: the node in the j of community is joined in the community member of the i of community, society's area code of the node in the j of community is revised as to i, the value that the survival mark alive of the j of community is set is false, more in_edge[i is counted on the limit of the inside of the i of new communities] and inner total number of degrees degree[i], other intercommunal limit numbers that more exist in the i of new communities and current whole network.
CN201210004690.9A 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network Expired - Fee Related CN102571431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210004690.9A CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201110396200 2011-12-02
CN201110396200.X 2011-12-02
CN201210004690.9A CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Publications (2)

Publication Number Publication Date
CN102571431A CN102571431A (en) 2012-07-11
CN102571431B true CN102571431B (en) 2014-06-18

Family

ID=46415957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210004690.9A Expired - Fee Related CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Country Status (1)

Country Link
CN (1) CN102571431B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819611B (en) * 2012-08-27 2015-04-15 方平 Local community digging method of complicated network
CN104156462B (en) * 2014-08-21 2017-07-28 上海交通大学 Complex network community method for digging based on cellular Learning Automata
CN104598927A (en) * 2015-01-29 2015-05-06 中国科学院深圳先进技术研究院 Large-scale graph partitioning method and system
CN107376357A (en) * 2016-05-17 2017-11-24 蔡小华 A kind of good friend's interaction class internet game method
CN106789285B (en) * 2016-12-28 2020-08-14 西安交通大学 Online social network multi-scale community discovery method
CN107888431B (en) * 2017-12-25 2020-06-16 北京理工大学 Dynamic core-edge network centralization algorithm and model construction method thereof
CN110110177B (en) * 2019-04-10 2020-09-25 中国人民解放军战略支援部队信息工程大学 Graph-based malicious software family clustering evaluation method and device
CN112580916B (en) * 2019-09-30 2024-05-28 深圳无域科技技术有限公司 Data evaluation method, device, computer equipment and storage medium
CN113011471A (en) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 Social group dividing method, social group dividing system and related devices

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572961A (en) * 2009-05-31 2009-11-04 北京航空航天大学 Mobile scale-free self-organizing network model building method
CN101771964A (en) * 2010-01-06 2010-07-07 北京航空航天大学 Information correlation based opportunistic network data distributing method
CN101594697B (en) * 2009-05-08 2011-01-05 北京航空航天大学 Method for data communication in community-based opportunistic network
GB2477921A (en) * 2010-02-17 2011-08-24 Sidonis Ltd Analysing a network using a network model with simulated changes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101594697B (en) * 2009-05-08 2011-01-05 北京航空航天大学 Method for data communication in community-based opportunistic network
CN101572961A (en) * 2009-05-31 2009-11-04 北京航空航天大学 Mobile scale-free self-organizing network model building method
CN101771964A (en) * 2010-01-06 2010-07-07 北京航空航天大学 Information correlation based opportunistic network data distributing method
GB2477921A (en) * 2010-02-17 2011-08-24 Sidonis Ltd Analysing a network using a network model with simulated changes

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种基于改进的Newman快速算法;安娜等;《科学技术与工程 》;20101028;第10卷(第30期);7550-7553 *
一种基于社区机会网络的消息传输算法;牛建伟等;《计算机研究与发展》;20091215(第12期);2068-2075 *
安娜等.一种基于改进的Newman快速算法.《科学技术与工程 》.2010,第10卷(第30期),
牛建伟等.一种基于社区机会网络的消息传输算法.《计算机研究与发展》.2009,(第12期),

Also Published As

Publication number Publication date
CN102571431A (en) 2012-07-11

Similar Documents

Publication Publication Date Title
CN102571431B (en) Group concept-based improved Fast-Newman clustering method applied to complex network
CN110532436B (en) Cross-social network user identity recognition method based on community structure
CN102810113B (en) A kind of mixed type clustering method for complex network
CN104102745B (en) Complex network community method for digging based on Local Minimum side
CN102571954B (en) Complex network clustering method based on key influence of nodes
CN104008165B (en) Club detecting method based on network topology and node attribute
Kundu et al. Fuzzy-rough community in social networks
CN113378913B (en) Semi-supervised node classification method based on self-supervised learning
CN104199852B (en) Label based on node degree of membership propagates community structure method for digging
CN107784598A (en) A kind of network community discovery method
CN103678671A (en) Dynamic community detection method in social network
CN108304380A (en) A method of scholar's name disambiguation of fusion academic
CN105719191A (en) System and method of discovering social group having unspecified behavior senses in multi-dimensional space
CN113807520A (en) Knowledge graph alignment model training method based on graph neural network
CN103617259A (en) Matrix decomposition recommendation method based on Bayesian probability with social relations and project content
CN106991614A (en) The parallel overlapping community discovery method propagated under Spark based on label
Li et al. Community detection for multi-layer social network based on local random walk
CN114462623B (en) Data analysis method, system and platform based on edge calculation
CN112464107B (en) Social network overlapping community discovery method and device based on multi-label propagation
Long Overlapping community detection with least replicas in complex networks
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN114417177A (en) Label propagation overlapping community discovery method based on node comprehensive influence
Xu et al. Integration of migration and attention flow data to reveal association of virtual–real dual intercity network structure
Ma et al. Opportunistic networks link prediction method based on Bayesian recurrent neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20161121

Address after: 450063 Henan Province, Zhengzhou North Sanhuan Henan province university science and Technology Park Building 7, 13 floor

Patentee after: Henan Zhongcheng information Polytron Technologies Inc

Address before: 100191 Haidian District, Xueyuan Road, No. 37,

Patentee before: Beihang University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140618

Termination date: 20200109