CN102571431A - Group concept-based improved Fast-Newman clustering method applied to complex network - Google Patents

Group concept-based improved Fast-Newman clustering method applied to complex network Download PDF

Info

Publication number
CN102571431A
CN102571431A CN2012100046909A CN201210004690A CN102571431A CN 102571431 A CN102571431 A CN 102571431A CN 2012100046909 A CN2012100046909 A CN 2012100046909A CN 201210004690 A CN201210004690 A CN 201210004690A CN 102571431 A CN102571431 A CN 102571431A
Authority
CN
China
Prior art keywords
community
network
node
limit
crowd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100046909A
Other languages
Chinese (zh)
Other versions
CN102571431B (en
Inventor
童超
戴彬
牛建伟
韩军威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Zhongcheng information Polytron Technologies Inc
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201210004690.9A priority Critical patent/CN102571431B/en
Publication of CN102571431A publication Critical patent/CN102571431A/en
Application granted granted Critical
Publication of CN102571431B publication Critical patent/CN102571431B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a group concept-based improved Fast-Newman clustering method applied to a complex network. According to the invention, the group concept is introduced; the adjacent cluster concept is confined according to the characteristics of complex network cluster structure; a modularity evaluation function proposed by Newman is improved, the maximal modularity evaluation functional value is saved, and the problem that the clustering precision is not highest at global maximum is solved, so the clustering result can more accurately reflect the real network cluster structure. Compared with a conventional FN clustering method, according to the method provided by the invention, the precision of the cluster analysis for the large scale complex network is greatly improved; and especially for the familiar complex network with large size, sparse connection and uneven relation, the clustering effect is more remarkable.

Description

To complex network based on the improved Fast-Newman clustering method of crowd's thought
Technical field
The invention belongs to the data mining field of community network,, be specifically related to a kind of optimization class clustering method that improves target function based on crowd's thought to the cluster of complex network cluster structure.
Background technology
Continuous development along with subjects such as computer, mathematics, physics, biology, sociology, complexity science; It is found that; Numerous systems in the real world all exist with the form of complex network, like internet, mobile telephone network, the mutual net of band blank sheet of paper, neuron net etc.Because the isomerism of node and annexation in this type network, clustering architecture (cluster structure) becomes one of the most general and most important topological structure attribute of complex network.The network cluster structure has that bunch interior nodes interconnects closely, bunch intermediate node connects sparse characteristics.Research complex network clustering algorithm is the propagation velocity of node relationships evolutionary process, signal or information in time in network and scope and the basis of predicting the great number of issues such as behavior of node in the network in the Analysis of Complex network with disclosing real network cluster structure, has important significance for theories.Simultaneously, clustering algorithm has been applied to terroristic organization's identification, social network analysis and organization and administration, agnoprotein matter function prediction, master control gene recognition and various fields such as excavation of Web community and search engine, has broad application prospects.
Early stage complex network clustering algorithm has spectral method and Kernighan-Lin algorithm (KL algorithm).Spectral method is a figure with complex network modeling, and clustering problem is changed into the quadratic form optimization problem, minimizes predefined " cutting function " through the characteristic vector of calculating Special matrix, thereby produces the effect of cutting apart network.Spectral method need rely on priori when stopping, and two fens strategies of its recurrence balance have obvious inferior position for many bunches of network configurations.The KL algorithm is cut apart thought based on figure equally, with connect between minimization bunch with bunch in the difference of linking number as optimization aim, through clustering architecture under the continuous adjustment node, select and accept to make the candidate solution of target function minimization.The KL algorithm is the same priori that relies in application, and very responsive to initial solution, and bad initial solution can cause the cluster process convergence rate slow and the result is relatively poor.
2002, people such as Flake proposed heuristic clustering algorithm Maximum Flow Community (MFC algorithm) based on max-flow min-cut theorem.Flake thinks to have in the network of clustering architecture, network " bottleneck " by bunch between connect and compose, the MFC algorithm is through calculating minimum cut set, recognition network " bottleneck " connects between deletion bunch, and network is divided into clustering architecture gradually.But the MFC algorithm carries out cluster based on connection, is not suitable for the network of node isomery.In the same year, Girvan and Newman have proposed Girvan-Newman algorithm (GN algorithm).This algorithm uses heuristic rule equally, through the limit Jie's number in the repeated calculation network, connects between identification and deletion bunch, generates a top-down hierarchical clustering and sets.The maximum shortcoming of GN algorithm is that amount of calculation is excessive, and algorithm the convergence speed is slow, is not suitable for being applied to large scale network.
2004, the Fast-Newman algorithm (FN algorithm) that Newman has proposed, this algorithm is a kind of optimized Algorithm, optimization aim is the famous mixed-media network modules mixed-media property evaluation function (or claiming Q function) that Newman and Girvan proposed in the same year.Under the initial condition, the FN algorithm is regarded each node as one bunch, and the union operation through maximization Q function in iterative process calculates the bottom-up clustering architecture relational tree that comprises the hierarchical clustering process.Based on the Q function; Guimera and Amaral have proposed to merge the Guimera-Amaral algorithm (GA algorithm) of simulated annealing; This algorithm is separated corresponding Q functional value through calculated candidate and is estimated its quality; And whether the decision of the Metropolis criterion through the simulated annealing strategy accept candidate solution, and this algorithm is the highest algorithm of present clustering precision.In addition, a lot of complex network clustering algorithms are optimization aim with maximization Q function all, and this type algorithm has solved the slow excessively problem of convergence rate in initial solution and the heuritic approach of depending on unduly.
But still there is defective in the optimization of Q function: at first, depend on the target function of optimization fully based on the network cluster structure quality that clustering algorithm identified of optimizing idea, the target function of " having partially " can cause separating of " having partially ".Because the Q function is that inclined to one side target function is arranged, so clustering precision is not the highest when the Q function reaches global maximum, the optimized Algorithm cluster result of this moment can not the real network cluster structure of entirely accurate ground portrayal.Secondly, along with the continuous expansion of complex network scale, target function value calculating and the time complexity of iterative process own improve constantly in the optimized Algorithm, and the time and the resource that cause the cluster computing to consume are more and more.
Summary of the invention
The defective that exists to the optimization of Q function in the present FN algorithm: clustering precision is not the highest when the Q function reaches global maximum; The cluster result of this moment can not be portrayed real network cluster structure in entirely accurate ground; And continuous expansion along with the complex network scale; The time that cluster consumes is more and more with resource, the present invention proposes a kind of be directed against complex network based on the improved Fast-Newman clustering method of crowd's thought.
The present invention proposes a kind of to complex network based on the improved Fast-Newman clustering method of crowd's thought, specifically comprise the steps:
Step 1: all nodes in the statistics network, and for each node sequence numbering, establish node and add up to N, i is the numbering of node, 1≤i≤N, and to each node i in the network, society's area code that its place is set is i;
Step 2: for each node i is created a community structure; And be provided for the survival mark alive that representes whether this community exists for each community; Node i is added among the community member of the i of community; The value that the parameter alive of this community structure is set is ture, and ture representes that this community exists, and false representes that this community does not exist; It is that node total in the network is counted N that the sum nalive of community that exists in the current network is set;
Step 3: to each i of community, confirm its inner limit count in_edge [i] with and inner number of degrees degree [i];
Step 4: to every couple of i of community, j, cross_edge [i] [j], 1≤i≤N, 1≤j≤N, and i ≠ j are counted in definite limit between the two;
Step 5: the modularity evaluation function value Q ' [i] that confirms each i of community:
Q ′ [ i ] = Σ i = 1 nalive [ m i m - d i 2 m q d q 2 m ] - - - ( 1 )
Wherein, m represents the limit number of whole network, m iRepresent the limit in the i of community to count in_edge [i], d iRepresent the degree sum degree [i] of all nodes in the i of community, q represents the corresponding crowd of the i of community, m qRepresent the limit number in the crowd q, d qRepresent the degree sum of all nodes in the crowd q; The crowd q that the i of community is corresponding is meant the set of i of community and the adjacent community of i of community; Described adjacent community is defined as: arbitrary node exists at least one to connect the limit among a node and the p of community if exist at least among the i of community, and then i of community and the p of community are exactly adjacent community;
Step 6: variable maxQ ' is set, is used for preserving the maximum Q ' value of current network community;
Step 7: judge the community that whether exists in the current network greater than, if exist, then enumerate communities all in the current network to i, j, execution in step 8 then; Otherwise, execution in step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j;
Step 8: judge communities all in the current network to whether all being got, if do not have, get a pair of community of not getting arbitrarily to i, j if all got, changes step 12 and carries out;
Step 9: judge the limit that whether has connection between i of community and the j of community, if exist, execution in step 10 if do not exist, is changeed step 8 and is carried out;
Step 10: supposition merges the i of community and the j of community and obtains the i ' of new communities; I ' is new communities number; In_edge [i '] and inner total number of degrees degree [i '] are counted in total limit of confirming the inside of the i ' of new communities, confirm the modularity evaluation function value Q ' [i '] of the i ' of new communities then:
Q ′ [ i ′ ] = Σ i ′ = 1 nalive ′ [ m i ′ m - d i ′ 2 m q ′ d q ′ 2 m ] - - - ( 2 )
Wherein, nalive ' carries out community's sum of existing in the current network combination situation under for supposition with the i of community and the j of community, and its value is the total nalive-1 of community that exists in the current network; Q ' represents the crowd of the i ' of community correspondence, and m represents the limit number of whole network, m I 'Represent the interior limit of the i ' of community to count in_edge [i '], m Q 'Represent the interior limit number of crowd q ', d I 'Represent the degree sum of interior all nodes of the i ' of community, d Q 'Represent the degree sum of interior all nodes of crowd q ';
Step 11: whether the modularity evaluation function value Q ' [i '] that relatively obtains greater than the variable maxQ ' of current maximum Q ' value, if not, does not upgrade, and changes step 8 and carries out; If the value of upgrading maxQ ' is the modularity evaluation function value Q ' [i] of new communities, and the j of community is merged among the i of community, changes step 7 then and carry out;
Step 12: preserve the middle maximum Q ' value of current variable maxQ ', and final community partition structure, method ends then.
Advantage of the present invention and good effect are: the inventive method is through introducing the crowd; Reset mixed-media network modules mixed-media property evaluation function (Q function); Making clustering precision avoid when reaching global maximum is not the highest problem; The cluster result that obtains can be portrayed real network cluster structure more exactly, and the ratio of precision that the large-scale complex network clustering is analyzed adopts the precision of the clustering method of legacy network modularity evaluation function (Q function) to want high, and is big in network size; And connect sparse or the uneven network of annexation in, the cluster effect is especially outstanding.
Description of drawings
Fig. 1 is the whole flow chart of steps of clustering method of the present invention;
Fig. 2 carries out the cluster effect comparison diagram of method in data set " Neural Network " of cluster for adopting clustering method of the present invention with adopting traditional Q function; Wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
Fig. 3 carries out the cluster effect comparison diagram of method in data set " Political Blogs " of cluster for adopting clustering method of the present invention with adopting traditional Q function; Wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
Fig. 4 carries out the cluster effect comparison diagram of method in data set " Email " of cluster for adopting clustering method of the present invention with adopting traditional Q function, and wherein (a) adopts the evaluation of Conductance function, (b) for adopting the evaluation of Expansion function;
Like Fig. 5 is the member relation figure that karate club network abstraction is come out.
Embodiment
To combine accompanying drawing and specific embodiment that the inventive method is described below.
Along with network size enlarges gradually, the probability that node has a global information in the network reduces gradually.In the large-scale complex network, node only has global information under minimum probability; Generally, to have with oneself be the local message of core to node.In clustering algorithm, use knowledge and strategy under the global context, though can find theoretic globally optimal solution, can't obtain the most real clustering architecture.Therefore, the local message scope that analog node is grasped with and policy setting during cluster, and under this environment prerequisite the optimum cluster result of search, become the necessary condition that obtains the live network clustering architecture.
The local message environment that the present invention is based on node proposes and has defined the notion of " crowd ", and crowd's thought of proposition is intended to the local message scope that node is grasped in the accurate description complex network, for node provides policy setting in cluster process.Through the introducing of " crowd " notion, the decision region of node in cluster process in the live network clustering architecture described more exactly.Based on this notion, the present invention improves modularity evaluation function (Q function), and reducing it has bias, and is applied in the improved FN clustering method.Be as the criterion and confirm justice crowd's scope, the present invention has carried out trace analysis to the forming process of true clustering architecture in the different system, through the comparative study to large amount of complex network data collection, draws to draw a conclusion:
(1) the local message scope grasped of node is equivalent to node companys of foundation frontier juncture system and the effective range that maybe the company of foundation frontier juncture is.From the angle analysis that network configuration forms, the actuating force foundation and the renewal on limit just that the network cluster structure constantly develops, therefore, the local message scope that the probability distribution through the company of foundation limit between the node can find node to grasp.
(2) in the complex network of physical system or biosystem, for example in the neuroid, exist effective probability to set up the node that connects the limit with destination node and concentrate in the clustering architecture adjacent on clustering architecture and the physical location under the destination node.Pass through the adjacent clustering architecture of physical location between node and set up the possibility existence that connects frontier juncture system, but its probable value is less, and cluster process is not had obvious influence.
(3) in the complex network of social system or communication system, for example in the interpersonal relationships networks such as " everybody nets ", the people that possibly set up good friend's relation with the target individual concentrates in target individual's " two degree good friends " (being good friend's good friend).Discover that two degree good friends concentrate the clustering architecture be present in the current place of target individual and exist in the clustering architecture that directly connects frontier juncture system with this clustering architecture.Two degree good friends are present in not in two clustering architectures that directly link to each other in theory, but it is less to set up the probable value of good friend's relation under this situation, has randomness, and the relation of foundation is less to the differentiation influence of clustering architecture.
As shown in Figure 5 is the member relation figure that karate club (Zachary ' s karate club) network abstraction is come out.Can find out among the figure that this network has 34 nodes, 78 limits, represent 78 pairs of good friend's relations between these club 34 members and the member respectively.In addition, No. 1 node degree is that 16, No. 17 node degrees are 2 among the figure, represents No. 1 pairing member that 16 good friends are arranged, and No. 17 pairing clubbite has 2 good friends.At last, because No. 6 nodes (or No. 7 nodes) are the good friends of No. 1 node, and the good friend that No. 17 nodes are No. 6 nodes (or No. 7 nodes), so No. 1 node and No. 17 nodes are two degree good friends relations.
The present invention is following to crowd's definition: the crowd of node is in current clustering architecture, to have set up and the set that possibly set up effective node that is connected with this node; Be the current affiliated network cluster structure of node and the intersection of all adjacent cluster structures of this clustering architecture; The crowd who is called this node is shown in the expression formula of (1).
Com i={Clu i|i∈Clu i}∪{Clu k=1...n|Clu i~Clu k} (1)
Wherein, Com iThe corresponding crowd of expression node i, Clu iClustering architecture under the expression node i, Clu i~Clu kRepresent that two bunches is adjacent cluster.
The definition of adjacent cluster is following: bunch AClu AWith a bunch BClu BAmong adjacent and if only if bunch A among at least one node and bunch B arbitrary node exist at least one to connect the limit, shown in (2).
Figure BDA0000129370860000051
Wherein, i~j representation node i and j exist and connect frontier juncture system.
Need to prove: because bunch interior nodes connects closely, physical structure is adjacent and the information sharing rate high, so at a time down, all nodes in the same clustering architecture have identical group rings border.Therefore, use crowd's scope of equal this clustering architecture of definable of crowd's scope of any node in the clustering architecture.Crowd's scope of any node in the corresponding crowd scope of formula (3) expression node c and the affiliated clustering architecture of node c is identical.
{Com c|c=Clu i}={Com i|i∈Clu i} (3)
Modularity evaluation function, i.e. expectation linking number poor in following bunch of actual linking number and the situation that is connected at random in being defined as bunch of Q function.The Q function is suc as formula shown in (4):
Q = Σ S = 1 K [ m S m - ( d S 2 m ) 2 ] - - - ( 4 )
Wherein, K representes total number of network cluster structure, and m representes total limit number of connecting in the network, m STotal limit number of the connection among the expression network cluster S, d SAmong the expression network cluster S node degree with.
Can find out through formula (4); The expectation linking number has used the ratio of bunch interior nodes degree sum and the whole network degree sum in bunch; Therefore, other all nodes of arbitrary node and the whole network all might connect, this overall thought; Cause the Q function to produce bias, make various optimized Algorithm when the Q value overall situation is maximum, can't obtain optimal solution based on the Q function.
Based on crowd's notion, the present invention improves the Q function.The desired value that the local message scope of utilizing node to grasp connects to the node in the clustering architecture has at random been carried out definition again, shown in (5), and new Q function Q ' be:
Q ′ = Σ S = 1 K [ m S m - d S 2 m q d q 2 m ] - - - ( 5 )
Wherein, S representative bunch, the sum that K represents in the whole network bunch; The corresponding crowd of q representative bunch S, m represents the limit of the whole network, m SLimit number in the representative bunch S, m qRepresent the limit number in the crowd q, d SThe degree sum of all nodes in the representative bunch S, d qRepresent the degree sum of all nodes in the crowd q.
Wherein,
Figure BDA0000129370860000062
described node in the clustering architecture in crowd's scope; Promptly under the local message environment, the probability that connects is expected.This shows: only there is the possibility that is connected in the node in the clustering architecture with crowd's interior nodes under situation about connecting at random; And with the whole network in the outer node of crowd, the extraneous node of local message that promptly node is grasped in the clustering architecture not have the possibility of connection.Optimization aim function after the improvement has embodied the application of crowd's thought in the complex network clustering method.
The inventive method is particularly useful for the interpersonal relationships network in the social system, the fixed telephone network situation in the communication system, description be will occur in individuality in the network as starting point, analyze the application scenarios of describing dependence between the individuality.Through the relation of the gathering between the individuality in statistics interpersonal relationships network or the fixed telephone network, be target with higher statistics dividing precision, can reflect whole topology of networks truly, in the hope of best user experience is provided for the user.Interpersonal relationships network that statistics relied in the application scenarios or fixed telephone network scene are used; Can show for the target function of parameter in order to node and the internodal frontier juncture system that connects; Then with statistics in the practical application scene and division network configuration topology; How the problem that strengthens the network configuration accuracy converts into according to node and the internodal frontier juncture system that connects as parameter, and optimization objective function makes target function level off to no inclined to one side target function more; Each node in the network is carried out accurate sub-clustering, to obtain the process of approximate optimal solution.
The inventive method defines community in the application scenarios and network using data structure as shown in table 1.
The data structure of table 1 community and network
Figure BDA0000129370860000063
Figure BDA0000129370860000071
As shown in Figure 1, for the present invention is directed to the overall flow figure based on the improved Fast-Newman clustering method of crowd's thought of complex network, specifically comprise the steps.
Step 1: all the interstitial content N in the statistics network, and be each node sequence numbering, to each node i in the network, the id of community that its place is set is i, and node_map [the id]=i of i node also promptly is set.I representes the numbering of node, 1≤i≤N.
Step 2: for each node i is created a community structure; In this community member member, add node i, and this community of mark exists at present, the value that the survival mark alive of this community specifically is set is true; True representes that this community exists, and false representes that this community does not exist.All communities of creating are added among the set cmtys.The quantity nalive of community that exists in the current network is that the node in the network is counted N.
Step 3: to each i of community, confirm its inner total limit count in_edge [i] with and inner total number of degrees degree [i].Under the initial situation, because of only there being a node in each community, so in_edge [i]=0, degree [i]=0.
Step 4: to every couple of i of community, j, cross_edge [i] [j], 1≤i≤N, 1≤j≤N, and i ≠ j are counted in definite limit between the two.
Step 5: to each i of community, according to formula (5) determination module property evaluation function Q '.Bunch just represent community in the formula (5), specifically the value for the modularity evaluation function Q ' [i] of the i of community adopts formula (6) to obtain:
Q ′ [ i ] = Σ i = 1 nalive [ m i m - d i 2 m q d q 2 m ] - - - ( 6 )
Wherein, nalive is the community's sum that exists in the current network, and q represents the corresponding crowd of the i of community, and m represents the limit number of whole network, m iRepresent the limit in the i of community to count in_edge [i], m qRepresent the limit number in the crowd q, d iRepresent the degree sum of all nodes in the i of community, d qRepresent the degree sum of all nodes in the crowd q.
The crowd q of certain i of community correspondence is meant the set of the adjacent community of i of community and the i of community according to the described definition of formula (1).According to the described definition of formula (2), arbitrary node exists at least one to connect the limit among a node and the p of community if exist at least among the i of community, and then i of community and the p of community are exactly adjacent community.
Two i of community of concrete judgement; Whether j is adjacent community; Can count cross_edge [i] [j] by the limit of two communities that obtain in the step 4 and confirm,, represent that then i of community and the j of community are adjacent communities if cross_edge [i] [j] is not equal to 0; If cross_edge [i] [j] equals 0, represent that then i of community and the j of community are not adjacent communities.
Step 6: the variable maxQ ' that is provided for representing current maximum Q ' value:
maxQ′=max(Q′[i],1≤i≤nalive)
Step 7: if when existing greater than one community in the current network, promptly communities all in the current network is enumerated to i in nalive>1 o'clock, j, and execution in step 8 then; Otherwise, execution in step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j.
Step 8: judge communities all in the current network to whether all being got,, get a pair of community of not getting arbitrarily to i, j if do not have; If all community was to all being got, changes step 12 and carry out.
Step 9: judge the limit that whether has connection between i of community and the j of community, specifically judge, if cross_edge [i] [j] ≠ 0 according to the value of cross_edge [i] [j]; It is true that mark found is set, and two communities that possibly merge have been found in expression, and execution in step 10 then; If cross_edge [i] [j]=0; Then represent this a pair of i of community, do not have the limit of connection between the j, change step 8 and carry out.
Step 10: supposition merges i of community and the j of community; The id that merges the new communities that obtain is set to i '; In_edge [i '] and inner total number of degrees degree [i '] are counted in total limit of confirming the inside of the i ' of new communities, equally according to formula (5) determination module property evaluation function value Q ' [i '].Be specially:
Q ′ [ i ′ ] = Σ i ′ = 1 nalive ′ [ m i ′ m - d i ′ 2 m q ′ d q ′ 2 m ] - - - ( 7 )
Wherein, nalive ' carries out community's sum of existing in the current network combination situation under for supposition with the i of community and the j of community, and its value is the total nalive-1 of community that exists in the current network; Q ' represents the crowd of the i ' of community correspondence, and m represents the limit number of whole network, m I 'Represent the interior limit of the i ' of community to count in_edge [i '], m Q 'Represent the interior limit number of crowd q ', d I 'Represent the degree sum of interior all nodes of the i ' of community, d Q 'Represent the degree sum of interior all nodes of crowd q '.The crowd q ' of the i ' of community correspondence is meant the set of the i ' of community and the adjacent community of the i ' of community.According to the described definition of formula (2), arbitrary node exists at least one to connect the limit among a node and the p of community if exist at least among the i ' of community, and then the i ' of community is exactly adjacent community with the p of community.
It is the internal edges number that the internal edges number of the i of community is added the j of community that in_edge [i '] is counted on total limit of the inside of the concrete i ' of new communities, adds the limit number that connects between i of community and the j of community and obtains.The inner total number of degrees degre [i '] of concrete new communities is that the number of degrees that the number of degrees with the j of community add the i of community obtain.
Step 11: whether the modularity evaluation function value Q ' [i '] that relatively obtains greater than the variable maxQ ' of current maximum Q ' value, otherwise, do not do renewal, execution in step 8; If the value of upgrading maxQ ' is the modularity evaluation function value Q ' [i] of new communities, and the j of community is merged among the i of community, execution in step 7 then.Specifically just the j of community merges to and comprises following operation among the i of community: the id i ' that merges the new community that obtains is set equals i; The value that the survival mark alive of the j of community is set is false; Affiliated society's area code node_map value of node among the j of community is revised as i; Node among the j of community is joined among the community member member of the i of community; With the limit of the inside of the i of new communities more count in_edge [i] total and inner number of degrees degree [i], other intercommunal limit numbers that more exist in i of new communities and the current whole network, the modularity evaluation function value Q ' [i] of the current i of community is be exactly the Q ' that step 10 obtains [i '].
The i of community counts in_edge [i] in inner limit be exactly the in_edge that obtains in the step 10 [i '], total the inner number of degrees degree [i] of the i of community is exactly the degre [i '] that obtains in the step 10.Cross_edge [] [] is counted on other intercommunal limits that more exist in the i of new communities and the current whole network: intercommunal limit that will link to each other with the j of community, except the i of community is added in the limit that links to each other with the i of community, statistics limit number.
Step 12: preserve the middle maximum Q ' value of current variable maxQ ', and final community partition structure, method ends then.
Resulting final community partition structure is exactly the resulting cluster result of clustering method of the present invention, and this result can reflect whole topology of networks more truly, in the hope of best user experience is provided for the user.The advantage of the inventive method specifically is described through following test.In Fig. 2-Fig. 4, FN representes to adopt the traditional F N clustering method suc as formula the traditional Q function shown in (4); FN-group representes to adopt clustering method of the present invention, the Q ' function that is based on the crowd of employing.
In order to verify in the inventive method based on the effect of improved modularization evaluation function in optimization method of crowd's thought and the effect in cluster process; The present invention chooses employing and compares suc as formula traditional F N clustering method and the inventive method of the traditional Q function shown in (4), mainly considers based on following reason:
1) traditional F N clustering method is the clustering method that uses the modularization evaluation function to be optimized as target function the earliest and the most directly, most thereafter such clustering methods all with traditional F N clustering method as the basis.Therefore realization chooses traditional F N clustering method to the improvement of target function in the traditional F N clustering method, has basic meaning for this type FN clustering method.
2) core concept of FN clustering method is the optimization to target function, and candidate solution search strategy and reception strategy in the clustering method are simple relatively, and the cluster effect depends on the definition of target function fully.The improvement effect of the performance target function that therefore, clustering precision can be accurate and visual.
The final output network clustering architecture of the inventive method hierarchical relationship tree.The instrument that has used Stanford Network Analysis Project (SNAP) to provide in the development, the algorithm time complexity of this method are O (mn).
Be objective comprehensive relatively improvement effect of the inventive method; Simultaneously three scale data sets (Neural network, Political Blogs and Email) all different with attribute are carried out the cluster computing, and use Conductance and two kinds of cluster effects of Expansion function among the Network Community Profile (NCP) that cluster result is carried out evaluation analysis respectively.
The definition of Conductance function and Expansion function is distinguished as follows:
Conductance: f ( S ) = c S 2 m S + c S Expansion: f ( S ) = c S n S
Wherein, c SExpression bunch S interior nodes and bunch S exterior node connect the sum on limit; m SCompany's limit sum in the expression bunch S; n SNode sum in the expression bunch S.The functional value of two evaluation functions is low more, explains that clustering precision is high more, effect is good more.
Experimental result shows that the inventive method is significantly increased to the ratio of precision traditional F N clustering method of large-scale complex network clustering analysis.Big for network size, and connect sparse or the uneven network effect of annexation especially outstanding.
Fig. 2 has shown in data set " Neural Network ", uses the FN algorithm before and after improving to carry out the cluster computing, and the result who uses Conductance and Expansion function that the cluster effect is estimated." Neural Network " data set belongs to the neuron complex network in the life system, and node in the network and limit have real physical significance, and its basic parameter is as shown in table 2.
Table 2 Neural Network data set attribute
Attribute Describe Numerical value
Number?of?nodes Node total quantity in the network 297
Average?clustering?coefficient Average cluster efficient 0.2924
Number?of?edges Total limit number in the network 2359
Diameter Network diameter 5
Number?of?triangles The node total number that connects frontier juncture system triangular in shape in the network 3241
Average?shortest?path?length Average shortest path 2.4553
In (a) of Fig. 2; Adopt the Conductance function that the cluster result of the inventive method is estimated; The Conductance mean value that obtains is 0.521, and the Conductance mean value that the cluster result evaluation of traditional F N clustering method is obtained is 0.7633.The Conductance mean value of the inventive method is lower than traditional F N clustering method under 87.1% situation, and along X-direction, difference constantly enlarges.This explanation is along with cluster passing operation time, clustering architecture scale constantly become big, and the lifting effect of clustering precision is remarkable all the more.
Can know that from Fig. 2 (b) the Expansion mean value of the inventive method is 13.48, and the Expansion mean value of traditional F N clustering method is 15.4232.The Expansion value of the inventive method is lower than FN algorithm under 80.65% situation, and the Expansion value of the inventive method has than great fluctuation process, and numerical value has notable difference in different clustering architecture scales.This explanation the inventive method has embodied the local message environmental difference in the cluster process, and can find more real clustering architecture based on local optimum.
Fig. 3 has shown the inventive method and the traditional F N clustering method cluster effect in " Political Blogs " data centralization." Political Blogs " data set belongs to the political blog complex network in the social system, and node and limit have social effect.Compare with " Neural network " data set; " Political Blogs " data set is larger, and number of nodes has enlarged 3.1 times, connects limit quantity and has increased severely 7 times; Therefore; Connection relation between nodes is tightr, and cluster coefficients improves, and the short loop quantity (triangular relationship) in the network increases.But simultaneously, the average shortest path between network node is elongated, and this explains that in data centralization, the raising degree of relationships between nodes tightness is limited, can't the cancellation network scale increase the influence that is brought.Its basic parameter is as shown in table 3.
Table 3 Political Blogs data set attribute
Attribute Describe Numerical value
Number?of?nodes Node total quantity in the network 1222
Average?clustering?coefficient Average cluster efficient 0.3203
Number?of?edges Total limit number in the network 16717
Diameter Network diameter 8
Number?of?triangles The node total number that connects frontier juncture system triangular in shape in the network 101043
Average?shortest?path?length Average shortest path 2.7375
Can know that from Fig. 3 (a) the Conductance mean value of the inventive method is 0.5499, and the Conductance mean value of traditional F N clustering method is 0.8969.The Conductance value of the inventive method is lower than traditional F N clustering method under 85.53% situation.Compare with Fig. 2 (a), the clustering precision of the inventive method improves more significantly greatly with the change of clustering architecture scale, the complex network that this explanation scale is big more, and the local group notion that the inventive method embodied is obvious more to the castering action of clustering precision.
Can know that from Fig. 3 (b) the Expansion mean value of the inventive method is 24.9263, and the Expansion mean value of traditional F N clustering method is 34.5256.The Expansion value of the inventive method is lower than traditional F N clustering method under 77.63% situation.Relatively find with Fig. 2 (b); The Expansion value fluctuation of the inventive method is more violent; But decrease than the more excellent probable value of traditional F N clustering method result, this explanation is along with the increase of complex network scale, the increase of localized network architectural difference property; The inventive method can demonstrate fully this local difference, and in most of the cases optimizes cluster result.
Fig. 4 has shown the inventive method and the traditional F N clustering method cluster effect in " Email " data centralization." Email " data set is described the complex network of community's Email dealing relation." Email " data set is compared with preceding two data sets, and it is many but concern sparse characteristics to have a number of nodes, so its cluster coefficients is lower; Average shortest path numerical value is higher; In this case, the locality of node is stronger, and quantity and the possibility of grasping global information are littler.Its basic parameter is as shown in table 4.
Table 4 Email data set attribute
Attribute Describe Numerical value
Number?of?nodes Node total quantity in the network 1133
Average?clustering?coefficient Average cluster efficient 0.2202
Number?of?edges Total limit number in the network 5452
Diameter Network diameter 8
Number?of?triangles The node total number that connects frontier juncture system triangular in shape in the network 5453
Average?shortest?path?length Average shortest path 3.6060
Can know that from Fig. 4 (a) the Conductance mean value of the inventive method is 0.521, and the Conductance mean value of traditional F N clustering method is 0.7633.The Conductance value of the inventive method is lower than traditional F N clustering method under 92.31% situation.Can know that from Fig. 4 (b) the Expansion mean value of the inventive method is 9.3233, and the Expansion mean value of traditional F N clustering method is 9.4885.The Expansion value of the inventive method is lower than traditional F N clustering method under 92.31% situation.Compare with Fig. 2, Fig. 3; The inventive method is further remarkable to the raising of clustering precision; In the complex network that this explanation scale is bigger, relation is sparse relatively, cluster coefficients is lower; The inventive method is the information environment of real simulated local finite more, and then promotes clustering precision to greatest extent.
The experimental result explanation of three group data sets: the inventive method is significantly increased to the ratio of precision traditional F N clustering method of large-scale complex network clustering analysis.Big for network size, and connect sparse or the uneven network effect of annexation especially outstanding.

Claims (3)

  1. One kind to complex network based on the improved Fast-Newman clustering method of crowd's thought, it is characterized in that, specifically comprise the steps:
    Step 1: all nodes in the statistics network, and for each node sequence numbering, establish node and add up to N, i is the numbering of node, 1≤i≤N, and to each node i in the network, society's area code that its place is set is i;
    Step 2: for each node i is created a community structure; And be provided for the survival mark alive that representes whether this community exists for each community; Node i is added among the community member of the i of community; The value that the parameter alive of this community structure is set is ture, and ture representes that this community exists, and false representes that this community does not exist; It is that node total in the network is counted N that the sum nalive of community that exists in the current network is set;
    Step 3: to each i of community, confirm its inner limit count in_edge [i] with and inner number of degrees degree [i];
    Step 4: to every couple of i of community, j, cross_edge [i] [j], 1≤i≤N, 1≤j≤N, and i ≠ j are counted in definite limit between the two;
    Step 5: the modularity evaluation function value Q ' [i] that confirms each i of community:
    Q ′ [ i ] = Σ i = 1 nalive [ m i m - d i 2 m q d q 2 m ] - - - ( 1 )
    Wherein, m represents the limit number of whole network, m iRepresent the limit in the i of community to count in_edge [i], d iRepresent the degree sum degree [i] of all nodes in the i of community, q represents the corresponding crowd of the i of community, m qRepresent the limit number in the crowd q, d qRepresent the degree sum of all nodes in the crowd q; The crowd q that the i of community is corresponding is meant the set of i of community and the adjacent community of i of community; Described adjacent community is defined as: arbitrary node exists at least one to connect the limit among a node and the p of community if exist at least among the i of community, and then i of community and the p of community are exactly adjacent community;
    Step 6: variable maxQ ' is set, is used for preserving the maximum Q ' value of current network community;
    Step 7: judge the community that whether exists in the current network greater than, if exist, then enumerate communities all in the current network to i, j, execution in step 8 then; Otherwise, execution in step 12; 1≤i≤nalive, 1≤j≤nalive, and i ≠ j;
    Step 8: judge communities all in the current network to whether all being got, if do not have, get a pair of community of not getting arbitrarily to i, j if all got, changes step 12 and carries out;
    Step 9: judge the limit that whether has connection between i of community and the j of community, if exist, execution in step 10 if do not exist, is changeed step 8 and is carried out;
    Step 10: supposition merges the i of community and the j of community and obtains the i ' of new communities; I ' is new communities number; In_edge [i '] and inner total number of degrees degree [i '] are counted in total limit of confirming the inside of the i ' of new communities, confirm the modularity evaluation function value Q ' [i '] of the i ' of new communities then:
    Q ′ [ i ′ ] = Σ i ′ = 1 nalive ′ [ m i ′ m - d i ′ 2 m q ′ d q ′ 2 m ] - - - ( 2 )
    Wherein, nalive ' carries out community's sum of existing in the current network combination situation under for supposition with the i of community and the j of community, and its value is the total nalive-1 of community that exists in the current network; Q ' represents the crowd of the i ' of community correspondence, and m represents the limit number of whole network, m I 'Represent the interior limit of the i ' of community to count in_edge [i '], m Q 'Represent the interior limit number of crowd q ', d I 'Represent the degree sum of interior all nodes of the i ' of community, d Q 'Represent the degree sum of interior all nodes of crowd q ';
    Step 11: whether the modularity evaluation function value Q ' [i '] that relatively obtains greater than the variable maxQ ' of current maximum Q ' value, if not, does not upgrade, and changes step 8 and carries out; If the value of upgrading maxQ ' is the modularity evaluation function value Q ' [i] of new communities, and the j of community is merged among the i of community, changes step 7 then and carry out;
    Step 12: preserve the middle maximum Q ' value of current variable maxQ ', and final community partition structure, method ends then.
  2. 2. according to claim 1 a kind of to complex network based on the improved Fast-Newman clustering method of crowd's thought; It is characterized in that; In_edge [i '] is counted on total limit of the new communities i ' inside described in the step 10; Be the internal edges number that the internal edges number of the i of community is added the j of community, add the limit number that connects between i of community and the j of community and obtain that total number of degrees degree [i '] of said new communities i ' inside obtains the number of degrees that the number of degrees of the j of community add the i of community.
  3. 3. according to claim 1 a kind of to complex network based on the improved Fast-Newman clustering method of crowd's thought; It is characterized in that; The j of community is merged among the i of community described in the step 11; Specifically comprise following operation: the node among the j of community is joined among the community member of the i of community, society's area code of the node among the j of community is revised as i, the value that the survival mark alive of the j of community is set is false; More in_edge [i] and inner total number of degrees degree [i], other intercommunal limit numbers that more exist in i of new communities and the current whole network are counted in the limit of the inside of the i of new communities.
CN201210004690.9A 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network Expired - Fee Related CN102571431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210004690.9A CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201110396200 2011-12-02
CN201110396200.X 2011-12-02
CN201210004690.9A CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Publications (2)

Publication Number Publication Date
CN102571431A true CN102571431A (en) 2012-07-11
CN102571431B CN102571431B (en) 2014-06-18

Family

ID=46415957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210004690.9A Expired - Fee Related CN102571431B (en) 2011-12-02 2012-01-09 Group concept-based improved Fast-Newman clustering method applied to complex network

Country Status (1)

Country Link
CN (1) CN102571431B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819611A (en) * 2012-08-27 2012-12-12 方平 Local community digging method of complicated network
CN104598927A (en) * 2015-01-29 2015-05-06 中国科学院深圳先进技术研究院 Large-scale graph partitioning method and system
CN106789285A (en) * 2016-12-28 2017-05-31 西安交通大学 A kind of multiple dimensioned community discovery method of online community network
CN104156462B (en) * 2014-08-21 2017-07-28 上海交通大学 Complex network community method for digging based on cellular Learning Automata
CN107376357A (en) * 2016-05-17 2017-11-24 蔡小华 A kind of good friend's interaction class internet game method
CN107888431A (en) * 2017-12-25 2018-04-06 北京理工大学 The centralization algorithm and its model construction method of a kind of dynamic core edge network
CN110110177A (en) * 2019-04-10 2019-08-09 中国人民解放军战略支援部队信息工程大学 A kind of Malware family Cluster Evaluation method and device based on figure
CN112580916A (en) * 2019-09-30 2021-03-30 深圳无域科技技术有限公司 Data evaluation method and device, computer equipment and storage medium
CN113011471A (en) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 Social group dividing method, social group dividing system and related devices

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572961A (en) * 2009-05-31 2009-11-04 北京航空航天大学 Mobile scale-free self-organizing network model building method
CN101771964A (en) * 2010-01-06 2010-07-07 北京航空航天大学 Information correlation based opportunistic network data distributing method
CN101594697B (en) * 2009-05-08 2011-01-05 北京航空航天大学 Method for data communication in community-based opportunistic network
GB2477921A (en) * 2010-02-17 2011-08-24 Sidonis Ltd Analysing a network using a network model with simulated changes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101594697B (en) * 2009-05-08 2011-01-05 北京航空航天大学 Method for data communication in community-based opportunistic network
CN101572961A (en) * 2009-05-31 2009-11-04 北京航空航天大学 Mobile scale-free self-organizing network model building method
CN101771964A (en) * 2010-01-06 2010-07-07 北京航空航天大学 Information correlation based opportunistic network data distributing method
GB2477921A (en) * 2010-02-17 2011-08-24 Sidonis Ltd Analysing a network using a network model with simulated changes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
安娜等: "一种基于改进的Newman快速算法", 《科学技术与工程 》 *
牛建伟等: "一种基于社区机会网络的消息传输算法", 《计算机研究与发展》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819611A (en) * 2012-08-27 2012-12-12 方平 Local community digging method of complicated network
CN102819611B (en) * 2012-08-27 2015-04-15 方平 Local community digging method of complicated network
CN104156462B (en) * 2014-08-21 2017-07-28 上海交通大学 Complex network community method for digging based on cellular Learning Automata
CN104598927A (en) * 2015-01-29 2015-05-06 中国科学院深圳先进技术研究院 Large-scale graph partitioning method and system
CN107376357A (en) * 2016-05-17 2017-11-24 蔡小华 A kind of good friend's interaction class internet game method
CN106789285A (en) * 2016-12-28 2017-05-31 西安交通大学 A kind of multiple dimensioned community discovery method of online community network
CN107888431A (en) * 2017-12-25 2018-04-06 北京理工大学 The centralization algorithm and its model construction method of a kind of dynamic core edge network
CN107888431B (en) * 2017-12-25 2020-06-16 北京理工大学 Dynamic core-edge network centralization algorithm and model construction method thereof
CN110110177A (en) * 2019-04-10 2019-08-09 中国人民解放军战略支援部队信息工程大学 A kind of Malware family Cluster Evaluation method and device based on figure
CN112580916A (en) * 2019-09-30 2021-03-30 深圳无域科技技术有限公司 Data evaluation method and device, computer equipment and storage medium
CN112580916B (en) * 2019-09-30 2024-05-28 深圳无域科技技术有限公司 Data evaluation method, device, computer equipment and storage medium
CN113011471A (en) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 Social group dividing method, social group dividing system and related devices

Also Published As

Publication number Publication date
CN102571431B (en) 2014-06-18

Similar Documents

Publication Publication Date Title
CN102571431B (en) Group concept-based improved Fast-Newman clustering method applied to complex network
CN102810113B (en) A kind of mixed type clustering method for complex network
CN104102745B (en) Complex network community method for digging based on Local Minimum side
Li et al. A comparative analysis of evolutionary and memetic algorithms for community detection from signed social networks
CN102571954B (en) Complex network clustering method based on key influence of nodes
CN103678671B (en) A kind of dynamic community detection method in social networks
CN106202335B (en) A kind of traffic big data cleaning method based on cloud computing framework
CN107784598A (en) A kind of network community discovery method
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
CN102750286B (en) A kind of Novel decision tree classifier method processing missing data
CN108683448B (en) Influence node identification method and system suitable for aviation network
CN103020163A (en) Node-similarity-based network community division method in network
CN113807520A (en) Knowledge graph alignment model training method based on graph neural network
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
CN106991614A (en) The parallel overlapping community discovery method propagated under Spark based on label
CN105184326A (en) Active learning multi-label social network data analysis method based on graph data
US20170236226A1 (en) Computerized systems, processes, and user interfaces for globalized score for a set of real-estate assets
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN109885797B (en) Relational network construction method based on multi-identity space mapping
Bhat et al. OCMiner: a density-based overlapping community detection method for social networks
Xu et al. Integration of migration and attention flow data to reveal association of virtual–real dual intercity network structure
Ma et al. Opportunistic networks link prediction method based on Bayesian recurrent neural network
CN109033746A (en) A kind of protein complex recognizing method based on knot vector
Jiang et al. Efficiency improvements in social network communication via MapReduce

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20161121

Address after: 450063 Henan Province, Zhengzhou North Sanhuan Henan province university science and Technology Park Building 7, 13 floor

Patentee after: Henan Zhongcheng information Polytron Technologies Inc

Address before: 100191 Haidian District, Xueyuan Road, No. 37,

Patentee before: Beihang University

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140618

Termination date: 20200109

CF01 Termination of patent right due to non-payment of annual fee