CN103325061A - Community discovery method and system - Google Patents

Community discovery method and system Download PDF

Info

Publication number
CN103325061A
CN103325061A CN2013102012988A CN201310201298A CN103325061A CN 103325061 A CN103325061 A CN 103325061A CN 2013102012988 A CN2013102012988 A CN 2013102012988A CN 201310201298 A CN201310201298 A CN 201310201298A CN 103325061 A CN103325061 A CN 103325061A
Authority
CN
China
Prior art keywords
community
node
attribute
network
obtains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102012988A
Other languages
Chinese (zh)
Other versions
CN103325061B (en
Inventor
徐冰莹
贾焰
杨树强
周斌
韩伟红
李爱平
韩毅
李莎莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201310201298.8A priority Critical patent/CN103325061B/en
Publication of CN103325061A publication Critical patent/CN103325061A/en
Application granted granted Critical
Publication of CN103325061B publication Critical patent/CN103325061B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a community discovery method. The community discovery method includes the steps that community division is conducted on a plurality of nodes in a network based on modularity maximization, and community boundary nodes obtained in the last step are adjusted based on community property entropy minimality. The community discovery method further includes the steps that if the community division obtained after adjustment meets an end condition, the community division will be used as final community division; otherwise, the communities obtained after adjustment will be used as nodes, community division will be conducted on the nodes again, and the community boundary nodes need to be readjusted. According to the community discovery method, the structure of the network and the attributive characters of the nodes are taken into account at the same time, and the degree of accuracy of community discovery is improved. In addition, the community discovery method is close to be linear in time complexity and suitable for large-scale on-line social network data.

Description

A kind of community discovery method and system
Technical field
The present invention relates to Complex Networks Analysis and Data Mining, relate in particular to a kind of community discovery method and system.
Background technology
Along with to the character of social networks and the further investigation of mathematical feature, the researcher finds that many networks all have a common feature---community structure, that is to say, network is made of several " groups " or " group ", connection between the node in each " group " is very tight, and the connection between " group " is then relatively sparse.The discovery of Web Community can help the more effectively architectural feature of awareness network of people, thereby service more effective, more personalized is provided.For example: be used for information recommendation, user's classification, and the behavioural analysis of internet colony etc.On the online social networks, individual tends to produce a large amount of content of text from media properties, and these content of text have reflected topic that the author is concerned about and viewpoint tendency etc.Individual self label substance and personal information data such as age, occupation, interest etc., can reflect that individuality has a certain social characteristic, and the homogeney of social networks is so that have the people of identical social characteristic and more easily get together.Therefore, utilize these information can improve the quality of only finding community according to network structure.
Current, many researchs are found community by the structure of phase-split network.Wherein, the people such as Blondel have level based on the community structure of the large scale network in the reality, and the maximized fast algorithm of two stage modularities (BGL algorithm) that has proposed a kind of iteration is used for finding community.This algorithm was divided into for two steps: the first step, by local exchange node between the community so that the modularity maximization that community divides.Second step, community that the back network divide is produced are as a node in the new network, and the weights on limit are the weights sum on two intercommunal limits of its representative between the node.Above two steps that iterate are until the size of modularity no longer may increase.The employed modularity module of BGL algorithm as shown in the formula define, this definition is applicable to weighting network:
Q = 1 2 m Σ ij [ A ij - k i k j 2 m ] δ ( c i , c j ) - - - ( 1 )
Wherein, A IjThe weight on the limit between expression node i and the node j; k i=∑ jA IjThe weights sum on all limits that expression links to each other with node i; c i(affiliated) community at expression node i place; δ function δ (u, v) expression is 1 when u equates with v, and is 0 in all the other situations;
Figure BDA00003254094000021
The weights sum on all limits in the expression network.
Yet the BGL algorithm does not relate to the attribute information of network node.And studies show that, in real online social networks, the attribute information of node can be one of standard of judging, under hard-packed prerequisite, the nodal community in the same community is more similar better.In addition, although existing a lot of figure clustering method has combined the structure of network and the attributive character of node (or claiming nodal community or node attribute information) consideration (the new network of method construct by attribute and structure are weighted for example, and carry out community at new network and divide), but the result of these clusters often deposits structurally not closely or not related community, thereby causes the result of community discovery inaccurate; And the time complexity of these methods is higher, is unsuitable for processing large-scale data.
Therefore, the accuracy that needs a kind of method to improve community discovery; Can also be applicable to simultaneously extensive online social network data.
Summary of the invention
According to one embodiment of present invention, provide a kind of community discovery method, having comprised:
Step 1), a plurality of nodes in the network are carried out community based on modularity maximization divide;
Step 2), based on community's attribute entropy minimization adjustment from community's boundary node that step 1) obtains;
If step 3) is from step 2) community that obtains divides and satisfies termination condition, and then this community divides as final community and divides; Otherwise, will be from step 2) and the community that obtains is as node, re-execute step 1) and this node is carried out community divide and re-execute step 2) adjust community's boundary node.
In one embodiment, described termination condition is: through step 1) and step 2) processing after the community that the obtains modularity of dividing with process before modularity do not compare and increase, and through step 1) and step 2) processing after the whole community attribute entropy divided of the community that obtains with process before whole community attribute entropy do not compare and reduce.
In another embodiment, described termination condition is: repeating step 1) and step 2) number of times reached predetermined threshold value.
In one embodiment, step 1) comprises: for each node in the network, with this node motion to the maximum corresponding neighbor node of the positive increment of modularity place community, until the movement of any node all can not bring the positive increment of modularity.
In one embodiment, step 2) comprising:
Step 21), select at random community's boundary node;
Step 22), calculate described community boundary node moves on to the neighbours community that its neighbours community produces from the community at place community's attribute entropy production;
Step 23), select the neighbours community of described community attribute entropy production minimum, judge the whether community at described community boundary node place of this neighbours community, if not, then described community boundary node is moved to this neighbours community from the community at its place;
Step 24) if described community boundary node moves the whole community attribute entropy of front and back to change, then returns step 21).
In a further embodiment, step 21) comprising:
Step 211), community of random selection from the community that step 1) obtains;
Step 212), select a node at random from selected community, the end points on the limit that wherein links to each other with this node not exclusively is the node in the community at its place.
In one embodiment, step 21) frontly also comprise:
Step 20), the community of step 1) divided the community that is reduced to ancestor node divide, wherein ancestor node be for the first time based on modularity maximization carry out node in the network before community divides.
According to one embodiment of present invention, provide a kind of community discovery system, comprised that community divides module and community's adjusting module.Wherein, community's division module maximizes based on modularity for a plurality of nodes to network and carries out community's division.Community's adjusting module is used for dividing community's boundary node that module obtains based on community's attribute entropy minimization adjustment from described community; Satisfy predetermined condition if the community that obtains after adjusting divides, then this community is divided into final community's division; Otherwise, the community that obtains after adjusting as node, is divided module by described community and again this node is carried out community and divide and readjust community's boundary node by described community adjusting module.
Beneficial effect of the present invention is as follows:
By utilizing the attributive character of node, the boundary node that uses the community that the modularity maximization approach produces is optimized operation, with the more obvious community of discovery feature, thus the accuracy that has improved community discovery.In addition, time complexity of the present invention approaches linear, is applicable to extensive online social network data.
Description of drawings
Fig. 1 is community discovery method process flow diagram according to an embodiment of the invention;
Fig. 2 is the tree derivation of an embodiment that comprises the network of a plurality of nodes;
Fig. 3 carries out based on modularity maximization that community for the first time divides and adjusts the tree derivation that community that community's boundary node obtains divides based on community's attribute entropy minimization node shown in Figure 2;
Fig. 4 carries out based on modularity maximization that community for the second time divides and adjusts the tree derivation that community that community's boundary node obtains divides based on community's attribute entropy minimization node shown in Figure 2;
Fig. 5 a is the schematic diagram of an embodiment of network structure;
Fig. 5 b is the attribute matrix schematic diagram of node in the network structure shown in Fig. 5 a;
Fig. 5 c only considers that network structure carries out the result schematic diagram that community divides;
Fig. 5 d carries out the result schematic diagram that community divides according to community discovery method provided by the invention; And
Fig. 6 carries out the result that community divides according to community discovery method provided by the invention and existing method to contrast schematic diagram.
Embodiment
The present invention is described in detail below in conjunction with the drawings and specific embodiments.
Convenient in order hereinafter to describe, the following concept of paper:
1, network chart
A given network chart G=(V, E), wherein V={v 1, v 2... v nThe expression nodes set, E={e 1, e 2..., e mThe expression limit set.
Figure BDA00003254094000041
And
Figure BDA00003254094000042
| V|=n and | E|=m.Usually, can use<v i, v jExpression node v iAnd v jBetween the limit that connects, wherein, 1≤i≤n, 1≤j≤n, i ≠ j.Thereby the limit also can be expressed as e k=<v i, v j, 1≤k≤m.In social networks, the limit can represent between its node (user) that links to each other each other good friend, bean vermicelli etc., can decide according to the target of analyzing.Network chart related in the text is non-directed graph, and the node among the G is unordered to represented limit, namely<and v i, v jAnd<v j, v iWhat represent is the same limit.
2, with the network chart of node attribute information
As indicated above, take online social networks as example, attribute (being attributive character as described above, attribute information) can be nationality, hobby of user's label, user's age, user etc.Dividing different networks may need to use different attributes, and attribute can be that what to disperse also can be continuous.The discrete hobby that for example comprises the user: physical culture, literature, politics etc.; The continuous age that for example comprises the user.Yet user's age 10~15 also can be dispersed and be turned on " youth " this discrete attribute.Attribute can represent with proper vector, and for example: one comprises address (to provincial), age (to age bracket), and the tlv triple of interest (in the artificial or computer-aided analysis perhaps label to specific classification): (Hunan, teenager, literature), (Zhejiang, middle age, literature).
The network chart that has attribute for the node in the network can be defined as figure (or claiming multiattribute figure) G={V with node attribute information, E, and A}, wherein, A={a 1, a 2... a lThe community set that has of expression nodes, and all properties number | A|=l.Arbitrary node v in the network i∈ V is corresponding attribute vector [a I1..., a Il], a wherein IlNode v iAt attribute a lOn value.
Community discovery target on the multiattribute figure namely is in a multiattribute figure, is k community (i.e. group, group) with node division, is expressed as G i=(V i, E i, A i), wherein
Figure BDA00003254094000051
V i∩ V j=φ.And the node in same community not only interconnects closely but also have very high similarity.
3, the neighbor node of node set
Node v iNeighbor node set N (v i)={ v j|<v i, v j∈ E, v j∈ V}, the expression network in node v iThe end points on the limit that directly links to each other.
4, the neighbours community of node set
Node v iNeighbours community set NV v i = { V j | v j ′ community , v j ∈ N ( v i ) } , Expression and node v iThe community that directly is connected.
5, community's boundary node
The V of community mBoundary node set O ( V m ) = { v i | v j ∈ N ( v i ) , v j ∉ V m , v i ∈ V m }, Wherein, the node v in the community iBe the boundary node of this community, refer to and v iThe end points on the limit that directly links to each other not exclusively is the V of community mInterior node.In social networks, the boundary node of community is born the important node of link bridge and Information Communication channeling often, also might be the node that belongs to a plurality of communities.
6, community's attribute entropy
Entropy is a key concept in the Shannon information theory, in Data Mining, can be used for defining a similarity in the data set, and the similarity in data are gathered between each node is higher, and its whole entropy is just lower.Suppose a given V of community m, comprise | V m|=M node, can adopt following formula to define the attribute entropy H (V of this community so m):
H ( V m ) = Σ i = 1 M - 1 Σ j = i + 1 M ( s i , j 2 ln s i , j 2 + ( 1 - s i , j 2 ) ln ( 1 - s i , j 2 ) ) - - - ( 2 )
In the formula (2), s I, jRepresent the similarity between two node i and the j, namely in the similarity on the attribute (reflecting the intimate degree of two nodes on attribute), and
On this basis, the attribute entropy (or claiming whole community attribute entropy) of whole community division can be expressed as:
H = Σ i = 1 K H ( V i ) - - - ( 3 )
According to one embodiment of present invention, can come the similarity on attribute between the computing node with various similarity calculating methods, as utilize the cosine law and generalized J accard coefficient to calculate this similarity.Because attribute can be continuous, also can be disperse or content of text.Attribute is among the embodiment of Category Attributes therein, and the computing method of attributes similarity can adopt the method based on proper vector, perhaps adopts Descartes's similarity calculating method.If attribute is continuous, then can be converted into first discrete (conversion to the age described above) and processes again.
According to one embodiment of present invention, provide a kind of community discovery method.As shown in Figure 1, the method comprises following three steps:
Step 1, use modularity maximization approach are carried out community and are divided.
Weighting network has among the embodiment of N node therein, and when initial, each node in this weighting network represents a different community, that is to say, has what nodes just to represent that what communities are arranged in this network in the network.Then, suppose that node i from the community that the community at own place shifted out and transferred to its neighbor node j place, calculates respectively the increment that it moves to the modularity of different communities.Finding out in the neighbor node set of node i can be so that the node j' of the positive increment maximum of modularity, and really node i is transferred to the community at node j' place.If can not find the neighbor node j' that satisfies condition in this process, it is constant that then node i remains on original community.This process lasts till in this network, and the movement of any node all can not bring till the positive increment of modularity.After this process finishes, obtain the local modularity maximization of network.
Step 2, the community's attribute entropy minimization process of carrying out.
As indicated above, the similarity in community between each node is higher, and its attribute entropy is just lower.On the community that obtains according to step 1 divides, minimize whole community attribute entropy by adjusting community's boundary node.
In one embodiment, if be not ancestor node to be carried out community divide in step 1, the community that then also step 1 will be obtained before this step divides the community that reverts to ancestor node and divides.This be because, the community that all needs before to obtain from current community division (comprising modularity maximization and community's attribute entropy minimization process) at next iteration (adopting in other words the modularity maximization approach to carry out community divides) regards node as, re-starts community according to this node and divide (will be described in more detail below) when next iteration.Therefore, during the community that step 1 obtains divided, each node in the community may be the community that obtains after a front community divides, and community's attribute entropy need to calculate according to the attribute of the ancestor node in the community.
In a further embodiment, can be by creating and safeguard that community that adjacent twice modularity maximization approach produces divides and corresponding relation is realized above-mentioned reduction, its structure can be as Fig. 2-4 be described, represents setting.Fig. 2-4 show respectively network with 10 ancestor nodes when initial, for the first time after the iteration (each iteration comprises modularity maximization and community's attribute entropy minimization process), the second time community structure schematic diagram that obtains after the iteration.Although this network structure is not shown, should be understood that 10 nodes may have the limit to link to each other each other, and each node has attribute.Among Fig. 2,0_0 represents the numbering of these 10 ancestor nodes to 0_9, and wherein the numeral of " _ " front is used for representing that this tree is the result's (perhaps be used for representing formed which network, wherein the network of original state is the 0th network) which iteration obtains."-" node of 10 node tops represents the root node of whole tree, and its each stalk tree represents a community (when being initial, each node is a community).After this network being used for the first time the modularity maximization approach divide community and adjusting this community based on community's attribute entropy minimization and divide, this network can be divided into 4 communities (or claiming supernode), ancestor node 0_0 wherein, 0_1,0_2 and 0_3 are divided into a community, ancestor node 0_4,0_5 and 0_6 are divided into a community and ancestor node 0_7 is divided into separately a community.Fig. 3 shows community and the ancestor node corresponding relation that obtains after the iteration in the first time, and the numbering of 4 communities among Fig. 3 is respectively 1_0,1_1,1_2 and 1_3.Wherein, the ancestor node that the 1_0 of community is corresponding is 0_0,0_1, and 0_2 and 0_3, the ancestor node that the 1_1 of community is corresponding is 0_4,0_5 and 0_6, the ancestor node that the 1_2 of community is corresponding is 0_7, the ancestor node that the 1_3 of community is corresponding then is 0_8 and 0_9.If do not satisfy predetermined termination condition, then next step will do Further Division (or claim iteration, comprise modularity maximization procedure and community's attribute entropy minimization process) to the new network that these 4 supernodes form.Show the corresponding relation that carries out dividing for the second time rear community and node such as Fig. 4, wherein the 2_0 of community has comprised supernode 1_0 and 1_1, and the 2_1 of community has comprised supernode 1_2 and 1_3.Thereby, can obtain current community on ancestor node and be divided into { 0_0,0_1,0_2,0_3,0_4,0_5,0_6} and { 0_7,0_8,0_9}.
The corresponding relation that represents between ancestor node and the community's division with setting has above exemplarily been described, should be understood that community's division and the corresponding relation thereof that also can adopt other technologies well known in the art to create and safeguard adjacent twice modularity maximization approach generation.
In one embodiment, given multiattribute figure and certain initial community thereof are divided, its community's attribute entropy minimization method comprises:
A), V of community of random selection from current community divides m, to the conceptual description of community's attribute entropy, obtain its boundary node set according to above.
B), select the boundary node i of community at random from set, suppose neighbours community that it is moved to it, calculate its contribution degree increment to the attribute entropy of neighbours community (being community's attribute entropy production).
Wherein, according to following formula computing node i to the V of community mThe contribution degree increment of the attribute entropy of (the neighbours community that it moves to):
ΔH = H ( V m ) - H ( V m - i )
= Σ j ∈ V m - i ( s i , j 2 ln s i , j 2 + ( 1 - s i , j 2 ) ln ( 1 - s i , j 2 ) ) - - - ( 4 )
Wherein, the community boundary node of node i for selecting; H (V m) be the V of community (neighbours community) at node i place mThe attribute entropy; s I, jRepresent two node i and the similarity of j on attribute, lim s i , j → 0 s i , j ln s i , j = 0 .
C), select the neighbours community (namely adopting formula (4) to calculate the corresponding neighbours of the minimum delta H community of gained) of community's attribute entropy production minimum, and judge the whether community at the current place of node i of this community, if not, then this node is moved to new community from current community.
D), judge whether to satisfy end condition: judge that whether whole community attribute entropy no longer changes, if still change, then repeats above-mentioned steps.
In one embodiment, can adopt following algorithm to describe community's attribute entropy minimization process:
Algorithm: entropyMin
Input: community divides V, similarity matrix
Output: community divides V'
1.Repeat
2.V mA community of<-random community from V//select at random
3.V Out<-outer of V m// obtain the V of community mBoundary node set O (V m)
4.v<-random node of V Out// select at random the boundary node v of community
5.V OuterThe neighbours community set (community that comprises own current place) of<-v ' s neighbor community//obtain node v
6.for V' ∈ V Outer∪ V m, calculate Δ V v' // calculate moves in each neighbours community the increment of community's attribute entropy
7.V New<-select min Δ V vOne of ' // select increment minimum as target community
8.
Figure BDA00003254094000084
If // target community and current place community are inconsistent, then move to new target community, otherwise it is motionless to remain on current community
9.move?v?to?community?V new
Until community's attribute entropy stop when no longer changing
Through after the community attribute entropy minimization process, a community that replaces minimization process to obtain with each node (supernode) in the new network, and the weights on the limit between the node are the weights sums on two intercommunal limits of correspondence; Certainly the weights that encircle the limit of node are the weights sum on limit between the same community interior nodes of its representative.
Step 3, judge whether to satisfy termination condition, if do not satisfy the above process that then repeats.
In one embodiment, because final purpose is so that the community that finally obtains reaches balance between modularity maximization and homogeneous community of sign community attribute entropy minimization, so said method ends at, and modularity can not increase again and whole community attribute entropy can not reduce again.And in another embodiment, can control granularity and the scale that community divides by the control iterations, the community structure on the reflection different levels reaches the purpose of finding the hierarchy type community structure.
In one embodiment, can adopt following arthmetic statement to carry out the process that community divides based on the modularity maximization and based on community's attribute entropy minimization:
Algorithm: ACD
Input: imax
Output: V
1:i<-0; // initialization section, the record iterations
2:Q Old<-modularityCalc (); // introductory die lumpiness value
3:C Old<-entropyCalc (); // initial community attribute entropy
3:Q New<-modularityMax (); // modularity maximum operation, and logging modle degree value
4:V 0<-entropyMin (); The operation of // entropy minimization
5:C New<-entropyCalc (); Community's attribute entropy that // calculating is new
6:while Q New-Q Old0and C Old-C New0and i<imax do//evaluation algorithm termination condition: modularity and entropy all no longer change or reach the iterations of maximum
7:newNetwork (V i) // make up supernode network
8:Q Old<-Q New// preservation modularity value
9:C Old<-C New// preservation community attribute entropy
10:i<-i+1; // iterations counting
11:Q New<-modularityMax (); The maximization of // modularity, and preserve the modularity value
12:resetCommunity (); // the community that obtains ancestor node divides
13:V i<-entropyMin (); // after the community structure that ancestor node is divided, carry out community's attribute entropy minimization to operate
14:C New<-entropyCalc (); The entropy that // calculating is new
15:end?while
16:return V i// return the net result that community divides
According to one embodiment of present invention, also provide a kind of community discovery system, comprise that community divides module and community's adjusting module.Wherein, community's division module maximizes based on modularity for a plurality of nodes to network and carries out community's division.Community's adjusting module is used for dividing community's boundary node that module obtains based on community's attribute entropy minimization adjustment from community; Satisfy predetermined condition if the community that obtains after adjusting divides, then this community is divided into final community's division; Otherwise, the community that obtains after adjusting as node, is divided module by community and again this node is carried out community and divide and readjust community's boundary node by community's adjusting module.
Fig. 5 a-5d has described a network example and has adopted community discovery method provided by the invention and the method that adopts existing only structure Network Based to find the result precision contrast of community.Fig. 5 a shows the network structure of this network example, and Fig. 5 b shows the attribute matrix of node.Here for the sake of simplicity, describe respectively community with A1, A2, A3 and divide required node attribute information, and represent with 0,1 whether node has this attribute.Can find out from Fig. 5 b, the node 1,2 shown in Fig. 1,3 attribute are identical, and node 4,5,6 attribute are identical, and node 7,8,9,10,11,12 attribute are identical.Fig. 5 c shows the division result of community that employing only considers that the existing method of network structure obtains, and Fig. 5 d shows the net result of the community's division that obtains according to community discovery method provided by the invention, that considered structure and nodal community.Can find out from Fig. 5 c and 5d, among Fig. 5 c, entirely not identical at the attribute of each community's internal node.And divide among the result in the community of Fig. 5 d, the nodal community of each community inside is identical, and namely node 1,2,3 forms a community, and node 4,5,6 forms a community, and other nodes form a community.As seen, adopt the resulting result of community discovery method provided by the invention more accurate.
Be the further accuracy of checking community discovery method provided by the invention, the inventor has gathered the relevant data of the political blog of the U.S., comprising 1490 blogs and 19090 hyperlink thereof, each blog has himself attribute (comprising " democracy " and " republicanism ").Fig. 6 shows and uses distinct methods political blog to be carried out the result of community discovery.As shown in Figure 6, after employing BGL method was carried out community's division, modularity and whole community attribute entropy were respectively 0.472 and 0.407; Divide modularity and the whole community attribute entropy obtain and be 0.411 and 0.03 and adopt community discovery method provided by the invention (representing take ACD among the figure) to carry out community, compare with the BGL method descended respectively 4.6% and 92.1%(as indicated above, entropy is lower to show that the node similarity in the community is higher).Adopt the whole community attribute entropy of SA-Cluster method gained close with the whole community attribute entropy that adopts community discovery method gained provided by the invention, but its modularity is lower.According to the experimental analysis on true social network data shown in Fig. 6, proved that community discovery method provided by the invention can obtain more significantly community of feature.The node of same community inside not only structurally connects closely, and the similarity between the node is also higher.
It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although with reference to embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (12)

1. community discovery method comprises:
Step 1), a plurality of nodes in the network are carried out community based on modularity maximization divide;
Step 2), based on community's attribute entropy minimization adjustment from community's boundary node that step 1) obtains;
If step 3) is from step 2) community that obtains divides and satisfies termination condition, and then this community divides as final community and divides; Otherwise, will be from step 2) and the community that obtains is as node, re-execute step 1) and this node is carried out community divide and re-execute step 2) adjust community's boundary node.
2. method according to claim 1, wherein, described termination condition is:
Through step 1) and step 2) processing after the community that the obtains modularity of dividing with process before modularity do not compare and increase, and through step 1) and step 2) processing after the whole community attribute entropy divided of the community that obtains with process before whole community attribute entropy do not compare and reduce.
3. method according to claim 1, wherein, described termination condition is:
Repeating step 1) and step 2) number of times reached predetermined threshold value.
4. any one described method according to claim 1-3, wherein, step 1) comprises:
For each node in the network, with this node motion to the maximum corresponding neighbor node of the positive increment of modularity place community, until the movement of any node all can not bring the positive increment of modularity.
5. any one described method according to claim 1-3, wherein, described network is weighting network.
6. method according to claim 5, wherein, adopt following formula computing module degree:
Q = 1 2 m Σ ij [ A ij - k i k j 2 m ] δ ( c i , c j ) ,
Wherein, A IjThe weight on the limit between expression node i and the node j, k i=∑ jA IjThe weights sum on all limits that expression links to each other with node i, c iThe community at expression node i place, δ (c i, c j) expression works as c iWith c jBeing 1 when equal, is 0 in all the other situations,
Figure FDA00003254093900012
The weights sum on all limits in the expression network.
7. any one described method according to claim 1-3, wherein, step 2) comprising:
Step 21), select at random community's boundary node;
Step 22), calculate described community boundary node moves on to the neighbours community that its neighbours community produces from the community at place community's attribute entropy production;
Step 23), select the neighbours community of described community attribute entropy production minimum, judge the whether community at described community boundary node place of this neighbours community, if not, then described community boundary node is moved to this neighbours community from the community at its place;
Step 24) if described community boundary node moves the whole community attribute entropy of front and back to change, then returns step 21).
8. method according to claim 7, wherein, step 21) comprising:
Step 211), community of random selection from the community that step 1) obtains;
Step 212), select a node at random from selected community, the end points on the limit that wherein links to each other with this node not exclusively is the node in the community at its place.
9. method according to claim 7, wherein step 21) frontly also comprise:
Step 20), the community of step 1) divided the community that is reduced to ancestor node divide, wherein ancestor node be for the first time based on modularity maximization carry out node in the network before community divides.
10. method according to claim 7, wherein, calculate whole community attribute entropy according to following formula:
H = Σ m = 1 K H ( V m ) ,
Wherein, K represents the quantity of community, H (V m) the expression V of community mThe attribute entropy, and
H ( V m ) = Σ i = 1 M - 1 Σ j = i + 1 M ( s i , j 2 ln s i , j 2 + ( 1 - s i , j 2 ) ln ( 1 - s i , j 2 ) ) ,
Wherein M represents the V of community mThe number of nodes that comprises, s I, jRepresent two node i and the similarity of j on attribute.
11. method according to claim 7 wherein, is calculated community's attribute entropy production according to following formula:
ΔH = H ( V m ) - H ( V m - i )
= Σ j ∈ V m - i ( s i , j 2 ln s i , j 2 + ( 1 - s i , j 2 ) ln ( 1 - s i , j 2 ) ) ,
Wherein, node i represents selected community boundary node, H (V m) represent that node i moves to the V of neighbours community mAfter, comprise the V of community of node i mThe attribute entropy, s I, jRepresent two node i and the similarity of j on attribute.
12. a community discovery system comprises:
Community divides module, maximizes based on modularity for a plurality of nodes to network and carries out community's division;
Community's adjusting module is used for dividing community's boundary node that module obtains based on community's attribute entropy minimization adjustment from described community; Satisfy predetermined condition if the community that obtains after adjusting divides, then this community is divided into final community's division; Otherwise, the community that obtains after adjusting as node, is divided module by described community and again this node is carried out community and divide and readjust community's boundary node by described community adjusting module.
CN201310201298.8A 2012-11-02 2013-05-27 A kind of community discovery method and system Expired - Fee Related CN103325061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310201298.8A CN103325061B (en) 2012-11-02 2013-05-27 A kind of community discovery method and system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2012104337208 2012-11-02
CN201210433720 2012-11-02
CN201210433720.8 2012-11-02
CN201310201298.8A CN103325061B (en) 2012-11-02 2013-05-27 A kind of community discovery method and system

Publications (2)

Publication Number Publication Date
CN103325061A true CN103325061A (en) 2013-09-25
CN103325061B CN103325061B (en) 2017-04-05

Family

ID=49193785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310201298.8A Expired - Fee Related CN103325061B (en) 2012-11-02 2013-05-27 A kind of community discovery method and system

Country Status (1)

Country Link
CN (1) CN103325061B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN104408149A (en) * 2014-12-04 2015-03-11 威海北洋电气集团股份有限公司 Criminal suspect mining association method and system based on social network analysis
CN104700311A (en) * 2015-01-30 2015-06-10 福州大学 Method for discovering neighborhood following community in social network
CN104715418A (en) * 2015-03-16 2015-06-17 北京航空航天大学 Novel social network sampling method
CN104820945A (en) * 2015-04-17 2015-08-05 南京大学 Online social network information transmision maximization method based on community structure mining algorithm
CN105095403A (en) * 2015-07-08 2015-11-25 福州大学 Parallel community discovery algorithm based on mixed neighbor message propagation
CN105704776A (en) * 2016-01-14 2016-06-22 河南科技大学 Node message forwarding method considering network node energy and caching
CN105701511A (en) * 2016-01-14 2016-06-22 河南科技大学 Adaptive spectral clustering method of extracting network node community attribute
CN106027296A (en) * 2016-05-16 2016-10-12 国网江苏省电力公司信息通信分公司 Method and device for decomposing information system models in electric power system
CN103793489B (en) * 2014-01-16 2017-01-18 西北工业大学 Method for discovering topics of communities in on-line social network
CN106570082A (en) * 2016-10-19 2017-04-19 浙江工业大学 Friend relationship mining method combining network topology characteristics and user behavior characteristics
CN107818474A (en) * 2016-09-13 2018-03-20 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to dynamically adjust product price
CN108090132A (en) * 2017-11-24 2018-05-29 西北师范大学 Fusion tag, which averagely divides distance and the community of structural relation, can be overlapped division methods
CN109657016A (en) * 2018-12-30 2019-04-19 南京邮电大学盐城大数据研究院有限公司 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN111047453A (en) * 2019-12-04 2020-04-21 兰州交通大学 Detection method and device for decomposing large-scale social network community based on high-order tensor
CN112925989A (en) * 2021-01-29 2021-06-08 中国计量大学 Group discovery method and system of attribute network
CN113593713A (en) * 2020-12-30 2021-11-02 南方科技大学 Epidemic situation prevention and control method, device, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411572B (en) * 2016-09-06 2019-05-07 山东大学 A kind of community discovery method of combination nodal information and network structure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
CN101877711A (en) * 2009-04-28 2010-11-03 华为技术有限公司 Social network establishment method and device, and community discovery method and device
CN102148717A (en) * 2010-02-04 2011-08-10 明仲 Community detecting method and device in bipartite network
CN102194149A (en) * 2010-03-01 2011-09-21 中国人民解放军国防科学技术大学 Community discovery method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
CN101877711A (en) * 2009-04-28 2010-11-03 华为技术有限公司 Social network establishment method and device, and community discovery method and device
CN102148717A (en) * 2010-02-04 2011-08-10 明仲 Community detecting method and device in bipartite network
CN102194149A (en) * 2010-03-01 2011-09-21 中国人民解放军国防科学技术大学 Community discovery method

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793489B (en) * 2014-01-16 2017-01-18 西北工业大学 Method for discovering topics of communities in on-line social network
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN103942308B (en) * 2014-04-18 2017-04-05 中国科学院信息工程研究所 The detection method and device of extensive myspace
CN104408149A (en) * 2014-12-04 2015-03-11 威海北洋电气集团股份有限公司 Criminal suspect mining association method and system based on social network analysis
CN104408149B (en) * 2014-12-04 2017-12-12 威海北洋电气集团股份有限公司 Suspect based on social network analysis excavates correlating method and system
CN104700311A (en) * 2015-01-30 2015-06-10 福州大学 Method for discovering neighborhood following community in social network
CN104700311B (en) * 2015-01-30 2018-02-06 福州大学 A kind of neighborhood in community network follows community discovery method
CN104715418A (en) * 2015-03-16 2015-06-17 北京航空航天大学 Novel social network sampling method
CN104820945A (en) * 2015-04-17 2015-08-05 南京大学 Online social network information transmision maximization method based on community structure mining algorithm
CN104820945B (en) * 2015-04-17 2018-06-22 南京大学 Online community network information based on community structure mining algorithm propagates maximization approach
CN105095403A (en) * 2015-07-08 2015-11-25 福州大学 Parallel community discovery algorithm based on mixed neighbor message propagation
CN105701511A (en) * 2016-01-14 2016-06-22 河南科技大学 Adaptive spectral clustering method of extracting network node community attribute
CN105704776B (en) * 2016-01-14 2019-07-05 河南科技大学 A kind of node messages retransmission method for taking into account network node energy and caching
CN105704776A (en) * 2016-01-14 2016-06-22 河南科技大学 Node message forwarding method considering network node energy and caching
CN105701511B (en) * 2016-01-14 2019-04-02 河南科技大学 A kind of Adaptive spectra clustering method extracting network node community attributes
CN106027296A (en) * 2016-05-16 2016-10-12 国网江苏省电力公司信息通信分公司 Method and device for decomposing information system models in electric power system
CN106027296B (en) * 2016-05-16 2019-06-04 国网江苏省电力公司信息通信分公司 The decomposition method and device of information model in a kind of pair of electric system
CN107818474A (en) * 2016-09-13 2018-03-20 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to dynamically adjust product price
CN107818474B (en) * 2016-09-13 2022-01-18 百度在线网络技术(北京)有限公司 Method and device for dynamically adjusting product price
CN106570082A (en) * 2016-10-19 2017-04-19 浙江工业大学 Friend relationship mining method combining network topology characteristics and user behavior characteristics
CN106570082B (en) * 2016-10-19 2019-11-05 浙江工业大学 A kind of friends method for digging of combination network topology characteristic and user behavior characteristics
CN108090132A (en) * 2017-11-24 2018-05-29 西北师范大学 Fusion tag, which averagely divides distance and the community of structural relation, can be overlapped division methods
CN108090132B (en) * 2017-11-24 2021-05-25 西北师范大学 Community overlapping division method integrating average division distance and structural relationship of labels
CN109657016A (en) * 2018-12-30 2019-04-19 南京邮电大学盐城大数据研究院有限公司 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN111047453A (en) * 2019-12-04 2020-04-21 兰州交通大学 Detection method and device for decomposing large-scale social network community based on high-order tensor
CN113593713A (en) * 2020-12-30 2021-11-02 南方科技大学 Epidemic situation prevention and control method, device, equipment and medium
CN112925989A (en) * 2021-01-29 2021-06-08 中国计量大学 Group discovery method and system of attribute network
CN112925989B (en) * 2021-01-29 2022-04-26 中国计量大学 Group discovery method and system of attribute network

Also Published As

Publication number Publication date
CN103325061B (en) 2017-04-05

Similar Documents

Publication Publication Date Title
CN103325061A (en) Community discovery method and system
CN104731962A (en) Method and system for friend recommendation based on similar associations in social network
CN111950594A (en) Unsupervised graph representation learning method and unsupervised graph representation learning device on large-scale attribute graph based on sub-graph sampling
CN111428147A (en) Social recommendation method of heterogeneous graph volume network combining social and interest information
CN106920147A (en) A kind of commodity intelligent recommendation method that word-based vector data drives
CN111737535B (en) Network characterization learning method based on element structure and graph neural network
CN104077417B (en) People tag in social networks recommends method and system
CN104598611B (en) The method and system being ranked up to search entry
CN105069122B (en) A kind of personalized recommendation method and its recommendation apparatus based on user behavior
CN109710835B (en) Heterogeneous information network recommendation method with time weight
CN107122455A (en) A kind of network user's enhancing method for expressing based on microblogging
CN104063481A (en) Film individuation recommendation method based on user real-time interest vectors
CN107038184B (en) A kind of news recommended method based on layering latent variable model
CN104978396A (en) Knowledge database based question and answer generating method and apparatus
CN103020163A (en) Node-similarity-based network community division method in network
CN109951377A (en) A kind of good friend's group technology, device, computer equipment and storage medium
Huang et al. Joint weighted nonnegative matrix factorization for mining attributed graphs
CN113486190A (en) Multi-mode knowledge representation method integrating entity image information and entity category information
CN109446414A (en) A kind of software information website fast tag recommended method based on neural network classification
CN112131261B (en) Community query method and device based on community network and computer equipment
CN110321492A (en) A kind of item recommendation method and system based on community information
Wu et al. Estimating fund-raising performance for start-up projects from a market graph perspective
CN115408605A (en) Neural network recommendation method and system based on side information and attention mechanism
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
Sun et al. Graph force learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170405

Termination date: 20190527