CN111382318B - Dynamic community detection method based on information dynamics - Google Patents

Dynamic community detection method based on information dynamics Download PDF

Info

Publication number
CN111382318B
CN111382318B CN202010178455.8A CN202010178455A CN111382318B CN 111382318 B CN111382318 B CN 111382318B CN 202010178455 A CN202010178455 A CN 202010178455A CN 111382318 B CN111382318 B CN 111382318B
Authority
CN
China
Prior art keywords
information
node
community
network
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010178455.8A
Other languages
Chinese (zh)
Other versions
CN111382318A (en
Inventor
孙泽军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingdingshan University
Original Assignee
Pingdingshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingdingshan University filed Critical Pingdingshan University
Priority to CN202010178455.8A priority Critical patent/CN111382318B/en
Publication of CN111382318A publication Critical patent/CN111382318A/en
Application granted granted Critical
Publication of CN111382318B publication Critical patent/CN111382318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a dynamic community detection method based on information dynamics, which comprises the following steps: s1, initial community identification: s11, inputting an undirected network graph G= (V, E); s12, initializing the information I of each node v in the network v Calculating Jacquard similarity coefficient S between nodes uv And the connection strength H uv The method comprises the steps of carrying out a first treatment on the surface of the S13, calculating average similarity avg_S (v) and average avg_D (v) of neighbor nodes of the node v; s14, simulating an information dynamics interaction process between nodes by using an information dynamics model until an equilibrium state is reached; s15, performing community division according to the information quantity, and dividing neighbor nodes with the same information quantity into the same communities; s16, outputting an initial community C; s2, incremental community detection; s21, extracting changed subgraph delta G i The method comprises the steps of carrying out a first treatment on the surface of the S22, detecting subgraph delta G i Corresponding community delta C i The method comprises the steps of carrying out a first treatment on the surface of the S23, calculating unchanged community C' i‑1 The method comprises the steps of carrying out a first treatment on the surface of the S24, calculating a time slice T i Community C of (2) i The method comprises the steps of carrying out a first treatment on the surface of the S25, repeating the steps S21 to S24 until all the time slices T i And (5) finishing detection.

Description

Dynamic community detection method based on information dynamics
Technical Field
The invention relates to the field of information transmission, in particular to a dynamic community detection method based on information dynamics.
Background
The community structure is the most remarkable structural feature of a complex network, vertexes in the network can be naturally divided into a plurality of groups, the connection between the vertexes in the same group is relatively dense, and edges between the vertexes in different groups are relatively sparse, wherein each group is a 'community'.
In complex networks, communities often correspond to functional units of the network. For example, the same subject matter Web page groupings in the WWW network; functional modules in the protein molecule interaction network, metabolic channels in the metabolic network; a group of people in a social network that share common features, such as a research team made up of scientists in the same research direction in a network of scientists' partnerships, a terrorist organization in a terrorist network, etc. The community structure of the network is detected, the functions of the network and related components can be explored, inferred and predicted through the structural features, the performance bottleneck of the network can be identified, the performance of the network is improved, the service quality of the network is improved, and the evolution mechanism and the dynamic behavior of the network can be explored. Therefore, the research of community detection not only has important theoretical research significance, but also has strong practical application value.
Unlike static networks, dynamic networks are also evolving in structure over time. Nodes and edges in a dynamic network may both develop or disappear over time, and also lead to a continual change in the community structure in the network. The change state of the community structure in the dynamic network mainly comprises: new communities are generated, communities are increased, communities are contracted, communities are combined, communities are divided and communities are eliminated.
The existing dynamic network community detection method is complex in structure, and can only detect single time slices in sequence when detecting the community structure of a certain time slice, so that the calculation complexity is high, the calculation steps are complex, and the detection efficiency of the dynamic community is seriously affected.
Disclosure of Invention
The invention aims to solve the problems and provide a dynamic community detection method based on information dynamics, which reduces complexity and improves detection efficiency.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a dynamic community detection method based on information dynamics comprises the following steps:
s1, initial community identification:
s11, inputting an undirected network graph G= (V, E);
s12, initializing the network Information I of each node v v Calculating Jacquard similarity coefficient S between nodes uv And the connection strength H uv
S13, calculating average similarity avg_S (v) and average avg_D (v) of neighbor nodes of the node v;
s14, simulating an information dynamics interaction process among nodes in the network by using an information dynamics model until an equilibrium state is reached;
s15, performing community division according to the information quantity among the nodes, wherein neighbor nodes with the same information quantity are divided into the same communities, and nodes with different information are divided into different communities;
s16, outputting an initial community C;
s2, incremental community detection;
s21, extracting changed subgraph delta G i
S22, detecting subgraph delta G i Corresponding community delta C i
S23, calculating unchanged community C i-1
S24, calculating a time slice T i Community C of (2) i
S25, repeating the steps S21 to S24 until all the time slices T i And (5) finishing detection.
Further, in the step S12, the information I of the node v in the network is initialized v The time calculation formula is as follows:
wherein D is V Is the degree of the node, D max Is the maximum degree of the undirected network G.
Further, in the step S12, a Jacquard similarity coefficient S between nodes is calculated uv The formula of (2) is:
wherein Γ (u) is a set of neighbor nodes of node u and comprises node u, Γ (v) is a set of neighbor nodes of node v and comprises node v.
Further, in the step S12, the inter-node connection strength H is calculated uv The formula of (2) is:
wherein T is u Is the number of triangles owned by node u, N (u) is the set of neighbor nodes to node u but does not contain node u, and N (v) is the set of neighbor nodes to node v but does not contain node v.
Further, in the step S14, the information between the nodes is simulated by performing loop iteration using the information dynamics model, and when the information quantity propagated between any neighboring nodes is smaller than a threshold value, the information propagated between the nodes in the network is considered to reach an equilibrium state, and the loop is ended.
Further, the information dynamics model is constructed according to the propagation probability, the propagation quantity and the information loss of the information;
the propagation probability calculation formula of the information is as follows:
wherein,for node u i And the probability of information propagation between v, N (v) being the set of neighbor nodes to node v but not including nodes v, u i ∈N(v),/>For node u i Jaccard similarity coefficient to v;
in order to simulate the real propagation process, adjacent nodes are selected according to probability to propagate, and RN (v) is set as a node set selected for propagation, and is defined as follows:
wherein,probability interval representing each node selected, +.>Is defined as follows:
where ω is the number of neighbor nodes of node v, i.e., |n (v) |=ω;
The propagation quantity calculation formula of the information is as follows:
I u→v =f(I u -I v )S uv H uv
wherein I is u→v For the information quantity obtained by the node u from the neighbor node v thereof, u epsilon RN (v), S uv Is Jaccard similarity coefficient of nodes u and v, H uv The connection strength of the node v and the node u; f (·) is a coupling function representing the amount of information that propagates between nodes u and v; the definition of the coupling function f (·) is as follows:
the loss amount calculation formula of the information is as follows:
I ( u→v)_cost =λf(I u -I v )*(1-S uv );
wherein I is (u→v)_cost Represents the amount of information loss, λ is the degree of information loss, and is defined as follows:
when the information dynamics model is subjected to loop iteration, the information dynamics equation of the node v changing along with time is defined as follows:
I t+1 =I t +I in
wherein I is t+1 Update information representing the time of time step t+1, which is obtained from the previous time step I t Adding t+1 time steps to new information I obtained from its neighbor node in Obtained, I in Is defined as follows:
wherein (I) u→v -I (u→v)_cost )≥0。
Further, the changed sub-graph ΔG in step S21 i Including adding nodes, deleting nodes, adding edges and deleting edges; the set of incremental nodes is expressed as:
wherein V is i Representing a network G i Node set of (V) i-1 Representing a network G i-1 Node sets in (a);
the set of deleted nodes is expressed as:
wherein V is i Representing a network G i Node set of (V) i-1 Representing a network G i-1 Node sets in (a);
the set of added edges is:
wherein E is i Representing a network G i The set of middle edges E i-1 Representing a network G i-1 A set of middle edges;
the set of deleted edges is:
wherein E is i Representing a network G i The set of middle edges E i-1 Representing a network G i-1 A set of middle edges.
Further, in the step S22, a sub-graph Δg is detected i Corresponding community delta C i Steps S11 to S15 are repeated.
Further, in the step S24, community C i The calculation formula of (2) is as follows:
C i =C i-1 +ΔC i
wherein C is i-1 For unchanged communities, ΔC i Subgraph ΔG for change i A corresponding community.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides a dynamic community detection framework based on information dynamics, which firstly identifies a community structure of an initial time window and then incrementally detects a community structure of a subsequent time window. The framework uses a batch processing form to calculate a local subgraph of each time slice network, wherein the local subgraph possibly has a changed structure, so that the framework has a faster processing speed. Since only a small number of partial subgraphs change in the dynamic network, the information exchange in the subgraphs can reach the convergence state quickly. The time complexity of the subsequent time slices of the method is O (|DeltaE) i |+L·|ΔV i |·k i ) Due to the varying edge number ΔE i Node number DeltaV i Average degree k i And the iteration times L are smaller, so that the detection speed is high, and the method can be used for a large-scale network. The invention adopts the dynamic LFR model to generate the network and the real network to carry out comprehensive experimental evaluation. Experimental results show that the method can better identify the community structure in the dynamic network, and is superior to other detection methods on most networks.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1a is a diagram of community generation and death evolution;
FIG. 1b is a graph of community growth and contraction evolution;
FIG. 1c is a graph of community merging and splitting evolution;
FIG. 2 is a diagram of a dynamic community detection framework;
FIG. 3 is a graph of community detection based on information dynamics;
FIG. 4 is a graph of evolution of additional nodes within a community;
FIG. 5 is an evolution diagram of adding nodes between communities;
FIG. 6 is a graph of the evolution of delete nodes within a community;
FIG. 7 is a diagram of the evolution of deleted nodes between communities;
FIG. 8 is an enhanced edge evolution diagram within a community;
FIG. 9 is an enhanced edge evolution diagram between communities;
FIG. 10 is a diagram of intra-community delete-edge evolution;
FIG. 11 is a diagram of delete edge evolution among communities;
FIG. 12 is a graph of NMI performance metrics for several comparison algorithms over different transition probabilities;
FIG. 13 is a graph of ARI performance metrics for several comparison algorithms at different transition probabilities;
FIG. 14 is a graph showing NMI performance metrics for several comparison algorithms over different mixing parameters;
FIG. 15 is a graph showing ARI performance metrics for several comparison algorithms over different blend parameters;
FIG. 16 is a graph showing NMI performance metrics for several comparison algorithms at different averages;
FIG. 17 is a graph showing ARI performance metrics for several comparison algorithms at different averages;
FIG. 18 is a graph showing NMI performance metrics for various comparison algorithms over different numbers of newly added and lost communities;
FIG. 19 is a graph showing ARI performance metrics for various comparison algorithms over different numbers of newly added and lost communities;
FIG. 20 is a graph showing NMI performance metrics for various comparison algorithms over various numbers of expanding and contracting communities;
FIG. 21 is a graph showing ARI performance metrics for various comparison algorithms over various numbers of expanding and contracting communities;
FIG. 22 is a graph of NMI performance metrics for each comparison algorithm over community merge and split events;
FIG. 23 is a graph of ARI performance metrics for each comparison algorithm over community merge and split events;
fig. 24 is a schematic diagram of NMI indicator performance of each comparison algorithm on a real network;
FIG. 25 is a graph showing ARI indicator performance of each comparison algorithm on a real network;
FIG. 26 is a diagram showing the run time of each comparison algorithm on different scale generation networks;
FIG. 27 is a frame flow chart of initial community identification;
FIG. 28 is a block diagram of incremental community detection.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, modifications, equivalents, improvements, etc., which are apparent to those skilled in the art without the benefit of this disclosure, are intended to be included within the scope of this invention.
1 Community Structure in dynamic network
The change state of the community structure in the dynamic network mainly comprises: new communities are generated, communities are increased, communities are contracted, communities are combined, communities are divided and communities are eliminated.
Figure 1 illustrates several states of community evolution in a dynamic network. The definition of dynamic network formalization is given below. The present invention represents a dynamic network of l time slices or snapshots as dg= { G 0 ,G 1 ,G 2 ,…,G l }, wherein G t =(V t ,E t ) Represents a network of time slices of t, and has (0.ltoreq.t.ltoreq.l). The subgraph of the changes in the network is defined as ΔG t . The purpose of dynamic community detection is to reveal implicit community structure in the network under different time snapshots. Definition dc= { C 1 ,C 2 ,…,C l The community structure of the time slice network DG, wherein,representing a t-th time slice network G t Is divided into communities. The dynamic community detection is to identify the community structure DC of each time slice in the dynamic network DG of one time slice or snapshot. In a dynamic network DG, it is assumed that the evolution of each time slice network is smooth, and only a small local amount of structure changes, so that the community structures of adjacent time slice networks are closely related, and most of the community structures may not change. Therefore, the incremental community detection method is more suitable for dynamic community detection. Set C t ' shows the unchanged community structure of t-1 time slice, delta C t The community structure showing the change of t time slices is that
C t =C′ t-1 +ΔC t 。 (1)
The goal of the dynamic community detection of the present invention is to pass through a network ΔG that changes t Incremental identification of changed community structure deltac t Thereby obtaining the community structure in the t-time slice network.
Dynamic community discovery based on information dynamics
2.1 related definitions
Before elaborating the proposed algorithm, some basic definitions are formalized, which will be used hereinafter. Table 1 describes all key symbols used herein and is briefly described.
Definition 1 (Jaccard similarity coefficient) given an undirected network g= (V, E), jaccard similarity coefficients [203] for nodes u and V are defined as follows:
where Γ (u) is the set of neighbor nodes of node u, which includes node u and its neighbors N (u).
The symbol set used in Table 1
(symbol) Definition of the definition
n Number of nodes in network G (n= |v|)
m Number of edges in network G (m= |e|)
D v Degree of node v
D max Maximum degree of nodes in network G
Avg_D(v) Neighbor node of node vAverage degree of (2)
N(v) Neighbor node of node v
Avg_S(v) Average similarity of neighbor nodes of node v
Puv Probability of information propagating from node u to node v
Tv The number of triangles owned by node v
Huv Connection strength of node v to node u
Iv Information owned by node v
In the real world, the personal network often includes strong and weak relationships, which play an important role in community formation and information dissemination. To characterize this relationship, the present invention uses a triangle structure to define the contact strength, as the triangle structure may reflect the degree of connection tightness between nodes.
Definition 2. (connection strength) given undirected network g= (V, E), the connection strength of node V to node u is defined as follows:
wherein T is u Is the number of triangles owned by node u, and the intersection of sets N (u) and N (v) represents the triangles shared by nodes u and vNumber of parts. H can be observed by the formula (3) uv And H vu It is not necessarily equal, that is, the connection strength of the node v to the node u is not necessarily equal to the connection strength of the node u to the node v. This is also true in real life, for example, where the strength of the connection between a celebrity who has been academic for many years and an average person who has just entered the academic circle is different, because the former may be connected to a significantly greater number of researchers than the latter, the former having a significantly greater impact on the latter than the latter.
What is the information of the node in the network, how should it be represented? In a real-world social network, the more friends a person has, the more information he gets. Inspired by information exchange in a real social network, the method adopts the degree of the node as initial information.
Definition 3. (info) given undirected network g= (V, E), the information of node V is defined as follows:
where Deg (v) is the degree of node v, and D max Is the maximum degree of the network G. Let I max Information representing the maximum degree node in the network G, I can be known by equation (4) max =1, the node with the greatest degree has the greatest amount of information.
2.2 information dynamics model
According to the research on the information exchange mode between people in the real world, it can be observed that each node can acquire information from its neighbor nodes and propagate the information to its neighbor nodes with a certain probability. Propagation is highly dependent on its local topological properties and characteristics, such as the degree of nodes, the similarity between nodes, the strength of connections between nodes, etc. In addition, information loss should also be considered for information dissemination. Therefore, the invention constructs the information propagation model according to the information propagation probability, the propagation quantity and the information loss.
1) Propagation probability. Information owned by a node may be propagated with a certain probability to its directly connected neighborsNode [204,205 ]]. The higher the similarity between two nodes, the higher the probability of propagation. Formally, let N (v) denote the set of neighbor nodes of node v, and let u i E N (v). Set S uiv For node u i Jaccard similarity coefficient to v. Definition P uiv For node u i And v, the probability of information propagation is as follows:
in order to simulate the real propagation process, the invention selects some adjacent nodes to propagate according to probability, namely, the higher the similarity between the nodes is, the higher the probability of being selected is. Let RN (v) be the set of nodes selected for propagation, defined as follows:
RN(v)={u i ∈N(v)|Random_dec()∈J ui }, (6)
wherein J is ui Representing the probability interval that each node is selected. J (J) ui Is defined as follows:
where ω is the number of neighbor nodes of node v, i.e., |n (v) |=ω. Random_dec () is a Random function that generates a corresponding number of Random decimal numbers (log) based on the logarithm of the degree of node v 2 deg(v))。
2) Information propagation amount. The amount of information propagated is determined by the information difference between the two nodes, the node similarity and the strength of the connection between the nodes. Formally, set I u→v Representing information obtained by node u from its neighbor node v, where u e RN (v).
I u→v =f(I u -I v )S uv H uv , (8)
Wherein S is uv Is Jaccard similarity coefficient of nodes u and v, H uv Is the connection strength of node v and node u. The function f (·) is a coupling function that represents the distance between nodes u and vInformation propagated. The coupling function f (·) is defined as follows:
through the coupling function, the node with larger information quantity can be seen to be more easy to propagate and influence the node with smaller information quantity.
3) And (5) information loss. In order to reflect real world information dissemination, loss of information is also considered in the information dynamics dissemination model, which depends on the topology characteristics of the network and the amount of information transmitted. Set I (u→v)_cost Representing information loss, defined as follows:
I (u→v)_cost =λf(I u -I v )*(1-S uv ), (10)
wherein the method comprises the steps of
The degree of information loss is determined by the average similarity and average degree of the local nodes, and is automatically set according to the structural characteristics of the network without manual setting.
Finally, the information dissemination process is performed iteratively. In each step, each node updates its information based on the information of the neighboring nodes. By considering the interaction pattern together, the information dynamics equation of the node v over time is defined as follows:
I t+1 =I t +I in , (12)
wherein I is t+1 Update information representing the time of time step t+1, which is obtained from the previous time step I t Adding t+1 time steps to new information I obtained from its neighbor node in 。I in Is composed of two parts (actually propagated information and information loss), and is specifically expressed as follows:
wherein (I) u→v -I (u→v)_cost ) And is more than or equal to 0. That is, the information possessed by each node during propagation is not reduced. Over time, I in And tends to zero. Finally, information in the network is propagated to reach a stable state, and information dynamics in the network also reach a convergence state, so that a community structure in the network can be naturally and intuitively revealed according to the information quantity of each node.
2.3 dynamic Community discovery framework
Based on an information dynamics model, the section starts to construct a dynamic community detection framework based on information dynamics, and the dynamic community detection framework comprises two parts: initial time community detection and incremental community detection. Fig. 2 illustrates the basic principle of dynamic community detection based on information dynamics. First at T 0 Acquiring initial community structure C by utilizing information dynamics at moment 0 Then T is carried out i Dynamic community detection of time of day, wherein the key is to acquire changed subgraph delta G in network i Detecting a changed community structure deltac i . Next, the present invention will describe a specific procedure of dynamic community detection based on information dynamics.
(1) Initial time community detection
Initial community detection mainly comprises simulation of information dynamics in a network and identification of a community structure, as shown in fig. 3.
1) And (5) information dynamics simulation. According to the information propagation model, the information of each node is calculated, and the method mainly comprises the following steps. At the start time (t=0), each node starts with the initial information without any information interaction. The information then propagates through the network and the nodes constantly interact with each other. The amount of information propagated is determined by the information differences between nodes, node similarities and connection strengths. In addition, the loss in the information propagation process needs to be considered. In each iteration, each node updates its own information based on the information of its neighbor nodes. Finally, over time, the information in the network propagates to a steady state due to topology driven effects, and the information of all nodes no longer changes.
2) And (5) identifying a community structure. After the information dynamics process in the network reaches a convergence state, the information values of the nodes in the same community are equal, and the information values of the nodes in different communities are different. Therefore, the community structure in the network can be naturally revealed by distinguishing different information values of the nodes.
(2) Incremental community detection
Initial T using information dynamics method 0 After the time-of-day community structure detection, the following time slice (T i Moment) network adopts an incremental community discovery mode. T (T) i A community structure of time from the previous time T i-1 Unchanged community structure plus T i The community structure of time change is shown in formula (1). The incremental community discovery specific process comprises:
1) Extracting changed subgraph ΔG i
Obtaining a changed community structure deltac i First, it is necessary to extract the changed subgraph Δg i It is generated by some specific network event. The present invention next analyzes network events that may cause changes in the community structure, including adding nodes, deleting nodes, adding edges, and deleting edges.
(1) Adding nodes
Adding nodes refers to comparing previous time slice networks G i-1 At the current time slice network G i Newly added nodes. The increased set of nodes in the network is denoted as
Wherein V is i Representing a network G i Node set of (V) i-1 Representing a network G i-1 Is a node set in (a). By solving two sets V i And V is equal to i-1 The difference set of (2) may calculate the set of added nodes.
If the newly added node is in the community, namely the neighbor nodes of the node are all in the same community, the original community structure is not changed by the added node. As shown in fig. 4, adding nodes inside the community increases the connection density inside the community, so only newly added nodes need to be added to the current community.
If the newly added node is not in the community, i.e. its neighbor nodes are not in the same community, as shown in fig. 5, the newly added node may cause a change in the community structure. This requires recording of newly added nodes and communities involved, adding newly added nodes and connected communities to possibly changing sub-graph set ΔG i Is a kind of medium.
(2) Deleting nodes
Deleting a node refers to comparing with the previous time slice network G i-1 At the current time slice network G i And the node disappeared. The deleted node set in the network is represented as
By calculating a set V i-1 And V is equal to i Can obtain the current time slice network G i Node sets deleted in the hierarchy. Deleting nodes within a community may cause a change in the community structure, as shown in fig. 6, when deleting the node 16, the original community is split into two communities.
Deleting nodes between communities may also cause a change in community structure, as shown in fig. 7, where node 5 connects two communities, resulting in a splitting of communities when node 5 is deleted.
Therefore, when deleting nodes, nodes inside communities and among communities can cause structural changes, so the deleted nodes and the communities involved need to be added to the subimage set delta G which can be changed i Is a kind of medium.
(3) Increased edge
As with the addition of nodes, the addition of edges refers to comparison to the previous time slice network G i-1 At the current time slice network G i Newly added edges in (3). Similarly, an increased set of edges in a network may be represented as
Wherein E is i Representing a network G i The set of middle edges E i-1 Representing a network G i-1 A set of middle edges. By solving two sets E i And E is connected with i-1 The difference set of (2) may calculate an increased set of edges.
The edges are added in the communities, and the community structure cannot be changed. As shown in fig. 8, an edge is newly added between the nodes 7 and 9, and the modularity of the community and the clustering coefficient of the nodes are increased. Therefore, the community is not processed.
Adding edges between communities may cause changes in community structure, as shown in FIG. 9, adding edges between points 2,3,9 results in the merging of communities, thus requiring the addition of the current edge and the communities involved to potentially changing sub-graph set ΔG i Is a kind of medium.
(4) Delete edge
Deleting edges refers to comparing to the previous time slice network G i-1 At the current time slice network G i The edges disappearing in the middle. The deleted edge set in the network is represented as
By calculating set E i-1 And E is connected with i Can obtain the current time slice network G i Edge sets deleted in (a).
The deletion of edges inside communities may cause changes in community structure. As shown in FIG. 10, deleting an internal edge results in a splitting of communities, thus requiring the addition of the current edge and the communities involved to potentially changing sub-graph set ΔG i Is a kind of medium.
The edges are deleted among communities, so that the community change is not caused. As shown in fig. 11, the edge deletion between communities does not affect the original community structure.
2) Re-detecting sub-graph Δg i Corresponding community structure delta C i
Subgraph ΔG i Corresponding to time slice T i Possibly changed sub-networks in (a), re-use of the information dynamics versus deltag i Incremental community discovery is carried out to obtain a corresponding community structure delta C i
3) Calculating unchanged community structure C i-1
According to the above description, the time slice T has been obtained i-1 Community structure C of network i-1 . By step 1) knowing the time slice T i Subgraph ΔG that may change in a network i Communities that may change may be calculated. Next, the time slice T i-1 Community structure C of network i-1 Removing the possibly changed community structure to obtain unchanged community structure C i-1
4) Calculating a time slice T i Is a community structure C of (1) i
Time slice T i Is composed of C i Unchanged network structure in time slice network plus T i A community structure composition that changes at that time. Namely C i =C i-1 +ΔC i
5) Repeating the steps 1) to 4) until all time slices are detected.
Compared with a general real-time incremental algorithm, the incremental method based on information dynamics provided by the invention adopts batch processing instead of adding an event to process once, and has the advantage of improving the detection efficiency of communities. In contrast, when a single event is processed, different processing sequences may result in different detection results, and the detection efficiency of the community is also affected.
2.4 dynamic Community discovery algorithm based on information dynamics
The dynamic community detection method DCDID based on information dynamics will be described in detail in this section. The implementation process mainly comprises the steps of identifying a community structure in an initial stage, calculating changed subgraphs, identifying incremental communities and merging communities.
(1) Initial community identification
Initial societyThe structure of the clusters, T 0 Community division of the network at the time of window. Since the initial time slice has no prior community structure information, community detection needs to be performed on the whole network. The invention adopts a community discovery method CDID based on information dynamics to identify T 0 When the community structure of the network is initialized, the implementation process of the CDID method refers to a flow 1.
As shown in fig. 27, the specific process of the flow 1 is:
1) Input graph g= (V, E);
2) Initializing information I of each node v in the network according to formula (4) v Calculating the Jacquard similarity coefficient S between the nodes according to the formulas (2), (3) uv And joint strength H uv
3) Calculating average similarity avg_s (v) and average avg_d (v) of neighbor nodes of the node v;
4) Simulating an information dynamics interaction process between nodes in a network by using an information dynamics model until an equilibrium state is reached; the specific method is as follows: performing loop iteration to serve as an information dynamics model according to formulas (8) - (12), simulating information propagation among nodes, and if the information quantity propagated among any neighbor nodes is smaller than a threshold value, considering that the information propagation among nodes in a network reaches an equilibrium state, and ending the loop;
5) Carrying out community division according to the information quantity among the nodes, iteratively dividing the nodes in a depth-first mode, dividing neighbor nodes with the same information quantity into the same communities, and dividing nodes with different information into different communities;
6) Outputting a community C.
(2) Changed subgraph
The invention adopts an incremental detection method, and a changed subgraph is required to be acquired before incremental detection is carried out. The operation that may cause the community structure to change is analyzed, and events for changing the network are divided into four types of adding nodes, deleting nodes, adding edges and deleting edges, and each type of event returns a sub-graph that may change.
As shown in fig. 28, the specific process of the flow 2 is:
1) Time series diagram dg= { G 0 ,G 1 ,G 2 ,…G t };
2) Computing an initial community C according to Process 1 (CDID) 0
3) Circularly calculating a community structure of each time slice; the specific method comprises the following steps:
(1) calculating added nodes, deleted nodes, added edges and deleted edges in the current time slice network according to formulas (14) to (17);
(2) calculating a subgraph ΔG of changes caused by adding deleted nodes and deleted edges i
4) According to subgraph DeltaG i Calculating a changed community DeltaC using Process 1 i
5) According to DeltaG i And C i-1 Calculate unchanged community C i-1
6) Community deltac to be changed i And a changed community C i-1 Merging to obtain community C of the current time slice i
7) Repeating 3) -6) to calculate a community of next time slices.
(3) Incremental community detection
Most incremental dynamic community detection methods at present adopt a single event fine granularity processing method after obtaining an initial community, and an event is processed by generating the event. For example, a node is added to the network, and changes in the community structure caused by the node are handled. The benefit of such a design is real-time processing, which has the disadvantage of increasing computational complexity, and the different order of event processing may affect the quality of community detection. Therefore, the invention adopts a batch type incremental community detection method. After obtaining the subgraph with possibly changed structure, re-carrying out community division on the subgraph delta G by using a community detection method based on information dynamics to obtain a changed community structure delta C i . See scheme 2 for a specific implementation.
(4) Community merger
From the obtained possibly changed subgraphs and the known community structure of the previous time window, unaffected can be calculatedCommunity structure deltac i '. New community partitioning DeltaC of changed subgraphs combined i Combining to obtain a community structure C of the current time window i . See scheme 2 for a specific implementation.
3. Experiment
3.1 comparison Algorithm
QCA is a modularity optimization algorithm based on Louvain. The QCA adaptively updates and discovers new community structures according to changes in network structure (add-subtract edges or add-subtract nodes) and network information of previous time slices.
Facenet is a method for analyzing community structure and evolution in dynamic networks based on non-negative matrix factorization. The quality of the detected community structure is optimized by introducing a loss function. The facenet method requires parameter settings such as the number of communities.
DYNMOGA is a multi-objective optimized genetic algorithm based on evolutionary clustering, and detects community structures in a dynamic network by optimizing Modularity and NMI. The DYNMOGA algorithm also requires parameter settings.
DyPerm is an optimization method based on stability, and the algorithm also belongs to an incremental dynamic community detection method. The DyPerm method requires specifying the actual community structure at the beginning.
InBatch is an incremental batch dynamic network community detection method, and the algorithm is also based on the Louvain method to process the changed structure in batches, rather than real-time processing according to the events of the changed nodes and edges.
LBTR is an incremental dynamic community detection method based on machine learning, which also adopts a Louvain method to obtain an initial community structure, and then uses a machine learning method to conduct classification prediction and correction.
3.2 experimental data
In order to comprehensively evaluate the proposed DCDID algorithm, the dynamic community detection effect of the DCDID method is evaluated by adopting the generated network and the real network data sets, and the two data sets are briefly introduced respectively.
(1) Generating a network
Greene et al propose an extended LFR generation model for generating dynamic networks. The method provides for dynamic events in a plurality of parameter control networks. For example, nodes switch between community structures of each time slice network, new communities are generated and disappeared, communities grow and shrink, communities merge and split, and so on. The invention also adopts the method to generate a data set for evaluating the detection quality of the dynamic communities, and the detailed parameter description of the reference generation model is shown in the table 2.
TABLE 2 parameter description of dynamic LFR reference generative model
(symbol) Description of the invention
n The number of nodes of the generation network (n= |v|)
s Generating a number of network time slices
Mixing parameters to control definition of community structure
k Average degree of each time slice network
maxk Maximum degree of each time slice network node
C min Minimum size of each time slice network community
C max Maximum size of each time slice network community
p Probability of nodes transitioning between communities of time slice networks
birth Number of communities generating new events for each time slice
death Number of communities generating extinction events for each time slice
expand Number of communities generating growth events for each time slice
contract Number of communities generating shrinkage event for each time slice
r Ratio of each time slice growth event to contraction event
merge Number of communities for event merger for each time slice
split Number of communities for each time slice splitting event
(2) Real network
The real network data set can evaluate more trulyThe invention selects several representative real networks with different scales, and the networks also contain real community division information. Table 3 lists the characteristic properties of these networks, wherein,represents the average number of nodes per time slice, +.>Represents the average edge number, +. >Represents average degree->Representing the average cluster coefficient, S represents the number of time slices of the dynamic network. All of these datasets can be downloaded on the relevant website. Specifically, HSD11, HSD12, PS, CW datasets may be obtained at the social model website (http:// www.sociopatterns.org/datachasets /), CC and NCC datasets may be obtained at the quotation web site (https:// www.aminer.cn/rotation), and CPC datasets may be obtained at the IEEEVAST2008 CHALENGE website (http:// www.cs.umd.edu/hcil/VASTchalen-ge 08 /). Next, these real world network datasets are briefly introduced.
2011 highschoolddynamicontactnetwork (HSD 11) the dataset contains a time-series network of links between three class students in the first high of france mosaic, 12 months 2011. Nodes represent students, edges represent that there is a relationship between students, and classes represent a true community classification.
2012 highschoolddynamicontactnetwork (HSD 12) this data is also a time-series network of links between the high school students in the first-to-high school of france, which was collected at 11 months 2012, containing five classes, and other information is consistent with the HSD11 description.
Table 3 statistical properties of several real networks
Primaryschoolcontactnetworks (PS) the data set contains a contact timing network between the child and the teacher. Each child or teacher corresponds to an ID representing a node, and contacts between IDs represent edges. Each interval of 20 seconds serves as a time window.
Contactnetworkinaworkplace (CW) the data set comprises a time series network of contact between individuals measured in one office building in France between 24 and 3 days of 2013, 6 and 7 [250] . The time sequence network comprises 5 departments as real community information, and the contact condition among people at intervals of 20 seconds is recorded.
Cumulativeco-authorshipnetwork (CC) the data set is a thesis collaboration network derived from the quotation database designed by Chakraborty et al [251] . The data set used in the present invention is described in literature [252 ]]Finishing and modifying. Nodes in the network represent authors who published papers, and if two authors published articles together, there is a border between them. The papers belong to the field as true community information.
Non-simultaivco-authorshipnetwork (NCC) this data set is also a paper collaboration network as is the CC data set. CC accumulates the changes of each node and edge, that is, if the same two authors partnership papers are several times, several edges are generated between them. The NCC dataset is diametrically opposed, with only one edge, no matter how many times it is in collaboration.
3.3 evaluation index
The current evaluation method for the dynamic community detection effect is to evaluate the detection result of each time window one by one according to time slices, and the adopted evaluation method is consistent with the detection of a single static network community. The invention adopts two widely used evaluation indexes for evaluation. Normalized Mutual Information (NMI) is employed on generating the network data set and the real network data set with community information, randomly adjusting the index (ARI).
(1) Normalizing mutual information [129] (NormalizedMutualInformation)
Normalized mutual information is a similarity measure that is derived from information theory. This approach considers that if two partitions are more similar, less additional information is needed to infer the allocation of the other partition. The definition is as follows:
wherein I (X; Y) represents mutual information between X and Y, H (X) represents entropy of X, and NMI value ranges from 0 to 1. Nmi=0 when the predicted community division is completely independent of the actual community division. In contrast, nmi=1 when the predicted community matches exactly the real community partition.
(2) The ARI index, which is another measure of similarity between two clusters, is defined as follows:
where RI is a similarity measure between two partitions, which considers all pairs of samples. Evaluating quality of community detection by calculating the number of identical and different pairs of communities assigned to predictions and real communities [210] . In the concrete form of
Wherein n is ij ,a i And b j Is a value in the association table.
3.4 analysis and discussion of experimental results
In the performance evaluation experiment of this section, if the comparison algorithm has parameter settings which are all recommended by the author, all algorithm community detection results are averaged after 10 independent runs. All algorithmic comparison experiments were performed on a desktop computer configured to: CPU is Intel Corei5, main frequency is 3.3GHz, and memory is 16GB.
(1) An assessment on the network is generated.
The changes of the community structure in the dynamic network mainly comprise: the transition of nodes in the network, the generation and extinction of new communities, the growth and contraction of communities, the merging and splitting of communities and the like. In order to evaluate the performance of the proposed dynamic community detection method, the present invention employs a dynamic LFR reference model to generate a plurality of synthetic networks with different characteristics. In order to cover a plurality of states of the dynamic network community structure change, the invention evaluates four aspects of node conversion among communities, newly added and lost communities, growth and shrinkage of communities, combination of communities and splitting of communities. These several states specifically generate parameter settings as: dynamic network time slice s=20, node number n=1000, average degree k= [5-25], maximum degree maxk= [20-50]. See table 2 for a description of parameters for the dynamic LFR reference model. The DyPerm algorithm takes real community information as the community information of the initial time slice, so that the initial time slice community detection effect of the algorithm is not in comparison.
1) Inter-community node conversion.
Inter-community node transitions refer to nodes transitioning from one community to another in a dynamic network at different time slices. The probability of a node transitioning between communities is represented in the dynamic LFR model by the parameter p. Here, the influence of the transition probability p, the average degree k, and the mixing parameter μ on the community detection is tested separately. First, k=10, maxk=20, μ=0.1, and the transition probability p is changed from 0.1 to 0.8. Fig. 12 shows NMI performance indexes of several comparison algorithms at p of 0.1,0.4 and 0.8, respectively, and it can be seen that the algorithm DCDID and algorithm DYNMOGA proposed by the present invention perform optimally, and the NMI value obtained on each time slice is about 0.95. The DCDID method is proved to have more stable community detection performance along with the time evolution. The facenet algorithm also performed well, with NMI values up to 0.9. However, the facenet algorithm requires that the number of communities in the network be specified first, which is often unknown to the network in the real world. DyPerm also exhibited very stable NMI values, which remained substantially around 0.8. QCA, inBatch, LBTR the initial time slice achieves a very high NMI value because these algorithms are all communities that use the Louvain method to detect the initial window. The performance of the algorithms gradually decreases along with the evolution of time, which indicates that the whole community mining relies on the prior community structure detection, causes accumulated errors and gradually enlarges. When p=0.4 and p=0.8, after the time slice is greater than 8, the NMI value obtained by the InBatch algorithm approaches 0.
Fig. 13 shows ARI performance metrics for several comparison algorithms, where the highest ARI value is obtained, as can be seen for DCDID to perform best. The DYNMOGA and facenet algorithms perform well and also achieve higher ARI values. Although DyPerm gave a better NMI score in FIG. 12, its ARI score was very low in FIG. 13. In particular, when p.gtoreq.0.4, the time slice is greater than 10, the value is close to 0. Since the community division granularity of the DyPerm algorithm becomes finer and finer with the increase of time slices, the number of communities divided by the DyPerm algorithm is several tens of times that of real communities. The ARI index value of the QCA, inBatch, LBTR algorithm also drops significantly with increasing time window, especially when p.gtoreq.0.4, and their ARI score is close to 0 when the time slice is greater than 10.
Next the invention fixes k=10, maxk=20, p=0.1, letting the transition probability μ vary from 0.1 to 0.8. Figures 14, 15 show the NMI and ARI performance indicators for each comparison algorithm when μ is 0.2,0.4 and 0.8, respectively. Here μ starts from 0.2, because of the performance effects of the algorithms when given in fig. 12 (a) and 13 (a). It can be seen from fig. 13 that the DCDID method achieves the best NMI and ARI values when μ=0.2 and μ=0.4, and Dyperm achieves the best NMI values when μ reaches 0.8. From a review of fig. 13 it can be seen that the Dyperm algorithm has a value that is always stable at 0.8 during the change of μ from 0.2 to 0.8, but from fig. 14 it can be seen that the Dyperm algorithm has an ARI value that is close to 0 after a time slice of more than 8, because the algorithm divides communities too finely. The DYNMOGA and facetNet algorithms perform well and also achieve higher NMI and ARI values. The NMI and ARI values of the QCA, inBatch, LBTR three algorithms are also increasing with time slices, and their performance gradually decreases, especially the ARI value decreases significantly.
To evaluate the effect of network averaging on the performance of each comparison algorithm, maxk=20, p=0.1, μ=0.1 is fixed, and the transition probability k is varied from 5 to 25. Figures 16 and 17 show the NMI and ARI performance of each algorithm at average levels of 5, 15 and 25, respectively. Through the two graphs, the DCDID, DYNMOGA and FacetNet can obtain good community detection effects on networks with different average degrees, and the obtained NMI and ARI values are highest in the comparison algorithm. The DyPerm method can obtain a relatively stable NMI value, but the ARI value is very low. QCA also achieves better NMI and ARI values when the time slices are less than 10. As the time window increases, the performance of all three algorithms QCA, inBatch, LBTR gradually decreases.
2) Communities are newly added and eliminated.
In order to further detect the recognition effect of the algorithm on the new and the extinction events of communities in the dynamic network, the invention fixes parameters k=10, maxk=20, p=0.1 and μ=0.1, so that the number of new communities and the extinction communities is changed from 2 to 16. The number of communities per time slice in the generated dynamic network is around 80. Fig. 18 and 19 show the detection performance of each algorithm in the state of different numbers of new communities and casual communities. The dynamic LFR cannot generate a network when both the number of new and the number of the extinction communities reach 16, so the present invention sets the highest number of the extinction communities to 8. The DCDID and DYNMOGA algorithms can achieve good community identification effects under different numbers of newly-increased and casual communities, and the NMI value obtained by the DCDID and DYNMOGA algorithms is basically stable at about 0.93. The facenet method cannot run on the dynamic network due to the changing number of communities over time, and is therefore not shown in fig. 18 and 19. The DyPerm method can also obtain a better NMI index, the value of the NMI index is always stabilized at about 0.8, but the ARI value of the algorithm is still very low, and particularly, when the number of newly added and disappeared communities is more than 8 and the time slice is more than 12, the value of the NMI index is basically 0.QCA and LBTR methods also achieve good NMI values, although the number of new communities reaches 16, these two algorithms can also achieve NMI values of 0.5. The InBatch method performed poorly in this set of experiments, with both NMI and ARI values being the lowest.
3) The communities expand and contract.
The invention evaluates the detection effect of algorithm on the expansion and contraction of communities in the dynamic network. The fixed parameters k=10, maxk=20, p=0.1, μ=0.1, varying the number of community expansions and contractions in the dynamic network from 5 to 40. Fig. 20 and 21 show the detection effect when the number of community expansion and contraction in the dynamic network is 5, 20, 40. It can be seen that both the DCDID and DYNMOGA algorithms achieved the highest community detection quality. The NMI value obtained by DCDID is stabilized at about 0.95, and the ARI value is stabilized between 0.8 and 0.9. The facenet method also showed a stable community detection effect with NMI values around 0.84. The DyPerm method is stable in the obtained NMI value of about 0.8, but the obtained ARI value is relatively low, which means that the quality of the community detection is not high. With the continuous evolution of time, the community detection quality of the three algorithms QCA, inBatch and LBTR gradually decreases. Indicating that these several algorithms have a high accumulated error.
4) Communities merge and split.
In order to further evaluate the community detection effect of each comparison algorithm in the dynamic network under the event of community combination and community splitting, the invention changes the number of community combination and splitting in the network from 5 to 40, while keeping the parameters k=10, maxk=20, p=0.1, μ=0.1 constant. Fig. 22 and 23 show the community recognition effect when the number of communities combined and split in the dynamic network is 5, 20, 40 by each comparison algorithm. It can be observed that the DCDID and DYNMOGA methods perform best on the generated network among the several comparison algorithms. When the number of mergers and splits is less than 20, the obtained community detection quality is high and stable, and the NMI value of each time slice is kept at about 0.96. Fig. 22 (c) and 23 (c) show that when the number of communities merge and split in the dynamic network reaches 40, that is, substantially all communities in the network have a change, the community detection quality of DCDID, DYNMOGA and facenet algorithms is also unstable. It can be seen that the NMI and ARI values obtained are lower when the time slice is 7, but the community detection quality of these three algorithms is still higher than the other comparison algorithms. The DyPerm algorithm is also more stable in community detection when the number of community mergers and splits is smaller than 20, and the NMI value is stabilized at about 0.8, but is also unstable when the number of community mergers and splits is increased to 40. The quality of the community division of the three algorithms QCA, inBatch and LBTR gradually decreases with the increase of the time slices. Interestingly, the InBatch algorithm increases the number of community merges and splits to 40, and the NMI value is rather higher at 13, as shown in FIG. 22 (c). The network structure of the current time slice is analyzed, and the community structure under the time slice is clear, so that the number of connecting edges among communities is small. This may also be the reason why the NMI values of the DCDID, DYNMOGA and facenet algorithms have peaks at this time slice.
(2) Evaluation on a real network.
The above experiments evaluate the performance of each algorithm in generating community detection on a dynamic network, and then the invention further evaluates the DCDID algorithm on a real network. The invention adopts six real data sets, and all the data sets contain real community division information, so the invention still adopts NMI and ARI indexes for evaluation. Figures 24 and 25 show NMI and ARI performance of the comparison algorithms on the 6 real networks. It can be seen that DCDID and DyPerm achieved the best quality of community detection on CC and NCC networks, with NMI values around 0.5. The ARI index value of each comparison algorithm is very low on both networks, but the DCDID method achieves a higher ARI value than the other algorithms. As can be seen from table 1, the average of the two networks is low, and the average value of each time slice is 4.3. This demonstrates that DCDID can still achieve better community division quality on low average networks, which further verifies the experimental results in the generated network. Because the two networks contain a large number of nodes, up to 10 ten thousand nodes, the two algorithms DYNMOGA and facenet cannot run on the two networks. The QCA algorithm also achieves more reasonable performance on the two real networks, which is superior to the InBatch and LBTR methods. The community detection effect of the InBatch and LBTR methods on CC and NCC dynamic networks is not ideal and may be related to the low average of the two networks.
The InBatch method performs well on CW, HSD11 and HSD12 dynamic networks, which achieves the best community detection quality among the several comparison algorithms. The DCDID algorithm also achieves a good community detection effect on the three networks, and is superior to the DYNMOGA, QCA, dyPerm algorithm. The best community detection quality is achieved on PS networks compared to other algorithms DCDID and LBTR. The QCA method also achieves better results on the network, with NMI values around 0.7. The DyPerm algorithm has an unsatisfactory community detection effect on CW, HSD11, HSD12 and PS networks, and the obtained NMI and ARI index values are lower than those of other comparison algorithms.
In general, a large number of generated network comparison experiments can observe that the DCDID method provided by the invention obtains good community detection quality under different network event states. The evaluation on the real network shows that the DCDID method not only can obtain good community identification effect on the low-average network, but also can obtain good community detection performance on other networks. The DyPerm algorithm is stable in overall performance of the generated network, but it divides communities too finely, resulting in too low ARI values, and in addition, the algorithm takes real community information as initial information, which is not known in most real networks. The DYNMOGA and facenet methods perform well on the production network, but are spatially complex and cannot operate on the real networks CC and NCC. In addition, the facenet method requires a priori knowledge, which requires setting the number of communities detected. The QCA, inBatch and LBTR all adopt the Louvain method to detect the community structure of the initial time slice network, the Louvain method is a well-known and efficient static community detection method, so that the initial community detection quality of the algorithms is very high, but the performance of the algorithms is obviously reduced along with the time evolution in the generation network, and the accumulated errors of the three algorithms on the generation network are obvious, so that the community detection effect is not ideal. But QCA, inBatch and LBTR methods achieve good community detection performance on real networks, and especially the InBatch method achieves the best detection quality on three real networks, CW, HSD11 and HSD 12.
(3) Run-time analysis
In order to compare the community detection efficiency of the DCDID algorithm and other algorithms on dynamic networks of different sizes, the method adopts a dynamic LFR reference model to generate a plurality of dynamic networks of different sizes. Specifically, the parameter is set to k= [10-20], maxk= [20-50], p=0.1, μ=0.1, s=5, and the node number n is changed from 1,000 to 1,000,000. Fig. 26 shows the overall run time of each algorithm on the resulting 5-slot network. The highest running time for the DyPerm algorithm can be observed, which takes more than 10 days when the number of nodes reaches 5 tens of thousands. The small plot in fig. 26 shows the run time of algorithms other than DyPerm when the node size does not exceed 1 ten thousand. The operation time of the FaceNet and DYNMOGA algorithm is higher than that of the DCDID method, and when the number of nodes reaches 5 ten thousand, the FaceNet and DYNMOGA method can prompt that the memory space is insufficient and can not operate. In general, the DCDID method proposed by the present invention is faster than DyPerm, faceNet and DYNMOGA algorithms, and this advantage is more pronounced especially as the network scale increases, because the DCDID method has lower time complexity. The DCDID method is more time-complex and time-consuming to run than QCA, inBatch, and LBTR. Although these three algorithms require less time than DCDID, their community detection quality is not as high as DCDID.
Summary 4
The invention provides a dynamic community detection framework based on information dynamics, which firstly identifies a community structure of an initial time window through a community detection method based on information dynamics, and then detects the community structure of a subsequent time window in an incremental mode. The framework uses a batch processing form to calculate a local subgraph of each time slice network, wherein the local subgraph possibly has a changed structure, so that the framework has a faster processing speed. Based on the framework, the invention designs a dynamic network community identification algorithm DCDID, and the algorithm detects the community structure based on the thought of information dynamics. Since only a small number of partial subgraphs change in the dynamic network, the information exchange in the subgraphs can reach the convergence state quickly. The time complexity of the subsequent time slices of the DCDID method is O (|Δe) i |+L·|ΔV i |·k i ) Due to the varying edge number ΔE i Node number DeltaV i Average degree k i And the iteration number L are small, so the detectionThe method is fast and can be used for large-scale networks. And finally, the invention adopts a dynamic LFR model to generate a network and a real network to carry out comprehensive experimental evaluation. Experimental results show that the DCDID algorithm can better identify the community structure in the dynamic network, and is superior to the comparison algorithm on most networks.

Claims (7)

1. A dynamic community detection method based on information dynamics is characterized by comprising the following steps of: the method comprises the following steps:
s1, initial community identification:
s11, inputting an undirected network graph G= (V, E);
s12, initializing the information I of each node v in the network v Calculating Jacquard similarity coefficient S between nodes uv And the connection strength H uv
S13, calculating average similarity avg_S (v) and average avg_D (v) of neighbor nodes of the node v;
s14, simulating an information dynamics interaction process among nodes in the network by using an information dynamics model until an equilibrium state is reached;
s15, performing community division according to the information quantity among the nodes, wherein neighbor nodes with the same information quantity are divided into the same communities, and nodes with different information are divided into different communities;
s16, outputting an initial community C;
s2, incremental community detection;
s21, extracting changed subgraph delta G i
S22, detecting subgraph delta G i Corresponding community delta C i
S23, calculating unchanged community C i-1
S24, calculating a time slice T i Community C of (2) i
S25, repeating the steps S21 to S24 until all the time slices T i Finishing the detection;
in the step S14, the information between nodes is simulated by using the information dynamics model in a cyclic iteration mode, and when the information quantity of the information between any neighboring nodes is smaller than a threshold value, the information between the nodes in the network is considered to reach an equilibrium state, and the cycle is ended;
The information dynamics model is constructed according to the information propagation probability, the information propagation quantity and the information loss;
the propagation probability calculation formula of the information is as follows:
wherein,for node u i And the probability of information propagation between v, N (v) being the set of neighbor nodes to node v but not including nodes v, u i ∈N(v),/>For node u i Jaccard similarity coefficient to v;
in order to simulate the real propagation process, adjacent nodes are selected according to probability to propagate, and RN (v) is set as a node set selected for propagation, and is defined as follows:
wherein,probability interval representing each node selected, +.>Is defined as follows:
where ω is the number of neighbor nodes of node v, i.e., |n (v) |=ω;
the propagation quantity calculation formula of the information is as follows:
I u→v =f(I u -I v )S uv H uv
wherein I is u→v For the information quantity obtained by the node u from the neighbor node v thereof, u epsilon RN (v), S uv Is Jaccard similarity coefficient of nodes u and v, H uv The connection strength of the node v and the node u; f (·) is a coupling function representing the amount of information that propagates between nodes u and v; the definition of the coupling function f (·) is as follows:
the loss amount calculation formula of the information is as follows:
I (u→v)_cost =λf(I u -I v )*(1-S uv );
wherein I is (u→v)_cost Represents the amount of information loss, λ is the degree of information loss, and is defined as follows:
when the information dynamics model is subjected to loop iteration, the information dynamics equation of the node v changing along with time is defined as follows:
I t+1 =I t +I in
Wherein I is t+1 Update information representing the time of time step t+1, which is obtained from the previous time step I t Adding t+1 time steps to new information I obtained from its neighbor node in Obtained, I in Is defined as follows:
wherein (I) u→v -I (u→v)_cost )≥0;
Wherein the undirected network graph is a time-series network of links between class students, the nodes represent students, the edges represent that links exist between students, the classes represent real community classifications,
or the undirected network graph is a contact time sequence network between children and teachers, each child or teacher corresponds to an ID to represent a node, the contact between IDs represents a connecting edge,
or the undirected network graph is an article collaboration network, nodes in the network represent authors who published articles, if two authors published articles together, there is a border between them,
or the undirected network graph is a paper cooperation network, and nodes in the network represent authors who published the paper, and the same two authors have only one continuous edge no matter how many times they cooperate.
2. The dynamic community detection method based on information dynamics as claimed in claim 1, wherein: the step S12 initializes the information I of the node v in the network v The time calculation formula is as follows:
wherein D is V Is the degree of the node, D max Is the maximum degree of the undirected network G.
3. The dynamic community detection method based on information dynamics as claimed in claim 2, wherein: calculating the inter-node Jacquard similarity coefficient S in the step S12 uv The formula of (2) is:
wherein Γ (u) is a set of neighbor nodes of node u and comprises node u, Γ (v) is a set of neighbor nodes of node v and comprises node v.
4. The dynamic community detection method based on information dynamics as claimed in claim 3, wherein: calculating the inter-node connection strength H in the step S12 uv The formula of (2) is:
wherein T is u Is the number of triangles owned by node u, N (u) is the set of neighbor nodes to node u but does not contain node u, and N (v) is the set of neighbor nodes to node v but does not contain node v.
5. The dynamic community detection method based on information dynamics as claimed in claim 1, wherein: the changed sub-graph Δg in step S21 i Including adding nodes, deleting nodes, adding edges and deleting edges; the set of incremental nodes is expressed as:
wherein V is i Representing a network G i Node set of (V) i-1 Representing a network G i-1 Node sets in (a);
the set of deleted nodes is expressed as:
wherein V is i Representing a network G i Node set of (V) i-1 Representing a network G i-1 Node sets in (a);
the set of added edges is:
wherein E is i Representing a network G i The set of middle edges E i-1 Representing a network G i-1 A set of middle edges;
the set of deleted edges is:
wherein E is i Representing a network G i The set of middle edges E i-1 Representing a network G i-1 A set of middle edges.
6. The dynamic community detection method based on information dynamics as claimed in claim 5, wherein: the sub-graph Δg is detected in the step S22 i Corresponding community delta C i Steps S11 to S15 are repeated.
7. The dynamic community detection method based on information dynamics as claimed in claim 6, wherein: community C in step S24 i The calculation formula of (2) is as follows:
C i =C i-1 +ΔC i
wherein C is i-1 For unchanged communities, ΔC i Subgraph ΔG for change i A corresponding community.
CN202010178455.8A 2020-03-14 2020-03-14 Dynamic community detection method based on information dynamics Active CN111382318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010178455.8A CN111382318B (en) 2020-03-14 2020-03-14 Dynamic community detection method based on information dynamics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010178455.8A CN111382318B (en) 2020-03-14 2020-03-14 Dynamic community detection method based on information dynamics

Publications (2)

Publication Number Publication Date
CN111382318A CN111382318A (en) 2020-07-07
CN111382318B true CN111382318B (en) 2024-02-02

Family

ID=71215360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010178455.8A Active CN111382318B (en) 2020-03-14 2020-03-14 Dynamic community detection method based on information dynamics

Country Status (1)

Country Link
CN (1) CN111382318B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383422B (en) * 2020-11-04 2021-11-02 浙江大学 Network topology optimization method for accelerating convergence speed of consistency distributed algorithm
CN114743688A (en) * 2022-04-01 2022-07-12 平顶山学院 Disease propagation network detection method based on dynamic community
CN115048436B (en) * 2022-06-01 2024-07-12 优米互动(北京)科技有限公司 Phase division method of high-dimensional financial time sequence based on visual view principle

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469315A (en) * 2015-08-04 2016-04-06 电子科技大学 Dynamic social network community structure evolution method based on incremental clustering
CN106482927A (en) * 2016-10-11 2017-03-08 天津大学 The polynary complex impedance detection information fusion method of two phase flow based on multilayer complex network
CN109063277A (en) * 2018-07-12 2018-12-21 佛山科学技术学院 A kind of dynamic pattern recognition method and device based on gap metric
CN110086670A (en) * 2019-04-29 2019-08-02 安徽大学 Large-scale complex network community discovery method and application based on local neighbor information
CN110334264A (en) * 2019-06-27 2019-10-15 北京邮电大学 A kind of community detection method and device for isomery dynamic information network
CN110660082A (en) * 2019-09-25 2020-01-07 西南交通大学 Target tracking method based on graph convolution and trajectory convolution network learning
CN110796077A (en) * 2019-10-29 2020-02-14 湖北民族大学 Attitude motion real-time detection and correction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7958120B2 (en) * 2005-05-10 2011-06-07 Netseer, Inc. Method and apparatus for distributed community finding
US20160342398A1 (en) * 2015-05-22 2016-11-24 Alan A. Yelsey Dynamic Semiotic Systemic Knowledge Compiler System and Methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469315A (en) * 2015-08-04 2016-04-06 电子科技大学 Dynamic social network community structure evolution method based on incremental clustering
CN106482927A (en) * 2016-10-11 2017-03-08 天津大学 The polynary complex impedance detection information fusion method of two phase flow based on multilayer complex network
CN109063277A (en) * 2018-07-12 2018-12-21 佛山科学技术学院 A kind of dynamic pattern recognition method and device based on gap metric
CN110086670A (en) * 2019-04-29 2019-08-02 安徽大学 Large-scale complex network community discovery method and application based on local neighbor information
CN110334264A (en) * 2019-06-27 2019-10-15 北京邮电大学 A kind of community detection method and device for isomery dynamic information network
CN110660082A (en) * 2019-09-25 2020-01-07 西南交通大学 Target tracking method based on graph convolution and trajectory convolution network learning
CN110796077A (en) * 2019-10-29 2020-02-14 湖北民族大学 Attitude motion real-time detection and correction method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Community detection based on information dynamics;ZeJun Sun et al.;《Neurocomputing》;341-352 *
Identifying Communities in Dynamic Networks Using Information Dynamics;Zejun Sun et al.;《Entropy》;1-25 *
Overlapping Community Detection Based on Information Dynamics;ZEJUN SUN et al.;《IEEE Access》;70919-70934 *
基于网络关系的社交网络群体行为研究;梁霞;《中国优秀硕士学位论文全文数据库 信息科技辑》;I139-241 *

Also Published As

Publication number Publication date
CN111382318A (en) 2020-07-07

Similar Documents

Publication Publication Date Title
CN111382318B (en) Dynamic community detection method based on information dynamics
Cai et al. A survey on network community detection based on evolutionary computation
Li et al. Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms
Li et al. A novel complex network community detection approach using discrete particle swarm optimization with particle diversity and mutation
Bara’a et al. A review of heuristics and metaheuristics for community detection in complex networks: Current usage, emerging development and future directions
Guo et al. Evolutionary community structure discovery in dynamic weighted networks
Kundu et al. Fuzzy-rough community in social networks
Shi et al. A genetic algorithm for detecting communities in large-scale complex networks
Vincent-Cuaz et al. Semi-relaxed Gromov-Wasserstein divergence with applications on graphs
Bortner et al. Progressive clustering of networks using structure-connected order of traversal
Boobalan et al. Graph clustering using k-Neighbourhood Attribute Structural similarity
Gong et al. Identification of multi-resolution network structures with multi-objective immune algorithm
Tansey et al. A fast and flexible algorithm for the graph-fused lasso
Huang et al. Pairwise covariates-adjusted block model for community detection
Sun et al. Dynamic community detection based on the Matthew effect
Shah et al. On Summarizing Large-Scale Dynamic Graphs.
Jiao et al. Generative evolutionary anomaly detection in dynamic networks
CN106815653B (en) Distance game-based social network relationship prediction method and system
CN105162648B (en) Corporations' detection method based on backbone network extension
Laishram et al. On Finding and Analyzing the Backbone of the k-Core Structure of a Graph
Chen et al. Community detection in networks based on modified pagerank and stochastic block model
CN112347369B (en) Integrated learning dynamic social network link prediction method based on network characterization
O'Connor et al. Biclustering using message passing
Sikdar et al. The infinity mirror test for graph models
Laishram et al. Link prediction in social networks with edge aging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant