CN110738418A - Detection method of weakly connected overlapping communities - Google Patents

Detection method of weakly connected overlapping communities Download PDF

Info

Publication number
CN110738418A
CN110738418A CN201910981098.6A CN201910981098A CN110738418A CN 110738418 A CN110738418 A CN 110738418A CN 201910981098 A CN201910981098 A CN 201910981098A CN 110738418 A CN110738418 A CN 110738418A
Authority
CN
China
Prior art keywords
edge
community
communities
active
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910981098.6A
Other languages
Chinese (zh)
Inventor
许小媛
刘芳
黄金国
李海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Open University of Jiangsu City Vocational College
Original Assignee
Jiangsu Open University of Jiangsu City Vocational College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Open University of Jiangsu City Vocational College filed Critical Jiangsu Open University of Jiangsu City Vocational College
Priority to CN201910981098.6A priority Critical patent/CN110738418A/en
Publication of CN110738418A publication Critical patent/CN110738418A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a detection method of weakly connected overlapped communities, which comprises the steps of receiving a community detection graph model, dividing the graph communities according to the number of processors, calculating an influence propagation model of each partition, wherein the influence propagation model is used for adjusting the edge weight of an approximate active edge of each partition according to the field edge density, and detecting the weakly connected overlapped communities by using a time interaction bias algorithm.

Description

Detection method of weakly connected overlapping communities
Technical Field
The invention relates to the technical field of community detection, in particular to a detection method of weakly connected overlapping communities.
Background
In modern society, social networks, such as facebook, Twitter, and LinkedIn, have become an important part of people's daily lives, about 68% of online users have social information for obtaining news or connecting with friends, family, and other acquaintances, many of these users form or join online communities.
In such applications, standard community detection methods, such as the Louvain method, the infomap method, the label propagation or Newman dominant feature vector method, etc., are mainly concerned with the relationship connection between users, depending on the relationship link between the followers/followers. These techniques represent the network as a static structure with stable links because these links exist for a long time. Communities are detected, for example, by several methods:
(1) detection methods serving as disjoint and overlapping communities are proposed based on the Louvain method, which considers a graph partitioning process implementation method in processing social graphs with acceptable time cost.
(2) By introducing weighted community clustering measurement, extensible community detection of a large-scale graph community model is realized by adopting an infomap method, and the setting of the index depends on a triangular structure in a community.
(3) Based on a label propagation algorithm model, a PageRank method is adopted, and a method for solving the problem of load balance of a processor in a large-scale community detection process is provided.
(4) The community detection process is processed into NP difficult optimization problems based on a Newman main feature vector method, and then a heuristic search algorithm is adopted to realize effective detection on the community structure.
The above algorithms all work well, but observations in the Twitter dataset show that most users interact independently of their follower links, with about 70% of users not sharing interactions with the users following the links.
Disclosure of Invention
The invention aims to provide detection methods of weakly connected overlapping communities, which can determine high-frequency interaction of users by predicting the active trend of influential users to the future and the like, can determine the influence of the users on adjacent users, can ensure that the weakly connected users still have the opportunity to be brought into the communities, can improve the accuracy of community detection, can simultaneously provide Time Interaction Bias (TIB) community detection methods based on overlapping community detection by considering the community structure so as to obtain better overlapping community detection performance, can also introduce objective functions into the graph community division process, and can process the partition division processes in parallel to calculate the influence propagation model, thereby greatly improving the community detection performance.
To achieve the above object, with reference to fig. 1, the present invention provides methods for detecting weakly connected overlapping communities, the methods including:
s1: receiving a community detection graph model, and dividing graph communities according to the number of processors;
s2: calculating an influence propagation model of each partition, wherein the influence propagation model is used for adjusting the edge weight of the approximate active edge of each partition by combining the edge density of the field;
s3: and detecting the weakly connected overlapped communities by using a time interaction biasing algorithm.
In an embodiment of the step , in the step S1, the process of receiving the community detection graph model and dividing graph communities according to the number of processors includes the following steps:
s11: searching all communities in the community detection graph model, acquiring fully-communicated subgraphs, and generating a subgraph set PCs, wherein pc belongs to PCs;
s12: collecting the domain nodes in each edge h-hop range in each subgraph, wherein h is larger than 0;
s13, setting a plurality of partitions according to the number of processors, judging whether the number of subgraphs is larger than the number of partitions, if so, correspondingly distributing the subgraph to part of the partitions, ending the process, otherwise, entering the step S14;
s14, allocating at least sub-graphs to each partition;
s15: and sequentially distributing the rest sub-graphs to each partition by taking the minimum value of the target function as a constraint condition, wherein the target function is as follows:
Figure BDA0002235204220000021
wherein, PeIs the number of edges in the processor P, PvIs the number of vertices in the processor P,is pciAnd the total number of edges in the additional data set per processor, the additional data set holding all the contiguous edges,
Figure BDA0002235204220000023
is pciThe total number of vertices in the set and in the additional data set for each processor,
Figure BDA0002235204220000024
and
Figure BDA0002235204220000025
the calculation formulas of (A) and (B) are respectively as follows:
Figure BDA0002235204220000026
Figure BDA0002235204220000027
wherein, pciIs a connected community, which belongs to the PC set,
Figure BDA0002235204220000028
is pciIs the distance pciThe weighted edge set within h hops.
In an embodiment of step , in step S2, the process of calculating the influence propagation model for each partition includes the following steps:
s21: setting a community detection graph model G as an undirected weighted graph, wherein G is (V, E, W), wherein V represents a node set, E represents an edge set, and W is a weight function set omega epsilon W; each edge
Figure BDA0002235204220000029
The weight w (e) ≧ 0 denotes the total frequency of interaction between nodes u and v for a randomly selected time interval t;
s22, the weight w (e) of the edge e is normalized based on the following criteria:
Figure BDA0002235204220000031
wherein m and n are active parameters, are empirically selected according to the social network data set, are real numbers, and satisfy that m is less than n;
in the formula, N is more than or equal to 0 and less than or equal to 1 (e), and non-zero value parameters represent different activity degrees;
s23: based on active edge eiPropagating the normalized weight to its neighbor node e as follows:
U(e)=λh·N(ei)
where λ is the attenuation factor, 0 < λ < 1, h is the number of hops between the current edge and its neighbor nodes, N (e)i) Is the weight of the active neighbor of e;
repeatedly calculating U (e) for the lower neighborhood edge in h hops of the edge e, storing the result of U (e) into a hash table, and calculating the final weight f (e) of the edge e by combining the hash table:
in an embodiment of step , in step S3, the process of detecting weakly connected overlapping communities by using a time interaction biasing algorithm includes:
s31, obtaining groups of all maximum communities PL which cannot be further expanded to exceed the size k, wherein k is the preset number of nodes;
s32: calculating a density index rho (pl) of each communityi),
Figure BDA0002235204220000033
And determining a final TIB community identification result through density comparison.
In the further embodiment of step , the selection criterion of the community is that the corresponding density index is greater than a preset density threshold θ.
In a further embodiment of step , in step S32, the density indicator for the active bias communities is calculated according to the following formula:
ρ(C)=∑e∈Cf(e)/|C
where | C | refers to the size of the community, and f (e) is the final weight of the edge e.
In an embodiment of step , the detecting method further includes:
s4: acquiring communities
Figure BDA0002235204220000034
It is k connected community sets, so that: 1)
Figure BDA0002235204220000035
where e is an active or semi-active edge; 2) the active bias density metric ρ (C) is maximum.
In an embodiment of step , the detecting method further includes:
evaluation of indexes and F Using mutual information1And evaluating the detection result by the evaluation index.
Compared with the prior art, the technical scheme of the invention has the following remarkable beneficial effects:
1) the method comprises the steps of (1) designing objective functions aiming at the segmentation problem of a community detection graph model, optimizing processor load balance by utilizing a community structure, improving the efficiency of model solution, (2) redefining an active edge concept, providing influence propagation models, determining that users have high-frequency interaction and have strong identification performance for weak connection users, and (3) providing Time Interaction Bias (TIB) community detection methods based on overlapping community detection, and obtaining good overlapping community detection performance.
It should be understood that all combinations of the foregoing concepts as well as additional concepts described in greater detail below can be considered to be part of the presently disclosed subject matter unless such concepts are mutually inconsistent.
The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.
Drawings
The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a method of detecting weakly connected overlapping communities of the present invention.
FIG. 2 is an exemplary graph of weight assignments of the present invention.
FIG. 3 is an exemplary plot of community bias density values of the present invention.
Fig. 4 is a diagram illustrating an example process of partitioning PCs in accordance with the present invention.
Fig. 5 is a block diagram of the detection method of examples in the invention.
FIG. 6 is a schematic diagram of an experimental model object of the present invention.
FIG. 7 shows the results of comparative tests (F) according to the invention1) Schematic representation.
Fig. 8 is a graph showing the results of comparative experiments (NMI) according to the present invention.
FIG. 9 is a schematic diagram comparing the real community network detection results of the present invention.
Detailed Description
In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.
With reference to fig. 1, the present invention provides methods for detecting weakly connected overlapping communities, where the methods include:
s1: receiving a community detection graph model, and dividing graph communities according to the number of processors;
s2: calculating an influence propagation model of each partition, wherein the influence propagation model is used for adjusting the edge weight of the approximate active edge of each partition by combining the edge density of the field;
s3: and detecting the weakly connected overlapped communities by using a time interaction biasing algorithm.
In the invention, a community detection graph model G is set as an undirected weighted graph, wherein G is (V, E, W), wherein V represents a node set, E represents an edge set, and W is a weight function set omega ∈ W. Each edge
Figure BDA0002235204220000041
For example, in Twitter, the weight will be calculated as w (e) ≧ Sigma (@ + RTs), which represents the sum of the Interactions represented by mentions and reprints within a given time interval t, @ represents the sum of the Interactions at time above, RTs represents the interaction at the current time, Table 1 gives all the mathematical sign definitions used in the present invention.
TABLE 1 mathematical symbol definitions
Figure BDA0002235204220000051
The technical problems to be solved by the invention include the following two: (1) time interaction bias problem, with the goal of finding communities
Figure BDA0002235204220000052
It is k connected community sets, so that: a)where e is an active or semi-active edge; b) the active bias density metric ρ (C) is maximized. (2) The community-based graph partitioning problem, with the goal of finding a set of communities R, whereComprises the following steps: a) the nodes and the edges belong to a community structure; b) the data (node and edge) load is balanced across the processor groups (P).
The following describes the detection method of the present invention in detail with reference to specific examples, aiming at the two aforementioned technical problems.
Problem 1, time interaction bias problem
To solve the connected k-community constraint, we first give the definition of "clique", which is fully connected subgraphs and PC is the set of adjacent "cliques", meaning that they share k-1 nodes.
In a Twitter network, an edge weighted at 40 means that it contains 40 interaction processes and is likely to be considered an active edge, but an inactive edge may also be important, particularly if most of its neighbors are highly active edges.
Although the impact is small in the current time interval, its impact may be amplified and become very important in subsequent community detection processes.
Considering each edge e of the graph, the model determines the weight of the edge through a two-step process step is to normalize the weight w (e) of the edge e based on the following criteria:
Figure BDA0002235204220000061
where 0 ≦ n (e) ≦ 1, with non-zero values for different activity, when n (e) ≦ 1, for the activity of the edge of 100%. the parameters m and n in the above equation are real and satisfy m < n as an activity parameter, which is intended to be normalized for weights in the interval 0 to 1.
Definition 1: (active edge e)i) When N (e) is 1, the edge e of the communityiReferred to as active edges.
Definition 2: (approximate active edge) it is from active edge eiAdjacent edges within h hops of (a).
The second step in the model is based on the active edge eiPropagating the normalized weight to its neighbor node e as follows:
U(e)=λh·N(ei) (2)
where λ (0 < λ < 1) is attenuation factors, h is the number of hops between the current edge and its neighbor nodes, and N (e)i) The calculation of u (e) is repeated for the next neighbor edge in h-hops of edge e then the result of u (e) is stored in hash tables to be used to calculate the final weight f (e) of edge e, in the form:
Figure BDA0002235204220000062
where f (e) multiplies the values in the hash table to determine the new weight for edge e. The learning model will consider the neighbor nodes of the edge, which helps redefine the edge weights based on the liveness of the neighbors.
The model aims to consider the edge weights of temporal interactions taking into account the inactive edge neighbor edge weights. Therefore, it takes into account not only the current time interval but also the neighboring edge influence probability.
Example 1 consider the specific example shown in FIG. 2. in FIG. 2a, the original weighted graph is given, with N (e), step redefines the weights from 0 to 1, when equation (3) is executedFinally, we use the hash table to compute the final weight for each edge, as shown in FIG. 2c, which is computed using f (e).
Definition 3: (active bias Density) the density index for the active bias communities may be calculated as follows:
ρ(C)=∑e∈Cf(e)/|C| (4)
which is the sum of the biases within the community divided by the size of the community | C |. The quality of the community may be assessed based on a threshold limit community. Therefore, community discovery is performed based on the C-community, where ρ (C) can be maximized.
Example 2 continuing with the example shown in FIG. 3, there are now reconstructed weighted graphs, then we find all communities in the graph and calculate their bias density values using equation (5), as shown in FIG. 3. then, calculate the bias density values for the PCs using equation (4). the PCs only form communities when their bias density scores are higher than the bias density score of each community, otherwise, each community is itself a community.FIG. 2 gives two cases where the two PCs on the left have smaller density values when joined, and the right part is a contradiction.
Problem 2, community-based graph partitioning problem
The complexity of solving this problem is NP hard and can be solved using heuristic algorithms.
Definition 4: (Objective function J) it can evenly distribute PCs to available partitions if the following conditions are met: 1) the number of partitions (R) is greater than the number of processors (P), i.e., R ≧ P. 2) Each partition
Figure BDA0002235204220000072
With almost the same number of PCs. 3) The objective function J has the smallest value.
To ensure that these conditions are met, we calculate for each unassigned pc ∈ PCs as follows:
Figure BDA0002235204220000073
wherein, PeIs the number of edges in the processor P, PvIs the number of vertices in the processor P,is pciAnd per processorTotal number of edges in the additional dataset. The additional data set holds all the contiguous edges,
Figure BDA0002235204220000075
can be calculated as:
Figure BDA0002235204220000076
wherein, pciIs a connected community, which belongs to the PC set,
Figure BDA0002235204220000077
is pciIs the distance pc, isiThe parameter h is any number > 0 and can be changed as required.
Figure BDA0002235204220000078
Is pciAnd the total number of vertices in the additional data sets for each processor.
Figure BDA0002235204220000081
Example 3 considering the example shown in fig. 4a, to segment this graph we first find a community and then find PCs, where PCs are shown in dashed lines in fig. 4b, let us assume that the number of processors is P-2, then we need two partitions, each processor being assigned partitions1Assign to partition 1, PC2Is assigned to partition 2 and for PC3The assignment of (c) will be designed according to the value of the objective function J.
As a result, when PC is added3To PC in partition 11Then, the calculation result of the objective function J is 54. When adding PC3To PC in partition 22The calculation result of the objective function J is 44. Based on the calculation result, PC3Will take the last cases and be assigned to processor 2.
Fig. 5 is a block diagram of the detection method of examples, divided into three phases 1) graph partitioning, which is based on PCs and an objective function j-here, nodes and edges not belonging to the community are eliminated.2) computing an influence propagation model for each partition.3) detecting communities using the TIB community detection method.
The detailed calculation process of each stage of the community detection algorithm framework shown in FIG. 5 is as follows:
process 1: graph partitioning
The computational steps of the process are shown in Table 2. the inputs to the algorithmic process are the undirected weighted graph, neighborhood hop counts and the number of processors P. first, we look for communities (line 1), find PCs (line 2). then, we collect neighborhood nodes (lines 3-7) within each edge h-hop in PCs, which will be used to compute the impact propagation model.thereafter, we assign PCs to partitions, if there are six partitions, then assign 6 PCs (line 9) first, after that, the remaining PCs are assigned to partitions (lines 10-13) based on the computation of the objective function (J). when all PCs are evenly distributed across partitions, the partitioning process of the algorithm ends (lines 14-15).
Figure BDA0002235204220000091
And (2) a process: impact propagation model computation
The computation of the impact propagation model is given in Algorithm 2 of Table 3. first, the weights are initialized and normalized by P (e) (lines 2-4). for each edge e (line 5) where P (e) < 1, the initialization is performed as 1) h the number of hops from the edge to the neighborhood node, 2) N, which contains the set of e neighborhoods, 3) the hash table of e (line 6), and then the neighbor of e is found using degrees to search first and will beThey are stored in N. Then, u (e) is calculated and stored in N. This process is repeated until the maximum hop count constraint h < 4 (lines 7-11). Next, take the set of U (e) weights in the hash table as input (line 13), and output the f (e) value of edge e (line 14). This process is repeated until all edges are processed to output a set with bias weights
Figure BDA0002235204220000093
(line 16).
Figure BDA0002235204220000092
And 3, process: TIB colony detection
As shown in fig. 2a, the algorithm first obtains groups of all maximum communities that cannot be expanded to reach steps beyond size k, in the prior art, all neighboring communities that share k-1 nodes are considered, and the community selection criterion is that the density index i (G) is greater than the threshold θ:
in the present invention, we use the aforementioned ρ (C) index instead of the I (G) index for algorithm design, and the proposed bias density measurement index can find TIB community communities that are not connectedi),
Figure BDA0002235204220000102
The final TIB community identification results (lines 9-17) were determined by density comparison, as shown in FIG. 3 b.
Figure BDA0002235204220000103
In an embodiment of step , the detecting method further includes:
evaluation of indexes and F Using mutual information1And evaluating the detection result by the evaluation index. Mutual information evaluation index (NMI) can realize evaluation of detection community similarity, embodying overlapping community detection, F1The evaluation index can evaluate the community detection accuracy.
Generation of subjects
The selected experimental object generation method is LFR and MMSB, and the sparsity of the generated network can be controlled according to the setting of network parameters to obtain networks with different characteristics:
(1) MMSB object generation: the basis of the community generation algorithm is probability theory, so that a community link between p and q is obtained, and the community link has Y (p, q) distribution:
Figure BDA0002235204220000111
in the formula, the parameter β is an interaction matrix used in the community detection process, and Z is a distribution form presented in the community detection process, and has a polynomial characteristic:
Figure BDA0002235204220000112
Figure BDA0002235204220000113
in the formula, the parameter α is used for controlling the overlapping degree of the generated model communities, the model structure is shown in fig. 6, according to the experimental result, if a sparse overlapping community network model is required to be obtained, the modularity index can be adjusted to be more than or equal to 0.5, otherwise, if a dense overlapping community network model is required to be obtained, the modularity index can be adjusted to be less than 0.5, and table 5 shows the attribute data of the network experimental object generated by adopting the MMSB method.
TABLE 5 network object Attribute generated Using MMSB method
Figure BDA0002235204220000114
(2) LFR object generation: the community network generation algorithm is similar to the MMSB object generation method, the sparsity of the community network model can be controlled based on the setting of the modularity parameter, the generated LFR object is shown in figure 6b, and the parameter setting is shown in figure 6.
Table 6 network object attributes generated by LFR method
The experimental model object shown in fig. 6 corresponds to the 1 st and 3 rd rows of the parameters set in table 5 and table 6, respectively.
Secondly, stability comparison experiment results
First, using F1The evaluation indexes are used for evaluating the quality of the community discovery process, and the index form is as follows:
Figure BDA0002235204220000116
in the formula, the parameter FP is the proportion that the real network is a positive value, but the community detection result is a negative value; the parameter FN refers to the proportion that a real network is a negative value, but a community detection result is a positive value; the parameter TP is a proportion that a real network is a positive value and a community detection result is a positive value. P represents the accuracy of the community detection process, and R represents the recall rate of the community detection process.
To verify the effectiveness of the algorithm, the term "Chen M, Nguyen T, Szymanski B.on measurement of the quality of a network community structure [ J ] is chosen here]Social computing, 2013,52(3): 122-]Two documents mentioned in the two documents IEEE Transactions on Services Computing,2015,8(2):284-298 "(defined as comparative method 2)An overlapping community detection algorithm is used as a comparison. F1The results of the evaluation experiment are shown in FIG. 7. In the usual case, F1The evaluation experiment result value range is between 0 and 1, and the higher the value of the index value is like , the higher the stability of the algorithm is, from the experiment result of FIG. 7, the algorithm of the invention is F1The evaluation experiment result is relatively superior to the experiment results of the two selected comparison algorithms, which shows that the stability of community detection of the algorithm is relatively better. At the same time, in F1In the evaluation experiment result change trend, the three comparison algorithms show a gradually decreasing trend along with the increase of the number of communities, which shows that certain correlation exists between the stability of the three algorithms and the number of communities in the process of carrying out community detection, and the stability of the algorithm is worse when the number of communities is larger.
Thirdly, comparing the experimental results with the accuracy
For stability experimental analysis of community detection of the algorithm, NMI is selected as a comparative evaluation index. For communities a and B, it is specifically defined as:
wherein, the parameter N represents the number of the top points in the community detection network, the parameter C represents the confusion parameter matrix formed by the community detection model, and the parameter C represents the confusion parameter matrix formed by the community detection modelijRepresenting vertices belonging to different types of community detection results i and j simultaneously. CAAnd CBIs the number of divisions. The two aforementioned overlapping community detection algorithms are still chosen for comparison. The results of the experiment are shown in FIG. 8.
Compared with the NMI experimental result shown in FIG. 8, the community identification precision of several algorithms shows a relatively -caused reduction trend with the increase of the number of network vertices, for the network with the same scale, the tighter the network represents the more serious the overlapping condition of the network, from the experimental result, although the NMI precision of the algorithm is reduced with the increase of the number of vertices, the reduction amplitude is not large, and it can be seen that the community detection precision and stability of the algorithm are higher in the community overlapping degree compared with the comparison algorithm.
Fourth, real community network detection result
The method comprises the steps of constructing a real data set by utilizing an API (application program interface) network data crawling tool, setting seed accounts in a network by a data acquisition platform, and acquiring data characteristics of the network, wherein 1) 5 adjacent accounts are selected as the seed accounts, 2) adjacent vertexes in the Xinlang microblog are captured according to a deep search strategy, and 3) parameters in a capturing process are adjusted according to a reference network.
Through the data capturing process, the size of the constructed Sina microblog network is 78 groups of subjects, the top points are 900, the microblog data interactive information is 5 thousands, the experimental program selects Java language to construct the model, the comparison algorithm still selects the two overlapped community detection algorithms, and the evaluation indexes of the algorithm performance are selected according to the accuracy, the recall rate and the concordance index FmeasAnd comparing and verifying the performance of the algorithm. Can be defined as:
for the Xinlang microblog data obtained by the network data crawling tool, the edge number of the Xinlang microblog data is highly dependent on the data crawling sparsity, and the experimental comparison results of the two selected algorithms and the detection method in the data crawling are shown in the figure 8.
According to the comparison result of the Xinlang microblog data experiment shown in fig. 9, under the condition of low-density community detection, the difference of three algorithms is not large, particularly, compared with the comparison method 1, the detection method provided by the invention has unobvious advantages, but the overlapping and interaction degrees among communities are increased along with the increase of community compactness, and the detection method provided by the invention considers the problem that the propagation is influenced by the time interaction bias of the weakly connected overlapped communities, so that the comprehensive performance of the detection method provided by the invention is high in speed, and the change range of the two selected algorithms is small. The experimental results verify the performance advantages of the community discovery algorithm of the proposed algorithm.
It should be understood that the various concepts and embodiments described above, as well as those described in greater detail below, may be embodied in any of numerous ways, including , since the disclosed concepts and embodiments are not limited to any embodiment.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims (8)

1, method for detecting weakly connected overlapped communities, comprising:
s1: receiving a community detection graph model, and dividing graph communities according to the number of processors;
s2: calculating an influence propagation model of each partition, wherein the influence propagation model is used for adjusting the edge weight of the approximate active edge of each partition by combining the edge density of the field;
s3: and detecting the weakly connected overlapped communities by using a time interaction biasing algorithm.
2. The method for detecting weakly-connected overlapping communities as claimed in claim 1, wherein in step S1, the process of receiving the community detection graph model and dividing graph communities according to the number of processors includes the following steps:
s11: searching all communities in the community detection graph model, acquiring fully-communicated subgraphs, and generating a subgraph set PCs, wherein pc belongs to PCs;
s12: collecting the domain nodes in each edge h-hop range in each subgraph, wherein h is larger than 0;
s13, setting a plurality of partitions according to the number of processors, judging whether the number of subgraphs is larger than the number of partitions, if so, correspondingly distributing the subgraph to part of the partitions, ending the process, otherwise, entering the step S14;
s14, allocating at least sub-graphs to each partition;
s15: and sequentially distributing the rest sub-graphs to each partition by taking the minimum value of the target function as a constraint condition, wherein the target function is as follows:
Figure FDA0002235204210000011
wherein, PeIs the number of edges in the processor P, PvIs the number of vertices in the processor P,is pciAnd the total number of edges in the additional data set per processor, the additional data set holding all the contiguous edges,
Figure FDA0002235204210000013
is pciThe total number of vertices in the set and in the additional data set for each processor,
Figure FDA0002235204210000014
and
Figure FDA0002235204210000015
the calculation formulas of (A) and (B) are respectively as follows:
Figure FDA0002235204210000016
Figure FDA0002235204210000017
wherein, pciIs a connected community, which belongs toThe set of PCs is a set of PCs,
Figure FDA0002235204210000018
is pciIs the distance pciThe weighted edge set within h hops.
3. The method for detecting weakly connected overlapping communities according to claim 1, wherein in the step S2, the process of calculating the influence propagation model of each partition comprises the following steps:
s21: setting a community detection graph model G as an undirected weighted graph, wherein G is (V, E, W), wherein V represents a node set, E represents an edge set, and W is a weight function set omega epsilon W; each edge
Figure FDA0002235204210000019
The weight w (e) ≧ 0 denotes the total frequency of interaction between nodes u and v for a randomly selected time interval t;
s22, the weight w (e) of the edge e is normalized based on the following criteria:
wherein m and n are active parameters, are empirically selected according to the social network data set, are real numbers, and satisfy that m is less than n;
in the formula, N is more than or equal to 0 and less than or equal to 1 (e), and non-zero value parameters represent different activity degrees;
s23: based on active edge eiPropagating the normalized weight to its neighbor node e as follows:
U(e)=λh·N(ei)
where λ is the attenuation factor, 0 < λ < 1, h is the number of hops between the current edge and its neighbor nodes, N (e)i) Is the weight of the active neighbor of e;
repeatedly calculating U (e) for the lower neighborhood edge in h hops of the edge e, storing the result of U (e) into a hash table, and calculating the final weight f (e) of the edge e by combining the hash table:
Figure FDA0002235204210000022
4. the method for detecting weakly connected overlapping communities according to claim 1, wherein in step S3, the step of detecting weakly connected overlapping communities by using a time interaction biasing algorithm includes:
s31, obtaining groups of all maximum communities PL which cannot be further expanded to exceed the size k, wherein k is the preset number of nodes;
s32: calculating a density index rho (pl) of each communityi),
Figure FDA0002235204210000023
And determining a final TIB community identification result through density comparison.
5. The method for detecting the weakly connected overlapping communities according to claim 4, wherein the selection criterion of the community is that the corresponding density index is greater than a preset density threshold θ.
6. The method for detecting weakly-connected overlapping communities according to claim 4, wherein in step S32, the density index of the active bias communities is calculated according to the following formula:
ρ(C)=∑e∈Cf(e)/|C|
where | C | refers to the size of the community, and f (e) is the final weight of the edge e.
7. The method for detecting weakly-connected overlapping communities as claimed in claim 6, further comprising:
s4: acquiring communities
Figure FDA0002235204210000024
It is k connected community sets, so that: 1)
Figure FDA0002235204210000025
where e is an active or semi-active edge; 2) the active bias density metric ρ (C) is maximum.
8. The method for detecting weakly connected overlapping communities as claimed in claim 1, further comprising:
evaluation of indexes and F Using mutual information1And evaluating the detection result by the evaluation index.
CN201910981098.6A 2019-10-16 2019-10-16 Detection method of weakly connected overlapping communities Pending CN110738418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910981098.6A CN110738418A (en) 2019-10-16 2019-10-16 Detection method of weakly connected overlapping communities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910981098.6A CN110738418A (en) 2019-10-16 2019-10-16 Detection method of weakly connected overlapping communities

Publications (1)

Publication Number Publication Date
CN110738418A true CN110738418A (en) 2020-01-31

Family

ID=69268985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910981098.6A Pending CN110738418A (en) 2019-10-16 2019-10-16 Detection method of weakly connected overlapping communities

Country Status (1)

Country Link
CN (1) CN110738418A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311409A (en) * 2020-02-13 2020-06-19 腾讯云计算(北京)有限责任公司 Target object determination method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311409A (en) * 2020-02-13 2020-06-19 腾讯云计算(北京)有限责任公司 Target object determination method and device, electronic equipment and storage medium
CN111311409B (en) * 2020-02-13 2023-04-07 腾讯云计算(北京)有限责任公司 Target object determination method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Harenberg et al. Community detection in large‐scale networks: a survey and empirical evaluation
CN111046429B (en) Method and device for establishing relationship network based on privacy protection
Zhong et al. A fast minimum spanning tree algorithm based on K-means
Gui et al. A community discovery algorithm based on boundary nodes and label propagation
Modani et al. Large maximal cliques enumeration in sparse graphs
CN109766710B (en) Differential privacy protection method of associated social network data
Gohari et al. A significance-based trust-aware recommendation approach
Rossi et al. Hone: Higher-order network embeddings
Dhumal et al. Survey on community detection in online social networks
Falahiazar et al. Determining the Parameters of DBSCAN Automatically Using the Multi-Objective Genetic Algorithm.
CN108647739B (en) Social network community discovery method based on improved density peak clustering
CN110738418A (en) Detection method of weakly connected overlapping communities
Young et al. Growing networks of overlapping communities with internal structure
CN111178678B (en) Network node importance evaluation method based on community influence
Zhang et al. Detecting local communities in networks with edge uncertainty
Lu et al. Extending CDFR for overlapping community detection
Guo et al. The FRCK clustering algorithm for determining cluster number and removing outliers automatically
Mythili et al. Research Analysis on Clustering Techniques in Wireless Sensor Networks
Cheng et al. Label propagation for graph label noise
El Ayeb et al. Evaluation metrics for overlapping community detection
CN111709846A (en) Local community discovery algorithm based on line graph
CN112579831B (en) Network community discovery method, device and storage medium based on SimRank global matrix smooth convergence
Chen et al. A modularity degree based heuristic community detection algorithm
Li Neighbor Propagation Clustering Algorithm for Intrusion Detection.
Srihitha et al. CLASSIFICATION OF GROCERY ITEMS USING APRIORI ALGORITHM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200131

RJ01 Rejection of invention patent application after publication