CN109255433B

CN109255433B - Community detection method based on similarity

Info

Publication number: CN109255433B
Application number: CN201810987366.0A
Authority: CN
Inventors: 杨旭华; 沈敏
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2021-10-29
Anticipated expiration: 2038-08-28
Also published as: CN109255433A

Abstract

A community detection method based on similarity includes the steps of establishing a network model, calculating and recording a common neighbor set and a combined neighbor set of any two nodes, calculating a local degree ratio and a PageRank value of any one node in the network, calculating and recording degree ratio clustering values of all nodes and connecting edges in the network, finding out nodes with highest degree ratio similarity corresponding to all nodes, calculating and recording PageRank central clustering values of all nodes and connecting edges in the network, finding out nodes with highest PageRank similarity corresponding to all nodes, dividing all nodes and nodes with highest degree ratio similarity into the same community group to obtain initial community division, combining each node and the community where the node with the highest PageRank similarity is located, and obtaining a final community structure. The invention effectively utilizes the correlation information of the nodes, and has higher accuracy and lower time complexity.

Description

Community detection method based on similarity

Technical Field

The invention relates to the field of complex network and community division, in particular to a community detection method based on similarity.

Background

With the discovery of the small world and scale-free characteristics of a complex network, the research of network science is rapidly developed in the aspects of network structure, function, property and the like by combining the concepts and theories of nonlinear science and near modern physics. However, single indexes such as degree and betweenness in network science and abstract modeling such as topology, weighting, game models and the like cannot well describe the node characteristics of an actual network. In addition, the number of network nodes in the real world is large and complex, the relationships between the nodes and the nodes can be changed constantly, the nodes in the actual network are evolved and dynamic, and have clear subjectivity, and different principal behaviors reflect different functions of the nodes in network functions and behaviors. That is, the nodes and nodes mutually affect each other through the connecting edges, and the research of the local influence on the network function and the behavior is indispensable. The existing community division algorithms are rich, such as Newman fast algorithm and CNM algorithm based on modularity optimization, algorithm based on spectrum analysis, algorithm based on label propagation, algorithm based on seed diffusion, community division algorithm based on edges and the like. Community partitioning algorithms are generally partitioned according to a topology, which is the most fundamental abstraction of nodes and edges in a complex network. The community division method has no reasonable local influence on the nodes in the community formation process. The importance of the nodes is added into the network, the effect of mutual influence among the nodes in the formation of the community structure is quantitatively depicted by using the concept of the importance, the quality and the credibility of community division can be improved, the physical significance of the community is more definite, and the community change characteristics in the network evolution process are more accurate.

The community detection algorithm based on the local similarity has good performance in a large-scale network. The local similarity index includes CN index, Jaccard index, HPI index, and the like. These approaches still present some challenges, such as nodes at the edge of the network being easily overlooked. Therefore, research on a community detection algorithm based on similarity is necessary.

Disclosure of Invention

In order to overcome the defects of difficulty in obtaining network global information, low prediction precision and high time complexity of the conventional community detection algorithm, the invention provides a community detection method based on similarity, which is high in accuracy and low in time complexity.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for community detection based on similarity comprises the following steps:

the method comprises the following steps: constructing a undirected network model G (V, E) with N nodes, wherein V is a node and E is a connecting edge;

step two: arbitrarily selecting two nodes i and j, and calculating a common neighbor set Int (i, j) ═ Γ of the two nodes_i∩Γ_j) And a joint neighbor set Uni (i, j) ═ Γ_i∪Γ_j) Traversing all node pairs in the network, and calculating and recording a common neighbor set and a joint neighbor set of the corresponding node pairs;

step three: any node i in the network is selected, and the local degree ratio of the node is calculated

D (i) represents the degree of a node i, the degree of a node is the number of the adjacent nodes connected with the node, Γ (i) represents all the adjacent nodes connected with the node i and comprises the node i, the network is traversed, and the local degree ratio of all the nodes in the network is calculated and recorded;

step four: computing a vector containing PageRank values for each node in a network

x＝D(D-αA)^-1，

Where x is the calculated vector (PR) containing the PageRank value for each node of the network₁,PR₂,PR₃,…,PR_N) After multiple iterations of the above vector calculation formula, x converges to a value very close to the true centrality value, D is the diagonal matrix corresponding to the network and its element is D_ii＝max(k_i,1)，k_iIs the degree of node i; alpha is a positive adjustable parameter, A is an adjacency matrix of the network and represents the relation of connecting edges between nodes of the network, A _ij1, indicating that a connecting edge exists between the nodes i and j, and otherwise, indicating that no connecting edge exists;

step five: randomly selecting a node i in the network, and calculating the degree ratio clustering value of the connecting edges (i, j)

The value represents the degree ratio similarity of the nodes i and j, the network is traversed, and degree ratio clustering values of all nodes and connecting edges in the network are calculated and recorded;

step six: randomly selecting a node i in the network, finding out a node w with the highest degree ratio similarity, wherein the w is the j when the degree ratio clustering value of the connecting edge (i, j) takes the maximum value max { sim _ P (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the node with the highest degree ratio similarity corresponding to all the nodes;

step seven: randomly selecting a node i in the network, and calculating the PageRank centrality clustering value of the connecting edge (i, j)

The value represents the PageRank similarity of the nodes i and j, the network is traversed, and the PageRank centrality clustering values of all the nodes and the edges of the nodes in the network are calculated and recorded;

step eight: randomly selecting a node i in the network, finding out a node m with the highest PageRank similarity, wherein m is the j when the PageRank centrality clustering value of the connecting edge (i, j) takes the maximum value max { sim _ PR (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the nodes with the highest PageRank similarity corresponding to all the nodes;

step nine: randomly selecting a node i in the network, dividing a node j with the highest similarity to the degree ratio into the same community, traversing the network, and dividing all nodes into the same community with the highest similarity to the degree ratio to obtain initial community division;

step ten: randomly selecting a node i in the network, finding a point j with the highest PageRank similarity, and if the node i and the node j are not in the same community, combining the initial community where the node is located; and traversing the network, and combining each node with the community where the node with the highest PageRank similarity is located to obtain a final community structure.

The technical conception of the invention is as follows: and dividing the initial communities according to degree ratio similarity between the nodes with the continuous edges, and merging the initial communities according to the PageRank similarity between the nodes with the continuous edges so as to obtain a final community structure of the network.

The invention has the beneficial effects that: the similarity between the nodes is applied to the division of the community structure, the accuracy is high, and the time complexity is low.

Drawings

FIG. 1 is a schematic diagram of a network having 8 nodes and 10 edges.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a method for community detection based on similarity includes the following steps:

step two: arbitrarily selecting two nodes i and j, and calculating a common neighbor set Int (i, j) ═ Γ of the two nodes_i∩Γ_j) That is, the common neighbors of node 2 and node 5 in fig. 1 are node 4 and the joint neighbor set Uni (i, j) ═ t (Γ)_i∪Γ_j) That is, the joint neighbors of the node 2 and the node 5 in fig. 1 are the nodes {1,2,4,5,6,7,8}, all node pairs in the network are traversed, and the common neighbor set and the joint neighbor set of the corresponding node pair are calculated and recorded;

Wherein D (i) represents the degree of a node i, the degree of a node is the number of the adjacent nodes connected with the node i, namely the degree of a node 2 in the graph 1 is 2, and Γ (i) represents all the adjacent nodes connected with the node i and comprises the node i, traversing the network, and calculating and recording the local degree ratio of all the nodes in the network;

x＝D(D-αA)^-1，

As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims

1. A community detection method based on similarity is characterized in that: the method comprises the following steps:

x＝D(D-αA)^-1，

Where x is the calculated vector (PR) containing the PageRank value for each node of the network₁,PR₂,PR₃,…,PR_N) After multiple iterations of the above vector calculation formula, x converges to a value very close to the true centrality value, D is the diagonal matrix corresponding to the network and its element is D_ii＝max(k_i,1)，k_iIs the degree of node i; alpha is a positive adjustable parameter, A is an adjacency matrix of the network, representing the network between nodesA continuous edge relation, A_ij1, indicating that a connecting edge exists between the nodes i and j, and otherwise, indicating that no connecting edge exists;