CN109255433B - Community detection method based on similarity - Google Patents

Community detection method based on similarity Download PDF

Info

Publication number
CN109255433B
CN109255433B CN201810987366.0A CN201810987366A CN109255433B CN 109255433 B CN109255433 B CN 109255433B CN 201810987366 A CN201810987366 A CN 201810987366A CN 109255433 B CN109255433 B CN 109255433B
Authority
CN
China
Prior art keywords
node
network
nodes
similarity
pagerank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810987366.0A
Other languages
Chinese (zh)
Other versions
CN109255433A (en
Inventor
杨旭华
沈敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810987366.0A priority Critical patent/CN109255433B/en
Publication of CN109255433A publication Critical patent/CN109255433A/en
Application granted granted Critical
Publication of CN109255433B publication Critical patent/CN109255433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Biomedical Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A community detection method based on similarity includes the steps of establishing a network model, calculating and recording a common neighbor set and a combined neighbor set of any two nodes, calculating a local degree ratio and a PageRank value of any one node in the network, calculating and recording degree ratio clustering values of all nodes and connecting edges in the network, finding out nodes with highest degree ratio similarity corresponding to all nodes, calculating and recording PageRank central clustering values of all nodes and connecting edges in the network, finding out nodes with highest PageRank similarity corresponding to all nodes, dividing all nodes and nodes with highest degree ratio similarity into the same community group to obtain initial community division, combining each node and the community where the node with the highest PageRank similarity is located, and obtaining a final community structure. The invention effectively utilizes the correlation information of the nodes, and has higher accuracy and lower time complexity.

Description

Community detection method based on similarity
Technical Field
The invention relates to the field of complex network and community division, in particular to a community detection method based on similarity.
Background
With the discovery of the small world and scale-free characteristics of a complex network, the research of network science is rapidly developed in the aspects of network structure, function, property and the like by combining the concepts and theories of nonlinear science and near modern physics. However, single indexes such as degree and betweenness in network science and abstract modeling such as topology, weighting, game models and the like cannot well describe the node characteristics of an actual network. In addition, the number of network nodes in the real world is large and complex, the relationships between the nodes and the nodes can be changed constantly, the nodes in the actual network are evolved and dynamic, and have clear subjectivity, and different principal behaviors reflect different functions of the nodes in network functions and behaviors. That is, the nodes and nodes mutually affect each other through the connecting edges, and the research of the local influence on the network function and the behavior is indispensable. The existing community division algorithms are rich, such as Newman fast algorithm and CNM algorithm based on modularity optimization, algorithm based on spectrum analysis, algorithm based on label propagation, algorithm based on seed diffusion, community division algorithm based on edges and the like. Community partitioning algorithms are generally partitioned according to a topology, which is the most fundamental abstraction of nodes and edges in a complex network. The community division method has no reasonable local influence on the nodes in the community formation process. The importance of the nodes is added into the network, the effect of mutual influence among the nodes in the formation of the community structure is quantitatively depicted by using the concept of the importance, the quality and the credibility of community division can be improved, the physical significance of the community is more definite, and the community change characteristics in the network evolution process are more accurate.
The community detection algorithm based on the local similarity has good performance in a large-scale network. The local similarity index includes CN index, Jaccard index, HPI index, and the like. These approaches still present some challenges, such as nodes at the edge of the network being easily overlooked. Therefore, research on a community detection algorithm based on similarity is necessary.
Disclosure of Invention
In order to overcome the defects of difficulty in obtaining network global information, low prediction precision and high time complexity of the conventional community detection algorithm, the invention provides a community detection method based on similarity, which is high in accuracy and low in time complexity.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for community detection based on similarity comprises the following steps:
the method comprises the following steps: constructing a undirected network model G (V, E) with N nodes, wherein V is a node and E is a connecting edge;
step two: arbitrarily selecting two nodes i and j, and calculating a common neighbor set Int (i, j) ═ Γ of the two nodesi∩Γj) And a joint neighbor set Uni (i, j) ═ Γi∪Γj) Traversing all node pairs in the network, and calculating and recording a common neighbor set and a joint neighbor set of the corresponding node pairs;
step three: any node i in the network is selected, and the local degree ratio of the node is calculated
Figure GDA0003179510310000021
D (i) represents the degree of a node i, the degree of a node is the number of the adjacent nodes connected with the node, Γ (i) represents all the adjacent nodes connected with the node i and comprises the node i, the network is traversed, and the local degree ratio of all the nodes in the network is calculated and recorded;
step four: computing a vector containing PageRank values for each node in a network
x=D(D-αA)-1
Where x is the calculated vector (PR) containing the PageRank value for each node of the network1,PR2,PR3,…,PRN) After multiple iterations of the above vector calculation formula, x converges to a value very close to the true centrality value, D is the diagonal matrix corresponding to the network and its element is Dii=max(ki,1),kiIs the degree of node i; alpha is a positive adjustable parameter, A is an adjacency matrix of the network and represents the relation of connecting edges between nodes of the network, A ij1, indicating that a connecting edge exists between the nodes i and j, and otherwise, indicating that no connecting edge exists;
step five: randomly selecting a node i in the network, and calculating the degree ratio clustering value of the connecting edges (i, j)
Figure GDA0003179510310000031
The value represents the degree ratio similarity of the nodes i and j, the network is traversed, and degree ratio clustering values of all nodes and connecting edges in the network are calculated and recorded;
step six: randomly selecting a node i in the network, finding out a node w with the highest degree ratio similarity, wherein the w is the j when the degree ratio clustering value of the connecting edge (i, j) takes the maximum value max { sim _ P (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the node with the highest degree ratio similarity corresponding to all the nodes;
step seven: randomly selecting a node i in the network, and calculating the PageRank centrality clustering value of the connecting edge (i, j)
Figure GDA0003179510310000032
The value represents the PageRank similarity of the nodes i and j, the network is traversed, and the PageRank centrality clustering values of all the nodes and the edges of the nodes in the network are calculated and recorded;
step eight: randomly selecting a node i in the network, finding out a node m with the highest PageRank similarity, wherein m is the j when the PageRank centrality clustering value of the connecting edge (i, j) takes the maximum value max { sim _ PR (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the nodes with the highest PageRank similarity corresponding to all the nodes;
step nine: randomly selecting a node i in the network, dividing a node j with the highest similarity to the degree ratio into the same community, traversing the network, and dividing all nodes into the same community with the highest similarity to the degree ratio to obtain initial community division;
step ten: randomly selecting a node i in the network, finding a point j with the highest PageRank similarity, and if the node i and the node j are not in the same community, combining the initial community where the node is located; and traversing the network, and combining each node with the community where the node with the highest PageRank similarity is located to obtain a final community structure.
The technical conception of the invention is as follows: and dividing the initial communities according to degree ratio similarity between the nodes with the continuous edges, and merging the initial communities according to the PageRank similarity between the nodes with the continuous edges so as to obtain a final community structure of the network.
The invention has the beneficial effects that: the similarity between the nodes is applied to the division of the community structure, the accuracy is high, and the time complexity is low.
Drawings
FIG. 1 is a schematic diagram of a network having 8 nodes and 10 edges.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a method for community detection based on similarity includes the following steps:
the method comprises the following steps: constructing a undirected network model G (V, E) with N nodes, wherein V is a node and E is a connecting edge;
step two: arbitrarily selecting two nodes i and j, and calculating a common neighbor set Int (i, j) ═ Γ of the two nodesi∩Γj) That is, the common neighbors of node 2 and node 5 in fig. 1 are node 4 and the joint neighbor set Uni (i, j) ═ t (Γ)i∪Γj) That is, the joint neighbors of the node 2 and the node 5 in fig. 1 are the nodes {1,2,4,5,6,7,8}, all node pairs in the network are traversed, and the common neighbor set and the joint neighbor set of the corresponding node pair are calculated and recorded;
step three: any node i in the network is selected, and the local degree ratio of the node is calculated
Figure GDA0003179510310000041
Wherein D (i) represents the degree of a node i, the degree of a node is the number of the adjacent nodes connected with the node i, namely the degree of a node 2 in the graph 1 is 2, and Γ (i) represents all the adjacent nodes connected with the node i and comprises the node i, traversing the network, and calculating and recording the local degree ratio of all the nodes in the network;
step four: computing a vector containing PageRank values for each node in a network
x=D(D-αA)-1
Where x is the calculated vector (PR) containing the PageRank value for each node of the network1,PR2,PR3,…,PRN) After multiple iterations of the above vector calculation formula, x converges to a value very close to the true centrality value, D is the diagonal matrix corresponding to the network and its element is Dii=max(ki,1),kiIs the degree of node i; alpha is a positive adjustable parameter, A is an adjacency matrix of the network and represents the relation of connecting edges between nodes of the network, A ij1, indicating that a connecting edge exists between the nodes i and j, and otherwise, indicating that no connecting edge exists;
step five: randomly selecting a node i in the network, and calculating the degree ratio clustering value of the connecting edges (i, j)
Figure GDA0003179510310000051
The value represents the degree ratio similarity of the nodes i and j, the network is traversed, and degree ratio clustering values of all nodes and connecting edges in the network are calculated and recorded;
step six: randomly selecting a node i in the network, finding out a node w with the highest degree ratio similarity, wherein the w is the j when the degree ratio clustering value of the connecting edge (i, j) takes the maximum value max { sim _ P (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the node with the highest degree ratio similarity corresponding to all the nodes;
step seven: randomly selecting a node i in the network, and calculating the PageRank centrality clustering value of the connecting edge (i, j)
Figure GDA0003179510310000052
The value represents the PageRank similarity of the nodes i and j, the network is traversed, and the PageRank centrality clustering values of all the nodes and the edges of the nodes in the network are calculated and recorded;
step eight: randomly selecting a node i in the network, finding out a node m with the highest PageRank similarity, wherein m is the j when the PageRank centrality clustering value of the connecting edge (i, j) takes the maximum value max { sim _ PR (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the nodes with the highest PageRank similarity corresponding to all the nodes;
step nine: randomly selecting a node i in the network, dividing a node j with the highest similarity to the degree ratio into the same community, traversing the network, and dividing all nodes into the same community with the highest similarity to the degree ratio to obtain initial community division;
step ten: randomly selecting a node i in the network, finding a point j with the highest PageRank similarity, and if the node i and the node j are not in the same community, combining the initial community where the node is located; and traversing the network, and combining each node with the community where the node with the highest PageRank similarity is located to obtain a final community structure.
As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims (1)

1. A community detection method based on similarity is characterized in that: the method comprises the following steps:
the method comprises the following steps: constructing a undirected network model G (V, E) with N nodes, wherein V is a node and E is a connecting edge;
step two: arbitrarily selecting two nodes i and j, and calculating a common neighbor set Int (i, j) ═ Γ of the two nodesi∩Γj) And a joint neighbor set Uni (i, j) ═ Γi∪Γj) Traversing all node pairs in the network, and calculating and recording a common neighbor set and a joint neighbor set of the corresponding node pairs;
step three: any node i in the network is selected, and the local degree ratio of the node is calculated
Figure FDA0003179510300000011
D (i) represents the degree of a node i, the degree of a node is the number of the adjacent nodes connected with the node, Γ (i) represents all the adjacent nodes connected with the node i and comprises the node i, the network is traversed, and the local degree ratio of all the nodes in the network is calculated and recorded;
step four: computing a vector containing PageRank values for each node in a network
x=D(D-αA)-1
Where x is the calculated vector (PR) containing the PageRank value for each node of the network1,PR2,PR3,…,PRN) After multiple iterations of the above vector calculation formula, x converges to a value very close to the true centrality value, D is the diagonal matrix corresponding to the network and its element is Dii=max(ki,1),kiIs the degree of node i; alpha is a positive adjustable parameter, A is an adjacency matrix of the network, representing the network between nodesA continuous edge relation, Aij1, indicating that a connecting edge exists between the nodes i and j, and otherwise, indicating that no connecting edge exists;
step five: randomly selecting a node i in the network, and calculating the degree ratio clustering value of the connecting edges (i, j)
Figure FDA0003179510300000012
The value represents the degree ratio similarity of the nodes i and j, the network is traversed, and degree ratio clustering values of all nodes and connecting edges in the network are calculated and recorded;
step six: randomly selecting a node i in the network, finding out a node w with the highest degree ratio similarity, wherein the w is the j when the degree ratio clustering value of the connecting edge (i, j) takes the maximum value max { sim _ P (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the node with the highest degree ratio similarity corresponding to all the nodes;
step seven: randomly selecting a node i in the network, and calculating the PageRank centrality clustering value of the connecting edge (i, j)
Figure FDA0003179510300000013
The value represents the PageRank similarity of the nodes i and j, the network is traversed, and the PageRank centrality clustering values of all the nodes and the edges of the nodes in the network are calculated and recorded;
step eight: randomly selecting a node i in the network, finding out a node m with the highest PageRank similarity, wherein m is the j when the PageRank centrality clustering value of the connecting edge (i, j) takes the maximum value max { sim _ PR (i, j) }, and j belongs to Γ (i), traversing the network, and finding out the nodes with the highest PageRank similarity corresponding to all the nodes;
step nine: randomly selecting a node i in the network, dividing a node j with the highest similarity to the degree ratio into the same community, traversing the network, and dividing all nodes into the same community with the highest similarity to the degree ratio to obtain initial community division;
step ten: randomly selecting a node i in the network, finding a point j with the highest PageRank similarity, and if the node i and the node j are not in the same community, combining the initial community where the node is located; and traversing the network, and combining each node with the community where the node with the highest PageRank similarity is located to obtain a final community structure.
CN201810987366.0A 2018-08-28 2018-08-28 Community detection method based on similarity Active CN109255433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810987366.0A CN109255433B (en) 2018-08-28 2018-08-28 Community detection method based on similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810987366.0A CN109255433B (en) 2018-08-28 2018-08-28 Community detection method based on similarity

Publications (2)

Publication Number Publication Date
CN109255433A CN109255433A (en) 2019-01-22
CN109255433B true CN109255433B (en) 2021-10-29

Family

ID=65050450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810987366.0A Active CN109255433B (en) 2018-08-28 2018-08-28 Community detection method based on similarity

Country Status (1)

Country Link
CN (1) CN109255433B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434437B (en) * 2020-12-02 2023-08-25 大连大学 Method for constructing equipment support super-network dynamic evolution model by considering node recombination
CN112699108A (en) * 2020-12-25 2021-04-23 中科恒运股份有限公司 Data reconstruction method and device for marital registration system and terminal equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301888A (en) * 2016-07-27 2017-01-04 西安电子科技大学 Based on core node and the network community division method of community's convergence strategy
CN106934722A (en) * 2017-02-24 2017-07-07 西安电子科技大学 Multi-objective community detection method based on k node updates Yu similarity matrix
CN108073944A (en) * 2017-10-18 2018-05-25 南京邮电大学 A kind of label based on local influence power propagates community discovery method
CN108229546A (en) * 2017-12-25 2018-06-29 浙江工业大学 A kind of overlapping corporations detection method of feature based vector center peak value cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301888A (en) * 2016-07-27 2017-01-04 西安电子科技大学 Based on core node and the network community division method of community's convergence strategy
CN106934722A (en) * 2017-02-24 2017-07-07 西安电子科技大学 Multi-objective community detection method based on k node updates Yu similarity matrix
CN108073944A (en) * 2017-10-18 2018-05-25 南京邮电大学 A kind of label based on local influence power propagates community discovery method
CN108229546A (en) * 2017-12-25 2018-06-29 浙江工业大学 A kind of overlapping corporations detection method of feature based vector center peak value cluster

Also Published As

Publication number Publication date
CN109255433A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
Chakraborty et al. On the categorization of scientific citation profiles in computer science
CN106326637A (en) Link predicting method based on local effective path degree
CN103020163A (en) Node-similarity-based network community division method in network
CN113422695B (en) Optimization method for improving robustness of topological structure of Internet of things
Zhang et al. A combinatorial model and algorithm for globally searching community structure in complex networks
CN109255433B (en) Community detection method based on similarity
CN112182306A (en) Uncertain graph-based community discovery method
Priya et al. Community Detection in Networks: A Comparative study
Jabbour et al. Triangle-driven community detection in large graphs using propositional satisfiability
Pan et al. Overlapping community detection via leader-based local expansion in social networks
CN112949748A (en) Dynamic network anomaly detection algorithm model based on graph neural network
Wang et al. A novel measure for influence nodes across complex networks based on node attraction
CN111711530A (en) Link prediction algorithm based on community topological structure information
Chai et al. A node-priority based large-scale overlapping community detection using evolutionary multi-objective optimization
Qiao et al. Improving stochastic block models by incorporating power-law degree characteristic
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
Yuan et al. A Multi‐Granularity Backbone Network Extraction Method Based on the Topology Potential
Al-Mukhtar et al. Greedy modularity graph clustering for community detection of large co-authorship network
Berton et al. The Impact of Network Sampling on Relational Classification.
CN108241669A (en) A kind of construction method and system of adaptive text feature cluster
CN105337759A (en) Internal and external ratio measurement method based on community structure, and community discovery method
Chen et al. Detecting overlapping community in complex network based on node similarity
El-Daghar et al. EGBTER: Capturing degree distribution, clustering coefficients, and community structure in a single random graph model
Zhao et al. Effects of link perturbation on network modularity for community detections in complex network systems
Wang et al. Hierarchical community detection in social networks based on micro-community and minimum spanning tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant