CN112165401A - Edge community discovery algorithm based on network pruning and local community expansion - Google Patents
Edge community discovery algorithm based on network pruning and local community expansion Download PDFInfo
- Publication number
- CN112165401A CN112165401A CN202011040915.7A CN202011040915A CN112165401A CN 112165401 A CN112165401 A CN 112165401A CN 202011040915 A CN202011040915 A CN 202011040915A CN 112165401 A CN112165401 A CN 112165401A
- Authority
- CN
- China
- Prior art keywords
- graph
- community
- nodes
- node
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013138 pruning Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 abstract description 2
- 235000005156 Brassica carinata Nutrition 0.000 description 5
- 244000257790 Brassica carinata Species 0.000 description 5
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
An edge community discovery algorithm based on graph pruning and local community expansion is an overlapping community discovery algorithm. Firstly, the edge attraction force in the graph is calculated, and the edge with the edge attraction force lower than the threshold value in the graph is deleted to obtain the pruned graph. And then, converting the pruning graph into a line graph, calculating a score matrix of nodes on the line graph by using a PageRank algorithm, selecting a seed node, expanding the community, and repeating the process until no candidate seed node exists in the network. And merging the node communities on the repeated line graph, and converting the node communities back to the overlapped community structure of the nodes on the original graph. Compared with the prior art, the invention provides a side community discovery algorithm based on graph pruning and local community expansion for overlapping community discovery, and has the following main advantages: (1) and (3) providing a side attraction concept, and utilizing the side attraction to carry out graph pruning so as to reduce the scale of the graph nodes. (2) And the local community discovery algorithm is used for carrying out community discovery on the nodes on the line graph, so that a better evaluation metric value can be achieved.
Description
Technical Field
The invention belongs to the field of complex networks, and particularly relates to a side community discovery algorithm based on network pruning and local community expansion.
Background
The overlapping community discovery algorithm is an important algorithm in the field of complex networks, and has important help for understanding the complex networks. In the past decade, scholars have proposed various overlapping community discovery methods, such as side community discovery methods, local community discovery methods, and the like.
Currently, a typical edge community discovery method framework determines the similarity degree between edges by calculating the similarity degree between the edges, and combines similar edges into a community by combining an unsupervised learning method. When the similarity degree of the edges is determined, the similarity degree of the edges is calculated, or the edges in the graph are converted into nodes in the graph through a line graph model, and the relationship between the nodes of the graph is calculated. However, since the size of an edge in the graph is usually larger than that of a node, the overhead is large when calculating the similarity of the edge, and in some cases, the edge is redundant and interferes with the calculation of the similarity of the edge.
The typical local community discovery method framework is to find seed nodes in a graph, then, start from the seed nodes, expand local communities, and finally, merge redundant communities. The local community discovery method is high in efficiency of finding community structures, but is less applied to edge community discovery.
Disclosure of Invention
Aiming at the two problems, the invention aims to provide a graph pruning strategy capable of reducing the scale of edges in a graph by combining the advantages of a local community discovery algorithm on the basis of the edge community discovery algorithm, and reduce the time complexity of node community discovery on an online graph by combining the local community discovery algorithm.
The invention provides a side community discovery algorithm based on network pruning and local community expansion, which comprises the following steps:
step 1: pruning a graph using a graph pruning strategy and converting the graph into a line graph
Step 2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
And step 3: on the line graph, candidate seed nodes are searched, local communities are expanded from the seed nodes, and the process is repeated until no candidate seed nodes exist in the line graph
And 4, step 4: and merging redundant graph node communities, and converting the graph node communities into overlapped node communities of the graph.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph of NMI values for different thresholds of an karate club network;
FIG. 3, FIG. 4 are diagrams of adjacency matrices, distance matrices on the bare-channel club network, respectively;
fig. 5 is a graph of NMI values for three algorithms for an karate club network.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention. Now, the present invention is described in further detail with reference to the accompanying drawings and embodiments, where the embodiments of the present invention are premised on that a complex network data set is obtained, and fig. 1 is a schematic flow chart of a link community discovery method combining network pruning and local community expansion, as shown in fig. 1, the present embodiment mainly includes the following steps:
step 1: pruning a graph using a graph pruning strategy and converting the graph into a line graph
Calculating the attraction force values of all edges in the graph, the attraction force formula is defined as follows:
wherein k (i) and k (j) are nodesiAndjdegree of (1), nodeiAndjis shown as a drawingGNodes, n (i) and n (j) being pointsiOf the neighboring node. And deleting redundant edges in the network according to the attraction of each edge to obtain a pruning graph.
Step 2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
And after the line graph is obtained, constructing a node adjacency matrix of the line graph, and calculating a score matrix of the line graph nodes by combining an Euclidean distance formula and a PageRank algorithm. The Euclidean distance formula is as follows:
And step 3: on the line graph, candidate seed nodes are searched, local communities are expanded from the seed nodes, and the process is repeated until no candidate seed nodes exist in the line graph
And sequencing the graph nodes according to the score matrix, taking the node with the first rank as a seed node, finding out the node with more than two public neighbors relative to the seed node, and adding the screened non-adjacent nodes and neighbor nodes into the community. This process is repeated until the community size is no longer increased. The nodes that join the community are deleted in the list. The above operations are repeated until there are no more new seed nodes.
And 4, step 4: merging redundant graph node communities and converting the graph node communities into overlapped node communities of the graph
Calculating the similarity between every two communities, wherein the calculation formula is as follows:
wherein:anda community of nodes representing a graph. If the similarity is larger than 0.5, merging the two communities, and repeating the process after merging until the similarity between the communities is lower than 0.5 or the communities are merged into 1 community.
The above embodiments are only for illustrating the invention and not for limiting the same, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the invention, so that all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention should be defined by the claims.
Example 1 the invention was tested via a karate club network
The network we use is an empty track club network. The data set comprises 34 nodes and 78 edges and is used for scientific research tasks in the field of complex networks.
The method is applied to the data set for test verification, and the used evaluation index is Normalized Mutual Information (NMI).
In the karate club network, the adjustment of the pruning algorithm threshold is done, as shown in fig. 2, with different thresholds corresponding to different NMI values. The threshold corresponding to the maximum NMI value was chosen, the threshold being chosen to be 1.5.
From fig. 3 and 4, an adjacency matrix and a distance matrix of an edge community discovery algorithm based on network pruning and local community expansion can be seen. From fig. 5, it can be seen that the NMI of the edge community discovery algorithm based on network pruning and local community expansion is higher than other algorithms, which indicates that our algorithm finds communities in the karate network closer to the standard community partition.
Claims (5)
1. An edge community discovery algorithm based on network pruning and local community expansion is characterized by comprising the following steps:
step S1: using a graph pruning strategy to prune the graph, converting the graph into a line graph to calculate the attraction values of all edges in the graph, and deleting redundant edges in the network according to the attraction of each edge to obtain a pruning graph;
step S2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
After the line graph is obtained, a node adjacency matrix of the line graph is constructed, and a score matrix of the line graph nodes is calculated by combining an Euclidean distance formula and a PageRank algorithm;
step S3: on a line graph, candidate seed nodes are searched, a local community is expanded from the seed nodes, the process is repeated until no candidate seed nodes exist in the line graph, the line graph nodes are sorted according to a scoring matrix, the node with the first rank is used as the seed node, the node with more than two public neighbors is found, and the screened non-adjacent nodes and the screened neighbor nodes are added into the community; repeating the process until the community size is not increased any more; deleting the nodes added into the community in the list, and repeating the operation until no new seed nodes exist;
step S4: and merging redundant line graph node communities, converting the line graph node communities into overlapped node communities of the graph, merging the two communities if the similarity is greater than 0.5, and repeating the process after merging until the similarity between the communities is lower than 0.5 or the communities are merged into 1 community.
2. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein: the attractive force calculation formula of the edge in step S1 is:
4. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein the local community expansion process in step S3 is as follows:
the graph nodes are sorted according to the scoring matrix, the node with the first rank is used as a seed node, the node with more than two public neighbors is found out, and the screened non-adjacent node and the screened neighbor node are added into the community; repeating the process until the community size is not increased any more; and deleting the nodes added into the community in the list, and repeating the operation until no new seed nodes exist.
5. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein the similarity calculation formula between the communities calculated in the step S4 is as follows:
calculating the similarity between every two communities, wherein the calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011040915.7A CN112165401A (en) | 2020-09-28 | 2020-09-28 | Edge community discovery algorithm based on network pruning and local community expansion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011040915.7A CN112165401A (en) | 2020-09-28 | 2020-09-28 | Edge community discovery algorithm based on network pruning and local community expansion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112165401A true CN112165401A (en) | 2021-01-01 |
Family
ID=73861961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011040915.7A Pending CN112165401A (en) | 2020-09-28 | 2020-09-28 | Edge community discovery algorithm based on network pruning and local community expansion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112165401A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554308A (en) * | 2021-07-23 | 2021-10-26 | 中信银行股份有限公司 | User community division and risk user identification method and device and electronic equipment |
CN113762506A (en) * | 2021-08-13 | 2021-12-07 | 中国电子科技集团公司第三十八研究所 | Deep learning model pruning method and system |
CN114741468A (en) * | 2022-03-22 | 2022-07-12 | 平安科技(深圳)有限公司 | Text duplicate removal method, device, equipment and storage medium |
-
2020
- 2020-09-28 CN CN202011040915.7A patent/CN112165401A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554308A (en) * | 2021-07-23 | 2021-10-26 | 中信银行股份有限公司 | User community division and risk user identification method and device and electronic equipment |
CN113554308B (en) * | 2021-07-23 | 2024-05-28 | 中信银行股份有限公司 | User community division and risk user identification method and device and electronic equipment |
CN113762506A (en) * | 2021-08-13 | 2021-12-07 | 中国电子科技集团公司第三十八研究所 | Deep learning model pruning method and system |
CN113762506B (en) * | 2021-08-13 | 2023-11-24 | 中国电子科技集团公司第三十八研究所 | Pruning method and system for computer vision deep learning model |
CN114741468A (en) * | 2022-03-22 | 2022-07-12 | 平安科技(深圳)有限公司 | Text duplicate removal method, device, equipment and storage medium |
CN114741468B (en) * | 2022-03-22 | 2024-03-29 | 平安科技(深圳)有限公司 | Text deduplication method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112165401A (en) | Edge community discovery algorithm based on network pruning and local community expansion | |
CN109740541B (en) | Pedestrian re-identification system and method | |
CN111539181B (en) | Multi-strategy optimization X structure minimum tree construction method based on discrete differential evolution | |
CN102194149B (en) | Community discovery method | |
CN110719106B (en) | Social network graph compression method and system based on node classification and sorting | |
CN108399268B (en) | Incremental heterogeneous graph clustering method based on game theory | |
CN105978711B (en) | A kind of best exchange side lookup method based on minimum spanning tree | |
US20220005546A1 (en) | Non-redundant gene set clustering method and system, and electronic device | |
CN110909173A (en) | Non-overlapping community discovery method based on label propagation | |
CN110442618A (en) | Merge convolutional neural networks evaluation expert's recommended method of expert info incidence relation | |
CN111667373B (en) | Evolution community discovery method based on dynamic increment of neighbor subgraph social network | |
CN112949748A (en) | Dynamic network anomaly detection algorithm model based on graph neural network | |
CN108470251B (en) | Community division quality evaluation method and system based on average mutual information | |
CN106844533B (en) | Data packet aggregation method and device | |
CN109800231B (en) | Real-time co-movement motion mode detection method of track based on Flink | |
CN111861772A (en) | Local structure-based density maximization overlapping community discovery method and system | |
CN109033746B (en) | Protein compound identification method based on node vector | |
CN110807061A (en) | Method for searching frequent subgraphs of uncertain graphs based on layering | |
CN105975532A (en) | Query method based on iceberg vertex set in attribute graph | |
CN111369052B (en) | Simplified road network KSP optimization algorithm | |
CN108897820A (en) | A kind of parallel method of DENCLUE algorithm | |
CN113065073A (en) | Method for searching effective path set of city | |
CN111709846A (en) | Local community discovery algorithm based on line graph | |
CN114817653A (en) | Unsupervised community discovery method based on central node graph convolutional network | |
CN112579831A (en) | Network community discovery method and device based on SimRank global matrix smooth convergence and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210101 |