CN112165401A - Edge community discovery algorithm based on network pruning and local community expansion - Google Patents

Edge community discovery algorithm based on network pruning and local community expansion Download PDF

Info

Publication number
CN112165401A
CN112165401A CN202011040915.7A CN202011040915A CN112165401A CN 112165401 A CN112165401 A CN 112165401A CN 202011040915 A CN202011040915 A CN 202011040915A CN 112165401 A CN112165401 A CN 112165401A
Authority
CN
China
Prior art keywords
graph
community
nodes
node
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011040915.7A
Other languages
Chinese (zh)
Inventor
王贵参
王红梅
郭真俊
党源源
张丽杰
刘致华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Technology
Original Assignee
Changchun University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Technology filed Critical Changchun University of Technology
Priority to CN202011040915.7A priority Critical patent/CN112165401A/en
Publication of CN112165401A publication Critical patent/CN112165401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An edge community discovery algorithm based on graph pruning and local community expansion is an overlapping community discovery algorithm. Firstly, the edge attraction force in the graph is calculated, and the edge with the edge attraction force lower than the threshold value in the graph is deleted to obtain the pruned graph. And then, converting the pruning graph into a line graph, calculating a score matrix of nodes on the line graph by using a PageRank algorithm, selecting a seed node, expanding the community, and repeating the process until no candidate seed node exists in the network. And merging the node communities on the repeated line graph, and converting the node communities back to the overlapped community structure of the nodes on the original graph. Compared with the prior art, the invention provides a side community discovery algorithm based on graph pruning and local community expansion for overlapping community discovery, and has the following main advantages: (1) and (3) providing a side attraction concept, and utilizing the side attraction to carry out graph pruning so as to reduce the scale of the graph nodes. (2) And the local community discovery algorithm is used for carrying out community discovery on the nodes on the line graph, so that a better evaluation metric value can be achieved.

Description

Edge community discovery algorithm based on network pruning and local community expansion
Technical Field
The invention belongs to the field of complex networks, and particularly relates to a side community discovery algorithm based on network pruning and local community expansion.
Background
The overlapping community discovery algorithm is an important algorithm in the field of complex networks, and has important help for understanding the complex networks. In the past decade, scholars have proposed various overlapping community discovery methods, such as side community discovery methods, local community discovery methods, and the like.
Currently, a typical edge community discovery method framework determines the similarity degree between edges by calculating the similarity degree between the edges, and combines similar edges into a community by combining an unsupervised learning method. When the similarity degree of the edges is determined, the similarity degree of the edges is calculated, or the edges in the graph are converted into nodes in the graph through a line graph model, and the relationship between the nodes of the graph is calculated. However, since the size of an edge in the graph is usually larger than that of a node, the overhead is large when calculating the similarity of the edge, and in some cases, the edge is redundant and interferes with the calculation of the similarity of the edge.
The typical local community discovery method framework is to find seed nodes in a graph, then, start from the seed nodes, expand local communities, and finally, merge redundant communities. The local community discovery method is high in efficiency of finding community structures, but is less applied to edge community discovery.
Disclosure of Invention
Aiming at the two problems, the invention aims to provide a graph pruning strategy capable of reducing the scale of edges in a graph by combining the advantages of a local community discovery algorithm on the basis of the edge community discovery algorithm, and reduce the time complexity of node community discovery on an online graph by combining the local community discovery algorithm.
The invention provides a side community discovery algorithm based on network pruning and local community expansion, which comprises the following steps:
step 1: pruning a graph using a graph pruning strategy and converting the graph into a line graph
Step 2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
And step 3: on the line graph, candidate seed nodes are searched, local communities are expanded from the seed nodes, and the process is repeated until no candidate seed nodes exist in the line graph
And 4, step 4: and merging redundant graph node communities, and converting the graph node communities into overlapped node communities of the graph.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph of NMI values for different thresholds of an karate club network;
FIG. 3, FIG. 4 are diagrams of adjacency matrices, distance matrices on the bare-channel club network, respectively;
fig. 5 is a graph of NMI values for three algorithms for an karate club network.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention. Now, the present invention is described in further detail with reference to the accompanying drawings and embodiments, where the embodiments of the present invention are premised on that a complex network data set is obtained, and fig. 1 is a schematic flow chart of a link community discovery method combining network pruning and local community expansion, as shown in fig. 1, the present embodiment mainly includes the following steps:
step 1: pruning a graph using a graph pruning strategy and converting the graph into a line graph
Calculating the attraction force values of all edges in the graph, the attraction force formula is defined as follows:
Figure 100002_DEST_PATH_IMAGE001
wherein k (i) and k (j) are nodesiAndjdegree of (1), nodeiAndjis shown as a drawingGNodes, n (i) and n (j) being pointsiOf the neighboring node. And deleting redundant edges in the network according to the attraction of each edge to obtain a pruning graph.
Step 2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
And after the line graph is obtained, constructing a node adjacency matrix of the line graph, and calculating a score matrix of the line graph nodes by combining an Euclidean distance formula and a PageRank algorithm. The Euclidean distance formula is as follows:
Figure 33468DEST_PATH_IMAGE002
wherein:
Figure 62603DEST_PATH_IMAGE003
representing adjacency matrices
Figure 99830DEST_PATH_IMAGE004
Row a and column b.
And step 3: on the line graph, candidate seed nodes are searched, local communities are expanded from the seed nodes, and the process is repeated until no candidate seed nodes exist in the line graph
And sequencing the graph nodes according to the score matrix, taking the node with the first rank as a seed node, finding out the node with more than two public neighbors relative to the seed node, and adding the screened non-adjacent nodes and neighbor nodes into the community. This process is repeated until the community size is no longer increased. The nodes that join the community are deleted in the list. The above operations are repeated until there are no more new seed nodes.
And 4, step 4: merging redundant graph node communities and converting the graph node communities into overlapped node communities of the graph
Calculating the similarity between every two communities, wherein the calculation formula is as follows:
Figure 876899DEST_PATH_IMAGE005
wherein:
Figure 24984DEST_PATH_IMAGE006
and
Figure 705364DEST_PATH_IMAGE007
a community of nodes representing a graph. If the similarity is larger than 0.5, merging the two communities, and repeating the process after merging until the similarity between the communities is lower than 0.5 or the communities are merged into 1 community.
The above embodiments are only for illustrating the invention and not for limiting the same, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the invention, so that all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention should be defined by the claims.
Example 1 the invention was tested via a karate club network
The network we use is an empty track club network. The data set comprises 34 nodes and 78 edges and is used for scientific research tasks in the field of complex networks.
The method is applied to the data set for test verification, and the used evaluation index is Normalized Mutual Information (NMI).
In the karate club network, the adjustment of the pruning algorithm threshold is done, as shown in fig. 2, with different thresholds corresponding to different NMI values. The threshold corresponding to the maximum NMI value was chosen, the threshold being chosen to be 1.5.
From fig. 3 and 4, an adjacency matrix and a distance matrix of an edge community discovery algorithm based on network pruning and local community expansion can be seen. From fig. 5, it can be seen that the NMI of the edge community discovery algorithm based on network pruning and local community expansion is higher than other algorithms, which indicates that our algorithm finds communities in the karate network closer to the standard community partition.

Claims (5)

1. An edge community discovery algorithm based on network pruning and local community expansion is characterized by comprising the following steps:
step S1: using a graph pruning strategy to prune the graph, converting the graph into a line graph to calculate the attraction values of all edges in the graph, and deleting redundant edges in the network according to the attraction of each edge to obtain a pruning graph;
step S2: on the line graph, nodes are sorted by using a PageRank algorithm, and a scoring matrix of the nodes is obtained
After the line graph is obtained, a node adjacency matrix of the line graph is constructed, and a score matrix of the line graph nodes is calculated by combining an Euclidean distance formula and a PageRank algorithm;
step S3: on a line graph, candidate seed nodes are searched, a local community is expanded from the seed nodes, the process is repeated until no candidate seed nodes exist in the line graph, the line graph nodes are sorted according to a scoring matrix, the node with the first rank is used as the seed node, the node with more than two public neighbors is found, and the screened non-adjacent nodes and the screened neighbor nodes are added into the community; repeating the process until the community size is not increased any more; deleting the nodes added into the community in the list, and repeating the operation until no new seed nodes exist;
step S4: and merging redundant line graph node communities, converting the line graph node communities into overlapped node communities of the graph, merging the two communities if the similarity is greater than 0.5, and repeating the process after merging until the similarity between the communities is lower than 0.5 or the communities are merged into 1 community.
2. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein: the attractive force calculation formula of the edge in step S1 is:
Figure DEST_PATH_IMAGE001
wherein,
Figure 101559DEST_PATH_IMAGE002
and k (j) is a nodeiAndjdegree of (1), nodeiAndjis shown as a drawingGNodes, n (i) and n (j) being pointsiOf the neighboring node.
3. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein the european distance formula in step S2 is:
Figure 119193DEST_PATH_IMAGE003
wherein:
Figure 144918DEST_PATH_IMAGE004
representing adjacency matrices
Figure 113749DEST_PATH_IMAGE005
Row a and column b.
4. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein the local community expansion process in step S3 is as follows:
the graph nodes are sorted according to the scoring matrix, the node with the first rank is used as a seed node, the node with more than two public neighbors is found out, and the screened non-adjacent node and the screened neighbor node are added into the community; repeating the process until the community size is not increased any more; and deleting the nodes added into the community in the list, and repeating the operation until no new seed nodes exist.
5. The edge community discovery algorithm based on network pruning and local community expansion according to claim 1, wherein the similarity calculation formula between the communities calculated in the step S4 is as follows:
calculating the similarity between every two communities, wherein the calculation formula is as follows:
Figure 515912DEST_PATH_IMAGE006
wherein:
Figure 388053DEST_PATH_IMAGE007
and
Figure 584679DEST_PATH_IMAGE008
a community of nodes representing a graph.
CN202011040915.7A 2020-09-28 2020-09-28 Edge community discovery algorithm based on network pruning and local community expansion Pending CN112165401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011040915.7A CN112165401A (en) 2020-09-28 2020-09-28 Edge community discovery algorithm based on network pruning and local community expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011040915.7A CN112165401A (en) 2020-09-28 2020-09-28 Edge community discovery algorithm based on network pruning and local community expansion

Publications (1)

Publication Number Publication Date
CN112165401A true CN112165401A (en) 2021-01-01

Family

ID=73861961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011040915.7A Pending CN112165401A (en) 2020-09-28 2020-09-28 Edge community discovery algorithm based on network pruning and local community expansion

Country Status (1)

Country Link
CN (1) CN112165401A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554308A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 User community division and risk user identification method and device and electronic equipment
CN113762506A (en) * 2021-08-13 2021-12-07 中国电子科技集团公司第三十八研究所 Deep learning model pruning method and system
CN114741468A (en) * 2022-03-22 2022-07-12 平安科技(深圳)有限公司 Text duplicate removal method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554308A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 User community division and risk user identification method and device and electronic equipment
CN113554308B (en) * 2021-07-23 2024-05-28 中信银行股份有限公司 User community division and risk user identification method and device and electronic equipment
CN113762506A (en) * 2021-08-13 2021-12-07 中国电子科技集团公司第三十八研究所 Deep learning model pruning method and system
CN113762506B (en) * 2021-08-13 2023-11-24 中国电子科技集团公司第三十八研究所 Pruning method and system for computer vision deep learning model
CN114741468A (en) * 2022-03-22 2022-07-12 平安科技(深圳)有限公司 Text duplicate removal method, device, equipment and storage medium
CN114741468B (en) * 2022-03-22 2024-03-29 平安科技(深圳)有限公司 Text deduplication method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112165401A (en) Edge community discovery algorithm based on network pruning and local community expansion
CN109740541B (en) Pedestrian re-identification system and method
CN111539181B (en) Multi-strategy optimization X structure minimum tree construction method based on discrete differential evolution
CN102194149B (en) Community discovery method
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
CN108399268B (en) Incremental heterogeneous graph clustering method based on game theory
CN105978711B (en) A kind of best exchange side lookup method based on minimum spanning tree
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN110909173A (en) Non-overlapping community discovery method based on label propagation
CN110442618A (en) Merge convolutional neural networks evaluation expert's recommended method of expert info incidence relation
CN111667373B (en) Evolution community discovery method based on dynamic increment of neighbor subgraph social network
CN112949748A (en) Dynamic network anomaly detection algorithm model based on graph neural network
CN108470251B (en) Community division quality evaluation method and system based on average mutual information
CN106844533B (en) Data packet aggregation method and device
CN109800231B (en) Real-time co-movement motion mode detection method of track based on Flink
CN111861772A (en) Local structure-based density maximization overlapping community discovery method and system
CN109033746B (en) Protein compound identification method based on node vector
CN110807061A (en) Method for searching frequent subgraphs of uncertain graphs based on layering
CN105975532A (en) Query method based on iceberg vertex set in attribute graph
CN111369052B (en) Simplified road network KSP optimization algorithm
CN108897820A (en) A kind of parallel method of DENCLUE algorithm
CN113065073A (en) Method for searching effective path set of city
CN111709846A (en) Local community discovery algorithm based on line graph
CN114817653A (en) Unsupervised community discovery method based on central node graph convolutional network
CN112579831A (en) Network community discovery method and device based on SimRank global matrix smooth convergence and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210101