CN115878861B - Selection method for integrated key node group aiming at graph data completion - Google Patents

Selection method for integrated key node group aiming at graph data completion Download PDF

Info

Publication number
CN115878861B
CN115878861B CN202310074880.6A CN202310074880A CN115878861B CN 115878861 B CN115878861 B CN 115878861B CN 202310074880 A CN202310074880 A CN 202310074880A CN 115878861 B CN115878861 B CN 115878861B
Authority
CN
China
Prior art keywords
node
nodes
key
network
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310074880.6A
Other languages
Chinese (zh)
Other versions
CN115878861A (en
Inventor
付嘉乐
姜小丫
康明与
陈都鑫
虞文武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202310074880.6A priority Critical patent/CN115878861B/en
Publication of CN115878861A publication Critical patent/CN115878861A/en
Application granted granted Critical
Publication of CN115878861B publication Critical patent/CN115878861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for selecting an integrated key node group aiming at graph data completion, which comprises four modules: the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes. The effective results of the invention are as follows: aiming at the graph data complement problem, a group of key nodes can be identified, and the neighborhood-based, path-based and iteration-based multi-angle methods are integrated, so that the method has good effects on networks of different categories.

Description

Selection method for integrated key node group aiming at graph data completion
Technical Field
The invention relates to a method, in particular to a method for selecting an integrated key node group aiming at graph data completion, and belongs to the technical field of network key node identification and deep learning.
Background
The graph data complement technology refers to partial information based on the graph, and combines historical data and all information of the topological structure complement graph of the graph. The technology can make us unnecessary to know all information of the network and analyze and study the properties of the network. Thus, this technique can significantly reduce the cost of analyzing large complex networks. Today, this technology has played a great role in various fields including electricity, traffic, biology, chemistry, economy, society, and the like. The selection of the nodes for completion directly affects the effect of graph data completion.
Taking a traffic and transportation network as an example, we often consider a road network as a network, and analyze the traffic and transportation network based on real-time traffic flow at intersections and the topology of the road network. However, the number of important traffic intersections in a city often ranges from thousands to tens of thousands, and the states of the same intersection can be completely different, and if traffic flow information of each intersection is detected in real time, a great amount of cost is continuously introduced. Thus, one effective solution is: only a part of key nodes are monitored, and the rest node data is complemented by combining a complex network and a deep learning method. However, how to select key nodes for graph data completion becomes a problem to be solved.
However, the existing identification method for the key nodes of the complex network has the following disadvantages in the aspect of the problem:
1. the existing methods are mainly aimed at network propagation angles, network control angles and the like, for example, search for which nodes in a power grid can be damaged to make the loss of the power grid most serious, judge which people to push based on a social network to maximize the benefits of an advertising company, influence which nodes can make the network reach a given state most quickly, and the like, so that a selection method of key nodes aiming at the graph data complement problem is lacking;
2. most of the existing methods are aimed at static networks, and are often based on the structural characteristics of the networks, but do not pay attention to the dynamic properties and evolution rules of the networks;
3. most of the existing methods focus on ranking the importance of nodes, i.e. more focus on the selection of a single node, but lack a method for selecting multiple nodes simultaneously;
4. existing key node identification methods lack efficient integration. The current key node identification methods are based on different problems and angles, and the effects obtained on different types of graphs are often different. In addition, due to the complexity of the structure of the complex network itself and the diversity of the types, it is almost impossible to find an evaluation index applicable to all kinds of graphs.
Therefore, the invention designs a selection method of an integrated key node group aiming at graph data completion, integrates the existing key node identification method aiming at the graph data completion problem, and provides a method for selecting a group of key nodes, which can capture the dynamic characteristics of a network and has good effects on different types of graphs.
Disclosure of Invention
The invention provides a selection method of an integrated key node group for graph data completion, which integrates a plurality of key node identification methods with different angles, can obtain better effects on different types of graphs, considers the influence of dynamic characteristics of a network and selected nodes on subsequent selection, and has better effects compared with the structure characteristics of a static network only and a method for picking a plurality of previous names through node sequencing.
In order to achieve the above object, the technical scheme of the present invention is as follows, a method for selecting an integrated key node group for graph data completion, the method comprising the following steps:
step 1: acquiring topology information, acquiring data on each node in the network,
step 2: inputting the topology information obtained in the step 1 into a data preparation module,
step 3: inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module,
step 4: the key node input graph rolling network test module is obtained, the test result of each group of key nodes is output,
step 5: after the completion effects of different key nodes are calculated, the obtained key nodes and the mean square error thereof are input into a judging and outputting module, and a group of key nodes with the best completion effects are selected as a final result and output.
As an improvement of the present invention, step 1: the method comprises the steps of obtaining topology information, firstly, abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.
As an improvement of the invention, the step 2 is specifically as follows, the topology information obtained in the step 1 is input into a data preparation module to obtain an adjacent matrix thereof
Figure SMS_1
,/>
Figure SMS_2
Element->
Figure SMS_3
The definition herein is as follows:
Figure SMS_4
wherein ,
Figure SMS_5
representing adjacency matrix->
Figure SMS_6
Element of (a)>
Figure SMS_7
Respectively representing nodes in the network.
As an improvement of the invention, step 3 is specifically as follows, the adjacency matrix obtained in step 2 is input into a multi-angle key node identification module, and the module is divided into three sub-modules: the method comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, the judging flow based on each evaluating index is shown in a figure 2, and the specific flow is as follows:
step 3.1, inputting an adjacency matrix of a traffic network and the number of key nodes to be identified;
step 3.2, selecting the most important node in the network based on a corresponding key node identification method (such as degree centrality) according to the input adjacency matrix;
step 3.3. Deleting selected nodes and connected edges thereof from the original network;
step 3.4, inputting a new adjacency matrix, and subtracting 1 from the original number of key nodes;
step 3.5, repeating the steps 3.2 to 3.4 until the number of the key nodes to be selected is 0;
and 3.6. Outputting all the selected key node numbers.
As an improvement of the invention, the graph rolling network test module in the step 4 mainly uses the graph rolling network to make the key nodes identified in the previous stage perform graph data complement test, and outputs the test result, and the graph data complement method specifically comprises the following steps:
step 4.1: inputting real graph data information and key node numbers judged in the previous stage;
step 4.2: according to the input information, deleting the information of other nodes except the key node;
step 4.3: firstly, making the data pass through a linear layer, and carrying out graph data pre-complement;
the formula for graph data pre-completion is as follows:
Figure SMS_8
wherein ,
Figure SMS_11
is the graph data information after deleting part of the nodes, is +.>
Figure SMS_12
A dimension column vector, wherein->
Figure SMS_15
For the number of the nodes, the number of the nodes is,
Figure SMS_10
for the number of key nodes, < >>
Figure SMS_13
Is->
Figure SMS_16
Weight matrix of>
Figure SMS_17
Paranoid item of one->
Figure SMS_9
Validly set vector (L)>
Figure SMS_14
Data after pre-completion;
step 4.4: inputting the result obtained by the pre-complement into a graph convolution layer, and performing graph convolution operation;
the formula of the graph convolution operation is as follows:
Figure SMS_18
wherein ,
Figure SMS_19
indicate->
Figure SMS_20
The picture data information of the layer is one +.>
Figure SMS_21
Validly set vector (L)>
Figure SMS_22
Is->
Figure SMS_23
Is used for the weight matrix of the (c),
Figure SMS_24
is an adjacency matrix->
Figure SMS_25
Is a bias term;
step 4.5: inputting the result obtained by the graph convolution layer into a full connection layer, and outputting a graph data complement result;
the formula of the full link layer is as follows:
Figure SMS_26
wherein ,
Figure SMS_27
indicate->
Figure SMS_28
The picture data information of the layer is one +.>
Figure SMS_29
Validly set vector (L)>
Figure SMS_30
Is->
Figure SMS_31
Is used for the weight matrix of the (c),
Figure SMS_32
is a bias term;
step 4.6: comparing the result obtained by the completion with the original real information to obtain the graph data completion effect,
here take mean square error
Figure SMS_33
The calculation method is as follows:
Figure SMS_34
wherein ,
Figure SMS_35
complement values for map data,/->
Figure SMS_36
For the true value of the graph data, < >>
Figure SMS_37
For the number of nodes->
Figure SMS_38
The smaller the mean square error is, the better the graph data complement effect is for the key node number.
As an improvement of the present invention, step 5 is specifically as follows: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,
let 5 groups of key nodes obtained by the multi-angle key node identification module be respectively
Figure SMS_39
The +.A. obtained by the graph convolution test module>
Figure SMS_40
The results are +.>
Figure SMS_41
The final set of key nodes +.>
Figure SMS_42
The method comprises the following steps:
Figure SMS_43
as an improvement of the invention, step 3.2. Based on the corresponding key node identification method, the specific steps are as follows: including 5 different center metrics:
(1) Centering: the centrality describes the number of neighbors of a node, and the calculation formula is as follows:
Figure SMS_44
wherein ,
Figure SMS_45
representing node->
Figure SMS_46
Center of degree (F),>
Figure SMS_47
for the total number of nodes, +.>
Figure SMS_48
The greater the centrality is for the elements in the adjacency matrix, the more important the node is considered;
(2)
Figure SMS_49
and (3) decomposition: />
Figure SMS_50
The decomposition may be used to describe the location of a node in the network, and the specific calculation method is as follows:
firstly, deleting all nodes with the degree of 1 and corresponding edges thereof in a network, and then deleting all nodes with the degree of 1 and corresponding edges thereof in a new network again, and repeating the steps until no edges with the degree of 1 exist in the network;
the set of deleted nodes is referred to as a 1-shell, and the remaining nodes are referred to as 1-cores;
and so on until all nodes in the network are deleted, obtaining a k-shell and a k-core, wherein each node and a node uniquely belong to a certain k-shell, and the larger k is, the more important the node is considered;
(3) Near centrality: proximity centrality is used to describe the average distance of a node from all other nodes in the network, and is calculated as follows:
Figure SMS_51
wherein ,
Figure SMS_53
node->
Figure SMS_56
Near centrality of->
Figure SMS_59
For the total number of nodes in the network, +.>
Figure SMS_54
Representing node->
Figure SMS_57
And node->
Figure SMS_60
Is defined as the distance between the slave nodes +.>
Figure SMS_62
To node->
Figure SMS_52
Wherein if node +.>
Figure SMS_55
And node->
Figure SMS_58
And not communicated with each other, the following is considered: />
Figure SMS_61
At this time:
Figure SMS_63
the greater the proximity centrality, the more important the node is considered;
(4) Intermediate centrality: the mediating center is used to describe how many shortest paths a node is on to the nodes, and the calculation formula is as follows:
Figure SMS_64
wherein ,
Figure SMS_66
representing node->
Figure SMS_68
Middle centrality of->
Figure SMS_71
Representing node->
Figure SMS_67
And node->
Figure SMS_69
The number of shortest paths between->
Figure SMS_72
Representing node->
Figure SMS_73
And node->
Figure SMS_65
The shortest path between them is passed through the node +.>
Figure SMS_70
The greater the intermediacy of a node, the more important that node is considered to be,
(5) Feature vector centrality: the feature vector centrality assumes that the influence of a node is determined not only by the number of its neighbors, but also by the influence of each neighbor, and that the centrality of a node is proportional to the sum of the centralities of the nodes to which it is connected, then there are:
Figure SMS_74
wherein ,
Figure SMS_75
to represent the feature vector centrality of each node, < >>
Figure SMS_76
Is an adjacency matrix->
Figure SMS_77
Is a constant, if the above formula is true, +.>
Figure SMS_78
Is a matrix->
Figure SMS_79
And characteristic value->
Figure SMS_80
Corresponding feature vector, and method for calculating centrality of feature vectorIs given as initial value +.>
Figure SMS_81
Then the following iterative algorithm is used:
Figure SMS_82
,
wherein ,
Figure SMS_83
the greater the feature vector centrality of a node, the more important that node is considered. />
As an improvement of the present invention, four modules are included: the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes.
Step 3.3 is specifically as follows: after selecting a key node from the original network, a new adjacency matrix is obtained by
Figure SMS_84
Figure SMS_85
wherein ,
Figure SMS_86
for matrix->
Figure SMS_87
First->
Figure SMS_88
Line->
Figure SMS_89
Column element->
Figure SMS_90
Representing the number of the selected key node in the original network. The data preparation module not only utilizes static topology information of the network, but also utilizes real network data, and focuses on dynamic characteristics and evolution rules of the network.
Node
Figure SMS_91
And node->
Figure SMS_92
Shortest path +.>
Figure SMS_93
The calculation method of (2) is as follows:
first, find a node
Figure SMS_95
Is called node +.>
Figure SMS_98
If include node +.>
Figure SMS_101
Then->
Figure SMS_96
Otherwise, find node ++>
Figure SMS_99
The set of first-order neighbors of all first-order neighbors of (excluding the previously selected node), called node +.>
Figure SMS_102
If include node +.>
Figure SMS_104
Then->
Figure SMS_94
Otherwise, find node ++>
Figure SMS_97
First order neighbor … … of the second order neighbors of (a) and so on until node +.>
Figure SMS_100
Until this time, it is possible to determine +.>
Figure SMS_103
Compared with the prior art, the method has the advantages that 1, the existing key node selection method mainly aims at network propagation angles, network control angles and the like, but lacks of the key node selection method aiming at graph data completion, and the method is used for comparing and obtaining a group of key node identification methods aiming at graph data completion by utilizing a graph convolution network from the view of graph data completion;
2. most of the existing key node identification methods are aimed at static networks, are often based on the structural characteristics of the networks, and do not pay attention to the dynamic properties and evolution rules of the networks;
3. most existing approaches focus mostly on ordering the importance of nodes, i.e., more on the selection of individual nodes. In reality, however, if only the top in node ordering is selected
Figure SMS_105
The individual nodes are often not selected +.>
Figure SMS_106
The invention utilizes an efficient system for selecting a set of important nodes in a network, preferably among the nodes;
4. the existing key node identification methods have respective departure angles and emphasis points, and one method often has good effects on a certain class of graphs, but can not necessarily obtain good effects on other classes. Taking the example of centrality, centrality is an easy-to-calculate and efficient evaluation index for evaluating node importance in a scaleless network, whereas centrality may not be a good evaluation index when evaluating other classes of networks, such as random networks. Thus, existing methods lack efficient integration and integration. The invention integrates some representative key node identification methods before, and provides a method for selecting a group of key nodes.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of a set of key node selection processes according to the present invention;
FIG. 3 is an exploded view of a k-shell;
FIG. 4 is a flowchart of the method for rolling network full graph data and testing effects according to the present invention.
In the figure: 1 is a 1-shell, 2 is a 2-shell, and 3 is a 3-shell.
Detailed Description
In order to enhance the understanding of the present invention, the present embodiment will be described in detail with reference to the accompanying drawings.
Example 1: referring to fig. 1-4, a method for selecting an integrated key node group for graph data completion, the method comprising the steps of:
step 1: the topology information is acquired and the topology information is acquired,
step 2: inputting the topology information obtained in the step 1 into a data preparation module,
step 3: inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module,
step 4: the key node input graph rolling network test module is obtained, the test result of each group of key nodes is output,
step 5: after the completion effects of different key nodes are calculated, the obtained key nodes and the mean square error thereof are input into a judging and outputting module, and a group of key nodes with the best completion effects are selected as a final result and output.
Step 1: the method comprises the steps of obtaining topology information, firstly, abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.
Step 2 is specifically as follows, the topology information obtained in step 1 is input into a data preparation module to obtain an adjacent matrix thereof
Figure SMS_107
Figure SMS_108
Element->
Figure SMS_109
The definition herein is as follows:
Figure SMS_110
wherein ,
Figure SMS_111
representing adjacency matrix->
Figure SMS_112
Element of (a)>
Figure SMS_113
Respectively representing nodes in the network.
Step 3, inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module, wherein the module is divided into three sub-modules: the method comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, the judging flow based on each evaluating index is shown in a figure 2, and the specific flow is as follows:
step 3.1, inputting an adjacency matrix of a traffic network and the number of key nodes to be identified;
step 3.2, selecting the most important node in the network based on a corresponding key node identification method (such as degree centrality) according to the input adjacency matrix;
step 3.2, based on a corresponding key node identification method, the method specifically comprises the following steps: including 5 different center metrics:
(1) Centering: the centrality describes the number of neighbors of a node, and the calculation formula is as follows:
Figure SMS_114
,/>
wherein ,
Figure SMS_115
representing node->
Figure SMS_116
Center of degree (F),>
Figure SMS_117
for the total number of nodes, +.>
Figure SMS_118
The greater the centrality is for the elements in the adjacency matrix, the more important the node is considered;
(2)
Figure SMS_119
and (3) decomposition: />
Figure SMS_120
The decomposition may be used to describe the location of a node in the network, and the specific calculation method is as follows:
firstly, deleting all nodes with the degree of 1 and corresponding edges thereof in a network, and then deleting all nodes with the degree of 1 and corresponding edges thereof in a new network again, and repeating the steps until no edges with the degree of 1 exist in the network;
the set of deleted nodes is referred to as a 1-shell, and the remaining nodes are referred to as 1-cores;
and so on until all nodes in the network are deleted, obtaining a k-shell and a k-core, wherein each node and a node uniquely belong to a certain k-shell, and the larger k is, the more important the node is considered;
(3) Near centrality: proximity centrality is used to describe the average distance of a node from all other nodes in the network, and is calculated as follows:
Figure SMS_121
wherein ,
Figure SMS_123
node->
Figure SMS_126
Near centrality of->
Figure SMS_129
For the total number of nodes in the network, +.>
Figure SMS_124
Representing node->
Figure SMS_127
And node->
Figure SMS_130
Is defined as the distance between the slave nodes +.>
Figure SMS_132
To node->
Figure SMS_122
Wherein if node +.>
Figure SMS_125
And node->
Figure SMS_128
And not communicated with each other, the following is considered: />
Figure SMS_131
At this time:
Figure SMS_133
the greater the proximity centrality, the more important the node is considered;
(4) Intermediate centrality: the mediating center is used to describe how many shortest paths a node is on to the nodes, and the calculation formula is as follows:
Figure SMS_134
wherein ,
Figure SMS_136
representing node->
Figure SMS_139
Middle centrality of->
Figure SMS_141
Representing node->
Figure SMS_137
And node->
Figure SMS_140
The number of shortest paths between->
Figure SMS_142
Representing node->
Figure SMS_143
And node->
Figure SMS_135
The shortest path between them is passed through the node +.>
Figure SMS_138
The greater the intermediacy of a node, the more important that node is considered to be,
(5) Feature vector centrality: the feature vector centrality assumes that the influence of a node is determined not only by the number of its neighbors, but also by the influence of each neighbor, and that the centrality of a node is proportional to the sum of the centralities of the nodes to which it is connected, then there are:
Figure SMS_144
wherein ,
Figure SMS_145
to represent the feature vector centrality of each node, < >>
Figure SMS_146
Is an adjacency matrix->
Figure SMS_147
Is a constant, if the above formula is true, +.>
Figure SMS_148
Is a matrix->
Figure SMS_149
And characteristic value->
Figure SMS_150
Corresponding feature vectors, the centrality of the feature vectors is calculated by giving the initial value +.>
Figure SMS_151
Then the following iterative algorithm is used:
Figure SMS_152
,/>
wherein ,
Figure SMS_153
the greater the feature vector centrality of a node, the more important that node is considered.
Step 3.3. Deleting selected nodes and connected edges thereof from the original network;
step 3.4, inputting a new adjacency matrix, and subtracting 1 from the original number of key nodes;
step 3.5, repeating the steps 3.2 to 3.4 until the number of the key nodes to be selected is 0;
and 3.6. Outputting all the selected key node numbers.
Step 4, as shown in fig. 4, inputting the obtained five groups of key nodes into a graph rolling network test module, and outputting a test result of each group of key nodes, wherein the specific steps are as follows:
step 4.1: inputting real graph data information and key node numbers judged in the previous stage;
step 4.2: according to the input information, deleting the information of other nodes except the key node;
step 4.3: firstly, making the data pass through a linear layer, and carrying out graph data pre-complement;
the formula for graph data pre-completion is as follows:
Figure SMS_154
wherein ,
Figure SMS_157
is the graph data information after deleting part of the nodes, is +.>
Figure SMS_159
A dimension column vector, wherein->
Figure SMS_162
For the number of the nodes, the number of the nodes is,
Figure SMS_156
for the number of key nodes, < >>
Figure SMS_158
Is->
Figure SMS_161
Weight matrix of>
Figure SMS_163
Paranoid item of one->
Figure SMS_155
Weili (vitamin column)Vector (S)>
Figure SMS_160
Data after pre-completion;
step 4.4: inputting the result obtained by the pre-complement into a graph convolution layer, and performing graph convolution operation;
the formula of the graph convolution operation is as follows:
Figure SMS_164
wherein ,
Figure SMS_165
indicate->
Figure SMS_166
The picture data information of the layer is one +.>
Figure SMS_167
Validly set vector (L)>
Figure SMS_168
Is->
Figure SMS_169
Is used for the weight matrix of the (c),
Figure SMS_170
is an adjacency matrix->
Figure SMS_171
Is a bias term;
step 4.5: inputting the result obtained by the graph convolution layer into a full connection layer, and outputting a graph data complement result;
the formula of the full link layer is as follows:
Figure SMS_172
wherein ,
Figure SMS_173
indicate->
Figure SMS_174
The picture data information of the layer is one +.>
Figure SMS_175
Validly set vector (L)>
Figure SMS_176
Is->
Figure SMS_177
Is used for the weight matrix of the (c),
Figure SMS_178
is a bias term;
step 4.6: comparing the result obtained by the completion with the original real information to obtain the graph data completion effect,
here take mean square error
Figure SMS_179
The calculation method is as follows:
Figure SMS_180
wherein ,
Figure SMS_181
complement values for map data,/->
Figure SMS_182
For the true value of the graph data, < >>
Figure SMS_183
For the number of nodes->
Figure SMS_184
The smaller the mean square error is, the better the graph data complement effect is for the key node number.
The step 5 is specifically as follows: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,
let 5 groups of key nodes obtained by the multi-angle key node identification module be respectively
Figure SMS_185
The +.A. obtained by the graph convolution test module>
Figure SMS_186
The results are +.>
Figure SMS_187
The final set of key nodes +.>
Figure SMS_188
The method comprises the following steps:
Figure SMS_189
it should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and equivalent changes or substitutions made on the basis of the above-mentioned technical solutions fall within the scope of the present invention as defined in the claims.

Claims (5)

1. A method for selecting an integrated key node group for graph data completion, the method comprising the steps of:
step 1: acquiring topology information, acquiring data on each node in the network,
step 2: inputting the topology information obtained in the step 1 into a data preparation module, specifically, inputting the topology information obtained in the step 1 into the data preparation module to obtain an adjacent matrix thereof
Figure QLYQS_1
,/>
Figure QLYQS_2
Element->
Figure QLYQS_3
The definition herein is as follows:
Figure QLYQS_4
wherein ,
Figure QLYQS_5
representing adjacency matrix->
Figure QLYQS_6
Element of (a)>
Figure QLYQS_7
Respectively representing nodes in the network;
step 3: inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module, wherein the adjacency matrix obtained in the step 2 is input into the multi-angle key node identification module and is divided into three sub-modules: the system comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, and the specific flow is as follows based on the judging flow of each evaluating index:
step 3.1, inputting an adjacency matrix of a traffic network and the number of key nodes to be identified;
step 3.2, selecting the most important node in the network based on a corresponding key node identification method according to the input adjacency matrix;
step 3.3. Deleting selected nodes and connected edges thereof from the original network;
step 3.4, inputting a new adjacency matrix, and subtracting 1 from the original number of key nodes;
step 3.5, repeating the steps 3.2 to 3.4 until the number of the key nodes to be selected is 0;
step 3.6. Outputting all the selected key node numbers,
step 3.2 is based on a corresponding key node identification method, and specifically comprises the following steps: including 5 different center metrics:
(1) Centering: the centrality describes the number of neighbors of a node, and the calculation formula is as follows:
Figure QLYQS_8
wherein ,
Figure QLYQS_9
representing node->
Figure QLYQS_10
Center of degree (F),>
Figure QLYQS_11
for the total number of nodes, +.>
Figure QLYQS_12
The greater the centrality is for the elements in the adjacency matrix, the more important the node is considered;
(2)
Figure QLYQS_13
and (3) decomposition: />
Figure QLYQS_14
The decomposition may be used to describe the location of a node in the network, and the specific calculation method is as follows:
firstly, deleting all nodes with the degree of 1 and corresponding edges thereof in a network, and then deleting all nodes with the degree of 1 and corresponding edges thereof in a new network again, and repeating the steps until no edges with the degree of 1 exist in the network;
the set of deleted nodes is referred to as a 1-shell, and the remaining nodes are referred to as 1-cores;
and so on until all nodes in the network are deleted, obtaining a k-shell and a k-core, wherein each node and a node uniquely belong to a certain k-shell, and the larger k is, the more important the node is considered;
(3) Near centrality: proximity centrality is used to describe the average distance of a node from all other nodes in the network, and is calculated as follows:
Figure QLYQS_15
wherein ,
Figure QLYQS_18
node->
Figure QLYQS_21
Near centrality of->
Figure QLYQS_26
For the total number of nodes in the network, +.>
Figure QLYQS_17
Representing node->
Figure QLYQS_20
And node
Figure QLYQS_23
Is defined as the distance between the slave nodes +.>
Figure QLYQS_24
To node->
Figure QLYQS_16
Wherein if node +.>
Figure QLYQS_19
And node->
Figure QLYQS_22
And not communicated with each other, the following is considered: />
Figure QLYQS_25
At this time:
Figure QLYQS_27
the greater the proximity centrality, the more important the node is considered;
(4) Intermediate centrality: the mediating center is used to describe how many shortest paths a node is on to the nodes, and the calculation formula is as follows:
Figure QLYQS_28
wherein ,
Figure QLYQS_30
representing node->
Figure QLYQS_33
Middle centrality of->
Figure QLYQS_35
Representing node->
Figure QLYQS_31
And node->
Figure QLYQS_34
The number of shortest paths between->
Figure QLYQS_36
Representing node->
Figure QLYQS_37
And node->
Figure QLYQS_29
The shortest path between them is passed through the node +.>
Figure QLYQS_32
Number of one nodeThe greater the mediating centricity, the more important the node is considered to be,
(5) Feature vector centrality: the feature vector centrality assumes that the influence of a node is determined not only by the number of its neighbors, but also by the influence of each neighbor, and that the centrality of a node is proportional to the sum of the centralities of the nodes to which it is connected, then there are:
Figure QLYQS_38
wherein ,
Figure QLYQS_41
to represent the feature vector centrality of each node, < >>
Figure QLYQS_42
Is an adjacency matrix->
Figure QLYQS_43
Is a constant if
Figure QLYQS_40
Hold true->
Figure QLYQS_44
Is a matrix->
Figure QLYQS_45
And characteristic value->
Figure QLYQS_46
Corresponding feature vectors, the centrality of the feature vectors is calculated by giving the initial value +.>
Figure QLYQS_39
Then the following iterative algorithm is used:
Figure QLYQS_47
,
wherein ,
Figure QLYQS_48
the greater the feature vector centrality of a node, the more important the node is considered;
step 3.3 is specifically as follows: after selecting a key node from the original network, a new adjacency matrix is obtained by
Figure QLYQS_49
Figure QLYQS_50
wherein ,
Figure QLYQS_51
for matrix->
Figure QLYQS_52
First->
Figure QLYQS_53
Line->
Figure QLYQS_54
Column element->
Figure QLYQS_55
A number representing a selected key node in the original network;
node
Figure QLYQS_56
And node->
Figure QLYQS_57
Shortest path +.>
Figure QLYQS_58
The calculation method of (2) is as follows:
first, findTo the node
Figure QLYQS_60
Is called node +.>
Figure QLYQS_63
If include node +.>
Figure QLYQS_66
Then->
Figure QLYQS_61
Otherwise, find node ++>
Figure QLYQS_62
Is called node +.>
Figure QLYQS_65
If include node +.>
Figure QLYQS_69
Then->
Figure QLYQS_59
Otherwise, find node ++>
Figure QLYQS_64
First order neighbor … … of the second order neighbors of (a) and so on until node +.>
Figure QLYQS_67
Until this time, it is possible to determine +.>
Figure QLYQS_68
Step 4: the key node input graph rolling network test module is obtained, the test result of each group of key nodes is output,
step 5: after the completion effects of different key nodes are calculated, the obtained key nodes and the mean square error thereof are input into a judging and outputting module, and a group of key nodes with the best completion effects are selected as a final result and output.
2. The method for selecting an integrated key node group for graph data completion according to claim 1, wherein step 1: the method comprises the steps of obtaining topology information, firstly abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.
3. The method for selecting an integrated key node group for graph data completion according to claim 2, wherein in step 4, the graph rolling network testing module mainly uses the graph rolling network to make the key node identified in the previous stage a graph data completion test, and outputs a test result, and the graph data completion method specifically comprises the following steps:
step 4.1: inputting real graph data information and key node numbers judged in the previous stage;
step 4.2: according to the input information, deleting the information of other nodes except the key node;
step 4.3: firstly, making the data pass through a linear layer, and carrying out graph data pre-complement;
the formula for graph data pre-completion is as follows:
Figure QLYQS_70
wherein ,
Figure QLYQS_73
is the graph data information after deleting part of the nodes, is +.>
Figure QLYQS_75
A dimension column vector, wherein->
Figure QLYQS_77
For the number of nodes->
Figure QLYQS_72
For the number of key nodes, < >>
Figure QLYQS_74
Is->
Figure QLYQS_78
Weight matrix of>
Figure QLYQS_79
Paranoid item of one->
Figure QLYQS_71
Validly set vector (L)>
Figure QLYQS_76
Data after pre-completion;
step 4.4: inputting the result obtained by the pre-complement into a graph convolution layer, and performing graph convolution operation;
the formula of the graph convolution operation is as follows:
Figure QLYQS_80
wherein ,
Figure QLYQS_81
indicate->
Figure QLYQS_82
The picture data information of the layer is one +.>
Figure QLYQS_83
Validly set vector (L)>
Figure QLYQS_84
Is->
Figure QLYQS_85
Weight matrix of>
Figure QLYQS_86
Is an adjacency matrix->
Figure QLYQS_87
Is a bias term;
step 4.5: inputting the result obtained by the graph convolution layer into a full connection layer, and outputting a graph data complement result;
the formula of the full link layer is as follows:
Figure QLYQS_88
wherein ,
Figure QLYQS_89
indicate->
Figure QLYQS_90
The picture data information of the layer is one +.>
Figure QLYQS_91
Validly set vector (L)>
Figure QLYQS_92
Is->
Figure QLYQS_93
Weight matrix of>
Figure QLYQS_94
Is a bias term;
step 4.6: comparing the result obtained by the completion with the original real information to obtain the graph data completion effect,
here take mean square error
Figure QLYQS_95
The calculation method is as follows:
Figure QLYQS_96
,/>
wherein ,
Figure QLYQS_97
complement values for map data,/->
Figure QLYQS_98
For the true value of the graph data, < >>
Figure QLYQS_99
For the number of nodes->
Figure QLYQS_100
The smaller the mean square error is, the better the graph data complement effect is for the key node number.
4. The method for selecting an integrated key node group for graph data completion of claim 3, wherein step 5 specifically comprises the following steps: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,
let 5 groups of key nodes obtained by the multi-angle key node identification module be respectively
Figure QLYQS_101
The +.A. obtained by the graph convolution test module>
Figure QLYQS_102
The results are +.>
Figure QLYQS_103
The final set of key nodes +.>
Figure QLYQS_104
The method comprises the following steps:
Figure QLYQS_105
5. the method for selecting an integrated key node group for graph data completion according to claim 4, wherein the method is implemented by the following four modules, specifically:
the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes.
CN202310074880.6A 2023-02-07 2023-02-07 Selection method for integrated key node group aiming at graph data completion Active CN115878861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310074880.6A CN115878861B (en) 2023-02-07 2023-02-07 Selection method for integrated key node group aiming at graph data completion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310074880.6A CN115878861B (en) 2023-02-07 2023-02-07 Selection method for integrated key node group aiming at graph data completion

Publications (2)

Publication Number Publication Date
CN115878861A CN115878861A (en) 2023-03-31
CN115878861B true CN115878861B (en) 2023-05-26

Family

ID=85760788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310074880.6A Active CN115878861B (en) 2023-02-07 2023-02-07 Selection method for integrated key node group aiming at graph data completion

Country Status (1)

Country Link
CN (1) CN115878861B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109285346B (en) * 2018-09-07 2020-05-05 北京航空航天大学 Urban road network traffic state prediction method based on key road sections
CN110135092A (en) * 2019-05-21 2019-08-16 江苏开放大学(江苏城市职业学院) Complicated weighting network of communication lines key node recognition methods based on half local center
CN113190654A (en) * 2021-05-08 2021-07-30 北京工业大学 Knowledge graph complementing method based on entity joint embedding and probability model
CN113205466B (en) * 2021-05-10 2024-04-02 南京航空航天大学 Incomplete point cloud completion method based on hidden space topological structure constraint
CN114066772A (en) * 2021-11-26 2022-02-18 南京理工大学 Tooth body point cloud completion method and system based on transform encoder
CN114897084A (en) * 2022-05-24 2022-08-12 河南工学院 Tower crane structure safety monitoring method based on graph convolution neural network
CN115391553B (en) * 2022-08-23 2023-10-13 西北工业大学 Method for automatically searching time sequence knowledge graph completion model

Also Published As

Publication number Publication date
CN115878861A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN106452825A (en) Power distribution and utilization communication network alarm correlation analysis method based on improved decision tree
CN105721228A (en) Method for importance evaluation of nodes of power telecommunication network based on fast density clustering
CN107169871B (en) Multi-relationship community discovery method based on relationship combination optimization and seed expansion
Zhu et al. Network inference from consensus dynamics with unknown parameters
CN101901251B (en) Method for analyzing and recognizing complex network cluster structure based on markov process metastability
CN110705045B (en) Link prediction method for constructing weighted network by utilizing network topology characteristics
Zhang et al. Detecting colocation flow patterns in the geographical interaction data
CN116416478B (en) Bioinformatics classification model based on graph structure data characteristics
Mohamed et al. A genetic algorithms to solve the bicriteria shortest path problem
Laassem et al. Label propagation algorithm for community detection based on Coulomb’s law
Chopade et al. Node attributes and edge structure for large-scale big data network analytics and community detection
Liu et al. Mean First-Passage Time and Robustness of Complex Cellular Mobile Communication Network
Pan et al. Overlapping community detection via leader-based local expansion in social networks
CN108470251B (en) Community division quality evaluation method and system based on average mutual information
CN115878861B (en) Selection method for integrated key node group aiming at graph data completion
CN111008730B (en) Crowd concentration prediction model construction method and device based on urban space structure
CN117036060A (en) Vehicle insurance fraud recognition method, device and storage medium
Wu et al. A new community detection algorithm based on distance centrality
CN103051476B (en) Topology analysis-based network community discovery method
Chen et al. Dynamic path flow estimation using automatic vehicle identification and probe vehicle trajectory data: A 3D convolutional neural network model
CN115564989A (en) Random forest algorithm for land use classification
Jian et al. CLOSE: Local community detection by local structure expansion in a complex network
Abdulkarim et al. Using social network analysis to study diversity in business partnerships
Kisgyorgy et al. Analysis and observation of road network topology
Bütün et al. A multi-objective genetic algorithm for community discovery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant