CN115878861B

CN115878861B - Selection method for integrated key node group aiming at graph data completion

Info

Publication number: CN115878861B
Application number: CN202310074880.6A
Authority: CN
Inventors: 付嘉乐; 姜小丫; 康明与; 陈都鑫; 虞文武
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-05-26
Anticipated expiration: 2043-02-07
Also published as: CN115878861A

Abstract

The invention relates to a method for selecting an integrated key node group aiming at graph data completion, which comprises four modules: the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes. The effective results of the invention are as follows: aiming at the graph data complement problem, a group of key nodes can be identified, and the neighborhood-based, path-based and iteration-based multi-angle methods are integrated, so that the method has good effects on networks of different categories.

Description

Selection method for integrated key node group aiming at graph data completion

Technical Field

The invention relates to a method, in particular to a method for selecting an integrated key node group aiming at graph data completion, and belongs to the technical field of network key node identification and deep learning.

Background

The graph data complement technology refers to partial information based on the graph, and combines historical data and all information of the topological structure complement graph of the graph. The technology can make us unnecessary to know all information of the network and analyze and study the properties of the network. Thus, this technique can significantly reduce the cost of analyzing large complex networks. Today, this technology has played a great role in various fields including electricity, traffic, biology, chemistry, economy, society, and the like. The selection of the nodes for completion directly affects the effect of graph data completion.

Taking a traffic and transportation network as an example, we often consider a road network as a network, and analyze the traffic and transportation network based on real-time traffic flow at intersections and the topology of the road network. However, the number of important traffic intersections in a city often ranges from thousands to tens of thousands, and the states of the same intersection can be completely different, and if traffic flow information of each intersection is detected in real time, a great amount of cost is continuously introduced. Thus, one effective solution is: only a part of key nodes are monitored, and the rest node data is complemented by combining a complex network and a deep learning method. However, how to select key nodes for graph data completion becomes a problem to be solved.

However, the existing identification method for the key nodes of the complex network has the following disadvantages in the aspect of the problem:

1. the existing methods are mainly aimed at network propagation angles, network control angles and the like, for example, search for which nodes in a power grid can be damaged to make the loss of the power grid most serious, judge which people to push based on a social network to maximize the benefits of an advertising company, influence which nodes can make the network reach a given state most quickly, and the like, so that a selection method of key nodes aiming at the graph data complement problem is lacking;

2. most of the existing methods are aimed at static networks, and are often based on the structural characteristics of the networks, but do not pay attention to the dynamic properties and evolution rules of the networks;

3. most of the existing methods focus on ranking the importance of nodes, i.e. more focus on the selection of a single node, but lack a method for selecting multiple nodes simultaneously;

4. existing key node identification methods lack efficient integration. The current key node identification methods are based on different problems and angles, and the effects obtained on different types of graphs are often different. In addition, due to the complexity of the structure of the complex network itself and the diversity of the types, it is almost impossible to find an evaluation index applicable to all kinds of graphs.

Therefore, the invention designs a selection method of an integrated key node group aiming at graph data completion, integrates the existing key node identification method aiming at the graph data completion problem, and provides a method for selecting a group of key nodes, which can capture the dynamic characteristics of a network and has good effects on different types of graphs.

Disclosure of Invention

The invention provides a selection method of an integrated key node group for graph data completion, which integrates a plurality of key node identification methods with different angles, can obtain better effects on different types of graphs, considers the influence of dynamic characteristics of a network and selected nodes on subsequent selection, and has better effects compared with the structure characteristics of a static network only and a method for picking a plurality of previous names through node sequencing.

In order to achieve the above object, the technical scheme of the present invention is as follows, a method for selecting an integrated key node group for graph data completion, the method comprising the following steps:

step 1: acquiring topology information, acquiring data on each node in the network,

step 2: inputting the topology information obtained in the step 1 into a data preparation module,

step 3: inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module,

step 4: the key node input graph rolling network test module is obtained, the test result of each group of key nodes is output,

step 5: after the completion effects of different key nodes are calculated, the obtained key nodes and the mean square error thereof are input into a judging and outputting module, and a group of key nodes with the best completion effects are selected as a final result and output.

As an improvement of the present invention, step 1: the method comprises the steps of obtaining topology information, firstly, abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.

As an improvement of the invention, the step 2 is specifically as follows, the topology information obtained in the step 1 is input into a data preparation module to obtain an adjacent matrix thereof

，/>

Element->

The definition herein is as follows:

；

wherein ,

representing adjacency matrix->

Element of (a)>

Respectively representing nodes in the network.

As an improvement of the invention, step 3 is specifically as follows, the adjacency matrix obtained in step 2 is input into a multi-angle key node identification module, and the module is divided into three sub-modules: the method comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, the judging flow based on each evaluating index is shown in a figure 2, and the specific flow is as follows:

step 3.1, inputting an adjacency matrix of a traffic network and the number of key nodes to be identified;

step 3.2, selecting the most important node in the network based on a corresponding key node identification method (such as degree centrality) according to the input adjacency matrix;

step 3.3. Deleting selected nodes and connected edges thereof from the original network;

step 3.4, inputting a new adjacency matrix, and subtracting 1 from the original number of key nodes;

step 3.5, repeating the steps 3.2 to 3.4 until the number of the key nodes to be selected is 0;

and 3.6. Outputting all the selected key node numbers.

As an improvement of the invention, the graph rolling network test module in the step 4 mainly uses the graph rolling network to make the key nodes identified in the previous stage perform graph data complement test, and outputs the test result, and the graph data complement method specifically comprises the following steps:

step 4.1: inputting real graph data information and key node numbers judged in the previous stage;

step 4.2: according to the input information, deleting the information of other nodes except the key node;

step 4.3: firstly, making the data pass through a linear layer, and carrying out graph data pre-complement;

the formula for graph data pre-completion is as follows:

，

wherein ,

is the graph data information after deleting part of the nodes, is +.>

A dimension column vector, wherein->

For the number of the nodes, the number of the nodes is,

for the number of key nodes, < >>

Is->

Weight matrix of>

Paranoid item of one->

Validly set vector (L)>

Data after pre-completion;

step 4.4: inputting the result obtained by the pre-complement into a graph convolution layer, and performing graph convolution operation;

the formula of the graph convolution operation is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Is used for the weight matrix of the (c),

is an adjacency matrix->

Is a bias term;

step 4.5: inputting the result obtained by the graph convolution layer into a full connection layer, and outputting a graph data complement result;

the formula of the full link layer is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Is used for the weight matrix of the (c),

is a bias term;

step 4.6: comparing the result obtained by the completion with the original real information to obtain the graph data completion effect,

here take mean square error

The calculation method is as follows:

，

wherein ,

complement values for map data,/->

For the true value of the graph data, < >>

For the number of nodes->

The smaller the mean square error is, the better the graph data complement effect is for the key node number.

As an improvement of the present invention, step 5 is specifically as follows: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,

let 5 groups of key nodes obtained by the multi-angle key node identification module be respectively

The +.A. obtained by the graph convolution test module>

The results are +.>

The final set of key nodes +.>

The method comprises the following steps:

。

as an improvement of the invention, step 3.2. Based on the corresponding key node identification method, the specific steps are as follows: including 5 different center metrics:

(1) Centering: the centrality describes the number of neighbors of a node, and the calculation formula is as follows:

，

wherein ,

representing node->

Center of degree (F),>

for the total number of nodes, +.>

The greater the centrality is for the elements in the adjacency matrix, the more important the node is considered;

（2）

and (3) decomposition: />

The decomposition may be used to describe the location of a node in the network, and the specific calculation method is as follows:

firstly, deleting all nodes with the degree of 1 and corresponding edges thereof in a network, and then deleting all nodes with the degree of 1 and corresponding edges thereof in a new network again, and repeating the steps until no edges with the degree of 1 exist in the network;

the set of deleted nodes is referred to as a 1-shell, and the remaining nodes are referred to as 1-cores;

and so on until all nodes in the network are deleted, obtaining a k-shell and a k-core, wherein each node and a node uniquely belong to a certain k-shell, and the larger k is, the more important the node is considered;

(3) Near centrality: proximity centrality is used to describe the average distance of a node from all other nodes in the network, and is calculated as follows:

，

wherein ,

node->

Near centrality of->

For the total number of nodes in the network, +.>

Representing node->

And node->

Is defined as the distance between the slave nodes +.>

To node->

Wherein if node +.>

And node->

And not communicated with each other, the following is considered: />

At this time:

；

the greater the proximity centrality, the more important the node is considered;

(4) Intermediate centrality: the mediating center is used to describe how many shortest paths a node is on to the nodes, and the calculation formula is as follows:

，

wherein ,

representing node->

Middle centrality of->

Representing node->

And node->

The number of shortest paths between->

Representing node->

And node->

The shortest path between them is passed through the node +.>

The greater the intermediacy of a node, the more important that node is considered to be,

(5) Feature vector centrality: the feature vector centrality assumes that the influence of a node is determined not only by the number of its neighbors, but also by the influence of each neighbor, and that the centrality of a node is proportional to the sum of the centralities of the nodes to which it is connected, then there are:

，

wherein ,

to represent the feature vector centrality of each node, < >>

Is an adjacency matrix->

Is a constant, if the above formula is true, +.>

Is a matrix->

And characteristic value->

Corresponding feature vector, and method for calculating centrality of feature vectorIs given as initial value +.>

Then the following iterative algorithm is used:

,

wherein ,

the greater the feature vector centrality of a node, the more important that node is considered. />

As an improvement of the present invention, four modules are included: the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes.

Step 3.3 is specifically as follows: after selecting a key node from the original network, a new adjacency matrix is obtained by

：

，

wherein ,

for matrix->

First->

Line->

Column element->

Representing the number of the selected key node in the original network. The data preparation module not only utilizes static topology information of the network, but also utilizes real network data, and focuses on dynamic characteristics and evolution rules of the network.

Node

And node->

Shortest path +.>

The calculation method of (2) is as follows:

first, find a node

Is called node +.>

If include node +.>

Then->

Otherwise, find node ++>

The set of first-order neighbors of all first-order neighbors of (excluding the previously selected node), called node +.>

If include node +.>

Then->

Otherwise, find node ++>

First order neighbor … … of the second order neighbors of (a) and so on until node +.>

Until this time, it is possible to determine +.>

。

Compared with the prior art, the method has the advantages that 1, the existing key node selection method mainly aims at network propagation angles, network control angles and the like, but lacks of the key node selection method aiming at graph data completion, and the method is used for comparing and obtaining a group of key node identification methods aiming at graph data completion by utilizing a graph convolution network from the view of graph data completion;

2. most of the existing key node identification methods are aimed at static networks, are often based on the structural characteristics of the networks, and do not pay attention to the dynamic properties and evolution rules of the networks;

3. most existing approaches focus mostly on ordering the importance of nodes, i.e., more on the selection of individual nodes. In reality, however, if only the top in node ordering is selected

The individual nodes are often not selected +.>

The invention utilizes an efficient system for selecting a set of important nodes in a network, preferably among the nodes;

4. the existing key node identification methods have respective departure angles and emphasis points, and one method often has good effects on a certain class of graphs, but can not necessarily obtain good effects on other classes. Taking the example of centrality, centrality is an easy-to-calculate and efficient evaluation index for evaluating node importance in a scaleless network, whereas centrality may not be a good evaluation index when evaluating other classes of networks, such as random networks. Thus, existing methods lack efficient integration and integration. The invention integrates some representative key node identification methods before, and provides a method for selecting a group of key nodes.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram of a set of key node selection processes according to the present invention;

FIG. 3 is an exploded view of a k-shell;

FIG. 4 is a flowchart of the method for rolling network full graph data and testing effects according to the present invention.

In the figure: 1 is a 1-shell, 2 is a 2-shell, and 3 is a 3-shell.

Detailed Description

In order to enhance the understanding of the present invention, the present embodiment will be described in detail with reference to the accompanying drawings.

Example 1: referring to fig. 1-4, a method for selecting an integrated key node group for graph data completion, the method comprising the steps of:

step 1: the topology information is acquired and the topology information is acquired,

Step 1: the method comprises the steps of obtaining topology information, firstly, abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.

Step 2 is specifically as follows, the topology information obtained in step 1 is input into a data preparation module to obtain an adjacent matrix thereof

，

Element->

The definition herein is as follows:

；

wherein ,

representing adjacency matrix->

Element of (a)>

Respectively representing nodes in the network.

Step 3, inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module, wherein the module is divided into three sub-modules: the method comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, the judging flow based on each evaluating index is shown in a figure 2, and the specific flow is as follows:

step 3.2, based on a corresponding key node identification method, the method specifically comprises the following steps: including 5 different center metrics:

，/>

wherein ,

representing node->

Center of degree (F),>

for the total number of nodes, +.>

（2）

and (3) decomposition: />

，

wherein ,

node->

Near centrality of->

For the total number of nodes in the network, +.>

Representing node->

And node->

Is defined as the distance between the slave nodes +.>

To node->

Wherein if node +.>

And node->

And not communicated with each other, the following is considered: />

At this time:

；

，

wherein ,

representing node->

Middle centrality of->

Representing node->

And node->

The number of shortest paths between->

Representing node->

And node->

The shortest path between them is passed through the node +.>

，

wherein ,

to represent the feature vector centrality of each node, < >>

Is an adjacency matrix->

Is a constant, if the above formula is true, +.>

Is a matrix->

And characteristic value->

Corresponding feature vectors, the centrality of the feature vectors is calculated by giving the initial value +.>

Then the following iterative algorithm is used:

,/>

wherein ,

the greater the feature vector centrality of a node, the more important that node is considered.

and 3.6. Outputting all the selected key node numbers.

Step 4, as shown in fig. 4, inputting the obtained five groups of key nodes into a graph rolling network test module, and outputting a test result of each group of key nodes, wherein the specific steps are as follows:

the formula for graph data pre-completion is as follows:

，

wherein ,

is the graph data information after deleting part of the nodes, is +.>

A dimension column vector, wherein->

For the number of the nodes, the number of the nodes is,

for the number of key nodes, < >>

Is->

Weight matrix of>

Paranoid item of one->

Weili (vitamin column)Vector (S)>

Data after pre-completion;

the formula of the graph convolution operation is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Is used for the weight matrix of the (c),

is an adjacency matrix->

Is a bias term;

the formula of the full link layer is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Is used for the weight matrix of the (c),

is a bias term;

here take mean square error

The calculation method is as follows:

，

wherein ,

complement values for map data,/->

For the true value of the graph data, < >>

For the number of nodes->

The step 5 is specifically as follows: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,

The +.A. obtained by the graph convolution test module>

The results are +.>

The final set of key nodes +.>

The method comprises the following steps:

。

it should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and equivalent changes or substitutions made on the basis of the above-mentioned technical solutions fall within the scope of the present invention as defined in the claims.

Claims

1. A method for selecting an integrated key node group for graph data completion, the method comprising the steps of:

step 2: inputting the topology information obtained in the step 1 into a data preparation module, specifically, inputting the topology information obtained in the step 1 into the data preparation module to obtain an adjacent matrix thereof

，/>

Element->

The definition herein is as follows:

wherein ,

representing adjacency matrix->

Element of (a)>

Respectively representing nodes in the network;

step 3: inputting the adjacency matrix obtained in the step 2 into a multi-angle key node identification module, wherein the adjacency matrix obtained in the step 2 is input into the multi-angle key node identification module and is divided into three sub-modules: the system comprises a neighborhood judging module, a path judging module and an iteration judging module, wherein each sub-module is based on 1-2 different judging indexes, and the specific flow is as follows based on the judging flow of each evaluating index:

step 3.2, selecting the most important node in the network based on a corresponding key node identification method according to the input adjacency matrix;

step 3.6. Outputting all the selected key node numbers,

step 3.2 is based on a corresponding key node identification method, and specifically comprises the following steps: including 5 different center metrics:

，

wherein ,

representing node->

Center of degree (F),>

for the total number of nodes, +.>

（2）

and (3) decomposition: />

，

wherein ,

node->

Near centrality of->

For the total number of nodes in the network, +.>

Representing node->

And node

Is defined as the distance between the slave nodes +.>

To node->

Wherein if node +.>

And node->

And not communicated with each other, the following is considered: />

At this time:

；

，

wherein ,

representing node->

Middle centrality of->

Representing node->

And node->

The number of shortest paths between->

Representing node->

And node->

The shortest path between them is passed through the node +.>

Number of one nodeThe greater the mediating centricity, the more important the node is considered to be,

，

wherein ,

to represent the feature vector centrality of each node, < >>

Is an adjacency matrix->

Is a constant if

Hold true->

Is a matrix->

And characteristic value->

Then the following iterative algorithm is used:

,

wherein ,

the greater the feature vector centrality of a node, the more important the node is considered;

：

，

wherein ,

for matrix->

First->

Line->

Column element->

A number representing a selected key node in the original network;

node

And node->

Shortest path +.>

The calculation method of (2) is as follows:

first, findTo the node

Is called node +.>

If include node +.>

Then->

Otherwise, find node ++>

Is called node +.>

If include node +.>

Then->

Otherwise, find node ++>

Until this time, it is possible to determine +.>

；

2. The method for selecting an integrated key node group for graph data completion according to claim 1, wherein step 1: the method comprises the steps of obtaining topology information, firstly abstracting an actual traffic network into a network, wherein each traffic intersection is regarded as a node in the network, a road section connecting two traffic intersections is regarded as an edge between the nodes, and the attribute value of each node represents flow information representing the corresponding traffic intersection in time.

3. The method for selecting an integrated key node group for graph data completion according to claim 2, wherein in step 4, the graph rolling network testing module mainly uses the graph rolling network to make the key node identified in the previous stage a graph data completion test, and outputs a test result, and the graph data completion method specifically comprises the following steps:

the formula for graph data pre-completion is as follows:

，

wherein ,

is the graph data information after deleting part of the nodes, is +.>

A dimension column vector, wherein->

For the number of nodes->

For the number of key nodes, < >>

Is->

Weight matrix of>

Paranoid item of one->

Validly set vector (L)>

Data after pre-completion;

the formula of the graph convolution operation is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Weight matrix of>

Is an adjacency matrix->

Is a bias term;

the formula of the full link layer is as follows:

，

wherein ,

indicate->

The picture data information of the layer is one +.>

Validly set vector (L)>

Is->

Weight matrix of>

Is a bias term;

here take mean square error

The calculation method is as follows:

，/>

wherein ,

complement values for map data,/->

For the true value of the graph data, < >>

For the number of nodes->

4. The method for selecting an integrated key node group for graph data completion of claim 3, wherein step 5 specifically comprises the following steps: after the completion effect of the different five groups of key nodes is calculated, the obtained five groups of key nodes and the mean square error thereof are input into a judging and outputting module, the completion effect of each group of key nodes is compared, a group of key nodes with the best completion effect is selected as a final result and is output,

The +.A. obtained by the graph convolution test module>

The results are +.>

The final set of key nodes +.>

The method comprises the following steps:

。

5. the method for selecting an integrated key node group for graph data completion according to claim 4, wherein the method is implemented by the following four modules, specifically:

the data preparation module extracts topology information of the real network data according to the real network data and obtains an adjacency matrix; the multi-angle key node identification module is used for identifying a group of key nodes by utilizing various methods; the graph rolling network testing module is used for testing the result obtained in the previous stage based on the graph rolling network to obtain a graph data complement effect; and the judging and outputting module is used for comparing the complement effects and outputting an optimal group of key nodes.