CN115828987A

CN115828987A - Dynamic graph neural network prediction method under time-space distribution migration and product

Info

Publication number: CN115828987A
Application number: CN202211462517.3A
Authority: CN
Inventors: 朱文武; 王鑫; 张泽阳
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-21

Abstract

The embodiment of the application relates to the field of a neural network of a graph in deep learning, and discloses a dynamic neural network prediction method and a dynamic neural network prediction product under space-time distribution migration, wherein the method comprises the following steps: inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes; sampling and randomly replacing the high-order change space-time characterization of the nodes according to the neighborhood of the nodes and the time stamps of the nodes to obtain various intervention distributions; and optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and various intervention distributions. The method eliminates the influence of the change mode by capturing the change and invariant modes in the dynamic graph, enables the dynamic graph attention network to depend on the characteristics of the invariant mode, still has good prediction capability when processing the dynamic graph with space-time distribution migration, and effectively improves the distribution outer generalization capability of the dynamic graph attention network.

Description

Dynamic graph neural network prediction method under time-space distribution migration and product

Technical Field

The embodiment of the application relates to the field of a neural network of a graph in deep learning, in particular to a neural network prediction method and a neural network prediction product of a dynamic graph under space-time distribution migration.

Background

The dynamic graph or network has complex data structure and time information, and is an effective abstract real world data structure, such as a citation network, a social network, a transaction network, a traffic network and the like. In recent years, with the continuous development of deep learning, the dynamical graph neural network GNN represents a strong prediction capability by utilizing the structure and time dynamics of the dynamical graph through the advantages of end-to-end learning and a powerful reasoning method.

However, because the patterns utilized by the conventional kinetic map neural network vary with distribution under distribution migration, the conventional kinetic map neural network cannot deal with the distribution variation naturally existing in the kinetic map. Although the existing distribution generalization method processes the distribution migration of time series data by means of difference measurement, normalization, distribution robust optimization and the like, the method cannot consider the more complex space-time distribution migration in the dynamic graph (including the distribution change of the representation with time while the graph structure is included), so that the performance of the optimized dynamic graph neural network is remarkably reduced under the space-time distribution migration. Therefore, how to improve the prediction performance of the dynamic graph neural network under the spatio-temporal distribution migration becomes a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the application aims to provide a dynamic graph neural network prediction method and a dynamic graph neural network prediction product under space-time distribution migration, and aims to solve the problem that the prediction performance of the dynamic graph neural network under space-time distribution migration is reduced.

The embodiment of the present application provides, in a first aspect, a method for predicting a dynamic graph neural network under spatio-temporal distribution migration, including:

inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes;

sampling and randomly replacing the high-order change space-time characterization of the nodes according to the neighborhood of the nodes and the time stamps of the nodes to obtain various intervention distributions;

and optimizing the decoupled spatiotemporal attention network according to the high-order invariant spatiotemporal characterization of the nodes, the high-order variant spatiotemporal characterization of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under spatiotemporal distribution migration.

Optionally, according to the neighborhood of the node and the timestamp of the node, sampling and randomly replacing the high-order change space-time characterization of the node to obtain a plurality of intervention distributions, including:

extracting a high-order change space-time representation of the first node at a first time stamp from the high-order change space-time representations of the nodes;

and replacing the high-order change space-time representation of the first node at the first time stamp with the high-order change space-time representation of the second node at the second time stamp to obtain the multiple intervention distributions.

Optionally, replacing the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp, resulting in the plurality of intervention distributions, comprising:

under the condition that the second node is the same as the first node and the second timestamp is different from the first timestamp, replacing the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp to obtain a plurality of first types of intervention distributions;

under the condition that the second node is different from the first node and the second timestamp is the same as the first timestamp, replacing the high-order change spatiotemporal characterization of the first node at the first timestamp with the high-order change spatiotemporal characterization of the second node at the second timestamp to obtain a plurality of second types of intervention distributions;

under the condition that the second node is different from the first node and the second timestamp is different from the first timestamp, replacing the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp to obtain a plurality of intervention distributions of a third type;

determining the plurality of first type intervention distributions, the second type intervention distribution, and the third type intervention distribution as the plurality of intervention distributions.

Optionally, inputting the dynamic graph into a decoupled spatiotemporal attention network to obtain a high-order invariant spatiotemporal characterization of the node and a high-order variant spatiotemporal characterization of the node, including:

inputting the dynamic graph into a first time-space attention layer of the decoupled time-space attention network, and defining nodes of the dynamic graph according to the first time-space attention layer of the decoupled time-space attention network to obtain a first time-space neighbor representation of the nodes and a first information representation of the nodes;

calculating a first invariant time-space representation of the node and a first variant time-space representation of the node according to the first time-space neighbor representation of the node and the first information representation of the node;

combining the first invariant spatio-temporal representation of the node and the first variant spatio-temporal representation of the node to obtain a first spatio-temporal representation of the node;

and inputting the first space-time representation of the node into a subsequent space-time attention layer of the decoupled space-time attention network to obtain a high-order invariant space-time representation of the node and a high-order variant space-time representation of the node.

Optionally, after obtaining the first space-time characterization of the node, the method includes:

defining the m-1 node space-time representation according to the m space-time attention layer of the decoupled space-time attention network to obtain the m space-time neighbor representation of the node and the m information representation of the node, wherein m is greater than or equal to 2;

calculating the mth invariant time-space representation of the node and the mth variation time-space representation of the node according to the mth time-space neighbor representation of the node and the mth information representation of the node;

combining the mth invariant spatiotemporal representation of the node and the mth variant spatiotemporal representation of the node to obtain an mth spatiotemporal representation of the node under the condition that the mth spatiotemporal attention layer is not the last spatiotemporal attention layer;

in the case where the mth spatiotemporal attention layer is the last spatiotemporal attention layer, taking the mth invariant spatiotemporal characterization of the node as a high order invariant spatiotemporal characterization of the node, and taking the mth variant spatiotemporal characterization of the node as a high order variant spatiotemporal characterization of the node.

Optionally, calculating an mth invariant spatio-temporal representation of a node and an mth variant spatio-temporal representation of a node according to the mth spatio-temporal neighbor characterization of the node and the mth information characterization of the node, comprises:

processing the mth space-time neighbor representation of the node according to the normalized exponential function to obtain an mth invariant structure mode mask of the node and an mth variant structure mode mask of the node;

processing the mth invariant structure mode mask of the node according to a neighbor aggregation function to obtain the mth invariant space-time representation of the node, and processing the mth variant structure mode mask of the node according to the neighbor aggregation function to obtain the mth variant space-time representation of the node.

Optionally, optimizing the decoupled spatiotemporal attention network according to the high-order invariant spatiotemporal characterization of the nodes, the high-order variant spatiotemporal characterization of the nodes, and the multiple intervention distributions to obtain a dynamic graph attention network, including:

calculating the loss of the high-order invariant space-time representation of the nodes to the label of the dynamic graph as a target task loss, and calculating the loss of the high-order invariant space-time representation of the nodes and the loss of the high-order variant space-time representation of the nodes to the label of the dynamic graph as a mixed loss;

calculating the variance of the mixing loss under the condition of the plurality of intervention distributions, taking the variance as an invariance loss regularization term, and combining the target task loss and the invariance loss regularization term to obtain a final loss;

and optimizing the parameters of the decoupled space-time attention network according to the final loss to obtain the dynamic graph attention network.

A second aspect of the embodiments of the present application provides a dynamic graph neural network prediction apparatus under spatio-temporal distribution migration, including:

the decoupling module is used for inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes;

the intervention module is used for sampling and randomly replacing the high-order change space-time representation of the node according to the neighborhood of the node and the timestamp of the node to obtain various intervention distributions;

and the optimization module is used for optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under space-time distribution migration.

Wherein the intervention module comprises:

the extraction submodule is used for extracting the high-order change space-time representation of the first node at the first time stamp from the high-order change space-time representation of the node;

and the replacing submodule is used for replacing the high-order change space-time characterization of the first node at the first time stamp with the high-order change space-time characterization of the second node at the second time stamp to obtain the multiple intervention distributions.

Wherein, the replacement submodule further comprises:

the first type replacement subunit is used for replacing the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp to obtain a plurality of first type intervention distributions under the condition that the second node is the same as the first node and the second timestamp is different from the first timestamp;

the second type replacing subunit is used for replacing the high-order change spatiotemporal characterization of the first node at the first timestamp by the high-order change spatiotemporal characterization of the second node at the second timestamp to obtain a plurality of second types of intervention distributions under the condition that the second node is different from the first node and the second timestamp is the same as the first timestamp;

a third type replacing subunit, configured to, in a case that the second node is different from the first node and the second timestamp is different from the first timestamp, replace the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp, so as to obtain a plurality of intervention distributions of a third type;

a combining subunit configured to take the plurality of intervention distributions of the first type, the intervention distributions of the second type, and the intervention distributions of the third type as the plurality of intervention distributions.

The decoupling module comprises:

the first definition sub-module is used for inputting the dynamic graph into a first space-time attention layer of the decoupled space-time attention network, defining nodes of the dynamic graph according to the first space-time attention layer of the decoupled space-time attention network, and obtaining a first space-time neighbor representation of the nodes and a first information representation of the nodes;

the first decoupling submodule is used for calculating a first invariant time-space characteristic of the node and a first variation time-space characteristic of the node according to the first time-space neighbor characteristic of the node and the first information characteristic of the node;

the first combination submodule is used for combining the first invariant space-time representation of the node and the first variant space-time representation of the node to obtain a first space-time representation of the node;

and the high-order decoupling submodule is used for inputting the first space-time representation of the node into a subsequent space-time attention layer of the decoupled space-time attention network to obtain the high-order invariant space-time representation of the node and the high-order variant space-time representation of the node.

After obtaining the first space-time characterization of the node, the high-order decoupling submodule further includes:

the second definition submodule is used for defining the m-1 node space-time representation according to the m space-time attention layer of the decoupled space-time attention network to obtain the m space-time neighbor representation of the node and the m information representation of the node, wherein m is greater than or equal to 2;

the second decoupling submodule is used for calculating the mth invariant time-space representation of the node and the mth variant time-space representation of the node according to the mth time-space neighbor representation of the node and the mth information representation of the node;

a second combination submodule, configured to combine the mth invariant spatiotemporal representation of the node and the mth variant spatiotemporal representation of the node to obtain an mth spatiotemporal representation of the node when the mth spatiotemporal attention layer is not the last spatiotemporal attention layer;

an output submodule configured to take the mth invariant spatiotemporal representation of the node as a higher order invariant spatiotemporal representation of the node and take the mth variant spatiotemporal representation of the node as a higher order variant spatiotemporal representation of the node if the mth spatiotemporal attention layer is the last spatiotemporal attention layer.

The second decoupling submodule further includes:

the normalization subunit is used for processing the mth spatio-temporal neighbor representation of the node according to the normalization index function to obtain an mth invariant structure mode mask of the node and an mth variant structure mode mask of the node;

and the neighbor aggregation subunit is used for processing the mth invariant structure mode mask of the node according to a neighbor aggregation function to obtain the mth invariant space-time representation of the node, and processing the mth variant structure mode mask of the node according to the neighbor aggregation function to obtain the mth variant space-time representation of the node.

Wherein, the optimization module further comprises:

the mixed loss calculation submodule is used for calculating the loss of the high-order invariant space-time characteristics of the nodes to the labels of the dynamic graph as target task loss, and calculating the loss of the high-order invariant space-time characteristics of the nodes and the loss of the high-order variant space-time characteristics of the nodes to the labels of the dynamic graph as mixed loss;

a final loss calculation submodule, configured to calculate a variance of the mixing loss under the condition of the multiple intervention distributions, use the variance as an invariance loss regularization term, and combine the target task loss and the invariance loss regularization term to obtain a final loss;

and the optimization submodule is used for optimizing the parameters of the decoupled space-time attention network according to the final loss to obtain the dynamic graph attention network.

A third aspect of embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the steps in the method for predicting a neural network of a dynamic graph under spatiotemporal distribution migration described in any one of the first aspects.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, on which a computer program/instructions are stored, which, when executed by a processor, implement the steps in the method for predicting a neural network of a dynamical diagram under spatio-temporal distribution migration according to any one of the first aspect.

Has the advantages that:

the application provides a dynamic graph neural network prediction method under space-time distribution migration and a product, and the method comprises the following steps: inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes; sampling and randomly replacing the high-order change space-time characterization of the nodes according to the neighborhood of the nodes and the time stamps of the nodes to obtain various intervention distributions; and optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under space-time distribution migration. The method provided by the application has the following advantages:

(1) According to the method, the dynamic graph attention network still has good prediction capability when processing the dynamic graph with space-time distribution migration by constructing a decoupled space-time attention network to capture the changing and unchanging mode in the dynamic graph, and the distribution generalization capability of the dynamic graph attention network is effectively improved.

(2) The method designs a space-time intervention mechanism, solves the problem of high entanglement of a change mode among nodes of the dynamic graph, eliminates the false influence of the change mode, reduces the complexity of the dynamic graph attention network under the condition of processing space-time distribution migration, and improves the performance of the dynamic graph attention network.

(3) The method simultaneously considers time distribution migration and structure distribution migration in the dynamic graph, and improves the prediction performance of the dynamic graph under the time-space distribution migration.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of a method for predicting a neural network of a dynamic graph under spatio-temporal distribution migration according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a method for predicting a neural network of a dynamic graph under spatio-temporal distribution migration according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a prediction apparatus of a neural network of a dynamic graph under spatiotemporal distribution migration according to an embodiment of the present application;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the related art, a kind of dynamic graph neural network is a classical dynamic graph neural network, a graph neural network is adopted to aggregate graph structure information of each time segment, then a cyclic recurrent neural network or a time self-attention mechanism and other sequence models are adopted to process time information, or a time coding technology is introduced to express each time segment as a function of time, and then a space module or a memory module such as the graph neural network is used to process the structure information. The other type is a distribution generalization method, which processes the distribution transition of the time series data by difference measurement, normalization, distribution robust optimization, etc.

However, none of the above prior arts can consider more complicated space-time distribution migration (including distribution change of the simultaneous representation of the graph structure with time) in the dynamic graph, so that the performance of the optimized dynamic graph neural network is reduced remarkably under the space-time distribution migration.

In view of this, an embodiment of the present application provides a method for predicting a dynamic neural network under spatio-temporal distribution migration, where fig. 1 shows a flowchart of the method for predicting a dynamic neural network under spatio-temporal distribution migration, and as shown in fig. 1, the method includes the following steps:

s101, inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes.

S102, sampling and randomly replacing the high-order change space-time characterization of the nodes according to the neighborhood of the nodes and the time stamps of the nodes to obtain various intervention distributions.

S103, optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network.

And optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under space-time distribution migration.

When the step S101 is specifically implemented, the embodiment of the present application provides a decoupled spatiotemporal attention network to capture a change mode and an invariant mode in a dynamic graph, and defines nodes of an input dynamic graph based on the decoupled spatiotemporal attention network, so that each node can pay attention to all historical neighbors thereof through a dispersed attention information transfer mechanism. The decoupled space-time attention network comprises a plurality of decoupled space-time attention layers, the space-time representation of a variable mode and a constant mode in the dynamic graph is continuously summarized by stacking the decoupled space-time attention layers, and finally, a high-order constant space-time representation and a high-order variable space-time representation are output to complete the decoupling of the variable mode and the constant mode of the dynamic graph.

Specifically, the dynamic graph is input into a first space-time attention layer of the decoupled space-time attention network, the first space-time attention layer preprocesses the input dynamic graph, and information of each node in the dynamic graph is obtained, wherein the information of each node comprises node information of each node under all timestamps and node information interacted with each node. And then, defining the nodes of the dynamic graph according to a first time-space attention layer of the decoupled time-space attention network to obtain an attention query, a key and a value vector of each node, wherein the attention query and the key represent interaction of each node and a time-space neighbor of each node under a time stamp and serve as the first time-space neighbor of each node, and the value vector represents node information of each node and serves as the first information representation of each node.

Taking a node u in the dynamic graph as an example, obtaining a space-time neighbor N of the node u at time t as N after preprocessing ^t (u)＝{v:(u,v)∈E _t V is a neighbor node of node u, E _t All edges in the dynamic graph. Subsequently, the nodes of the dynamic graph are defined according to the first spatio-temporal attention layer of the decoupled spatio-temporal attention network, and the specific definition formula is as follows:

wherein the content of the first and second substances,

is the representation of node u at time t, q is attention query, k is a key, v is a value vector, ae (t) is time coding of time t, W is excellenceAnd (4) transforming the weight.

And q and k are used as first time-space neighbors of the node to represent, and v is used as first information of the node to represent.

After the first time-space neighbor representation of the node and the first information representation of the node are obtained, a first invariant time-space representation of the node and a first variant time-space representation of the node are calculated according to the first time-space neighbor representation of the node and the first information representation of the node.

Specifically, firstly, decoupling the space-time neighbor representation of the node, and processing the first space-time neighbor representations q and k of the node according to a Softmax normalized exponential function to obtain a first invariant structure mode mask of the node and a first variant structure mode mask of the node, wherein the calculation formula is as follows:

wherein m is _I Pattern mask of invariant structure for node, m _V For the varying structural pattern mask of the node, d is the characteristic dimension, and Softmax (-) is a normalized exponential function.

And after the first invariable structure mode mask of the node and the first variable structure mode mask of the node are obtained, completing the initial decoupling of the dynamic graph node, wherein the invariable structure mode mask and the variable structure mode mask obtained by the initial decoupling only decouple the space-time neighbor representation of the node, and then continuously decoupling the information representation of the node. Specifically, based on a first information representation v of the node, the first invariant structure mode mask of the node is processed according to an Agg neighbor aggregation function to obtain a first invariant space-time representation of the node, and based on the first information representation v of the node, the first variant structure mode mask of the node is processed according to the Agg neighbor aggregation function to obtain a first variant space-time representation of the node, and a calculation formula is as follows:

m _f ＝Softmax(w _f )

wherein Agg (-) is a neighbor aggregation function,

is an invariant spatio-temporal representation of the node,

is a space-time characterization of the variation of the node, v is an information characterization of the node, m _f As a feature mask, w _f Are learnable parameters.

The capture of the change mode and the invariant mode of the dynamic graph in the first time-space attention layer is completed, and the first invariant time-space representation of the nodes and the first change time-space representation of the nodes are obtained. Since the first invariant spatiotemporal representations of the nodes and the first variant spatiotemporal representations of the nodes obtained by the first spatiotemporal attention layer are also low-order spatiotemporal features, a plurality of spatiotemporal attention layers having the same structure as the first spatiotemporal attention layer described above are stacked in the decoupled spatiotemporal attention network. At this time, the first invariant spatiotemporal representation of the node and the first variant spatiotemporal representation of the node are also required to be combined to obtain the first spatiotemporal representation of the node, and the first spatiotemporal representation of the node is used as an input of the second spatiotemporal attention layer to stack the second spatiotemporal attention layer. Specifically, the space-time characterization of the node is obtained according to the following formula:

wherein the content of the first and second substances,

is a space-time representation of the node,

is an invariant spatio-temporal representation of the node,

is a spatio-temporal characterization of the changes of the nodes.

When the space-time attention layer is not the first space-time attention layer, the space-time attention layer is set as the mth space-time attention layer (m is more than or equal to 2), the m-1 node space-time representation is defined according to the mth space-time attention layer of the decoupled space-time attention network, and the mth space-time neighbor representation of the node and the mth information representation of the node are obtained; and calculating the mth invariant time-space representation of the node and the mth variation time-space representation of the node according to the mth time-space neighbor representation of the node and the mth information representation of the node. It should be noted that the formula for calculating in the mth spatiotemporal attention layer is the same as the formula for calculating in the first spatiotemporal attention layer, and specific contents can be referred to the flow of the first spatiotemporal attention layer, and a description thereof is not repeated here.

It should be noted that, when the spatio-temporal attention layer is the mth spatio-temporal attention layer, and the spatio-temporal neighbor representation and the node information representation are processed in the following manner: according to the mth space-time neighbor representation of the node and the mth information representation of the node, calculating the mth invariant space-time representation of the node and the mth change space-time representation of the node, and processing the mth space-time neighbor representation of the node according to a Softmax normalization exponential function to obtain an mth invariant structure mode mask of the node and an mth change structure mode mask of the node; processing the mth invariable structure mode mask of the node according to an Agg neighbor aggregation function to obtain the mth invariable space-time representation of the node, and processing the mth variable structure mode mask of the node according to the neighbor aggregation function to obtain the mth variable space-time representation of the node.

After the mth invariant spatiotemporal representation of the node and the mth variant spatiotemporal representation of the node, if the situation that the mth spatiotemporal attention layer is not the preset last spatiotemporal attention layer is obtained, it is indicated that the stacking in the decoupled spatiotemporal attention network is not finished at this time, the obtained mth invariant spatiotemporal representation of the node and the mth variant spatiotemporal representation of the node are still non-high order invariant spatiotemporal representations and high order variant spatiotemporal representations, the subsequent stacking of the spatiotemporal attention layers needs to be continued, the mth invariant spatiotemporal representation of the node and the mth variant spatiotemporal representation of the node are combined to obtain the mth spatiotemporal representation of the node, and then the obtained mth spatiotemporal representation of the node is used as the input of the next spatiotemporal attention layer to be processed by the next layer.

And under the condition that the mth space-time attention layer is the last space-time attention layer, the situation that the stacking in the decoupled space-time attention network is finished at the moment is shown, the obtained mth invariant space-time characteristics of the nodes and the mth variation space-time characteristics of the nodes are subjected to high-order invariant space-time characteristics and high-order variation space-time characteristics, the mth invariant space-time characteristics of the nodes and the mth variation space-time characteristics of the nodes are not combined any more, but the mth invariant space-time characteristics of the nodes are directly used as the high-order invariant space-time characteristics of the nodes, and the mth variation space-time characteristics of the nodes are output as the high-order variation space-time characteristics of the nodes.

It should be noted that, in the embodiments of the present application, the number of stacked spatio-temporal attention layers in a decoupled spatio-temporal attention network may be set to 2 to 3 layers, that is, the high-order variable spatio-temporal characterization and the high-order invariant spatio-temporal characterization may be obtained, and the specific number of layers may be adaptively set according to factors such as actual situations and computational power, which is not limited herein.

According to the process, the characteristics of a changed spatiotemporal mode and the characteristics of a constant spatiotemporal mode in an input dynamic graph are captured through stacking of a plurality of spatiotemporal attention layers in a decoupled spatiotemporal attention network, wherein the changed spatiotemporal mode is highly entangled among nodes, so that a backdoor path is generated by causal relations among the nodes, and a false influence is generated on the prediction performance of the whole dynamic graph attention network. In addition, because the computation cost for directly generating and mixing the subset of the high-order spatio-temporal features for intervention is very high, in order to reduce the computation cost, when the step S102 is implemented to create a plurality of intervention distributions, the intervention process is approximated by sampling and randomly replacing the high-order change spatio-temporal representations of the nodes according to the neighborhoods of the nodes and the timestamps of the nodes, so that the original structure and features do not need to be directly changed, the computation cost is reduced, and the generation problem and the high complexity of complex distributions are avoided.

When the step S102 is specifically implemented, the high-order change spatiotemporal representations of the nodes include corresponding high-order change spatiotemporal representations of all nodes under all timestamps, which are obtained through the decoupled spatiotemporal attention network in S101, and then the high-order change spatiotemporal representation of the first node under the first timestamp is extracted from the high-order change spatiotemporal representations of the nodes; and then replacing the high-order change space-time representation of the first node at the first time stamp with the high-order change space-time representation of the second node at the second time stamp to obtain the plurality of intervention distributions.

In an alternative embodiment, in the case where the second node is the same as the first node and the second timestamp is different from the first timestamp, the high-order change spatiotemporal characterization of the first node at the first timestamp is replaced with the high-order change spatiotemporal characterization of the second node at the second timestamp, resulting in a plurality of intervention distributions of the first type. Specifically, firstly, a high-order change space-time representation of a random node under a random time stamp is extracted from the high-order change space-time representations, and the high-order change space-time representation is used as the high-order change space-time representation of the first node under the first time stamp; the high order change spatiotemporal representation of the first node at the first timestamp is then replaced with a high order change spatiotemporal representation of the first node at a second timestamp, which is a different timestamp than the first timestamp, as an intervention distribution. And sequentially taking all nodes in the dynamic graph as high-order change space-time characteristics of the first node under the first timestamp according to the mode, replacing the high-order change space-time characteristics of the second timestamp (all other possible timestamps different from the first timestamp) of the first node by using the high-order change space-time characteristics of the second timestamp of the first node, generating an intervention distribution every time different replacement is performed, traversing all the nodes to obtain a plurality of corresponding intervention distributions serving as a plurality of first types of intervention distributions, and outputting the obtained plurality of first types of intervention distributions serving as the plurality of intervention distributions.

In another optional embodiment, in a case where the second node is different from the first node and the second timestamp is the same as the first timestamp, the high-order change spatiotemporal characterization of the first node at the first timestamp is replaced with the high-order change spatiotemporal characterization of the second node at the second timestamp, so as to obtain a plurality of intervention distributions of the second type. Specifically, firstly, a high-order change space-time representation of a random node under a random time stamp is extracted from the high-order change space-time representations, and the high-order change space-time representation is used as the high-order change space-time representation of the first node under the first time stamp; and then replacing the high-order change space-time representation of the first node at the first time stamp with the high-order change space-time representation of the second node at the first time stamp, wherein the second node is different from the first node, and an intervention distribution is obtained. And sequentially taking all the nodes in the dynamic graph as the high-order change space-time characteristics of the first node under the first timestamp according to the mode, replacing the high-order change space-time characteristics of the first timestamp of the second node (all other possible nodes different from the first node), generating an intervention distribution by performing different replacement every time, traversing all the nodes to obtain a plurality of corresponding intervention distributions as a plurality of second types of intervention distributions, and outputting the obtained plurality of second types of intervention distributions as the plurality of intervention distributions.

In another optional implementation, in a case where the second node is different from the first node and the second timestamp is different from the first timestamp, the high-order change spatiotemporal characterization of the first node at the first timestamp is replaced with the high-order change spatiotemporal characterization of the second node at the second timestamp, so as to obtain a plurality of intervention distributions of the third type. Specifically, firstly, a high-order change space-time representation of a random node under a random time stamp is extracted from the high-order change space-time representations, and the high-order change space-time representation is used as the high-order change space-time representation of the first node under the first time stamp; and then replacing the high-order change space-time characterization of the first node at the first time stamp with the high-order change space-time characterization of the second node at a second time stamp, wherein the second node is a different node from the first node, and the second time stamp is a different time stamp from the first time stamp, so as to obtain an intervention distribution. And sequentially taking all nodes in the dynamic graph as high-order change space-time representations of the first node under the first timestamp according to the mode, replacing the high-order change space-time representations of the second timestamps (all other possible timestamps different from the first timestamp) of the second nodes (all other possible nodes different from the first node), generating an intervention distribution by performing different replacement every time, traversing all the nodes to obtain corresponding multiple intervention distributions serving as multiple intervention distributions of a third type, and outputting the obtained multiple intervention distributions of the third type as the multiple intervention distributions.

For example, in the high-order variation space-time characterization obtained in S101, a node u is randomly extracted at a time stamp t ₁ The high-order change space-time representation of the first node is taken as the high-order change space-time representation of the first node in the first time stamp, a node v different from the first node u is extracted as a second node, and the node v and the first time stamp t are selected ₁ A different second time stamp t ₂ Is characterized by the node v at a second time stamp t ₂ High-order change space-time characteristic replacement node u at time stamp t ₁ The high-order change space-time representation of the node is replaced, and the high-order invariant space-time representation of the node is not replaced, wherein the high-order change space-time representation of the node is obtained through the intervention distribution, and the intervention distribution is represented by the following formula:

wherein the content of the first and second substances,

at a first time stamp t for a first node u ₁ High order invariant spatio-temporal characterization and high order variant characterization,

at a second timestamp t for a second node v ₂ High order variations of (2) are spatio-temporal.

Determining the plurality of first type intervention distributions, the second type intervention distributions, and the third type intervention distributions as the plurality of intervention distributions. So far, a plurality of intervention distributions are obtained based on the high-order change space-time characterization of the dynamic graph, and the intervention process is approximated by sampling and randomly replacing the high-order change space-time characterization of the nodes when the plurality of intervention distributions are created in the step S102, so that the original structure and characteristics do not need to be directly changed, the calculation cost is reduced, and the generation problem of complex distribution and high complexity are avoided; in addition, because only the high-order change spatio-temporal characterization of the node is replaced as described above, and the high-order invariant spatio-temporal characterization of the node is not replaced, the result of the dynamic graph attention network obtained by training and optimizing based on the intervention distributions theoretically does not change when the prediction task is processed, but is still influenced in the training process, so that the parameters and the weights of the dynamic graph network are optimized by calculating the total loss by using the created multiple intervention distributions in the subsequent step S103, and the dynamic graph attention network obtained by optimizing can be predicted by relying on the invariant-mode spatio-temporal characterization more when the prediction task is executed, thereby improving the prediction performance of the dynamic graph attention network.

When the step S103 is specifically implemented, since the performance of the dynamic graph network mainly depends on the characteristics of the invariant mode in the dynamic graph, the computation loss is first represented only by the high-order invariant spatio-temporal representation of the nodes. Specifically, the loss of the high-order invariant spatio-temporal representation of the node to the label of the dynamic graph is calculated, and the loss calculated based on the high-order invariant spatio-temporal representation of the node is taken as a target task loss and is calculated according to the following mode:

L＝l(f(z _I ),y)

wherein L is target task loss, L is cross entropy loss function, f is downstream task predictor, z _I High order invariant spatio-temporal for nodesAnd y is a label of the dynamic graph.

Although the result of the optimized dynamic graph attention network in processing the prediction task is theoretically not changed due to the influence of the characteristics of the change pattern, the result is still influenced in the training process, so that the training needs to be performed on the characteristics of the change pattern, so that the dynamic graph attention network obtained by training and optimizing is more dependent on the characteristics of the invariant pattern in processing the prediction task. Specifically, the loss of the high-order invariant spatiotemporal representation of the nodes and the loss of the high-order variant spatiotemporal representation of the nodes to the label of the dynamic graph are calculated as a mixed loss according to the following modes:

L _m ＝l(g(z _V ,z _I ),y)

wherein L is _m For the mixture loss, l is the cross entropy loss function, g is the downstream task predictor, z _V For high-order variation spatio-temporal characterization of nodes, z _I And y is a label of the dynamic graph.

The characteristics of the invariant mode with stable prediction capability in the mixing loss are captured and utilized by calculating the regularization term of the invariance based on the mixing loss. Specifically, the dynamic graph network is optimized by calculating the variance of the mixing loss under a plurality of intervention distribution conditions as a regularization term of the invariance, and the variance is calculated according to the following modes:

wherein L is _do For invariance loss regularization term, var (-) is a variance function, L _m For mixed loss, do is the operator used to compute the invariance loss regularization term, P is the probability of the prediction label, s _i Are samples in the sample set S.

And finally, combining the target task loss and the invariance loss regularization term to obtain a final loss, and optimizing the parameters and the weight of the decoupled space-time attention network according to the final loss. The specific combination mode is as follows:

wherein theta is the parameters and weights of the decoupled spatio-temporal attention network, L is the target task loss, L is _do Is an invariance loss regularization term, and λ is an invariance loss regularization term coefficient.

It should be noted that the invariance loss regularization term coefficient is used to reduce the invariance loss regularization term with a larger order of magnitude, and is usually 0.01 or 0.001, and the value of the specific invariance loss regularization term coefficient λ is taken according to the actual situation, which is not limited herein.

The final loss of the invariance loss regularization term and the target task loss is obtained, and the invariance loss regularization term optimizes the influence of the characteristics of the change mode in the dynamic graph network, so that the dynamic space-time attention network can be predicted by relying on the characteristics of the invariance mode; the predicted performance of invariant mode based features in a dynamic graph network is optimized by target task loss. And the optimization effects of the two aspects are combined, so that the dynamic attention network obtained by optimizing the dynamic attention network based on the final loss optimization can better predict performance on the prediction task.

And finally, repeating the steps S101-S103, and performing loop optimization on the parameters and the weights in the decoupled dynamic attention network in each training turn according to the final loss to obtain the dynamic graph attention network.

FIG. 2 is a schematic diagram of a prediction method of a dynamic graph neural network under spatiotemporal distribution migration, as shown in FIG. 2, firstly, a decoupled dynamic graph attention network (lower left) in FIG. 2 obtains a summary of high-order invariant and variant modes, i.e. a high-order variant spatiotemporal representation and a high-order invariant spatiotemporal representation (original distribution in FIG. 2), for a given dynamic graph with a plurality of timestamps through decoupled spatiotemporal information transfer; then, sampling and recombining the change modes of each node across space and time through a space-time intervention mechanism, and replacing the high-order change space-time representation (the part indicated by a bidirectional arrow) as shown in fig. 2 to create a plurality of interfered distributions; as shown in the invariant loss regularization term part of fig. 2 (lower right), the invariant loss regularization term is calculated by using the samples of the interference distribution to optimize the model so that it can focus on the features of the invariant patterns for prediction; and finally, using the final loss to optimize parameters and weights in the decoupled dynamic attention network, repeating the steps S101-S103, and performing cyclic optimization on the parameters and weights in the decoupled dynamic attention network according to the final loss to obtain the dynamic graph attention network.

The method provided by the embodiment of the application can effectively improve the distribution generalization capability of the dynamic graph neural network, and the obtained dynamic graph attention network can effectively improve the prediction performance when processing dynamic graphs with space-time distribution migration in different scenes (such as academic cooperation prediction, commodity recommendation prediction, second-hand trading market recommendation and the like) in downstream tasks.

For example, in a second-hand trading market recommendation scenario, a dynamic graph of the second-hand trading market is input into a dynamic graph attention network, and a decoupled spatiotemporal attention network in the dynamic graph of the second-hand trading market decouples high-order invariant spatiotemporal representations (such as transaction characteristics of periodically purchased seller nodes and buyer nodes) and high-order variant spatiotemporal representations (such as transaction characteristics of suddenly purchased seller nodes and buyer nodes) in the dynamic graph of the second-hand trading market, and a prediction result is obtained according to the high-order invariant spatiotemporal representations.

The embodiment of the application provides a dynamic graph neural network prediction method under space-time distribution migration, which comprises the following steps: inputting the dynamic graph into a decoupled space-time attention network to obtain high-order invariant space-time characteristics of the nodes and high-order variant space-time characteristics of the nodes; sampling and randomly replacing the high-order change space-time characterization of the nodes according to the neighborhood of the nodes and the time stamps of the nodes to obtain various intervention distributions; and optimizing the decoupled space-time attention network according to the high-order invariant space-time representation of the nodes, the high-order variant space-time representation of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under space-time distribution migration. The method provided by the application has the following advantages:

Based on the same inventive concept, the embodiment of the present application discloses a dynamic graph neural network prediction device under spatiotemporal distribution migration, fig. 3 is a schematic diagram of the dynamic graph neural network prediction device under spatiotemporal distribution migration, as shown in fig. 3, including:

and the optimization module is used for optimizing the decoupled space-time attention network according to the high-order invariant space-time characterization of the nodes, the high-order variable space-time characterization of the nodes and the multiple intervention distributions to obtain a dynamic graph attention network, wherein the dynamic graph attention network is used for dynamic graph prediction under space-time distribution migration.

Wherein the intervention module comprises:

the extraction submodule is used for extracting the high-order change space-time characterization of the first node at the first time stamp from the high-order change space-time characterization of the nodes;

Wherein, the replacement submodule further comprises:

the second type replacing subunit is used for replacing the high-order change spatiotemporal representation of the first node at the first timestamp with the high-order change spatiotemporal representation of the second node at the second timestamp to obtain a plurality of second type intervention distributions under the condition that the second node is different from the first node and the second timestamp is the same as the first timestamp;

The decoupling module comprises:

and the high-order decoupling submodule is used for inputting the first space-time representation of the node into a subsequent space-time attention layer of the decoupled space-time attention network to obtain the high-order invariant space-time representation of the node and the high-order variable space-time representation of the node.

the second decoupling submodule is used for calculating the mth invariable time-space characteristic of the node and the mth variable time-space characteristic of the node according to the mth time-space neighbor characteristic of the node and the mth information characteristic of the node;

an output submodule configured to take the mth invariant spatiotemporal characterization of the node as a high-order invariant spatiotemporal characterization of the node and the mth variant spatiotemporal characterization of the node as a high-order variant spatiotemporal characterization of the node if the mth spatiotemporal attention layer is the last spatiotemporal attention layer.

The second decoupling submodule further includes:

Wherein, the optimization module further comprises:

Based on the same inventive concept, an electronic device is disclosed in the embodiments of the present application, and fig. 4 shows a schematic diagram of the electronic device proposed in the embodiments of the present application, and as shown in fig. 4, the electronic device 100 includes: the memory 110 and the processor 120 are connected in a communication manner through a bus, and the memory 110 and the processor 120 are connected in a communication manner, and a computer program is stored in the memory 110 and can be run on the processor 120 to implement the steps in the method for predicting the neural network of the dynamic graph under the spatiotemporal distribution migration disclosed in the embodiment of the application.

Based on the same inventive concept, the present application discloses a computer readable storage medium, on which a computer program/instruction is stored, which when executed by a processor, implements the steps in the method for predicting a neural network of a dynamical graph under spatio-temporal distribution migration disclosed in the present application.

Based on the same inventive concept, the embodiment of the present application discloses a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the computer program/instruction implements the steps in the method for predicting the neural network of the dynamic graph under spatio-temporal distribution migration disclosed in the embodiment of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.

The method and the product for predicting the neural network of the dynamic graph under the time-space distribution migration provided by the invention are introduced in detail, specific examples are applied in the method for explaining the principle and the implementation mode of the invention, and the description of the above embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A dynamic graph neural network prediction method under space-time distribution migration is characterized by comprising the following steps:

2. The method for predicting the neural network of the dynamic graph under the spatio-temporal distribution migration according to claim 1, wherein the high-order change spatio-temporal characterization of the nodes is sampled and randomly replaced according to the neighborhood of the nodes and the time stamps of the nodes to obtain a plurality of intervention distributions, which comprises:

3. The method for predicting the neural network of the dynamic graph under the spatio-temporal distribution migration according to claim 2, wherein the high-order change spatio-temporal characterization of the first node at the first time stamp is replaced by the high-order change spatio-temporal characterization of the second node at the second time stamp, so as to obtain the plurality of intervention distributions, and the method comprises the following steps:

under the condition that the second node is different from the first node and the second timestamp is the same as the first timestamp, replacing the high-order change spatio-temporal characterization of the first node at the first timestamp with the high-order change spatio-temporal characterization of the second node at the second timestamp to obtain a plurality of second types of intervention distribution;

determining the plurality of first type intervention distributions, the second type intervention distributions, and the third type intervention distributions as the plurality of intervention distributions.

4. The method for predicting the neural network of the dynamic graph under the spatio-temporal distribution migration according to claim 1, wherein the dynamic graph is input into a decoupled spatio-temporal attention network to obtain the high-order invariant spatio-temporal characterization of the nodes and the high-order variant spatio-temporal characterization of the nodes, and the method comprises the following steps:

calculating a first invariant time-space characterization of the node and a first variable time-space characterization of the node according to the first time-space neighbor characterization of the node and the first information characterization of the node;

combining the first invariant spatiotemporal characterization of the node and the first variant spatiotemporal characterization of the node to obtain a first spatiotemporal characterization of the node;

inputting the first spatiotemporal representation of the node into a subsequent spatiotemporal attention layer of the decoupled spatiotemporal attention network to obtain a high-order invariant spatiotemporal representation of the node and a high-order variant spatiotemporal representation of the node.

5. The method for predicting the neural network of the dynamical diagram under the spatio-temporal distribution migration according to claim 4, wherein after obtaining the first spatio-temporal characterization of the node, the method comprises:

defining the m-1 node space-time representation according to the mth space-time attention layer of the decoupled space-time attention network to obtain the mth space-time neighbor representation of the node and the mth information representation of the node, wherein m is greater than or equal to 2;

in the case that the mth spatiotemporal attention layer is the last spatiotemporal attention layer, taking the mth invariant spatiotemporal representation of the node as a higher order invariant spatiotemporal representation of the node, and taking the mth variant spatiotemporal representation of the node as a higher order variant spatiotemporal representation of the node.

6. The method for predicting the neural network of the dynamic graph under the spatio-temporal distribution migration according to claim 5, wherein the step of calculating the mth invariant spatio-temporal characterization of the node and the mth variant spatio-temporal characterization of the node according to the mth spatio-temporal neighbor characterization of the node and the mth information characterization of the node comprises:

7. The method for predicting the neural network of the dynamic graph under the spatiotemporal distribution migration according to claim 1, wherein the step of optimizing the decoupled spatiotemporal attention network according to the high-order invariant spatiotemporal characterization of the nodes, the high-order variant spatiotemporal characterization of the nodes and the plurality of intervention distributions to obtain the attention network of the dynamic graph comprises:

8. A dynamic graph neural network prediction device under spatio-temporal distribution migration is characterized by comprising the following components:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the steps in the method for prediction of a neural network of a kinetic map under spatiotemporal distribution migration of any of claims 1-7.

10. A computer readable storage medium having stored thereon a computer program/instructions, which when executed by a processor, implement the steps in the method for prediction of a neural network of a dynamical graph under spatio-temporal distribution migration of any of claims 1-7.