WO2023060563A1 - Adaptive diffusion in graph neural networks - Google Patents
Adaptive diffusion in graph neural networks Download PDFInfo
- Publication number
- WO2023060563A1 WO2023060563A1 PCT/CN2021/124130 CN2021124130W WO2023060563A1 WO 2023060563 A1 WO2023060563 A1 WO 2023060563A1 CN 2021124130 W CN2021124130 W CN 2021124130W WO 2023060563 A1 WO2023060563 A1 WO 2023060563A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gnn
- neighborhood radius
- parameters
- feature
- loss function
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 11
- 238000009792 diffusion process Methods 0.000 title claims description 26
- 230000003044 adaptive effect Effects 0.000 title description 13
- 238000012549 training Methods 0.000 claims abstract description 63
- 238000000034 method Methods 0.000 claims abstract description 49
- 230000009467 reduction Effects 0.000 claims abstract description 18
- 238000010200 validation analysis Methods 0.000 claims description 27
- 230000009466 transformation Effects 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 42
- 230000008569 process Effects 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 8
- 230000002776 aggregation Effects 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000001902 propagating effect Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 235000008694 Humulus lupulus Nutrition 0.000 description 2
- 235000009499 Vanilla fragrans Nutrition 0.000 description 2
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 2
- 244000263375 Vanilla tahitensis Species 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
Definitions
- aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for adaptive diffusion in graph networks.
- Graph neural networks are a type of neural networks that can be directly coupled with graph-structured data.
- GCNs graph convolution networks
- the graph convolution operation is designed to aggregate information from immediate neighboring nodes into the central node, which is also referred to as message passing.
- message passing To propagate information between nodes that are further away, multiple neural layers can be stacked to go beyond the immediate hop of neighbors.
- spectral based GNNs leverage graph spectral properties to collect signals from global neighbors.
- GDC graph diffusion convolution
- GNNs graph neural networks
- message passing based GNNs e.g., graph convolutional networks
- GDC graph diffusion convolution
- ADC adaptive diffusion convolution
- ADC is allowed to automatically learn a customized neighborhood size for each GNN layer and each feature dimension from data.
- ADC can empower GNNs to capture neighbors’ information from diverse graph structures, which is fully dependent on the data and downstream learning objective.
- GNNs are then capable of selectively modeling each neighbor’s multiple feature signals.
- the ADC disclosed in the present disclosure is a general plugin that can be directly applied to existing GNN models. By plugging it on GNNs, the upgraded GNNs can offer significant performance advances over their vanilla versions across a wide range of datasets. Furthermore, by learning the propagation neighborhood size automatically, ADC can consistently outperform GDC, which customizes this for each dataset by grid search. Finally, it is demonstrated that GNNs’ model capacity can benefit from the better coupling between the its architecture, graph structures, and feature channels, that is, by learning dedicated neighborhood size for each GNN layer and feature dimension.
- a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification comprises inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
- GNN graph neural Network
- the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
- updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprises updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
- the method further comprises inputting data of validation set into the GNN; and wherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprises updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
- the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
- the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
- neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
- the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
- the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
- the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
- step number away from the central node is truncated to a constant instead of infinity.
- the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as ⁇ (1- ⁇ ) k , wherein k represents a step number away from a central node, and ⁇ is a probability of a user staying in a current page.
- node classification may be image classification, speech recognition or anomaly detection, etc.
- Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC adaptive diffusion convolution
- Fig. 2 illustrates another exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC adaptive diffusion convolution
- Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC adaptive diffusion convolution
- Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
- GNNs The success of GNNs largely relies on the process of aggregating information from neighbors defined by the input graph structures, which is generally named message passing.
- message passing based GNNs e.g., GCNs leverage the immediate neighbors of each node during the aggregation process
- GDC is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion.
- the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
- the adaptive diffusion convolution (ADC) strategy is proposed to automatically learn the optimal neighborhood size from the data. Furthermore, the conventional assumption that all GNN layers and feature channels (or dimensions) should use the same neighborhood size for propagation can be broken in the present disclosure. It is designed to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures, which is the unique property that differ GNNs from traditional neural networks.
- ADC adaptive diffusion convolution
- the task Given the input feature matrix X and a subset of node label Y, the task is to predict the labels of remaining nodes.
- the task of node classification may be image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution.
- the task of node classification may be speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model.
- the task of node classification may be anomaly detection, including but not limited to, frauds recognition, dataset preprocessing, detection of online review spams, fake users and rum in social media, fake news, etc.
- the tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
- the convolution operation on graphs can be described as the process of neighborhood feature aggregation or message passing.
- the message passing graph convolutional networks can be simply defined as below:
- D is the diagonal degree matrix with and denotes hidden feature after transformation.
- T is used herein to denote
- the feature transformation function describes how features transform inside each node and the feature propagation function ⁇ ( ⁇ ) describes how features propagate between nodes. Essentially, how good a GNN model can utilize graph structures heavily depends on the design of the feature propagation function.
- f (T) is a matrix that can be generated by T. So f (T) can be represented as To quantify how far each node could aggregate features from, the neighborhood radius of a node as r is defined as below:
- ⁇ k denotes the influence from k-step-away nodes.
- the model puts more emphasis on long distance nodes, i.e., global information.
- the model amplifies local information.
- GDC graph diffusion convolution
- the weight coefficients should satisfy such that the signal strength is not amplified nor reduced through the propagation.
- the set of weight coefficients can be generated from the heat-kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
- Heat kernel incorporates prior knowledge into the GNN model, which means the feature propagation between nodes follows Newton’s law of cooling, i.e., the feature propagation speed between two nodes is proportional to the difference between their features.
- this prior knowledge can be described as below:
- N (i) denotes the neighborhood of node i
- x i (t) represents the feature of node i after diffusion time t.
- the heat kernel version of the GDC has r as below:
- t is the neighborhood radius r for the heat kernel based GDC, that is, t becomes a perfect continuous substitute for the hop number in multi-hop models.
- ADC adaptive diffusion convolution
- the training process of learning t can be the same to learning other weight and bias parameters in the model.
- the training process of learning t and other weight and bias parameters of the GNN is performed jointly and directly on the training set by considering t as one of the trainable parameters, via minimizing the loss function of the training set using gradient of t along with other weight and bias parameters of the GNN on the training set.
- all the other trainable parameters w including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
- w * is obtained when the loss function of the training set is minimized after certain training epochs.
- t is learned on the validation set with the learned w *
- t * is obtained when the loss function of the validation set is minimized after certain training epochs.
- the learned t * would change the value of every time t is updated, it is needed to make w converge to the optimal value, causing it too expensive to train.
- e denotes the number of training epochs
- ⁇ 1 and ⁇ 2 denote the learning rate on the training and validation sets, respectively.
- all the other trainable parameters w including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, w (e) is updated to w (e+1) by using the gradient of w on the training set. Then t is learned on the validation set with the updated w (e+1) during the same epoch, t (e) is updated to t (e+1) by using the gradient of t on the validation set. After all the training epochs, the optimal t * and w * may be obtained. This method could help avoid overfitting and thus offers better generalization.
- GDC uses the predetermined neighborhood radius for feature propagation.
- GDC proposes to use different neighborhood radius t for different datasets by hand-tuning the values.
- the disclosed methods described above further GDC's direction by automatically learning the radius from the given graph. This implies that one t for one dataset, that is, the same t for all GNN layers and all feature channels (dimensions) .
- Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC adaptive diffusion convolution
- the aforementioned strategy for updating t during the training of the model empowers the method to adaptively learn specific t for all layers and all feature channels.
- Fig. 2 illustrates another exemplary schematic diagram of Adaptive Diffusion Convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC Adaptive Diffusion Convolution
- a separate feature propagation function with a unique neighborhood radius is trained.
- the contributions from close e.g., in 1-hop
- distant neighbors e.g., in 3-hops
- the contributions from close neighbors are much more significant than from distant neighbors (shown as darker color concentrated around center) .
- the method of ADC based on heat kernel as an example herein.
- the method of ADC can be a generalized ADC (GADC) , in other words, not limiting the weight coefficients ⁇ k as heat kernel or other specific examples.
- GADC generalized ADC
- the feature propagation of GADC can be described with Eq. (3) , and further to learn for each layer and feature channel or dimension, the feature propagation of GADC is disclosed as below:
- ADC and GADC in the present disclosure are flexible components that can be directly plugged into existing GNN models, enabling them to adaptively learn the neighborhood radius for feature aggregation.
- Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
- ADC adaptive diffusion convolution
- ADC ADC Adaptive Diagonal Deformation
- image classification in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution
- speech recognition in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model
- anomaly detection such as frauds recognition, dataset preprocessing, detection of online review spams, fake users and rum in social media, fake news, etc.
- the tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
- the method is for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing, and may begin at block 301, with inputting data of training set into the GNN.
- GNN graph neural Network
- the method proceeds to block 302, with updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN.
- the trainable parameters of the GNN are updated by a first gradient on the training set to reduce the first loss function of the training set with one or more epochs.
- the trainable parameters of the GNN can be updated based on Eq. (9) or Eq. (10) .
- the method proceeds to block 303, with updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away.
- the sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
- the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
- the one or more neighborhood radius related parameters can be learned on the training set as other trainable parameters besides weight and bias parameters, based on the reduction of the first loss function of the training set by using the gradient descent.
- the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated as a bi-level optimization.
- the one or more neighborhood radius related parameters can be updated based on Eq. (8) or Eq. (11) .
- the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set, refer to Eq. (8) and (9) .
- All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
- the updated trainable parameters are obtained when the loss function of the training set is minimized after certain training epochs.
- the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters, and the updated one or more neighborhood radius related parameters are obtained when the loss function of the validation set is minimized after certain training epochs.
- the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch, refer to Eq. (10) and (11) .
- All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
- the trainable parameters are updated by using the gradient on the training set.
- the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters during the same epoch.
- the one or more neighborhood radius related parameters are updated by using the gradient on the validation set.
- the updated trainable parameters and one or more neighborhood radius related parameters may be obtained.
- the influence weights of all the neighbor nodes with different steps away can be generated in different ways, this would not limit the scope of the disclosure.
- the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
- the step number away from the central node can be truncated to a constant instead of infinity for feasible computability when the heat-kernel is used.
- the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as ⁇ (1- ⁇ ) k , wherein k represents a step number away from a central node, and ⁇ is a probability of a user staying in a current page.
- the method proceeds to block 304, with calculating the neighborhood radius based on the updated one or more neighborhood radius related parameters.
- the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly, refer to Eq. (3) or (7) . That is, all the GNN layers and feature dimensions should use the same neighborhood radius for feature propagation.
- the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively. That is, the GNN layers and feature dimensions can use respective learned neighborhood radius for feature propagation, refer to Eq. (12) or (13) .
- propagating on the input dimensions can generate better results than propagating on the output dimensions, the feature propagation of the message passing is performed before feature transformation of the message passing.
- ADC is able to enhance any graph-based model, particularly GNNs.
- neighborhood radius can be learned automatically for datasets. Specifically, learning unique neighborhood radius for each feature channel in each GNN layer can further improve the performance for downstream graph mining tasks.
- Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
- the computing system may comprise at least one processor 410.
- the computing system may further comprise at least one storage device 420. It should be appreciated that the storage device 420 may store computer-executable instructions that, when executed, cause the processor 410 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
- the embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium.
- the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed.
- GNN graph neural Network
- the method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
- the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
- the embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
- modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters. Numerous other aspects are provided.
Description
Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for adaptive diffusion in graph networks.
Graph neural networks (GNNs) are a type of neural networks that can be directly coupled with graph-structured data. Specifically, graph convolution networks (GCNs) generalize the convolution operation to local graph structures, offering attractive performance for various graph mining tasks. The graph convolution operation is designed to aggregate information from immediate neighboring nodes into the central node, which is also referred to as message passing. To propagate information between nodes that are further away, multiple neural layers can be stacked to go beyond the immediate hop of neighbors. To directly collect high-order information, spectral based GNNs leverage graph spectral properties to collect signals from global neighbors.
Though generating promising results, both strategies are limited to a pre-determined and fixed neighborhood for passing and receiving messages. Essentially, these methods have an implicit assumption that all graph datasets share the same size of receptive field during the message passing process. To break this, graph diffusion convolution (GDC) was recently proposed to extend the discrete message passing process in GCN to a diffusion process, enabling it to aggregate information from a larger neighborhood. However, for each input graph, GDC hand-tunes the best neighborhood size for feature aggregation by grid-searching the parameters on the validation set, making its practical application limited and sensitive.
SUMMARY
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The success of graph neural networks (GNNs) largely relies on the process of aggregating information from neighbors defined by the input graph structures. Notably, message passing based GNNs, e.g., graph convolutional networks, leverage the immediate neighbors of each node during the aggregation process, and recently, graph diffusion convolution (GDC) is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
To eliminate the manual search process of the optimal propagation neighborhood in GDC, it is disclosed in the present disclosure the adaptive diffusion convolution (ADC) strategy that supports learning the optimal neighborhood from the data automatically. ADC achieves this by formalizing the task as a bi-level optimization problem, enabling the customized learning of one optimal propagation neighborhood size for each dataset. In other words, all GNN layers and feature channels (dimensions) share the same neighborhood size during message passing on each graph.
It is further disclosed in the present disclosure, ADC is allowed to automatically learn a customized neighborhood size for each GNN layer and each feature dimension from data. By learning a unique propagation neighborhood for each layer, ADC can empower GNNs to capture neighbors’ information from diverse graph structures, which is fully dependent on the data and downstream learning objective. Similarly, by learning distinct neighborhood size for each feature channel, GNNs are then capable of selectively modeling each neighbor’s multiple feature signals. Altogether, ADC makes GNNs fully coupled with the graph structures and all feature channels.
The ADC disclosed in the present disclosure is a general plugin that can be directly applied to existing GNN models. By plugging it on GNNs, the upgraded GNNs can offer significant performance advances over their vanilla versions across a wide range of datasets. Furthermore, by learning the propagation neighborhood size automatically, ADC can consistently outperform GDC, which customizes this for each dataset by grid search. Finally, it is demonstrated that GNNs’ model capacity can benefit from the better coupling between the its architecture, graph structures, and feature channels, that is, by learning dedicated neighborhood size for each GNN layer and feature dimension.
According to an aspect, a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
According to a further aspect, wherein the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
According to a further aspect, updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprises updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
According to a further aspect, the method further comprises inputting data of validation set into the GNN; and wherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprises updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
According to a further aspect, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
According to a further aspect, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
According to a further aspect, wherein the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
According to a further aspect, wherein the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
According to a further aspect, wherein the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
According to a further aspect, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as
wherein k represents a step number away from a central node, and t is a diffusion time.
According to a further aspect, wherein the step number away from the central node is truncated to a constant instead of infinity.
According to a further aspect, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α)
k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
The models to which the plugin in the present disclosure applied can focus on the problem of semi-supervised node classification, the input of which may include an undirected network containing multiple nodes and edges therebetween. Given the input feature and a set of labelled nodes, the task is to predict the labels of remaining nodes. As examples but not limiting, node classification may be image classification, speech recognition or anomaly detection, etc.
The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 2 illustrates another exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
The success of GNNs largely relies on the process of aggregating information from neighbors defined by the input graph structures, which is generally named message passing. Notably, message passing based GNNs, e.g., GCNs leverage the immediate neighbors of each node during the aggregation process, and recently, GDC is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
To address this issue, the adaptive diffusion convolution (ADC) strategy is proposed to automatically learn the optimal neighborhood size from the data. Furthermore, the conventional assumption that all GNN layers and feature channels (or dimensions) should use the same neighborhood size for propagation can be broken in the present disclosure. It is designed to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures, which is the unique property that differ GNNs from traditional neural networks. By directly plugging ADC in the present disclosure into existing GNNs, consistent and significant outperformance over both GDC and their vanilla versions across various datasets may be obtained, realizing improved model capacity brought by automatically learning unique neighborhood size per layer and per channel in GNNs.
In the context of GNNs, it is focused on the problem of semi-supervised node classification in the present disclosure. The input may include an undirected network G= (V, E) , where the node set V contains of n nodes {v
1, …, v
n} and E is the edge set, and A∈R
n×n is the symmetric adjacency matrix of graph G. Given the input feature matrix X and a subset of node label Y, the task is to predict the labels of remaining nodes.
In an aspect, the task of node classification may be image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution. In another aspect, the task of node classification may be speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model. In yet another aspect, the task of node classification may be anomaly detection, including but not limited to, frauds recognition, dataset preprocessing, detection of online review spams, fake users and rumors in social media, fake news, etc. The tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
Back to the GNNs, the convolution operation on graphs can be described as the process of neighborhood feature aggregation or message passing. The message passing graph convolutional networks can be simply defined as below:
Where H
(l) denotes the hidden feature of layer l with H
(0) =X and X as the input feature matrix,
denotes feature transformation and γ (·) denotes feature propagation. Taking GCN as an example, the feature transformation and feature propagation functions correspond to
and
respectively, in which D is the diagonal degree matrix with
and
denotes hidden feature after transformation. Note that GCN uses the adjacency matrix A with self-loop, so it actually uses
To simplify the notations, T is used herein to denote
Straightforwardly, the feature transformation function
describes how features transform inside each node and the feature propagation function γ (·) describes how features propagate between nodes. Essentially, how good a GNN model can utilize graph structures heavily depends on the design of the feature propagation function.
Most graph-based models can be represented as
where f (T) is a matrix that can be generated by T. So f (T) can be represented as
To quantify how far each node could aggregate features from, the neighborhood radius of a node as r is defined as below:
Here, θ
k denotes the influence from k-step-away nodes. For a large r, it means that the model puts more emphasis on long distance nodes, i.e., global information. For a small r, it means that the model amplifies local information.
For GCN, the neighborhood radius is fixed to r=1, which is just the range of nodes directly connected to it. To collect information beyond direct connections, it is required to stack multiple GCN layers to reach high-order neighborhoods.
There are attempts to improve GCN's feature propagation function from first-hop neighborhood to multi-hop neighborhood, such as MixHop, JKNet, and SGC. For example, SGC uses feature propagation function
where
In other words, the neighborhood radius r=K for SGC, which is the range of neighborhoods to collect information from each GNN layer. However, for all multi-hop models, the discrete nature of hop numbers makes r non-differentiable, which is unfavourably for subsequent calculating.
A line of work has been focused on generalizing feature propagation from discrete hops to continuous graph diffusion. Notably, graph diffusion convolution (GDC) addresses this by propagation setup as below:
Where k is summed from 0 to infinity, making each node aggregate information from the whole graph. In Eq. (3) , the weight coefficients should satisfy
such that the signal strength is not amplified nor reduced through the propagation. In an aspect, the set of weight coefficients can be generated from personalized PageRank, as θ
k=α (1-α)
k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page. In another aspect, the set of weight coefficients can be generated from the heat-kernel as
wherein k represents a step number away from a central node, and t is a diffusion time.
The representation of the set of weight coefficients referred to in the present disclosure herein is merely used as examples. The heat kernel is taken as an example hereinafter and would not limit the scope of the present disclosure.
Heat kernel incorporates prior knowledge into the GNN model, which means the feature propagation between nodes follows Newton’s law of cooling, i.e., the feature propagation speed between two nodes is proportional to the difference between their features. Formally, this prior knowledge can be described as below:
Where N (i) denotes the neighborhood of node i, x
i (t) represents the feature of node i after diffusion time t. This differential equation can be solved as below:
X (t) =H
tX (0) (5)
Where X (t) denotes the feature matrix after diffusion time t, and H
t=e
- (I-T) t is the heat kernel.
The heat kernel version of the GDC has r as below:
This suggests that t is the neighborhood radius r for the heat kernel based GDC, that is, t becomes a perfect continuous substitute for the hop number in multi-hop models.
Recall that the heat kernel version of graph diffusion convolution (GDC) has the following feature propagation function as below:
Where the Laplacian matrix L=I-T. For each graph dataset, in GDC, it requires the manual grid search step to determine the neighborhood radius related parameter t. Moreover, t is fixed for all feature channels and propagation layers in each dataset.
In the present disclosure, it is disclosed a method called adaptive diffusion convolution (ADC) of how to adaptively learn the neighborhood radius from data for each graph, and it is further disclosed how to generalize it for different feature channels and GNN layers.
It is enabled to replace GNNs’ discrete feature propagation function with the continuous heat kernel by GDC. Moving forward from which, instead of hand-tuning t, an optimal neighborhood radius r can be obtained by calculating the gradient of t and update the t to converge.
In an aspect, the training process of learning t can be the same to learning other weight and bias parameters in the model. Specifically, the training process of learning t and other weight and bias parameters of the GNN is performed jointly and directly on the training set by considering t as one of the trainable parameters, via minimizing the loss function of the training set using gradient of t along with other weight and bias parameters of the GNN on the training set.
However, learning t directly on the training set may cause overfitting in certain cases. To address the issue, it is further disclosed to train t on the validation set instead of the training set, by using the gradient of t on the validation set. The goal for the model is to find t
* that minimizes the loss function of the validation set
wherein w denotes all the other trainable parameters in the feature transformation function and w
* denotes the set of parameters that minimize the loss function of the training set
This strategy can be formalized as a bi-level optimization problem as below:
With Eq. (8) and (9) , all the other trainable parameters w, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. w
*is obtained when the loss function of the training set
is minimized after certain training epochs. Then t is learned on the validation set with the learned w
*, and t
*is obtained when the loss function of the validation set
is minimized after certain training epochs. As the learned t
*would change the value of
every time t is updated, it is needed to make w converge to the optimal value, causing it too expensive to train.
For the purpose of decreasing the training cost, it is further disclosed an approximation method to update t every time w is updated, which can be as below:
Where e denotes the number of training epochs, α
1 and α
2 denote the learning rate on the training and validation sets, respectively.
With Eq. (10) and (11) , all the other trainable parameters w, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, w
(e) is updated to w
(e+1) by using the gradient of w on the training set. Then t is learned on the validation set with the updated w
(e+1) during the same epoch, t
(e) is updated to t
(e+1) by using the gradient of t on the validation set. After all the training epochs, the optimal t
*and w
*may be obtained. This method could help avoid overfitting and thus offers better generalization.
Conventional GNNs use the predetermined neighborhood radius for feature propagation. As described above, GDC proposes to use different neighborhood radius t for different datasets by hand-tuning the values. The disclosed methods described above further GDC's direction by automatically learning the radius from the given graph. This implies that one t for one dataset, that is, the same t for all GNN layers and all feature channels (dimensions) .
Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure. It can be seen in Fig. 1, a same learned t (shown as 2 for example) is applied to all layers of GNN and all feature channels (dimensions) . When the t is large, the contributions from close and distant neighbors would have little difference, and when the t is small, the contributions from close neighbors would be much more significantly than distant neighbors, shown as the greyscale of the circles.
It is expected that for each layer and feature dimension, a unique r may be learned and used, making them adaptive for the final learning objective. The obstacle that prevents prior arts from achieving this lie in the infeasible challenge of hand-tuning or grid-searching the propagation function separately for each feature channel and GNN layer, given that as the number of parameters increases, the time complexity increases exponentially.
The aforementioned strategy for updating t during the training of the model empowers the method to adaptively learn specific t for all layers and all feature channels.
It is disclosed that to learn a unique r for all layers and all feature channels, the method described above of the heat kernel can be evolved as below by extending the feature propagation function in Eq. (7) for each layer and feature channel, that is from t to
Where
denotes the neighborhood radius t for the l-th layer and i-th channel,
represents the i-th column of the hidden feature
i.e., the feature on channel or dimension i, and
denotes the feature propagation function on the l-th layer and i-th channel. This feature propagation function enables the GNN to train a separate t for each feature channel and layer.
Fig. 2 illustrates another exemplary schematic diagram of Adaptive Diffusion Convolution (ADC) , in accordance with various aspects of the present disclosure.
As can be seen in Fig. 2, for the hidden feature
of feature channel i in layer l, a separate feature propagation function
with a unique neighborhood radius
is trained. When t is large (e.g., t=3) , the contributions from close (e.g., in 1-hop) and distant neighbors (e.g., in 3-hops) have little difference (shown as the relatively similar color shading across different hops) . When t is small (e.g., t=1) , the contributions from close neighbors are much more significant than from distant neighbors (shown as darker color concentrated around center) .
It is disclosed the method of ADC based on heat kernel as an example herein. Without loss of generality, the method of ADC can be a generalized ADC (GADC) , in other words, not limiting the weight coefficients θ
k as heat kernel or other specific examples. The feature propagation of GADC can be described with Eq. (3) , and further to learn
for each layer and feature channel or dimension, the feature propagation of GADC is disclosed as below:
Where
denotes the weight coefficient for k-hop neighbors on l-th layer and i-th channel/dimension.
is restricted during training, that is, the sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
As it operates differently on each channel, whether to propagate before or after the feature transformation function actually matters. Empirically, it is found that propagating on the input channels generates better results than propagating on the output channels. Therefore, the feature propagation and transformation steps in the original message passing networks from Eq. (1) are swapped as below:
Additionally, calculating e
-Lt directly is in feasible for large graphs. Practically, it is needed to use the top-K truncation to approximate the heat kernel, making ADC (in Eq. (12) ) and GADC (in Eq. (13) ) respectively updated as below:
ADC and GADC in the present disclosure are flexible components that can be directly plugged into existing GNN models, enabling them to adaptively learn the neighborhood radius for feature aggregation.
Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
Generally the approach of ADC is discussed in the context of the task of classification, including but not limited to, image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution; speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model; anomaly detection, such as frauds recognition, dataset preprocessing, detection of online review spams, fake users and rumors in social media, fake news, etc. The tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
The method is for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing, and may begin at block 301, with inputting data of training set into the GNN.
The method proceeds to block 302, with updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN. In an aspect, the trainable parameters of the GNN are updated by a first gradient on the training set to reduce the first loss function of the training set with one or more epochs. As an example, the trainable parameters of the GNN can be updated based on Eq. (9) or Eq. (10) .
The method proceeds to block 303, with updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away. The sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
In an aspect, the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set. As an example, the one or more neighborhood radius related parameters can be learned on the training set as other trainable parameters besides weight and bias parameters, based on the reduction of the first loss function of the training set by using the gradient descent.
In another aspect, the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated as a bi-level optimization. As an example, inputting data of validation set into the GNN, and updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN. For example, the one or more neighborhood radius related parameters can be updated based on Eq. (8) or Eq. (11) .
In an aspect, the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set, refer to Eq. (8) and (9) . All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. The updated trainable parameters are obtained when the loss function of the training set is minimized after certain training epochs. Then the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters, and the updated one or more neighborhood radius related parameters are obtained when the loss function of the validation set is minimized after certain training epochs.
In another aspect, the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch, refer to Eq. (10) and (11) . All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, the trainable parameters are updated by using the gradient on the training set. Then the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters during the same epoch. The one or more neighborhood radius related parameters are updated by using the gradient on the validation set. After all the training epochs, the updated trainable parameters and one or more neighborhood radius related parameters may be obtained.
The influence weights of all the neighbor nodes with different steps away can be generated in different ways, this would not limit the scope of the disclosure. As an example, the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as
wherein k represents a step number away from a central node, and t is a diffusion time. Besides, the step number away from the central node can be truncated to a constant instead of infinity for feasible computability when the heat-kernel is used. As another example, the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α)
k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
The method proceeds to block 304, with calculating the neighborhood radius based on the updated one or more neighborhood radius related parameters. In an aspect, the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly, refer to Eq. (3) or (7) . That is, all the GNN layers and feature dimensions should use the same neighborhood radius for feature propagation. In another aspect, the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively. That is, the GNN layers and feature dimensions can use respective learned neighborhood radius for feature propagation, refer to Eq. (12) or (13) .
In an aspect, as operate differently on each dimension, propagating on the input dimensions can generate better results than propagating on the output dimensions, the feature propagation of the message passing is performed before feature transformation of the message passing.
As discussed above with Fig. 1-3, ADC is able to enhance any graph-based model, particularly GNNs. By directly plugging ADC into existing GNNs, neighborhood radius can be learned automatically for datasets. Specifically, learning unique neighborhood radius for each feature channel in each GNN layer can further improve the performance for downstream graph mining tasks.
Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure. The computing system may comprise at least one processor 410. The computing system may further comprise at least one storage device 420. It should be appreciated that the storage device 420 may store computer-executable instructions that, when executed, cause the processor 410 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.
Claims (15)
- A method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification, comprising:inputting data of training set into the GNN;updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN;updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; andcalculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
- The method of claim 1, wherein the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
- The method of claim 1, updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprising:updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
- The method of claim 3, further comprising:inputting data of validation set into the GNN; andwherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprising:updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
- The method of claim 4, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
- The method of claim 4, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
- The method of claim 4, wherein the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
- The method of claim 4, wherein the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
- The method of claim 8, wherein the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
- The method of claim 10, wherein the step number away from the central node is truncated to a constant instead of infinity.
- The method of claim 1, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α) k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
- A computer system, comprising:one or more processors; andone or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-12.
- One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
- A computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/124130 WO2023060563A1 (en) | 2021-10-15 | 2021-10-15 | Adaptive diffusion in graph neural networks |
CN202180103443.1A CN118140231A (en) | 2021-10-15 | 2021-10-15 | Adaptive diffusion in a graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/124130 WO2023060563A1 (en) | 2021-10-15 | 2021-10-15 | Adaptive diffusion in graph neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023060563A1 true WO2023060563A1 (en) | 2023-04-20 |
Family
ID=85987960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/124130 WO2023060563A1 (en) | 2021-10-15 | 2021-10-15 | Adaptive diffusion in graph neural networks |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118140231A (en) |
WO (1) | WO2023060563A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116541794A (en) * | 2023-07-06 | 2023-08-04 | 中国科学技术大学 | Sensor data anomaly detection method based on self-adaptive graph annotation network |
CN117633635A (en) * | 2024-01-23 | 2024-03-01 | 南京信息工程大学 | Dynamic rumor detection method based on space-time propagation diagram |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200151288A1 (en) * | 2018-11-09 | 2020-05-14 | Nvidia Corp. | Deep Learning Testability Analysis with Graph Convolutional Networks |
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
-
2021
- 2021-10-15 WO PCT/CN2021/124130 patent/WO2023060563A1/en unknown
- 2021-10-15 CN CN202180103443.1A patent/CN118140231A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200151288A1 (en) * | 2018-11-09 | 2020-05-14 | Nvidia Corp. | Deep Learning Testability Analysis with Graph Convolutional Networks |
US20200285944A1 (en) * | 2019-03-08 | 2020-09-10 | Adobe Inc. | Graph convolutional networks with motif-based attention |
Non-Patent Citations (3)
Title |
---|
INDRO SPINELLI; SIMONE SCARDAPANE; AURELIO UNCINI: "Adaptive Propagation Graph Convolutional Network", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 September 2020 (2020-09-28), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081771656, DOI: 10.1109/TNNLS.2020.3025110 * |
YANG ZEYU: "Master's Thesis", 16 December 2020, NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, CN, article YANG, ZEYU: "Research on GCN with Neighborhood Selection Strategy", pages: 1 - 50, XP009545019, DOI: 10.27251/d.cnki.gnjdc.2020.000754 * |
ZHAO JIALIN, DING MING, KHARLAMOV EVGENY: "Adaptive Diffusion in Graph Neural Networks", 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 20 April 2023 (2023-04-20), pages 1 - 3, XP093058146 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116541794A (en) * | 2023-07-06 | 2023-08-04 | 中国科学技术大学 | Sensor data anomaly detection method based on self-adaptive graph annotation network |
CN116541794B (en) * | 2023-07-06 | 2023-10-20 | 中国科学技术大学 | Sensor data anomaly detection method based on self-adaptive graph annotation network |
CN117633635A (en) * | 2024-01-23 | 2024-03-01 | 南京信息工程大学 | Dynamic rumor detection method based on space-time propagation diagram |
CN117633635B (en) * | 2024-01-23 | 2024-04-16 | 南京信息工程大学 | Dynamic rumor detection method based on space-time propagation diagram |
Also Published As
Publication number | Publication date |
---|---|
CN118140231A (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023060563A1 (en) | Adaptive diffusion in graph neural networks | |
US20230169140A1 (en) | Graph convolutional networks with motif-based attention | |
Lee et al. | Complex-valued neural networks: A comprehensive survey | |
Stutz et al. | Learning optimal conformal classifiers | |
US20240112090A1 (en) | Concurrent optimization of machine learning model performance | |
US11042802B2 (en) | System and method for hierarchically building predictive analytic models on a dataset | |
WO2022166115A1 (en) | Recommendation system with adaptive thresholds for neighborhood selection | |
US11475543B2 (en) | Image enhancement using normalizing flows | |
US20230306723A1 (en) | Systems, methods, and apparatuses for implementing self-supervised domain-adaptive pre-training via a transformer for use with medical image classification | |
JP6851634B2 (en) | Feature conversion module, pattern identification device, pattern identification method, and program | |
CN114610897A (en) | Medical knowledge map relation prediction method based on graph attention machine mechanism | |
Luo et al. | Multinomial Bayesian extreme learning machine for sparse and accurate classification model | |
Wang et al. | Fully hyperbolic graph convolution network for recommendation | |
WO2023000165A1 (en) | Method and apparatus for classifying nodes of a graph | |
CN114756694A (en) | Knowledge graph-based recommendation system, recommendation method and related equipment | |
Wu et al. | JPEG steganalysis based on denoising network and attention module | |
US20220253688A1 (en) | Recommendation system with adaptive weighted baysian personalized ranking loss | |
CN113326884A (en) | Efficient learning method and device for large-scale abnormal graph node representation | |
Liu et al. | GDST: Global Distillation Self-Training for Semi-Supervised Federated Learning | |
Wang et al. | Variance of the gradient also matters: Privacy leakage from gradients | |
Yang et al. | Confidence-based and sample-reweighted test-time adaptation | |
Dyer et al. | Gradient-assisted calibration for financial agent-based models | |
Zhong et al. | Lightweight Federated Graph Learning for Accelerating Classification Inference in UAV-assisted MEC Systems | |
Mavromatis et al. | SemPool: Simple, robust, and interpretable KG pooling for enhancing language models | |
Sierra et al. | Global and local neural network ensembles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21960294 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |