WO2023060563A1 - Adaptive diffusion in graph neural networks - Google Patents

Adaptive diffusion in graph neural networks Download PDF

Info

Publication number
WO2023060563A1
WO2023060563A1 PCT/CN2021/124130 CN2021124130W WO2023060563A1 WO 2023060563 A1 WO2023060563 A1 WO 2023060563A1 CN 2021124130 W CN2021124130 W CN 2021124130W WO 2023060563 A1 WO2023060563 A1 WO 2023060563A1
Authority
WO
WIPO (PCT)
Prior art keywords
gnn
neighborhood radius
parameters
feature
loss function
Prior art date
Application number
PCT/CN2021/124130
Other languages
French (fr)
Inventor
Jialin Zhao
Ming Ding
Jie Tang
Evgeny Kharlamov
Original Assignee
Robert Bosch Gmbh
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh, Tsinghua University filed Critical Robert Bosch Gmbh
Priority to PCT/CN2021/124130 priority Critical patent/WO2023060563A1/en
Priority to CN202180103443.1A priority patent/CN118140231A/en
Publication of WO2023060563A1 publication Critical patent/WO2023060563A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning

Definitions

  • aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for adaptive diffusion in graph networks.
  • Graph neural networks are a type of neural networks that can be directly coupled with graph-structured data.
  • GCNs graph convolution networks
  • the graph convolution operation is designed to aggregate information from immediate neighboring nodes into the central node, which is also referred to as message passing.
  • message passing To propagate information between nodes that are further away, multiple neural layers can be stacked to go beyond the immediate hop of neighbors.
  • spectral based GNNs leverage graph spectral properties to collect signals from global neighbors.
  • GDC graph diffusion convolution
  • GNNs graph neural networks
  • message passing based GNNs e.g., graph convolutional networks
  • GDC graph diffusion convolution
  • ADC adaptive diffusion convolution
  • ADC is allowed to automatically learn a customized neighborhood size for each GNN layer and each feature dimension from data.
  • ADC can empower GNNs to capture neighbors’ information from diverse graph structures, which is fully dependent on the data and downstream learning objective.
  • GNNs are then capable of selectively modeling each neighbor’s multiple feature signals.
  • the ADC disclosed in the present disclosure is a general plugin that can be directly applied to existing GNN models. By plugging it on GNNs, the upgraded GNNs can offer significant performance advances over their vanilla versions across a wide range of datasets. Furthermore, by learning the propagation neighborhood size automatically, ADC can consistently outperform GDC, which customizes this for each dataset by grid search. Finally, it is demonstrated that GNNs’ model capacity can benefit from the better coupling between the its architecture, graph structures, and feature channels, that is, by learning dedicated neighborhood size for each GNN layer and feature dimension.
  • a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification comprises inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
  • GNN graph neural Network
  • the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
  • updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprises updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
  • the method further comprises inputting data of validation set into the GNN; and wherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprises updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
  • the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
  • the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
  • neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
  • the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
  • the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
  • the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
  • step number away from the central node is truncated to a constant instead of infinity.
  • the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as ⁇ (1- ⁇ ) k , wherein k represents a step number away from a central node, and ⁇ is a probability of a user staying in a current page.
  • node classification may be image classification, speech recognition or anomaly detection, etc.
  • Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC adaptive diffusion convolution
  • Fig. 2 illustrates another exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC adaptive diffusion convolution
  • Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC adaptive diffusion convolution
  • Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
  • GNNs The success of GNNs largely relies on the process of aggregating information from neighbors defined by the input graph structures, which is generally named message passing.
  • message passing based GNNs e.g., GCNs leverage the immediate neighbors of each node during the aggregation process
  • GDC is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion.
  • the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
  • the adaptive diffusion convolution (ADC) strategy is proposed to automatically learn the optimal neighborhood size from the data. Furthermore, the conventional assumption that all GNN layers and feature channels (or dimensions) should use the same neighborhood size for propagation can be broken in the present disclosure. It is designed to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures, which is the unique property that differ GNNs from traditional neural networks.
  • ADC adaptive diffusion convolution
  • the task Given the input feature matrix X and a subset of node label Y, the task is to predict the labels of remaining nodes.
  • the task of node classification may be image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution.
  • the task of node classification may be speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model.
  • the task of node classification may be anomaly detection, including but not limited to, frauds recognition, dataset preprocessing, detection of online review spams, fake users and rum in social media, fake news, etc.
  • the tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
  • the convolution operation on graphs can be described as the process of neighborhood feature aggregation or message passing.
  • the message passing graph convolutional networks can be simply defined as below:
  • D is the diagonal degree matrix with and denotes hidden feature after transformation.
  • T is used herein to denote
  • the feature transformation function describes how features transform inside each node and the feature propagation function ⁇ ( ⁇ ) describes how features propagate between nodes. Essentially, how good a GNN model can utilize graph structures heavily depends on the design of the feature propagation function.
  • f (T) is a matrix that can be generated by T. So f (T) can be represented as To quantify how far each node could aggregate features from, the neighborhood radius of a node as r is defined as below:
  • ⁇ k denotes the influence from k-step-away nodes.
  • the model puts more emphasis on long distance nodes, i.e., global information.
  • the model amplifies local information.
  • GDC graph diffusion convolution
  • the weight coefficients should satisfy such that the signal strength is not amplified nor reduced through the propagation.
  • the set of weight coefficients can be generated from the heat-kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
  • Heat kernel incorporates prior knowledge into the GNN model, which means the feature propagation between nodes follows Newton’s law of cooling, i.e., the feature propagation speed between two nodes is proportional to the difference between their features.
  • this prior knowledge can be described as below:
  • N (i) denotes the neighborhood of node i
  • x i (t) represents the feature of node i after diffusion time t.
  • the heat kernel version of the GDC has r as below:
  • t is the neighborhood radius r for the heat kernel based GDC, that is, t becomes a perfect continuous substitute for the hop number in multi-hop models.
  • ADC adaptive diffusion convolution
  • the training process of learning t can be the same to learning other weight and bias parameters in the model.
  • the training process of learning t and other weight and bias parameters of the GNN is performed jointly and directly on the training set by considering t as one of the trainable parameters, via minimizing the loss function of the training set using gradient of t along with other weight and bias parameters of the GNN on the training set.
  • all the other trainable parameters w including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
  • w * is obtained when the loss function of the training set is minimized after certain training epochs.
  • t is learned on the validation set with the learned w *
  • t * is obtained when the loss function of the validation set is minimized after certain training epochs.
  • the learned t * would change the value of every time t is updated, it is needed to make w converge to the optimal value, causing it too expensive to train.
  • e denotes the number of training epochs
  • ⁇ 1 and ⁇ 2 denote the learning rate on the training and validation sets, respectively.
  • all the other trainable parameters w including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, w (e) is updated to w (e+1) by using the gradient of w on the training set. Then t is learned on the validation set with the updated w (e+1) during the same epoch, t (e) is updated to t (e+1) by using the gradient of t on the validation set. After all the training epochs, the optimal t * and w * may be obtained. This method could help avoid overfitting and thus offers better generalization.
  • GDC uses the predetermined neighborhood radius for feature propagation.
  • GDC proposes to use different neighborhood radius t for different datasets by hand-tuning the values.
  • the disclosed methods described above further GDC's direction by automatically learning the radius from the given graph. This implies that one t for one dataset, that is, the same t for all GNN layers and all feature channels (dimensions) .
  • Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC adaptive diffusion convolution
  • the aforementioned strategy for updating t during the training of the model empowers the method to adaptively learn specific t for all layers and all feature channels.
  • Fig. 2 illustrates another exemplary schematic diagram of Adaptive Diffusion Convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC Adaptive Diffusion Convolution
  • a separate feature propagation function with a unique neighborhood radius is trained.
  • the contributions from close e.g., in 1-hop
  • distant neighbors e.g., in 3-hops
  • the contributions from close neighbors are much more significant than from distant neighbors (shown as darker color concentrated around center) .
  • the method of ADC based on heat kernel as an example herein.
  • the method of ADC can be a generalized ADC (GADC) , in other words, not limiting the weight coefficients ⁇ k as heat kernel or other specific examples.
  • GADC generalized ADC
  • the feature propagation of GADC can be described with Eq. (3) , and further to learn for each layer and feature channel or dimension, the feature propagation of GADC is disclosed as below:
  • ADC and GADC in the present disclosure are flexible components that can be directly plugged into existing GNN models, enabling them to adaptively learn the neighborhood radius for feature aggregation.
  • Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
  • ADC adaptive diffusion convolution
  • ADC ADC Adaptive Diagonal Deformation
  • image classification in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution
  • speech recognition in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model
  • anomaly detection such as frauds recognition, dataset preprocessing, detection of online review spams, fake users and rum in social media, fake news, etc.
  • the tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
  • the method is for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing, and may begin at block 301, with inputting data of training set into the GNN.
  • GNN graph neural Network
  • the method proceeds to block 302, with updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN.
  • the trainable parameters of the GNN are updated by a first gradient on the training set to reduce the first loss function of the training set with one or more epochs.
  • the trainable parameters of the GNN can be updated based on Eq. (9) or Eq. (10) .
  • the method proceeds to block 303, with updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away.
  • the sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
  • the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
  • the one or more neighborhood radius related parameters can be learned on the training set as other trainable parameters besides weight and bias parameters, based on the reduction of the first loss function of the training set by using the gradient descent.
  • the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated as a bi-level optimization.
  • the one or more neighborhood radius related parameters can be updated based on Eq. (8) or Eq. (11) .
  • the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set, refer to Eq. (8) and (9) .
  • All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
  • the updated trainable parameters are obtained when the loss function of the training set is minimized after certain training epochs.
  • the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters, and the updated one or more neighborhood radius related parameters are obtained when the loss function of the validation set is minimized after certain training epochs.
  • the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch, refer to Eq. (10) and (11) .
  • All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set.
  • the trainable parameters are updated by using the gradient on the training set.
  • the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters during the same epoch.
  • the one or more neighborhood radius related parameters are updated by using the gradient on the validation set.
  • the updated trainable parameters and one or more neighborhood radius related parameters may be obtained.
  • the influence weights of all the neighbor nodes with different steps away can be generated in different ways, this would not limit the scope of the disclosure.
  • the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as wherein k represents a step number away from a central node, and t is a diffusion time.
  • the step number away from the central node can be truncated to a constant instead of infinity for feasible computability when the heat-kernel is used.
  • the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as ⁇ (1- ⁇ ) k , wherein k represents a step number away from a central node, and ⁇ is a probability of a user staying in a current page.
  • the method proceeds to block 304, with calculating the neighborhood radius based on the updated one or more neighborhood radius related parameters.
  • the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly, refer to Eq. (3) or (7) . That is, all the GNN layers and feature dimensions should use the same neighborhood radius for feature propagation.
  • the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively. That is, the GNN layers and feature dimensions can use respective learned neighborhood radius for feature propagation, refer to Eq. (12) or (13) .
  • propagating on the input dimensions can generate better results than propagating on the output dimensions, the feature propagation of the message passing is performed before feature transformation of the message passing.
  • ADC is able to enhance any graph-based model, particularly GNNs.
  • neighborhood radius can be learned automatically for datasets. Specifically, learning unique neighborhood radius for each feature channel in each GNN layer can further improve the performance for downstream graph mining tasks.
  • Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
  • the computing system may comprise at least one processor 410.
  • the computing system may further comprise at least one storage device 420. It should be appreciated that the storage device 420 may store computer-executable instructions that, when executed, cause the processor 410 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
  • the embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium.
  • the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed.
  • GNN graph neural Network
  • the method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
  • the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
  • the embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
  • modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters. Numerous other aspects are provided.

Description

ADAPTIVE DIFFUSION IN GRAPH NEURAL NETWORKS FIELD
Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for adaptive diffusion in graph networks.
BACKGROUND
Graph neural networks (GNNs) are a type of neural networks that can be directly coupled with graph-structured data. Specifically, graph convolution networks (GCNs) generalize the convolution operation to local graph structures, offering attractive performance for various graph mining tasks. The graph convolution operation is designed to aggregate information from immediate neighboring nodes into the central node, which is also referred to as message passing. To propagate information between nodes that are further away, multiple neural layers can be stacked to go beyond the immediate hop of neighbors. To directly collect high-order information, spectral based GNNs leverage graph spectral properties to collect signals from global neighbors.
Though generating promising results, both strategies are limited to a pre-determined and fixed neighborhood for passing and receiving messages. Essentially, these methods have an implicit assumption that all graph datasets share the same size of receptive field during the message passing process. To break this, graph diffusion convolution (GDC) was recently proposed to extend the discrete message passing process in GCN to a diffusion process, enabling it to aggregate information from a larger neighborhood. However, for each input graph, GDC hand-tunes the best neighborhood size for feature aggregation by grid-searching the parameters on the validation set, making its practical application limited and sensitive.
SUMMARY
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The success of graph neural networks (GNNs) largely relies on the process of aggregating information from neighbors defined by the input graph structures. Notably, message passing based GNNs, e.g., graph convolutional networks, leverage the immediate neighbors of each node during the aggregation process, and recently,  graph diffusion convolution (GDC) is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
To eliminate the manual search process of the optimal propagation neighborhood in GDC, it is disclosed in the present disclosure the adaptive diffusion convolution (ADC) strategy that supports learning the optimal neighborhood from the data automatically. ADC achieves this by formalizing the task as a bi-level optimization problem, enabling the customized learning of one optimal propagation neighborhood size for each dataset. In other words, all GNN layers and feature channels (dimensions) share the same neighborhood size during message passing on each graph.
It is further disclosed in the present disclosure, ADC is allowed to automatically learn a customized neighborhood size for each GNN layer and each feature dimension from data. By learning a unique propagation neighborhood for each layer, ADC can empower GNNs to capture neighbors’ information from diverse graph structures, which is fully dependent on the data and downstream learning objective. Similarly, by learning distinct neighborhood size for each feature channel, GNNs are then capable of selectively modeling each neighbor’s multiple feature signals. Altogether, ADC makes GNNs fully coupled with the graph structures and all feature channels.
The ADC disclosed in the present disclosure is a general plugin that can be directly applied to existing GNN models. By plugging it on GNNs, the upgraded GNNs can offer significant performance advances over their vanilla versions across a wide range of datasets. Furthermore, by learning the propagation neighborhood size automatically, ADC can consistently outperform GDC, which customizes this for each dataset by grid search. Finally, it is demonstrated that GNNs’ model capacity can benefit from the better coupling between the its architecture, graph structures, and feature channels, that is, by learning dedicated neighborhood size for each GNN layer and feature dimension.
According to an aspect, a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius  to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
According to a further aspect, wherein the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
According to a further aspect, updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprises updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
According to a further aspect, the method further comprises inputting data of validation set into the GNN; and wherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprises updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
According to a further aspect, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
According to a further aspect, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
According to a further aspect, wherein the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
According to a further aspect, wherein the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
According to a further aspect, wherein the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
According to a further aspect, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as
Figure PCTCN2021124130-appb-000001
wherein k represents a step number away from a central node, and t is a diffusion time.
According to a further aspect, wherein the step number away from the central node is truncated to a constant instead of infinity.
According to a further aspect, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α)  k,  wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
The models to which the plugin in the present disclosure applied can focus on the problem of semi-supervised node classification, the input of which may include an undirected network containing multiple nodes and edges therebetween. Given the input feature and a set of labelled nodes, the task is to predict the labels of remaining nodes. As examples but not limiting, node classification may be image classification, speech recognition or anomaly detection, etc.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 2 illustrates another exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure.
Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
DETAILED DESCRIPTION
The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
The success of GNNs largely relies on the process of aggregating information from neighbors defined by the input graph structures, which is generally named message passing. Notably, message passing based GNNs, e.g., GCNs leverage the immediate neighbors of each node during the aggregation process, and recently, GDC is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited.
To address this issue, the adaptive diffusion convolution (ADC) strategy is proposed to automatically learn the optimal neighborhood size from the data. Furthermore, the conventional assumption that all GNN layers and feature channels (or dimensions) should use the same neighborhood size for propagation can be broken in the present  disclosure. It is designed to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures, which is the unique property that differ GNNs from traditional neural networks. By directly plugging ADC in the present disclosure into existing GNNs, consistent and significant outperformance over both GDC and their vanilla versions across various datasets may be obtained, realizing improved model capacity brought by automatically learning unique neighborhood size per layer and per channel in GNNs.
In the context of GNNs, it is focused on the problem of semi-supervised node classification in the present disclosure. The input may include an undirected network G= (V, E) , where the node set V contains of n nodes {v 1, …, v n} and E is the edge set, and A∈R n×n is the symmetric adjacency matrix of graph G. Given the input feature matrix X and a subset of node label Y, the task is to predict the labels of remaining nodes.
In an aspect, the task of node classification may be image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution. In another aspect, the task of node classification may be speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model. In yet another aspect, the task of node classification may be anomaly detection, including but not limited to, frauds recognition, dataset preprocessing, detection of online review spams, fake users and rumors in social media, fake news, etc. The tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
Back to the GNNs, the convolution operation on graphs can be described as the process of neighborhood feature aggregation or message passing. The message passing graph convolutional networks can be simply defined as below:
Figure PCTCN2021124130-appb-000002
Where H  (l) denotes the hidden feature of layer l with H  (0) =X and X as the input feature matrix, 
Figure PCTCN2021124130-appb-000003
denotes feature transformation and γ (·) denotes feature propagation. Taking GCN as an example, the feature transformation and feature propagation functions correspond to
Figure PCTCN2021124130-appb-000004
and
Figure PCTCN2021124130-appb-000005
respectively, in which D is the diagonal degree matrix with
Figure PCTCN2021124130-appb-000006
and
Figure PCTCN2021124130-appb-000007
denotes hidden feature after transformation. Note that GCN uses the adjacency matrix A with self-loop, so it actually uses
Figure PCTCN2021124130-appb-000008
To simplify the notations, T is used herein to denote
Figure PCTCN2021124130-appb-000009
Straightforwardly, the feature transformation function
Figure PCTCN2021124130-appb-000010
describes how features transform inside each node and the feature propagation function γ (·) describes how features propagate between nodes. Essentially, how good a GNN model can utilize graph structures heavily depends on the design of the feature propagation function.
Most graph-based models can be represented as
Figure PCTCN2021124130-appb-000011
where f (T) is a matrix that can be generated by T. So f (T) can be represented as
Figure PCTCN2021124130-appb-000012
Figure PCTCN2021124130-appb-000013
To quantify how far each node could aggregate features from, the neighborhood radius of a node as r is defined as below:
Figure PCTCN2021124130-appb-000014
Here, θ k denotes the influence from k-step-away nodes. For a large r, it means that the model puts more emphasis on long distance nodes, i.e., global information. For a small r, it means that the model amplifies local information.
For GCN, the neighborhood radius is fixed to r=1, which is just the range of nodes directly connected to it. To collect information beyond direct connections, it is required to stack multiple GCN layers to reach high-order neighborhoods.
There are attempts to improve GCN's feature propagation function from first-hop neighborhood to multi-hop neighborhood, such as MixHop, JKNet, and SGC. For example, SGC uses feature propagation function
Figure PCTCN2021124130-appb-000015
where
Figure PCTCN2021124130-appb-000016
Figure PCTCN2021124130-appb-000017
In other words, the neighborhood radius r=K for SGC, which is the range of neighborhoods to collect information from each GNN layer. However, for all multi-hop models, the discrete nature of hop numbers makes r non-differentiable, which is unfavourably for subsequent calculating.
A line of work has been focused on generalizing feature propagation from discrete hops to continuous graph diffusion. Notably, graph diffusion convolution (GDC) addresses this by propagation setup as below:
Figure PCTCN2021124130-appb-000018
Where k is summed from 0 to infinity, making each node aggregate information from the whole graph. In Eq. (3) , the weight coefficients should satisfy
Figure PCTCN2021124130-appb-000019
such that the signal strength is not amplified nor reduced through the propagation. In an aspect, the set of weight coefficients can be generated from personalized PageRank, as θ k=α (1-α)  k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page. In another aspect, the set of  weight coefficients can be generated from the heat-kernel as
Figure PCTCN2021124130-appb-000020
wherein k represents a step number away from a central node, and t is a diffusion time.
The representation of the set of weight coefficients referred to in the present disclosure herein is merely used as examples. The heat kernel is taken as an example hereinafter and would not limit the scope of the present disclosure.
Heat kernel incorporates prior knowledge into the GNN model, which means the feature propagation between nodes follows Newton’s law of cooling, i.e., the feature propagation speed between two nodes is proportional to the difference between their features. Formally, this prior knowledge can be described as below:
Figure PCTCN2021124130-appb-000021
Where N (i) denotes the neighborhood of node i, x i (t) represents the feature of node i after diffusion time t. This differential equation can be solved as below:
X (t) =H tX (0)     (5)
Where X (t) denotes the feature matrix after diffusion time t, and H t=e - (I-T) t is the heat kernel.
The heat kernel version of the GDC has r as below:
Figure PCTCN2021124130-appb-000022
This suggests that t is the neighborhood radius r for the heat kernel based GDC, that is, t becomes a perfect continuous substitute for the hop number in multi-hop models.
Recall that the heat kernel version of graph diffusion convolution (GDC) has the following feature propagation function as below:
Figure PCTCN2021124130-appb-000023
Where the Laplacian matrix L=I-T. For each graph dataset, in GDC, it requires the manual grid search step to determine the neighborhood radius related parameter t. Moreover, t is fixed for all feature channels and propagation layers in each dataset.
In the present disclosure, it is disclosed a method called adaptive diffusion convolution (ADC) of how to adaptively learn the neighborhood radius from data for each graph, and it is further disclosed how to generalize it for different feature channels and GNN layers.
It is enabled to replace GNNs’ discrete feature propagation function with the continuous heat kernel by GDC. Moving forward from which, instead of hand-tuning t, an optimal neighborhood radius r can be obtained by calculating the gradient of t and update the t to converge.
In an aspect, the training process of learning t can be the same to learning other weight and bias parameters in the model. Specifically, the training process of learning  t and other weight and bias parameters of the GNN is performed jointly and directly on the training set by considering t as one of the trainable parameters, via minimizing the loss function of the training set using gradient of t along with other weight and bias parameters of the GNN on the training set.
However, learning t directly on the training set may cause overfitting in certain cases. To address the issue, it is further disclosed to train t on the validation set instead of the training set, by using the gradient of t on the validation set. The goal for the model is to find t * that minimizes the loss function of the validation set
Figure PCTCN2021124130-appb-000024
wherein w denotes all the other trainable parameters in the feature transformation function and w * denotes the set of parameters that minimize the loss function of the training set
Figure PCTCN2021124130-appb-000025
This strategy can be formalized as a bi-level optimization problem as below:
Figure PCTCN2021124130-appb-000026
Figure PCTCN2021124130-appb-000027
With Eq. (8) and (9) , all the other trainable parameters w, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. w *is obtained when the loss function of the training set
Figure PCTCN2021124130-appb-000028
is minimized after certain training epochs. Then t is learned on the validation set with the learned w *, and t *is obtained when the loss function of the validation set
Figure PCTCN2021124130-appb-000029
is minimized after certain training epochs. As the learned t *would change the value of 
Figure PCTCN2021124130-appb-000030
every time t is updated, it is needed to make w converge to the optimal value, causing it too expensive to train.
For the purpose of decreasing the training cost, it is further disclosed an approximation method to update t every time w is updated, which can be as below:
Figure PCTCN2021124130-appb-000031
Figure PCTCN2021124130-appb-000032
Where e denotes the number of training epochs, α 1 and α 2 denote the learning rate on the training and validation sets, respectively.
With Eq. (10) and (11) , all the other trainable parameters w, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, w  (e) is updated to w  (e+1) by using the gradient of w on the training set. Then t is learned on the validation set with the updated w  (e+1) during the same epoch, t  (e) is updated to t  (e+1) by using the gradient of t on the validation set. After all the training epochs, the optimal t *and w *may be obtained. This method could help avoid overfitting and thus offers better generalization.
Conventional GNNs use the predetermined neighborhood radius for feature propagation. As described above, GDC proposes to use different neighborhood radius t for different datasets by hand-tuning the values. The disclosed methods described above further GDC's direction by automatically learning the radius from the given graph. This implies that one t for one dataset, that is, the same t for all GNN layers and all feature channels (dimensions) .
Fig. 1 illustrates an exemplary schematic diagram of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure. It can be seen in Fig. 1, a same learned t (shown as 2 for example) is applied to all layers of GNN and all feature channels (dimensions) . When the t is large, the contributions from close and distant neighbors would have little difference, and when the t is small, the contributions from close neighbors would be much more significantly than distant neighbors, shown as the greyscale of the circles.
It is expected that for each layer and feature dimension, a unique r may be learned and used, making them adaptive for the final learning objective. The obstacle that prevents prior arts from achieving this lie in the infeasible challenge of hand-tuning or grid-searching the propagation function separately for each feature channel and GNN layer, given that as the number of parameters increases, the time complexity increases exponentially.
The aforementioned strategy for updating t during the training of the model empowers the method to adaptively learn specific t for all layers and all feature channels.
It is disclosed that to learn a unique r for all layers and all feature channels, the method described above of the heat kernel can be evolved as below by extending the feature propagation function in Eq. (7) for each layer and feature channel, that is from t to
Figure PCTCN2021124130-appb-000033
Figure PCTCN2021124130-appb-000034
Where
Figure PCTCN2021124130-appb-000035
denotes the neighborhood radius t for the l-th layer and i-th channel, 
Figure PCTCN2021124130-appb-000036
represents the i-th column of the hidden feature
Figure PCTCN2021124130-appb-000037
i.e., the feature on channel or dimension i, and
Figure PCTCN2021124130-appb-000038
denotes the feature propagation function on the l-th layer and i-th channel. This feature propagation function enables the GNN to train a separate t for each feature channel and layer.
Fig. 2 illustrates another exemplary schematic diagram of Adaptive Diffusion Convolution (ADC) , in accordance with various aspects of the present disclosure.
As can be seen in Fig. 2, for the hidden feature
Figure PCTCN2021124130-appb-000039
of feature channel i in layer l, a separate feature propagation function
Figure PCTCN2021124130-appb-000040
with a unique neighborhood radius 
Figure PCTCN2021124130-appb-000041
is trained. When t is large (e.g., t=3) , the contributions from close (e.g., in 1-hop) and distant neighbors (e.g., in 3-hops) have little difference (shown as the relatively similar color shading across different hops) . When t is small (e.g., t=1) , the contributions from close neighbors are much more significant than from distant neighbors (shown as darker color concentrated around center) .
It is disclosed the method of ADC based on heat kernel as an example herein. Without loss of generality, the method of ADC can be a generalized ADC (GADC) , in other words, not limiting the weight coefficients θ k as heat kernel or other specific examples. The feature propagation of GADC can be described with Eq. (3) , and further to learn
Figure PCTCN2021124130-appb-000042
for each layer and feature channel or dimension, the feature propagation of GADC is disclosed as below:
Figure PCTCN2021124130-appb-000043
Where
Figure PCTCN2021124130-appb-000044
denotes the weight coefficient for k-hop neighbors on l-th layer and i-th channel/dimension. 
Figure PCTCN2021124130-appb-000045
is restricted during training, that is, the sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
As it operates differently on each channel, whether to propagate before or after the feature transformation function actually matters. Empirically, it is found that propagating on the input channels generates better results than propagating on the output channels. Therefore, the feature propagation and transformation steps in the original message passing networks from Eq. (1) are swapped as below:
Figure PCTCN2021124130-appb-000046
Additionally, calculating e -Lt directly is in feasible for large graphs. Practically, it is needed to use the top-K truncation to approximate the heat kernel, making ADC (in Eq. (12) ) and GADC (in Eq. (13) ) respectively updated as below:
Figure PCTCN2021124130-appb-000047
Figure PCTCN2021124130-appb-000048
ADC and GADC in the present disclosure are flexible components that can be directly plugged into existing GNN models, enabling them to adaptively learn the neighborhood radius for feature aggregation.
Fig. 3 illustrates an exemplary flow chart of adaptive diffusion convolution (ADC) , in accordance with various aspects of the present disclosure. As described below, some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for  implementation of all embodiments. In some examples, the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
Generally the approach of ADC is discussed in the context of the task of classification, including but not limited to, image classification, in which each node represents an image, an edge may exist if two images are determined to be in a same class, and features may be derived from the pixels to a probability distribution; speech recognition, in which each node represents an waveform of a sound record, an edge may exist if two sound records are determined to be in a same class, and features may be the waveform of the sound to a probability distribution over the discrete states of a Hidden Markov Model; anomaly detection, such as frauds recognition, dataset preprocessing, detection of online review spams, fake users and rumors in social media, fake news, etc. The tasks referred to in the present disclosure herein are merely used as examples, and the present disclosure can be applied to any scenario that inputs can be learned as node representations and connections between the nodes can be learned as edges.
The method is for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing, and may begin at block 301, with inputting data of training set into the GNN.
The method proceeds to block 302, with updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN. In an aspect, the trainable parameters of the GNN are updated by a first gradient on the training set to reduce the first loss function of the training set with one or more epochs. As an example, the trainable parameters of the GNN can be updated based on Eq. (9) or Eq. (10) .
The method proceeds to block 303, with updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away. The sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1.
In an aspect, the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set. As an example, the one or more neighborhood radius related parameters can be learned on the training set as other trainable parameters besides weight and bias parameters, based on the reduction of the first loss function of the training set by using the gradient descent.
In another aspect, the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated as a bi-level optimization. As an example, inputting data of validation set into the GNN, and updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN. For example, the one or more neighborhood radius related parameters can be updated based on Eq. (8) or Eq. (11) .
In an aspect, the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set, refer to Eq. (8) and (9) . All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. The updated trainable parameters are obtained when the loss function of the training set is minimized after certain training epochs. Then the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters, and the updated one or more neighborhood radius related parameters are obtained when the loss function of the validation set is minimized after certain training epochs.
In another aspect, the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch, refer to Eq. (10) and (11) . All the trainable parameters, including at least all the weight and bias parameters of the GNN, are firstly learned on the training set. For each epoch, the trainable parameters are updated by using the gradient on the training set. Then the one or more neighborhood radius related parameters are learned on the validation set with the updated trainable parameters during the same epoch. The one or more neighborhood radius related parameters are updated by using the gradient on the validation set. After all the training epochs, the updated trainable parameters and one or more neighborhood radius related parameters may be obtained.
The influence weights of all the neighbor nodes with different steps away can be generated in different ways, this would not limit the scope of the disclosure. As an example, the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as
Figure PCTCN2021124130-appb-000049
wherein k represents a step number away from a central node, and t is a diffusion time. Besides, the step number away from the central node can be truncated to a constant instead of infinity for feasible computability when the heat-kernel is used. As another example, the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α)  k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
The method proceeds to block 304, with calculating the neighborhood radius based on the updated one or more neighborhood radius related parameters. In an aspect, the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly, refer to Eq. (3) or (7) . That is, all the GNN layers and feature dimensions should use the same neighborhood radius for feature propagation. In another aspect, the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively. That is, the GNN layers and feature dimensions can use respective learned neighborhood radius for feature propagation, refer to Eq. (12) or (13) .
In an aspect, as operate differently on each dimension, propagating on the input dimensions can generate better results than propagating on the output dimensions, the feature propagation of the message passing is performed before feature transformation of the message passing.
As discussed above with Fig. 1-3, ADC is able to enhance any graph-based model, particularly GNNs. By directly plugging ADC into existing GNNs, neighborhood radius can be learned automatically for datasets. Specifically, learning unique neighborhood radius for each feature channel in each GNN layer can further improve the performance for downstream graph mining tasks.
Fig. 4 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure. The computing system may comprise at least one processor 410. The computing system may further comprise at least one storage device 420. It should be appreciated that the storage device 420 may store computer-executable instructions that, when executed, cause the processor 410 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform a method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification is disclosed. The method comprises: inputting data of training set into the GNN; updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN; updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and calculated the neighborhood radius  to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-3.
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims (15)

  1. A method for training a graph neural Network (GNN) to learn neighborhood radius for feature propagation of message passing to perform node classification, comprising:
    inputting data of training set into the GNN;
    updating trainable parameters of the GNN based at least partially on a reduction of a first loss function of the training set, wherein the trainable parameters comprise at least weight and bias parameters of the GNN;
    updating one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set, wherein the one or more neighborhood radius related parameters comprise at least influence weights of all the neighbor nodes with different steps away, and wherein sum of the influence weights of all the neighbor nodes on each layer of the GNN equals to 1; and
    calculated the neighborhood radius to be used in the feature propagation of message passing based on the updated one or more neighborhood radius related parameters.
  2. The method of claim 1, wherein the trainable parameters of the GNN and the one or more neighborhood radius related parameters are updated jointly based at least partially on the reduction of the first loss function of the training set.
  3. The method of claim 1, updating the trainable parameters of the GNN based at least partially on the reduction of the first loss function of the training set further comprising:
    updating the trainable parameters of the GNN by a first gradient on the training set to reduce the first loss function.
  4. The method of claim 3, further comprising:
    inputting data of validation set into the GNN; and
    wherein updating the one or more neighborhood radius related parameters based at least partially on the reduction of the first loss function of the training set further comprising:
    updating the one or more neighborhood radius related parameters by a second gradient on the validation set to reduce a second loss function of the validation set, wherein the second loss function of the validation set is calculated with the updated trainable parameters of the GNN.
  5. The method of claim 4, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN that minimize the first loss function of the training set.
  6. The method of claim 4, wherein the one or more neighborhood radius related parameters are updated based on the updated trainable parameters of the GNN each epoch.
  7. The method of claim 4, wherein the neighborhood radius is calculated for all layers and feature dimensions of the GNN uniformly.
  8. The method of claim 4, wherein the neighborhood radius is calculated for each layer and each feature dimension of the GNN respectively.
  9. The method of claim 8, wherein the feature propagation of the message passing is performed before feature transformation of the message passing with the updated neighborhood radius for each layer and each feature dimension of the GNN respectively.
  10. The method of claim 1, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the heat kernel as
    Figure PCTCN2021124130-appb-100001
    wherein k represents a step number away from a central node, and t is a diffusion time.
  11. The method of claim 10, wherein the step number away from the central node is truncated to a constant instead of infinity.
  12. The method of claim 1, wherein the influence weights of all the neighbor nodes with different steps away are generated based on the PageRank as α (1-α)  k, wherein k represents a step number away from a central node, and α is a probability of a user staying in a current page.
  13. A computer system, comprising:
    one or more processors; and
    one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-12.
  14. One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
  15. A computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
PCT/CN2021/124130 2021-10-15 2021-10-15 Adaptive diffusion in graph neural networks WO2023060563A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/124130 WO2023060563A1 (en) 2021-10-15 2021-10-15 Adaptive diffusion in graph neural networks
CN202180103443.1A CN118140231A (en) 2021-10-15 2021-10-15 Adaptive diffusion in a graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/124130 WO2023060563A1 (en) 2021-10-15 2021-10-15 Adaptive diffusion in graph neural networks

Publications (1)

Publication Number Publication Date
WO2023060563A1 true WO2023060563A1 (en) 2023-04-20

Family

ID=85987960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124130 WO2023060563A1 (en) 2021-10-15 2021-10-15 Adaptive diffusion in graph neural networks

Country Status (2)

Country Link
CN (1) CN118140231A (en)
WO (1) WO2023060563A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541794A (en) * 2023-07-06 2023-08-04 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network
CN117633635A (en) * 2024-01-23 2024-03-01 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151288A1 (en) * 2018-11-09 2020-05-14 Nvidia Corp. Deep Learning Testability Analysis with Graph Convolutional Networks
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151288A1 (en) * 2018-11-09 2020-05-14 Nvidia Corp. Deep Learning Testability Analysis with Graph Convolutional Networks
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
INDRO SPINELLI; SIMONE SCARDAPANE; AURELIO UNCINI: "Adaptive Propagation Graph Convolutional Network", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 September 2020 (2020-09-28), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081771656, DOI: 10.1109/TNNLS.2020.3025110 *
YANG ZEYU: "Master's Thesis", 16 December 2020, NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, CN, article YANG, ZEYU: "Research on GCN with Neighborhood Selection Strategy", pages: 1 - 50, XP009545019, DOI: 10.27251/d.cnki.gnjdc.2020.000754 *
ZHAO JIALIN, DING MING, KHARLAMOV EVGENY: "Adaptive Diffusion in Graph Neural Networks", 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 20 April 2023 (2023-04-20), pages 1 - 3, XP093058146 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541794A (en) * 2023-07-06 2023-08-04 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network
CN116541794B (en) * 2023-07-06 2023-10-20 中国科学技术大学 Sensor data anomaly detection method based on self-adaptive graph annotation network
CN117633635A (en) * 2024-01-23 2024-03-01 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram
CN117633635B (en) * 2024-01-23 2024-04-16 南京信息工程大学 Dynamic rumor detection method based on space-time propagation diagram

Also Published As

Publication number Publication date
CN118140231A (en) 2024-06-04

Similar Documents

Publication Publication Date Title
WO2023060563A1 (en) Adaptive diffusion in graph neural networks
US20230169140A1 (en) Graph convolutional networks with motif-based attention
Lee et al. Complex-valued neural networks: A comprehensive survey
Stutz et al. Learning optimal conformal classifiers
US20240112090A1 (en) Concurrent optimization of machine learning model performance
US11042802B2 (en) System and method for hierarchically building predictive analytic models on a dataset
WO2022166115A1 (en) Recommendation system with adaptive thresholds for neighborhood selection
US11475543B2 (en) Image enhancement using normalizing flows
US20230306723A1 (en) Systems, methods, and apparatuses for implementing self-supervised domain-adaptive pre-training via a transformer for use with medical image classification
JP6851634B2 (en) Feature conversion module, pattern identification device, pattern identification method, and program
CN114610897A (en) Medical knowledge map relation prediction method based on graph attention machine mechanism
Luo et al. Multinomial Bayesian extreme learning machine for sparse and accurate classification model
Wang et al. Fully hyperbolic graph convolution network for recommendation
WO2023000165A1 (en) Method and apparatus for classifying nodes of a graph
CN114756694A (en) Knowledge graph-based recommendation system, recommendation method and related equipment
Wu et al. JPEG steganalysis based on denoising network and attention module
US20220253688A1 (en) Recommendation system with adaptive weighted baysian personalized ranking loss
CN113326884A (en) Efficient learning method and device for large-scale abnormal graph node representation
Liu et al. GDST: Global Distillation Self-Training for Semi-Supervised Federated Learning
Wang et al. Variance of the gradient also matters: Privacy leakage from gradients
Yang et al. Confidence-based and sample-reweighted test-time adaptation
Dyer et al. Gradient-assisted calibration for financial agent-based models
Zhong et al. Lightweight Federated Graph Learning for Accelerating Classification Inference in UAV-assisted MEC Systems
Mavromatis et al. SemPool: Simple, robust, and interpretable KG pooling for enhancing language models
Sierra et al. Global and local neural network ensembles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960294

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE