WO2024011475A1 - Method and apparatus for graph neural architecture search under distribution shift - Google Patents
Method and apparatus for graph neural architecture search under distribution shift Download PDFInfo
- Publication number
- WO2024011475A1 WO2024011475A1 PCT/CN2022/105600 CN2022105600W WO2024011475A1 WO 2024011475 A1 WO2024011475 A1 WO 2024011475A1 CN 2022105600 W CN2022105600 W CN 2022105600W WO 2024011475 A1 WO2024011475 A1 WO 2024011475A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- graph
- gnn
- architecture
- searched
- graphs
- Prior art date
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000001537 neural effect Effects 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012360 testing method Methods 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 44
- 238000013461 design Methods 0.000 claims description 15
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 238000013459 approach Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000007876 drug discovery Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Definitions
- the present disclosure relates generally to artificial intelligence technical field, and more particularly, to graph neural architecture search technology.
- Graph-structured data has attracted lots of attention in recent years for its flexible representation ability in various domains.
- Graph neural networks (GNNs) models have been proposed and achieved great successes in many graph tasks.
- GNAS graph neural architecture search
- These automatically designed architectures have achieved competitive or better performances compared with manually designed GNNs on datasets with the same distributions under the independently and identically distributed (I. I. D. ) assumption, i.e., the training and test graphs are independently sampled from the identical distribution.
- a method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs comprises: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
- an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs comprises: a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space; an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
- an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs may comprise a memory and at least one processor coupled to the memory.
- the at least one processor may be configured to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
- a computer readable medium storing computer code for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs.
- the computer code when executed by a processor, may cause the processor to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
- a computer program product for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs.
- the computer program product may comprise processor executable computer code for: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
- FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
- FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
- FIG. 3 illustrates a flow chart of a method for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- FIG. 4 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- FIG. 5 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
- a graph is a non-linear data structure consisting of nodes and edges.
- the nodes may also be referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph.
- a simple graph 100 consists of nodes n1-n7 and edges e1-e6, edge e1 connects node n1 and n3, edge e2 connects node n2 and n4, and so on.
- Graph-structured data may be used in varioius domains, including social networks, information networks, biological networks, infrastructure networks, etc. which can not be structured in Euclidean space.
- graph 100 may be an assembly line layout graph, each node in the graph may represent a workshop, and each edge may represent the association between two workshops.
- graph 100 may be a circuit design layout graph, each node in the graph may represent a electron device of circuit module, and each edge may represent the association between two electron devices or circuit modules.
- Graph neural networks may learn node representations by a recursive message passing scheme where nodes aggregate information from their neighbors iteratively. Then, taking the graph classification task as an example, GNNs may use pooling methods to derive graph-level representations. Different GNN architectures mainly differ in their message passing mechanism, i.e., how to exchange information, to adapt to the demands of different graph scenarios.
- Graph neural architecture search may be utilized for automatically designing GNN architectures for various graph tasks.
- GNAS Graph neural architecture search
- an improved graph neural architecture search approach under distribution shifts is provided.
- Such a graph neural architecture search approach may be able to capture key information on graphs with widely varying distributions under the out-of-distribution settings through tailoring a unique GNN architecture for each graph instance.
- a self-supervised disentangled graph encoder is designed, which may project graphs into a disentangled latent space, where each disentangled factor in the space is trained by the supervised task and corresponding self-supervised learning task simultaneously.
- This design is able to capture the key information hidden in graphs in a more controllable manner via the self-supervised disentangled graph representation, thus improving the ability of the representations to generalize under distribution shifts.
- architecture customization with prototype is adopted to tailor specialized GNN architectures for graphs based on the similarities of their representations with prototypes vectors in the latent space, where each prototype vector corresponds to one different operation.
- a graph space is denoted as a label space is denoted as a training graph dataset is denoted as the corresponding training label set is denoted as a test graph dataset is denoted as and the corresponding test label set is denoted as
- the goal of GNAS under distribution shifts is to design a model using G tr and Y tr which works well on G te and Y te under the assumption that P (G tr , Y tr ) ⁇ P (G te , Y te ) , where P (G tr , Y tr ) denotes the probability distribution of training graph dataset, and P (G te , Y te ) denotes the probability distribution of test graph dataset, i.e.,
- GNNs for graph machine learning.
- a typical GNN consists of two parts: an architecture and learnable weights where and denote the architecture space and the weight space, respectively. Therefore, GNNs may be denoted as the following mapping function
- GCN graph convolutional network
- GAT graph attention network
- GIN graph isomorphism network
- SAGE graph sample and aggreate
- MLP multi-layer perceptron
- a pooling layer may also be fixed at the end of the GNN architecture as the standard global mean pooling.
- a GNN architecture may be customized for each graph.
- the disclosed GNAS method is more flexible and can better handle test graphs under distribution shift since it is known that different GNN architectures suit different graphs. Therefore, it is needed to learn an architecture mapping function and a weight mapping function so that these functions can automatically generate the optimal GNN for different graphs, including the architecture and its weights. Since the architecture only depends on the graph, the weight mapping function can be further simplified as Therefore, Eq. (1) may be transformed into the following objective function:
- FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
- the GNAS model 200 comprises three cascaded modules, i.e., a self-supervised disentangled graph encoder module 210, an architecture customization module 220 with prototype strategy, and a customized super-network module 230, to tailor a unique GNN architecture for each graph instance, thus enabling the ability of the model 200 in dealing with generalization under distribution shifts with non-I. I. D. settings.
- the graph encoder module 210 may capture diverse graph structures by a self-supervised and a supervised loss.
- the architecture customization module 220 may tailor the most suitable GNN architecture based on the learned graph representation.
- the customized super-network module 230 may enable efficient training by weight sharing.
- Each of these modules will be described in detail below.
- graphs g 1 , g 2 , and g 3 with different structures are input into the graph encoder module 210. It can be understood that much more input graphs may be input into the graph encoder module 210 during the training stage. These input graphs may have diverse graph structures from different distributions. To capture such diverse graph structures, the graph encoder module 210 may learn low-dimensional representations of graphs. In an embodiment, K GNNs may be adopted to learn K-chunk graph representations:
- both graph supervised learning task and self-supervised learning task may be used simultaneously.
- the downstream target graph task naturally provides supervision signals for learning the graph encoder module 210. Therefore, a classification layer may be placed after the obtained graph representation to get the prediction for the graph classification task.
- the graph representation for g i may be denoted as h i .
- the supervised learning loss is as follows:
- Graph self-supervised learning aims to learn informative graph representation through pretext tasks, which has shown several advantages including reducing label reliance, enhancing robustness, and model generalization ability. Therefore, graph SSL may also be used to complement the supervised learning task.
- an SSL auxiliary task may be set by generating pseudo labels from graphs structures, and the pseudo labels may be used as extra supervision signals.
- different pseudo labels may be adopted for different chunks of the disentangled GNN, so that the disentangled graph encoder 210 can capture different factors of the graph structure.
- the graph encoder 210 may focus on the degree distribution of graphs as a representative and explainable structural feature, while it is straightforward to generalize to other graph structures.
- the pseudo labels may be generated by calculating the ratio of nodes that exactly have degree k. Then, the SSL objective function may be formulated as:
- the pseudo-label may be obtained by adopting a regression function, such as, a linear layer followed by an activation function, on the k-th chunk of the graph representation h i .
- the last chunk may be left without SSL tasks to allow more flexibility in learning the disentangled graph representations.
- the architecture customization module 220 maps the representations into different tailored GNN architectures with prototype strategy.
- the probability of choosing an operation o in the i-th layer of a searched architecture may be denoted as where i ⁇ ⁇ 1, 2, ..., N ⁇ , N is the number of layers, and The probability may be calculated as:
- a prototype vector may be learned for each candidate operation, and operations may be selected based on the preferences of the graph, i.e., if the graph representation has a large projection on a prototype vector, its corresponding operation is more likely to be selected. Besides, by using the exponential function, the length of h can decide the shape of i.e., the larger
- the following regularizer may be adopted based on cosine distances between vectors to keep the diversity of operations:
- the architecture customization with prototype in module 220 can tailor the most suitable GNN architectures for different input graphs based on the graph representations. Besides GNN architectures, the weights of the architectures also need to be learned.
- a super-network module 230 may be adopted to obtain the weights of architectures. Specifically, in the super-network, all possible operations are jointly considered by mixing different operations into a continuous space as follows:
- the architecture is discretized at the end of the search phase by choosing the operation with the largest for all the layers. Then, the weights of the selected architecture are retrained. However, retraining is infeasible in the disclosed novel GNAS modle, since test graphs can be tailored with different architectures from those for training graphs. Therefore, the weights of the super-network may be directly used as the weights in the searched architecture. Besides, The continuous architecture are kept without the discretization step, enhancing flexibility on architecture customization and simplifying the optimization strategy. Moreover, the customized super-network may serve as a strong ensemble model with being the ensemble weights, which may also benefit out-of-distribution generalization.
- the GNAS model 200 may be optimized by using gradient descend methods based on a following loss function:
- the classification loss there are two groups of loss functions: the classification loss and the regularizer.
- the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, leading to unstable architecture customization. Therefore, a larger weight may be set for the regularizer initially, i.e., a smaller initial ⁇ in Eq. (2) , to force the self-supervised disentangled graph encoder to learn through its supervised learning and SSL tasks.
- ⁇ initial weight
- ⁇ t is the hyper-parameter value at the t-th epoch
- ⁇ is a small constant
- the overall training procedure is shown as below.
- the most suitable GNN architecture with its parameters may be directly generated for the test graphs without retraining.
- different learning rates may be used for the three modules 210-230.
- the learning rate of the self-supervised disentangled encoder module may be 1.5e-4.
- the learning rate of the architecture customization module may be 1e-4.
- the training procedure of these two modules are consine annealing scheduled.
- the learning rate of the customized super-network module may be 2e-3.
- ⁇ may be initialized as 0.07 and increased to 0.5 linearly.
- ⁇ 1 may be set as 0.05
- ⁇ 2 may be set as 0.002.
- the number of layers may be set as 2 or 3.
- the disclosed GNAS approach is not limited to such settings.
- FIG. 3 illustrates a flow chart of a method 300 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- the method 300 may be used to design a graph neural network for a graph classification task by graph neural architecture search under distribution shifts between training graphs and test graphs.
- Method 300 may also be used for other graph machine learning tasks.
- the method 300 may be a computer-implemented method.
- the method 300 may obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space.
- the input graph may be an assembly line layout graph.
- the input graph may also be a circuit design layout graph, such as printed circuit board design or chip design.
- GNNs designed by method 300 may be used to make classification on the input assembly line layout graphs or circuit design layout graphs under distribution shifts. For example, the GNNs may classify an input assembly line layout graph or circuit design layout graph as whether the assembly line layout is reasonable or whether the circuit is efficient, etc.
- the graph encoder module may be a self-supervised disentangled graph encoder, which may characterize invariant factors hidden in diverse graph structures.
- the graph encoder module may calculate the graph representation of the input graph using Eq. (3) and Eq. (4) .
- the graph encoder module may be trained by a supervised learning task and a self-supervised learning task.
- the graph encoder module may calculate and using Eq. (5) and Eq. (6)
- the method 300 may obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture.
- the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation.
- the probability of the operation may be calculated using Eq. (7) for each layer of the searched GNN architecture.
- the candidate operation set may comprise at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
- the candidate operation set may comprise other GNN layers, such as GraphConv.
- a pooling layer may be fixed at the end of the searched GNN architecture as a global mean pooling, or GIN may be fixed at the first layer for the Spurious-Motif dataset.
- the method 300 may obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space, such as, using Eq. (9) .
- the weights of the super-network may be directly used as the weights in the searched GNN architecture.
- the weights of the super-network may be shared among different GNN architectures, and thus the training will be much more efficient compared to tranining weights for different architecutres separately.
- the trained weights may be directly obtained by the super-network for the searched GNN architecture.
- the method 300 may comprise optimizing the GNN based on a main loss function and a regularizer, which may comprise repeating blocks 310-330 until converge.
- the main loss function may be a supervision loss of the searched GNN architecture such as in Eq. (2)
- the regularizer may be based on a supervised learning loss function such as in Eq. (5) and a self-supervised learning objective function such as in Eq. (6) for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations such as in Eq. (8) .
- the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, a smaller initial weight for the main loss function may be set, and the weight may be gradually insreased through the training procedure.
- FIG. 4 illustrates a block diagram of an apparatus 400 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- the apparatus 400 may comprise a graph encoder module 410, an architecture customization module 420, and a super-network module 430.
- the graph encoder module 410 may be used for obtaining a graph representation of an input graph in a disentangled latent space.
- the architecture customization module 420 may be used for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation, such as in Eq. (7) .
- the super-network module 430 may be used for obtaining weights for the searched GNN architecture. Different operations in the candidate operation set are mixed into a continuous space in the super-network module 430. In the training stage, the operations performed by the graph encoder module 410, the architecture customization module 420, and the super-network module 430 may be repeated to optimize the GNN based on a main loss function and a regularizer using Eq. (2) .
- the apparatus 400 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs.
- the graph encoder module 410 may obtain a graph representation of an input graph with the parameters learned for a specific graph task.
- the architecture customization module 420 may obtain different GNN architectures for different input graphs based on a probability or weight of an operation in each layer.
- the super-network module 430 may obtain weights learned in the training stage and share these weights for different GNN architectures.
- FIG. 5 illustrates a block diagram of an apparatus 500 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
- the apparatus 500 may comprise a memory 510 and at least one processor 520.
- the processor 520 may be coupled to the memory 510 and configured to perform the method 300 described above with reference to FIG. 3.
- the processor 520 may be a general-purpose processor, or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- the memory 510 may store the input data, output data, data generated by processor 520, and/or instructions executed by processor 520.
- the apparatus 500 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs in accordance with the present disclosure.
- a computer program product for computer vision processing may comprise processor executable computer code for performing the method 300 described above with reference to FIG. 3.
- a computer readable medium may store computer code for computer vision processing, the computer code when executed by a processor may cause the processor to perform the method 300 described above with reference to FIG. 3.
- Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Any connection may be properly termed as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A method for designing network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space (310); obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation (320); and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space (330).
Description
The present disclosure relates generally to artificial intelligence technical field, and more particularly, to graph neural architecture search technology.
Graph-structured data has attracted lots of attention in recent years for its flexible representation ability in various domains. Graph neural networks (GNNs) models have been proposed and achieved great successes in many graph tasks. To save human efforts on designing GNN architectures for different tasks and automatically design more powerful GNNs, graph neural architecture search (GNAS) has been utilized to search for an optimal GNN architecture. These automatically designed architectures have achieved competitive or better performances compared with manually designed GNNs on datasets with the same distributions under the independently and identically distributed (I. I. D. ) assumption, i.e., the training and test graphs are independently sampled from the identical distribution.
However, distribution shifts are ubiquitous and inevitable in real-world graph applications where there exist a large number of unforeseen and uncontrollable hidden factors. The existing GNAS approaches under the I. I. D. assumption only search a single fixed GNN architecture based on the training set before directly applying the selected architecture on the test set, failing to deal with varying distribution shifts under the out-of-distribution setting. Because the single GNN architecture discovered by existing methods may overfit the distributions of the training graph data, it may fail to make accurate predictions on test graph data with various distributions different from the training graph data.
Therefore, there exists a need for an improved method and apparatus for graph neural architecture search under distribution shifts between training graph data and test graph data.
SUMMARY
The following presents a simplified summary of one or more aspects according to the present disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect of the disclosure, a method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus comprises: a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space; an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus may comprise a memory and at least one processor coupled to the memory. The at least one processor may be configured to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, a computer readable medium storing computer code for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer code, when executed by a processor, may cause the processor to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, a computer program product for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer program product may comprise processor executable computer code for: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
Other aspects or variations of the disclosure will become apparent by consideration of the following detailed description and accompanying drawings.
The following figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the methods and structures disclosed herein may be implemented without departing from the spirit and principles of the disclosure described herein.
FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 3 illustrates a flow chart of a method for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 4 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 5 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
Before any embodiments of the present disclosure are explained in detail, it is to be understood that the disclosure is not limited in its application to the details of construction and the arrangement of features set forth in the following description. The disclosure is capable of other embodiments and of being practiced or of being carried out in various ways.
FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure. A graph is a non-linear data structure consisting of nodes and edges. The nodes may also be referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. For example, as shown in FIG. 1, a simple graph 100 consists of nodes n1-n7 and edges e1-e6, edge e1 connects node n1 and n3, edge e2 connects node n2 and n4, and so on. Graph-structured data may be used in varioius domains, including social networks, information networks, biological networks, infrastructure networks, etc. which can not be structured in Euclidean space. In an example, graph 100 may be an assembly line layout graph, each node in the graph may represent a workshop, and each edge may represent the association between two workshops. In another exmaple, graph 100 may be a circuit design layout graph, each node in the graph may represent a electron device of circuit module, and each edge may represent the association between two electron devices or circuit modules.
Graph neural networks (GNNs) may learn node representations by a recursive message passing scheme where nodes aggregate information from their neighbors iteratively. Then, taking the graph classification task as an example, GNNs may use pooling methods to derive graph-level representations. Different GNN architectures mainly differ in their message passing mechanism, i.e., how to exchange information, to adapt to the demands of different graph scenarios. Graph neural architecture search (GNAS) may be utilized for automatically designing GNN architectures for various graph tasks. However, when there is a distribution shift between training and test graphs, the existing approaches fail to deal with the problem of adapting to unknown test graph structures since they only search for a fixed architecture for all graphs. Taking drug discovery as an example, there exists only a limited amount of training data that can be obtained for experiments, and the interaction mechanism varies greatly for different molecules due to their complex chemical properties. Therefore, the GNN models designed for drug discovery frequently have to be tested on data with distribution shifts.
In this disclosure, an improved graph neural architecture search approach under distribution shifts is provided. Such a graph neural architecture search approach may be able to capture key information on graphs with widely varying distributions under the out-of-distribution settings through tailoring a unique GNN architecture for each graph instance.
Specifically, a self-supervised disentangled graph encoder is designed, which may project graphs into a disentangled latent space, where each disentangled factor in the space is trained by the supervised task and corresponding self-supervised learning task simultaneously. This design is able to capture the key information hidden in graphs in a more controllable manner via the self-supervised disentangled graph representation, thus improving the ability of the representations to generalize under distribution shifts. Then, architecture customization with prototype is adopted to tailor specialized GNN architectures for graphs based on the similarities of their representations with prototypes vectors in the latent space, where each prototype vector corresponds to one different operation. Next, a customized super-network with differentiable weights on the mixture of different operations is designed, which has great flexibility to ensemble different combinations of operations and enable the disclosed GNAS model to be easily optimized in an end-to-end fashion through gradient based methods. The designs of disentangled graph representations and learnable prototype-operation mapping are able to enhance the generalization ability of the disclosed GNAS model under distribution shifts. Extensive experiments on both synthetic and real-world graph datasets also shows the superiority of the disclosed GNAS model over existing GNAS baselines.
For the purpose of ease of description, a graph space is denoted as
a label space is denoted as
a training graph dataset is denoted as
the corresponding training label set is denoted as
a test graph dataset is denoted as
and the corresponding test label set is denoted as
The goal of GNAS under distribution shifts is to design a model
using G
tr and Y
tr which works well on G
te and Y
te under the assumption that P (G
tr, Y
tr) ≠P (G
te, Y
te) , where P (G
tr, Y
tr) denotes the probability distribution of training graph dataset, and P (G
te, Y
te) denotes the probability distribution of test graph dataset, i.e.,
where
is a loss function. In a common yet challenging setting, neither Y
te nor unlabeled G
te is available in the training phase. F may be GNNs for graph machine learning. A typical GNN consists of two parts: an architecture
and learnable weights
where
and
denote the architecture space and the weight space, respectively. Therefore, GNNs may be denoted as the following mapping function
This disclosure mostly focuses on different GNN layers, i.e., message-passing functions, for searching GNN architecture. Therefore, a search space of layer-by-layer architectures without sophisticated connections such as residual or jumping connections is adopted, though the disclosed method can be easily generalized. In an embodiment, five widely used GNN layers may be used as an operation candidate set
including graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , graph sample and aggreate (SAGE) , and GraphConv. Besides, multi-layer perceptron (MLP) which does not consider graph structures may also be adopted. A pooling layer may also be fixed at the end of the GNN architecture as the standard global mean pooling.
In this disclosure, instead of using a fixed GNN architecture for all graphs as in the existing GNAS methods, a GNN architecture may be customized for each graph. In this way, the disclosed GNAS method is more flexible and can better handle test graphs under distribution shift since it is known that different GNN architectures suit different graphs. Therefore, it is needed to learn an architecture mapping function
and a weight mapping function
so that these functions can automatically generate the optimal GNN for different graphs, including the architecture and its weights. Since the architecture only depends on the graph, the weight mapping function can be further simplified as
Therefore, Eq. (1) may be transformed into the following objective function:
where
is the regularizer and γ is a hyper-parameter representing the weight of the main loss function. Specific embodiments for properly designing Φ
A, Φ
w, and
will be described in details in connection with FIG. 2 below, so that the disclosed GNAS method can generalize under distribution shifts.
FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 2, the GNAS model 200 comprises three cascaded modules, i.e., a self-supervised disentangled graph encoder module 210, an architecture customization module 220 with prototype strategy, and a customized super-network module 230, to tailor a unique GNN architecture for each graph instance, thus enabling the ability of the model 200 in dealing with generalization under distribution shifts with non-I. I. D. settings. The graph encoder module 210 may capture diverse graph structures by a self-supervised and a supervised loss. Then, the architecture customization module 220 may tailor the most suitable GNN architecture based on the learned graph representation. Finally, the customized super-network module 230 may enable efficient training by weight sharing. Each of these modules will be described in detail below.
As shown in FIG. 2, graphs g
1, g
2, and g
3 with different structures are input into the graph encoder module 210. It can be understood that much more input graphs may be input into the graph encoder module 210 during the training stage. These input graphs may have diverse graph structures from different distributions. To capture such diverse graph structures, the graph encoder module 210 may learn low-dimensional representations of graphs. In an embodiment, K GNNs may be adopted to learn K-chunk graph representations:
where
is the k-th chunk of the node representation at the l-th layer, A is the adjacent matrix of the graph, and || represents concatenation. Different latent factors of the input graphs may be captured by using these disentangled GNN layers. Then, a readout layer may be adopted to aggregate node-level representations into a graph-level representation:
h = Readout (H
(L) ) . (4)
To learn the parameters of the self-supervised disentangled graph encoder module 210, both graph supervised learning task and self-supervised learning task may be used simultaneously.
The downstream target graph task naturally provides supervision signals for learning the graph encoder module 210. Therefore, a classification layer may be placed after the obtained graph representation to get the prediction for the graph classification task. The graph representation for g
i may be denoted as h
i. The supervised learning loss is as follows:
Graph self-supervised learning (SSL) aims to learn informative graph representation through pretext tasks, which has shown several advantages including reducing label reliance, enhancing robustness, and model generalization ability. Therefore, graph SSL may also be used to complement the supervised learning task. Specifically, an SSL auxiliary task may be set by generating pseudo labels from graphs structures, and the pseudo labels may be used as extra supervision signals. Besides, different pseudo labels may be adopted for different chunks of the disentangled GNN, so that the disentangled graph encoder 210 can capture different factors of the graph structure. In an embodiment, the graph encoder 210 may focus on the degree distribution of graphs as a representative and explainable structural feature, while it is straightforward to generalize to other graph structures. Specifically, for the k-th GNN chunk, the pseudo labels may be generated by calculating the ratio of nodes that exactly have degree k. Then, the SSL objective function may be formulated as:
where
is the pseudo-label and
may be obtained by adopting a regression function, such as, a linear layer followed by an activation function, on the k-th chunk of the graph representation h
i. In an embodiment, the last chunk may be left without SSL tasks to allow more flexibility in learning the disentangled graph representations.
As shown in FIG. 2, after obtaining the graph representations h
1, h
2 and h
3, they may be input into the architecture customization module 220, which maps the representations into different tailored GNN architectures with prototype strategy. Specifically, the probability of choosing an operation o in the i-th layer of a searched architecture may be denoted as
where i∈ {1, 2, …, N} , N is the number of layers, and
The probability may be calculated as:
where
is a learnable prototype vector representation of the operation o. An l
2 normalization on q may be adopted to ensure numerical stability and fair competition among different operations. In the architecture customization module 220, a prototype vector may be learned for each candidate operation, and operations may be selected based on the preferences of the graph, i.e., if the graph representation has a large projection on a prototype vector, its corresponding operation is more likely to be selected. Besides, by using the exponential function, the length of h can decide the shape of
i.e., the larger ||h||
2, the more likely that
are dominated by a few values, indicating that the graph requires specific operations.
In one embodiment, to avoid the mode collapse problem, i.e., vectors of different operations are similar and therefore become indistinguishable, the following regularizer may be adopted based on cosine distances between vectors to keep the diversity of operations:
The architecture customization with prototype in module 220 can tailor the most suitable GNN architectures for different input graphs based on the graph representations. Besides GNN architectures, the weights of the architectures also need to be learned.
As shown in FIG. 2, a super-network module 230 may be adopted to obtain the weights of architectures. Specifically, in the super-network, all possible operations are jointly considered by mixing different operations into a continuous space as follows:
where x is the input of the i-th layer and f
i (x) is the output. Then, all the weights may be optimized using gradient descend methods. Besides, since weights of different architectures are shared, the training will be much more efficient compared to training weights for different architectures separately.
It should be noted that in most NAS models, the architecture is discretized at the end of the search phase by choosing the operation with the largest
for all the layers. Then, the weights of the selected architecture are retrained. However, retraining is infeasible in the disclosed novel GNAS modle, since test graphs can be tailored with different architectures from those for training graphs. Therefore, the weights of the super-network may be directly used as the weights in the searched architecture. Besides, The continuous architecture are kept without the discretization step, enhancing flexibility on architecture customization and simplifying the optimization strategy. Moreover, the customized super-network may serve as a strong ensemble model with
being the ensemble weights, which may also benefit out-of-distribution generalization.
As shown in FIG. 2, the GNAS model 200 may be optimized by using gradient descend methods based on a following loss function:
where
is the supervision loss of the tailored architectures in Eq. (2) , i.e., the supervision loss of the final prediction given by the super-network module 230,
and
are the supervision loss and self-supervision loss of the self-supervised disentangled graph encoder module 210,
is the cosine distance loss of the architecture customization module 230, β
1 and β
2 are hyper-parameters.
and
are three additional loss functions introduced as the regularizer in Eq. (2) , and γ is the hyper-parameter to control the contribution of the regularizer.
For the overall optimization, there are two groups of loss functions: the classification loss and the regularizer. At an early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, leading to unstable architecture customization. Therefore, a larger weight may be set for the regularizer initially, i.e., a smaller initial γ in Eq. (2) , to force the self-supervised disentangled graph encoder to learn through its supervised learning and SSL tasks. As the training procedure continues, it can gradually focus more on training the architecture customization module and the super-network module by increasing γ as:
γ
t=γ
0+tΔγ
t, (11)
where γ
t is the hyper-parameter value at the t-th epoch, Δγ is a small constant.
The overall training procedure is shown as below. The most suitable GNN architecture with its parameters may be directly generated for the test graphs without retraining.
Input: Training Dataset G
tr and Y
tr, Hyper-parameters γ
0, Δγ
t, β
1, β
2
Initialize all leanable parameters and set γ = γ
0
while Not Converge do
Calculate graph representations h using Eq. (3) and Eq. (4)
Get the parameters from the super-network based on Eq. (9)
Calculate the overall loss in Eq. (2)
Update parameters using gradient descends
Update γ = γ -Δγ
end while
In an embodiment, different learning rates may be used for the three modules 210-230. For example, the learning rate of the self-supervised disentangled encoder module may be 1.5e-4. The learning rate of the architecture customization module may be 1e-4. The training procedure of these two modules are consine annealing scheduled. The learning rate of the customized super-network module may be 2e-3. γ may be initialized as 0.07 and increased to 0.5 linearly. In addition, β
1 may be set as 0.05, and β
2 may be set as 0.002. The number of layers may be set as 2 or 3. The disclosed GNAS approach is not limited to such settings.
FIG. 3 illustrates a flow chart of a method 300 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. In an embodiment, the method 300 may be used to design a graph neural network for a graph classification task by graph neural architecture search under distribution shifts between training graphs and test graphs. Method 300 may also be used for other graph machine learning tasks. In an embodiment, the method 300 may be a computer-implemented method.
In block 310, the method 300 may obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space. The input graph may be an assembly line layout graph. The input graph may also be a circuit design layout graph, such as printed circuit board design or chip design. GNNs designed by method 300 may be used to make classification on the input assembly line layout graphs or circuit design layout graphs under distribution shifts. For example, the GNNs may classify an input assembly line layout graph or circuit design layout graph as whether the assembly line layout is reasonable or whether the circuit is efficient, etc. The graph encoder module may be a self-supervised disentangled graph encoder, which may characterize invariant factors hidden in diverse graph structures. The graph encoder module may calculate the graph representation of the input graph using Eq. (3) and Eq. (4) . The graph encoder module may be trained by a supervised learning task and a self-supervised learning task. The graph encoder module may calculate
and
using Eq. (5) and Eq. (6) .
In block 320, the method 300 may obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation. In an embodiment, the probability of the operation may be calculated using Eq. (7) for each layer of the searched GNN architecture. The candidate operation set may comprise at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) . The candidate operation set may comprise other GNN layers, such as GraphConv. Depnding on different graph tasks, base graph shapes, and/or datasets, MLP which does not consider graph structures may be adopted, a pooling layer may be fixed at the end of the searched GNN architecture as a global mean pooling, or GIN may be fixed at the first layer for the Spurious-Motif dataset.
In block 330, the method 300 may obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space, such as, using Eq. (9) . In an embodiment, the weights of the super-network may be directly used as the weights in the searched GNN architecture. The weights of the super-network may be shared among different GNN architectures, and thus the training will be much more efficient compared to tranining weights for different architecutres separately. Then, in the test stage, the trained weights may be directly obtained by the super-network for the searched GNN architecture.
Although not shown in FIG. 3, in the training stage, the method 300 may comprise optimizing the GNN based on a main loss function and a regularizer, which may comprise repeating blocks 310-330 until converge. The main loss function may be a supervision loss of the searched GNN architecture such as in Eq. (2) , and the regularizer may be based on a supervised learning loss function such as in Eq. (5) and a self-supervised learning objective function such as in Eq. (6) for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations such as in Eq. (8) . In addition, since at the early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, a smaller initial weight for the main loss function may be set, and the weight may be gradually insreased through the training procedure.
FIG. 4 illustrates a block diagram of an apparatus 400 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 4, the apparatus 400 may comprise a graph encoder module 410, an architecture customization module 420, and a super-network module 430.
The graph encoder module 410 may be used for obtaining a graph representation of an input graph in a disentangled latent space. The architecture customization module 420 may be used for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation, such as in Eq. (7) . The super-network module 430 may be used for obtaining weights for the searched GNN architecture. Different operations in the candidate operation set are mixed into a continuous space in the super-network module 430. In the training stage, the operations performed by the graph encoder module 410, the architecture customization module 420, and the super-network module 430 may be repeated to optimize the GNN based on a main loss function and a regularizer using Eq. (2) .
The apparatus 400 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs. For example, in the test stage, the graph encoder module 410 may obtain a graph representation of an input graph with the parameters learned for a specific graph task. The architecture customization module 420 may obtain different GNN architectures for different input graphs based on a probability or weight of an operation in each layer. The super-network module 430 may obtain weights learned in the training stage and share these weights for different GNN architectures.
FIG. 5 illustrates a block diagram of an apparatus 500 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. The apparatus 500 may comprise a memory 510 and at least one processor 520. The processor 520 may be coupled to the memory 510 and configured to perform the method 300 described above with reference to FIG. 3. The processor 520 may be a general-purpose processor, or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The memory 510 may store the input data, output data, data generated by processor 520, and/or instructions executed by processor 520. The apparatus 500 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs in accordance with the present disclosure.
The various operations, modules, and networks described in connection with the disclosure herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. According an embodiment of the disclosure, a computer program product for computer vision processing may comprise processor executable computer code for performing the method 300 described above with reference to FIG. 3. According to another embodiment of the disclosure, a computer readable medium may store computer code for computer vision processing, the computer code when executed by a processor may cause the processor to perform the method 300 described above with reference to FIG. 3. Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Any connection may be properly termed as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.
The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the various embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the various embodiments. Thus, the claims are not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.
Claims (18)
- A method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space;obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; andobtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
- The method of claim 1, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the designed GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
- The method of claim 1, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
- The method of claim 3, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
- The method of claim 4, wherein a weight for the main loss function is gradually insreased through a training procedure.
- The method of claim 1, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
- The method of claim 1, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
- An apparatus for a graph neural network (GNN) designed by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space;an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; anda super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
- The apparatus of claim 8, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
- The apparatus of claim 8, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
- The apparatus of claim 10, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
- The apparatus of claim 11, wherein a weight for the main loss function is gradually insreased through a training procedure.
- The apparatus of claim 8, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
- The apparatus of claim 8, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
- An apparatus for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:a memory; andat least one processor coupled to the memory and configured to perform the method of one of claims 1-7.
- A computer readable medium, storing computer code for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, the computer code when executed by a processor, causing the processor to perform the method of one of claims 1-7.
- A computer program product for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising: processor executable computer code for performing the method of one of claims 1-7.
- A computer-implemented method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising steps of the method of one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/105600 WO2024011475A1 (en) | 2022-07-14 | 2022-07-14 | Method and apparatus for graph neural architecture search under distribution shift |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/105600 WO2024011475A1 (en) | 2022-07-14 | 2022-07-14 | Method and apparatus for graph neural architecture search under distribution shift |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024011475A1 true WO2024011475A1 (en) | 2024-01-18 |
Family
ID=89535094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/105600 WO2024011475A1 (en) | 2022-07-14 | 2022-07-14 | Method and apparatus for graph neural architecture search under distribution shift |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024011475A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688504A (en) * | 2024-02-04 | 2024-03-12 | 西华大学 | Internet of things abnormality detection method and device based on graph structure learning |
-
2022
- 2022-07-14 WO PCT/CN2022/105600 patent/WO2024011475A1/en unknown
Non-Patent Citations (5)
Title |
---|
CHEN ZHENGYU; XIAO TENG; KUANG KUN: "BA-GNN: On Learning Bias-Aware Graph Neural Network", 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), IEEE, 9 May 2022 (2022-05-09), pages 3012 - 3024, XP034159984, DOI: 10.1109/ICDE53745.2022.00271 * |
DING MUCONG, KONG KEZHI, CHEN JIUHAI, KIRCHENBAUER JOHN, GOLDBLUM MICAH, WIPF DAVID, HUANG FURONG, GOLDSTEIN TOM: "A Closer Look at Distribution Shifts and Out-of-Distribution Generalization on Graphs", WORKSHOP ON DISTRIBUTION SHIFTS, 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 29 September 2021 (2021-09-29), pages 1 - 15, XP093127209 * |
QITIAN WU; HENGRUI ZHANG; JUNCHI YAN; DAVID WIPF: "Handling Distribution Shifts on Graphs: An Invariance Perspective", ARXIV.ORG, 7 May 2022 (2022-05-07), XP091217381 * |
TIANXIANG ZHAO; DONGSHENG LUO; XIANG ZHANG; SUHANG WANG: "On Consistency in Graph Neural Network Interpretation", ARXIV.ORG, 27 May 2022 (2022-05-27), XP091233479 * |
ZHU QI, PONOMAREVA NATALIA, HAN JIAWEI, PEROZZI BRYAN: "Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data", 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 22 May 2021 (2021-05-22), pages 1 - 13, XP093127219, ISSN: 2331-8422 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688504A (en) * | 2024-02-04 | 2024-03-12 | 西华大学 | Internet of things abnormality detection method and device based on graph structure learning |
CN117688504B (en) * | 2024-02-04 | 2024-04-16 | 西华大学 | Internet of things abnormality detection method and device based on graph structure learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liang et al. | Darts+: Improved differentiable architecture search with early stopping | |
US20220036194A1 (en) | Deep neural network optimization system for machine learning model scaling | |
Liu et al. | Progressive neural architecture search | |
EP4170553A1 (en) | Framework for optimization of machine learning architectures | |
Jean et al. | Semi-supervised deep kernel learning: Regression with unlabeled data by minimizing predictive variance | |
CN111382868B (en) | Neural network structure searching method and device | |
US20220036123A1 (en) | Machine learning model scaling system with energy efficient network data transfer for power aware hardware | |
US20230082597A1 (en) | Neural Network Construction Method and System | |
Ma et al. | Adaptive-step graph meta-learner for few-shot graph classification | |
He et al. | The local elasticity of neural networks | |
Ru et al. | Neural architecture generator optimization | |
Jin et al. | Rc-darts: Resource constrained differentiable architecture search | |
US11586924B2 (en) | Determining layer ranks for compression of deep networks | |
US20210056357A1 (en) | Systems and methods for implementing flexible, input-adaptive deep learning neural networks | |
US20210350203A1 (en) | Neural architecture search based optimized dnn model generation for execution of tasks in electronic device | |
Wang et al. | Tackling instance-dependent label noise via a universal probabilistic model | |
CN113168568A (en) | System and method for active transfer learning with deep characterization | |
CN116594748A (en) | Model customization processing method, device, equipment and medium for task | |
CN111080551A (en) | Multi-label image completion method based on depth convolution characteristics and semantic neighbor | |
WO2024011475A1 (en) | Method and apparatus for graph neural architecture search under distribution shift | |
CN110473195A (en) | It is a kind of can automatic customization medicine lesion detection framework and method | |
Li et al. | AutoDet: pyramid network architecture search for object detection | |
Li et al. | LC-NAS: Latency constrained neural architecture search for point cloud networks | |
Sood et al. | Neunets: An automated synthesis engine for neural network design | |
Zhu et al. | When contrastive learning meets active learning: A novel graph active learning paradigm with self-supervision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22950601 Country of ref document: EP Kind code of ref document: A1 |