WO2024011475A1

WO2024011475A1 - Method and apparatus for graph neural architecture search under distribution shift

Info

Publication number: WO2024011475A1
Application number: PCT/CN2022/105600
Authority: WO
Inventors: Xin Wang; Wenwu Zhu; Hong Chen; Ze CHENG
Original assignee: Robert Bosch Gmbh; Tsinghua University
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2024-01-18

Abstract

A method for designing network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space (310); obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation (320); and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space (330).

Description

METHOD AND APPARATUS FOR GRAPH NEURAL ARCHITECTURE SEARCH UNDER DISTRIBUTION SHIFT

FIELD

The present disclosure relates generally to artificial intelligence technical field, and more particularly, to graph neural architecture search technology.

BACKGROUND

Graph-structured data has attracted lots of attention in recent years for its flexible representation ability in various domains. Graph neural networks (GNNs) models have been proposed and achieved great successes in many graph tasks. To save human efforts on designing GNN architectures for different tasks and automatically design more powerful GNNs, graph neural architecture search (GNAS) has been utilized to search for an optimal GNN architecture. These automatically designed architectures have achieved competitive or better performances compared with manually designed GNNs on datasets with the same distributions under the independently and identically distributed (I. I. D. ) assumption, i.e., the training and test graphs are independently sampled from the identical distribution.

However, distribution shifts are ubiquitous and inevitable in real-world graph applications where there exist a large number of unforeseen and uncontrollable hidden factors. The existing GNAS approaches under the I. I. D. assumption only search a single fixed GNN architecture based on the training set before directly applying the selected architecture on the test set, failing to deal with varying distribution shifts under the out-of-distribution setting. Because the single GNN architecture discovered by existing methods may overfit the distributions of the training graph data, it may fail to make accurate predictions on test graph data with various distributions different from the training graph data.

Therefore, there exists a need for an improved method and apparatus for graph neural architecture search under distribution shifts between training graph data and test graph data.

SUMMARY

The following presents a simplified summary of one or more aspects according to the present disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

In an aspect of the disclosure, a method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.

In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus comprises: a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space; an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.

In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus may comprise a memory and at least one processor coupled to the memory. The at least one processor may be configured to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.

In another aspect of the disclosure, a computer readable medium storing computer code for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer code, when executed by a processor, may cause the processor to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.

In another aspect of the disclosure, a computer program product for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer program product may comprise processor executable computer code for: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.

Other aspects or variations of the disclosure will become apparent by consideration of the following detailed description and accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The following figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the methods and structures disclosed herein may be implemented without departing from the spirit and principles of the disclosure described herein.

FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.

FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.

FIG. 3 illustrates a flow chart of a method for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.

FIG. 4 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.

FIG. 5 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.

DETAILED DESCRIPTION

Before any embodiments of the present disclosure are explained in detail, it is to be understood that the disclosure is not limited in its application to the details of construction and the arrangement of features set forth in the following description. The disclosure is capable of other embodiments and of being practiced or of being carried out in various ways.

FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure. A graph is a non-linear data structure consisting of nodes and edges. The nodes may also be referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. For example, as shown in FIG. 1, a simple graph 100 consists of nodes n1-n7 and edges e1-e6, edge e1 connects node n1 and n3, edge e2 connects node n2 and n4, and so on. Graph-structured data may be used in varioius domains, including social networks, information networks, biological networks, infrastructure networks, etc. which can not be structured in Euclidean space. In an example, graph 100 may be an assembly line layout graph, each node in the graph may represent a workshop, and each edge may represent the association between two workshops. In another exmaple, graph 100 may be a circuit design layout graph, each node in the graph may represent a electron device of circuit module, and each edge may represent the association between two electron devices or circuit modules.

Graph neural networks (GNNs) may learn node representations by a recursive message passing scheme where nodes aggregate information from their neighbors iteratively. Then, taking the graph classification task as an example, GNNs may use pooling methods to derive graph-level representations. Different GNN architectures mainly differ in their message passing mechanism, i.e., how to exchange information, to adapt to the demands of different graph scenarios. Graph neural architecture search (GNAS) may be utilized for automatically designing GNN architectures for various graph tasks. However, when there is a distribution shift between training and test graphs, the existing approaches fail to deal with the problem of adapting to unknown test graph structures since they only search for a fixed architecture for all graphs. Taking drug discovery as an example, there exists only a limited amount of training data that can be obtained for experiments, and the interaction mechanism varies greatly for different molecules due to their complex chemical properties. Therefore, the GNN models designed for drug discovery frequently have to be tested on data with distribution shifts.

In this disclosure, an improved graph neural architecture search approach under distribution shifts is provided. Such a graph neural architecture search approach may be able to capture key information on graphs with widely varying distributions under the out-of-distribution settings through tailoring a unique GNN architecture for each graph instance.

Specifically, a self-supervised disentangled graph encoder is designed, which may project graphs into a disentangled latent space, where each disentangled factor in the space is trained by the supervised task and corresponding self-supervised learning task simultaneously. This design is able to capture the key information hidden in graphs in a more controllable manner via the self-supervised disentangled graph representation, thus improving the ability of the representations to generalize under distribution shifts. Then, architecture customization with prototype is adopted to tailor specialized GNN architectures for graphs based on the similarities of their representations with prototypes vectors in the latent space, where each prototype vector corresponds to one different operation. Next, a customized super-network with differentiable weights on the mixture of different operations is designed, which has great flexibility to ensemble different combinations of operations and enable the disclosed GNAS model to be easily optimized in an end-to-end fashion through gradient based methods. The designs of disentangled graph representations and learnable prototype-operation mapping are able to enhance the generalization ability of the disclosed GNAS model under distribution shifts. Extensive experiments on both synthetic and real-world graph datasets also shows the superiority of the disclosed GNAS model over existing GNAS baselines.

For the purpose of ease of description, a graph space is denoted as

a label space is denoted as

a training graph dataset is denoted as

the corresponding training label set is denoted as

a test graph dataset is denoted as

and the corresponding test label set is denoted as

The goal of GNAS under distribution shifts is to design a model

using G _tr and Y _tr which works well on G _te and Y _te under the assumption that P (G _tr, Y _tr) ≠P (G _te, Y _te) , where P (G _tr, Y _tr) denotes the probability distribution of training graph dataset, and P (G _te, Y _te) denotes the probability distribution of test graph dataset, i.e.,

where

is a loss function. In a common yet challenging setting, neither Y _te nor unlabeled G _te is available in the training phase. F may be GNNs for graph machine learning. A typical GNN consists of two parts: an architecture

and learnable weights

where

and

denote the architecture space and the weight space, respectively. Therefore, GNNs may be denoted as the following mapping function

This disclosure mostly focuses on different GNN layers, i.e., message-passing functions, for searching GNN architecture. Therefore, a search space of layer-by-layer architectures without sophisticated connections such as residual or jumping connections is adopted, though the disclosed method can be easily generalized. In an embodiment, five widely used GNN layers may be used as an operation candidate set

including graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , graph sample and aggreate (SAGE) , and GraphConv. Besides, multi-layer perceptron (MLP) which does not consider graph structures may also be adopted. A pooling layer may also be fixed at the end of the GNN architecture as the standard global mean pooling.

In this disclosure, instead of using a fixed GNN architecture for all graphs as in the existing GNAS methods, a GNN architecture may be customized for each graph. In this way, the disclosed GNAS method is more flexible and can better handle test graphs under distribution shift since it is known that different GNN architectures suit different graphs. Therefore, it is needed to learn an architecture mapping function

and a weight mapping function

so that these functions can automatically generate the optimal GNN for different graphs, including the architecture and its weights. Since the architecture only depends on the graph, the weight mapping function can be further simplified as

Therefore, Eq. (1) may be transformed into the following objective function:

where

is the regularizer and γ is a hyper-parameter representing the weight of the main loss function. Specific embodiments for properly designing Φ _A, Φ _w, and

will be described in details in connection with FIG. 2 below, so that the disclosed GNAS method can generalize under distribution shifts.

FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 2, the GNAS model 200 comprises three cascaded modules, i.e., a self-supervised disentangled graph encoder module 210, an architecture customization module 220 with prototype strategy, and a customized super-network module 230, to tailor a unique GNN architecture for each graph instance, thus enabling the ability of the model 200 in dealing with generalization under distribution shifts with non-I. I. D. settings. The graph encoder module 210 may capture diverse graph structures by a self-supervised and a supervised loss. Then, the architecture customization module 220 may tailor the most suitable GNN architecture based on the learned graph representation. Finally, the customized super-network module 230 may enable efficient training by weight sharing. Each of these modules will be described in detail below.

As shown in FIG. 2, graphs g ₁, g ₂, and g ₃ with different structures are input into the graph encoder module 210. It can be understood that much more input graphs may be input into the graph encoder module 210 during the training stage. These input graphs may have diverse graph structures from different distributions. To capture such diverse graph structures, the graph encoder module 210 may learn low-dimensional representations of graphs. In an embodiment, K GNNs may be adopted to learn K-chunk graph representations:

where

is the k-th chunk of the node representation at the l-th layer, A is the adjacent matrix of the graph, and || represents concatenation. Different latent factors of the input graphs may be captured by using these disentangled GNN layers. Then, a readout layer may be adopted to aggregate node-level representations into a graph-level representation:

h = Readout (H ^(L) ) . (4)

To learn the parameters of the self-supervised disentangled graph encoder module 210, both graph supervised learning task and self-supervised learning task may be used simultaneously.

The downstream target graph task naturally provides supervision signals for learning the graph encoder module 210. Therefore, a classification layer may be placed after the obtained graph representation to get the prediction for the graph classification task. The graph representation for g _i may be denoted as h _i. The supervised learning loss is as follows:

where

is the classification layer.

Graph self-supervised learning (SSL) aims to learn informative graph representation through pretext tasks, which has shown several advantages including reducing label reliance, enhancing robustness, and model generalization ability. Therefore, graph SSL may also be used to complement the supervised learning task. Specifically, an SSL auxiliary task may be set by generating pseudo labels from graphs structures, and the pseudo labels may be used as extra supervision signals. Besides, different pseudo labels may be adopted for different chunks of the disentangled GNN, so that the disentangled graph encoder 210 can capture different factors of the graph structure. In an embodiment, the graph encoder 210 may focus on the degree distribution of graphs as a representative and explainable structural feature, while it is straightforward to generalize to other graph structures. Specifically, for the k-th GNN chunk, the pseudo labels may be generated by calculating the ratio of nodes that exactly have degree k. Then, the SSL objective function may be formulated as:

where

is the pseudo-label and

may be obtained by adopting a regression function, such as, a linear layer followed by an activation function, on the k-th chunk of the graph representation h _i. In an embodiment, the last chunk may be left without SSL tasks to allow more flexibility in learning the disentangled graph representations.

As shown in FIG. 2, after obtaining the graph representations h ₁, h ₂ and h ₃, they may be input into the architecture customization module 220, which maps the representations into different tailored GNN architectures with prototype strategy. Specifically, the probability of choosing an operation o in the i-th layer of a searched architecture may be denoted as

where i∈ {1, 2, …, N} , N is the number of layers, and

The probability may be calculated as:

where

is a learnable prototype vector representation of the operation o. An l ₂ normalization on q may be adopted to ensure numerical stability and fair competition among different operations. In the architecture customization module 220, a prototype vector may be learned for each candidate operation, and operations may be selected based on the preferences of the graph, i.e., if the graph representation has a large projection on a prototype vector, its corresponding operation is more likely to be selected. Besides, by using the exponential function, the length of h can decide the shape of

i.e., the larger ||h|| ₂, the more likely that

are dominated by a few values, indicating that the graph requires specific operations.

In one embodiment, to avoid the mode collapse problem, i.e., vectors of different operations are similar and therefore become indistinguishable, the following regularizer may be adopted based on cosine distances between vectors to keep the diversity of operations:

The architecture customization with prototype in module 220 can tailor the most suitable GNN architectures for different input graphs based on the graph representations. Besides GNN architectures, the weights of the architectures also need to be learned.

As shown in FIG. 2, a super-network module 230 may be adopted to obtain the weights of architectures. Specifically, in the super-network, all possible operations are jointly considered by mixing different operations into a continuous space as follows:

where x is the input of the i-th layer and f ⁱ (x) is the output. Then, all the weights may be optimized using gradient descend methods. Besides, since weights of different architectures are shared, the training will be much more efficient compared to training weights for different architectures separately.

It should be noted that in most NAS models, the architecture is discretized at the end of the search phase by choosing the operation with the largest

for all the layers. Then, the weights of the selected architecture are retrained. However, retraining is infeasible in the disclosed novel GNAS modle, since test graphs can be tailored with different architectures from those for training graphs. Therefore, the weights of the super-network may be directly used as the weights in the searched architecture. Besides, The continuous architecture are kept without the discretization step, enhancing flexibility on architecture customization and simplifying the optimization strategy. Moreover, the customized super-network may serve as a strong ensemble model with

being the ensemble weights, which may also benefit out-of-distribution generalization.

As shown in FIG. 2, the GNAS model 200 may be optimized by using gradient descend methods based on a following loss function:

where

is the supervision loss of the tailored architectures in Eq. (2) , i.e., the supervision loss of the final prediction given by the super-network module 230,

and

are the supervision loss and self-supervision loss of the self-supervised disentangled graph encoder module 210,

is the cosine distance loss of the architecture customization module 230, β ₁ and β ₂ are hyper-parameters.

and

are three additional loss functions introduced as the regularizer in Eq. (2) , and γ is the hyper-parameter to control the contribution of the regularizer.

For the overall optimization, there are two groups of loss functions: the classification loss and the regularizer. At an early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, leading to unstable architecture customization. Therefore, a larger weight may be set for the regularizer initially, i.e., a smaller initial γ in Eq. (2) , to force the self-supervised disentangled graph encoder to learn through its supervised learning and SSL tasks. As the training procedure continues, it can gradually focus more on training the architecture customization module and the super-network module by increasing γ as:

γ _t=γ ₀+tΔγ _t, (11)

where γ _t is the hyper-parameter value at the t-th epoch, Δγ is a small constant.

The overall training procedure is shown as below. The most suitable GNN architecture with its parameters may be directly generated for the test graphs without retraining.

Input: Training Dataset G _tr and Y _tr, Hyper-parameters γ ₀, Δγ _t, β ₁, β ₂

Initialize all leanable parameters and set γ = γ ₀

while Not Converge do

Calculate graph representations h using Eq. (3) and Eq. (4)

Calculate

and

using Eq. (5) and Eq. (6)

Calculate architecture probability

using Eq. (7)

Calculate

using Eq. (8)

Get the parameters from the super-network based on Eq. (9)

Calculate the overall loss in Eq. (2)

Update parameters using gradient descends

Update γ = γ -Δγ

end while

In an embodiment, different learning rates may be used for the three modules 210-230. For example, the learning rate of the self-supervised disentangled encoder module may be 1.5e-4. The learning rate of the architecture customization module may be 1e-4. The training procedure of these two modules are consine annealing scheduled. The learning rate of the customized super-network module may be 2e-3. γ may be initialized as 0.07 and increased to 0.5 linearly. In addition, β ₁ may be set as 0.05, and β ₂ may be set as 0.002. The number of layers may be set as 2 or 3. The disclosed GNAS approach is not limited to such settings.

FIG. 3 illustrates a flow chart of a method 300 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. In an embodiment, the method 300 may be used to design a graph neural network for a graph classification task by graph neural architecture search under distribution shifts between training graphs and test graphs. Method 300 may also be used for other graph machine learning tasks. In an embodiment, the method 300 may be a computer-implemented method.

In block 310, the method 300 may obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space. The input graph may be an assembly line layout graph. The input graph may also be a circuit design layout graph, such as printed circuit board design or chip design. GNNs designed by method 300 may be used to make classification on the input assembly line layout graphs or circuit design layout graphs under distribution shifts. For example, the GNNs may classify an input assembly line layout graph or circuit design layout graph as whether the assembly line layout is reasonable or whether the circuit is efficient, etc. The graph encoder module may be a self-supervised disentangled graph encoder, which may characterize invariant factors hidden in diverse graph structures. The graph encoder module may calculate the graph representation of the input graph using Eq. (3) and Eq. (4) . The graph encoder module may be trained by a supervised learning task and a self-supervised learning task. The graph encoder module may calculate

and

using Eq. (5) and Eq. (6) .

In block 320, the method 300 may obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation. In an embodiment, the probability of the operation may be calculated using Eq. (7) for each layer of the searched GNN architecture. The candidate operation set may comprise at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) . The candidate operation set may comprise other GNN layers, such as GraphConv. Depnding on different graph tasks, base graph shapes, and/or datasets, MLP which does not consider graph structures may be adopted, a pooling layer may be fixed at the end of the searched GNN architecture as a global mean pooling, or GIN may be fixed at the first layer for the Spurious-Motif dataset.

In block 330, the method 300 may obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space, such as, using Eq. (9) . In an embodiment, the weights of the super-network may be directly used as the weights in the searched GNN architecture. The weights of the super-network may be shared among different GNN architectures, and thus the training will be much more efficient compared to tranining weights for different architecutres separately. Then, in the test stage, the trained weights may be directly obtained by the super-network for the searched GNN architecture.

Although not shown in FIG. 3, in the training stage, the method 300 may comprise optimizing the GNN based on a main loss function and a regularizer, which may comprise repeating blocks 310-330 until converge. The main loss function may be a supervision loss of the searched GNN architecture such as in Eq. (2) , and the regularizer may be based on a supervised learning loss function such as in Eq. (5) and a self-supervised learning objective function such as in Eq. (6) for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations such as in Eq. (8) . In addition, since at the early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, a smaller initial weight for the main loss function may be set, and the weight may be gradually insreased through the training procedure.

FIG. 4 illustrates a block diagram of an apparatus 400 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 4, the apparatus 400 may comprise a graph encoder module 410, an architecture customization module 420, and a super-network module 430.

The graph encoder module 410 may be used for obtaining a graph representation of an input graph in a disentangled latent space. The architecture customization module 420 may be used for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation, such as in Eq. (7) . The super-network module 430 may be used for obtaining weights for the searched GNN architecture. Different operations in the candidate operation set are mixed into a continuous space in the super-network module 430. In the training stage, the operations performed by the graph encoder module 410, the architecture customization module 420, and the super-network module 430 may be repeated to optimize the GNN based on a main loss function and a regularizer using Eq. (2) .

The apparatus 400 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs. For example, in the test stage, the graph encoder module 410 may obtain a graph representation of an input graph with the parameters learned for a specific graph task. The architecture customization module 420 may obtain different GNN architectures for different input graphs based on a probability or weight of an operation in each layer. The super-network module 430 may obtain weights learned in the training stage and share these weights for different GNN architectures.

FIG. 5 illustrates a block diagram of an apparatus 500 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. The apparatus 500 may comprise a memory 510 and at least one processor 520. The processor 520 may be coupled to the memory 510 and configured to perform the method 300 described above with reference to FIG. 3. The processor 520 may be a general-purpose processor, or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The memory 510 may store the input data, output data, data generated by processor 520, and/or instructions executed by processor 520. The apparatus 500 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs in accordance with the present disclosure.

The various operations, modules, and networks described in connection with the disclosure herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. According an embodiment of the disclosure, a computer program product for computer vision processing may comprise processor executable computer code for performing the method 300 described above with reference to FIG. 3. According to another embodiment of the disclosure, a computer readable medium may store computer code for computer vision processing, the computer code when executed by a processor may cause the processor to perform the method 300 described above with reference to FIG. 3. Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Any connection may be properly termed as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.

The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the various embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the various embodiments. Thus, the claims are not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.

Claims

A method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:

obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space;

obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and

obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
The method of claim 1, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the designed GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
The method of claim 1, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
The method of claim 3, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
The method of claim 4, wherein a weight for the main loss function is gradually insreased through a training procedure.
The method of claim 1, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
The method of claim 1, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
An apparatus for a graph neural network (GNN) designed by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:

a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space;

an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and

a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
The apparatus of claim 8, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
The apparatus of claim 8, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
The apparatus of claim 10, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
The apparatus of claim 11, wherein a weight for the main loss function is gradually insreased through a training procedure.
The apparatus of claim 8, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
The apparatus of claim 8, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
An apparatus for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:

a memory; and

at least one processor coupled to the memory and configured to perform the method of one of claims 1-7.
A computer readable medium, storing computer code for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, the computer code when executed by a processor, causing the processor to perform the method of one of claims 1-7.
A computer program product for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising: processor executable computer code for performing the method of one of claims 1-7.
A computer-implemented method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising steps of the method of one of claims 1-7.