WO2024011475A1 - Method and apparatus for graph neural architecture search under distribution shift - Google Patents

Method and apparatus for graph neural architecture search under distribution shift Download PDF

Info

Publication number
WO2024011475A1
WO2024011475A1 PCT/CN2022/105600 CN2022105600W WO2024011475A1 WO 2024011475 A1 WO2024011475 A1 WO 2024011475A1 CN 2022105600 W CN2022105600 W CN 2022105600W WO 2024011475 A1 WO2024011475 A1 WO 2024011475A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
gnn
architecture
searched
graphs
Prior art date
Application number
PCT/CN2022/105600
Other languages
French (fr)
Inventor
Xin Wang
Wenwu Zhu
Hong Chen
Ze CHENG
Original Assignee
Robert Bosch Gmbh
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh, Tsinghua University filed Critical Robert Bosch Gmbh
Priority to PCT/CN2022/105600 priority Critical patent/WO2024011475A1/en
Publication of WO2024011475A1 publication Critical patent/WO2024011475A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present disclosure relates generally to artificial intelligence technical field, and more particularly, to graph neural architecture search technology.
  • Graph-structured data has attracted lots of attention in recent years for its flexible representation ability in various domains.
  • Graph neural networks (GNNs) models have been proposed and achieved great successes in many graph tasks.
  • GNAS graph neural architecture search
  • These automatically designed architectures have achieved competitive or better performances compared with manually designed GNNs on datasets with the same distributions under the independently and identically distributed (I. I. D. ) assumption, i.e., the training and test graphs are independently sampled from the identical distribution.
  • a method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs comprises: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
  • an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs comprises: a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space; an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
  • an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs may comprise a memory and at least one processor coupled to the memory.
  • the at least one processor may be configured to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
  • a computer readable medium storing computer code for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs.
  • the computer code when executed by a processor, may cause the processor to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
  • a computer program product for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs.
  • the computer program product may comprise processor executable computer code for: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
  • FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
  • FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
  • FIG. 3 illustrates a flow chart of a method for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • FIG. 4 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • FIG. 5 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
  • a graph is a non-linear data structure consisting of nodes and edges.
  • the nodes may also be referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph.
  • a simple graph 100 consists of nodes n1-n7 and edges e1-e6, edge e1 connects node n1 and n3, edge e2 connects node n2 and n4, and so on.
  • Graph-structured data may be used in varioius domains, including social networks, information networks, biological networks, infrastructure networks, etc. which can not be structured in Euclidean space.
  • graph 100 may be an assembly line layout graph, each node in the graph may represent a workshop, and each edge may represent the association between two workshops.
  • graph 100 may be a circuit design layout graph, each node in the graph may represent a electron device of circuit module, and each edge may represent the association between two electron devices or circuit modules.
  • Graph neural networks may learn node representations by a recursive message passing scheme where nodes aggregate information from their neighbors iteratively. Then, taking the graph classification task as an example, GNNs may use pooling methods to derive graph-level representations. Different GNN architectures mainly differ in their message passing mechanism, i.e., how to exchange information, to adapt to the demands of different graph scenarios.
  • Graph neural architecture search may be utilized for automatically designing GNN architectures for various graph tasks.
  • GNAS Graph neural architecture search
  • an improved graph neural architecture search approach under distribution shifts is provided.
  • Such a graph neural architecture search approach may be able to capture key information on graphs with widely varying distributions under the out-of-distribution settings through tailoring a unique GNN architecture for each graph instance.
  • a self-supervised disentangled graph encoder is designed, which may project graphs into a disentangled latent space, where each disentangled factor in the space is trained by the supervised task and corresponding self-supervised learning task simultaneously.
  • This design is able to capture the key information hidden in graphs in a more controllable manner via the self-supervised disentangled graph representation, thus improving the ability of the representations to generalize under distribution shifts.
  • architecture customization with prototype is adopted to tailor specialized GNN architectures for graphs based on the similarities of their representations with prototypes vectors in the latent space, where each prototype vector corresponds to one different operation.
  • a graph space is denoted as a label space is denoted as a training graph dataset is denoted as the corresponding training label set is denoted as a test graph dataset is denoted as and the corresponding test label set is denoted as
  • the goal of GNAS under distribution shifts is to design a model using G tr and Y tr which works well on G te and Y te under the assumption that P (G tr , Y tr ) ⁇ P (G te , Y te ) , where P (G tr , Y tr ) denotes the probability distribution of training graph dataset, and P (G te , Y te ) denotes the probability distribution of test graph dataset, i.e.,
  • GNNs for graph machine learning.
  • a typical GNN consists of two parts: an architecture and learnable weights where and denote the architecture space and the weight space, respectively. Therefore, GNNs may be denoted as the following mapping function
  • GCN graph convolutional network
  • GAT graph attention network
  • GIN graph isomorphism network
  • SAGE graph sample and aggreate
  • MLP multi-layer perceptron
  • a pooling layer may also be fixed at the end of the GNN architecture as the standard global mean pooling.
  • a GNN architecture may be customized for each graph.
  • the disclosed GNAS method is more flexible and can better handle test graphs under distribution shift since it is known that different GNN architectures suit different graphs. Therefore, it is needed to learn an architecture mapping function and a weight mapping function so that these functions can automatically generate the optimal GNN for different graphs, including the architecture and its weights. Since the architecture only depends on the graph, the weight mapping function can be further simplified as Therefore, Eq. (1) may be transformed into the following objective function:
  • FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
  • the GNAS model 200 comprises three cascaded modules, i.e., a self-supervised disentangled graph encoder module 210, an architecture customization module 220 with prototype strategy, and a customized super-network module 230, to tailor a unique GNN architecture for each graph instance, thus enabling the ability of the model 200 in dealing with generalization under distribution shifts with non-I. I. D. settings.
  • the graph encoder module 210 may capture diverse graph structures by a self-supervised and a supervised loss.
  • the architecture customization module 220 may tailor the most suitable GNN architecture based on the learned graph representation.
  • the customized super-network module 230 may enable efficient training by weight sharing.
  • Each of these modules will be described in detail below.
  • graphs g 1 , g 2 , and g 3 with different structures are input into the graph encoder module 210. It can be understood that much more input graphs may be input into the graph encoder module 210 during the training stage. These input graphs may have diverse graph structures from different distributions. To capture such diverse graph structures, the graph encoder module 210 may learn low-dimensional representations of graphs. In an embodiment, K GNNs may be adopted to learn K-chunk graph representations:
  • both graph supervised learning task and self-supervised learning task may be used simultaneously.
  • the downstream target graph task naturally provides supervision signals for learning the graph encoder module 210. Therefore, a classification layer may be placed after the obtained graph representation to get the prediction for the graph classification task.
  • the graph representation for g i may be denoted as h i .
  • the supervised learning loss is as follows:
  • Graph self-supervised learning aims to learn informative graph representation through pretext tasks, which has shown several advantages including reducing label reliance, enhancing robustness, and model generalization ability. Therefore, graph SSL may also be used to complement the supervised learning task.
  • an SSL auxiliary task may be set by generating pseudo labels from graphs structures, and the pseudo labels may be used as extra supervision signals.
  • different pseudo labels may be adopted for different chunks of the disentangled GNN, so that the disentangled graph encoder 210 can capture different factors of the graph structure.
  • the graph encoder 210 may focus on the degree distribution of graphs as a representative and explainable structural feature, while it is straightforward to generalize to other graph structures.
  • the pseudo labels may be generated by calculating the ratio of nodes that exactly have degree k. Then, the SSL objective function may be formulated as:
  • the pseudo-label may be obtained by adopting a regression function, such as, a linear layer followed by an activation function, on the k-th chunk of the graph representation h i .
  • the last chunk may be left without SSL tasks to allow more flexibility in learning the disentangled graph representations.
  • the architecture customization module 220 maps the representations into different tailored GNN architectures with prototype strategy.
  • the probability of choosing an operation o in the i-th layer of a searched architecture may be denoted as where i ⁇ ⁇ 1, 2, ..., N ⁇ , N is the number of layers, and The probability may be calculated as:
  • a prototype vector may be learned for each candidate operation, and operations may be selected based on the preferences of the graph, i.e., if the graph representation has a large projection on a prototype vector, its corresponding operation is more likely to be selected. Besides, by using the exponential function, the length of h can decide the shape of i.e., the larger
  • the following regularizer may be adopted based on cosine distances between vectors to keep the diversity of operations:
  • the architecture customization with prototype in module 220 can tailor the most suitable GNN architectures for different input graphs based on the graph representations. Besides GNN architectures, the weights of the architectures also need to be learned.
  • a super-network module 230 may be adopted to obtain the weights of architectures. Specifically, in the super-network, all possible operations are jointly considered by mixing different operations into a continuous space as follows:
  • the architecture is discretized at the end of the search phase by choosing the operation with the largest for all the layers. Then, the weights of the selected architecture are retrained. However, retraining is infeasible in the disclosed novel GNAS modle, since test graphs can be tailored with different architectures from those for training graphs. Therefore, the weights of the super-network may be directly used as the weights in the searched architecture. Besides, The continuous architecture are kept without the discretization step, enhancing flexibility on architecture customization and simplifying the optimization strategy. Moreover, the customized super-network may serve as a strong ensemble model with being the ensemble weights, which may also benefit out-of-distribution generalization.
  • the GNAS model 200 may be optimized by using gradient descend methods based on a following loss function:
  • the classification loss there are two groups of loss functions: the classification loss and the regularizer.
  • the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, leading to unstable architecture customization. Therefore, a larger weight may be set for the regularizer initially, i.e., a smaller initial ⁇ in Eq. (2) , to force the self-supervised disentangled graph encoder to learn through its supervised learning and SSL tasks.
  • initial weight
  • ⁇ t is the hyper-parameter value at the t-th epoch
  • is a small constant
  • the overall training procedure is shown as below.
  • the most suitable GNN architecture with its parameters may be directly generated for the test graphs without retraining.
  • different learning rates may be used for the three modules 210-230.
  • the learning rate of the self-supervised disentangled encoder module may be 1.5e-4.
  • the learning rate of the architecture customization module may be 1e-4.
  • the training procedure of these two modules are consine annealing scheduled.
  • the learning rate of the customized super-network module may be 2e-3.
  • may be initialized as 0.07 and increased to 0.5 linearly.
  • ⁇ 1 may be set as 0.05
  • ⁇ 2 may be set as 0.002.
  • the number of layers may be set as 2 or 3.
  • the disclosed GNAS approach is not limited to such settings.
  • FIG. 3 illustrates a flow chart of a method 300 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • the method 300 may be used to design a graph neural network for a graph classification task by graph neural architecture search under distribution shifts between training graphs and test graphs.
  • Method 300 may also be used for other graph machine learning tasks.
  • the method 300 may be a computer-implemented method.
  • the method 300 may obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space.
  • the input graph may be an assembly line layout graph.
  • the input graph may also be a circuit design layout graph, such as printed circuit board design or chip design.
  • GNNs designed by method 300 may be used to make classification on the input assembly line layout graphs or circuit design layout graphs under distribution shifts. For example, the GNNs may classify an input assembly line layout graph or circuit design layout graph as whether the assembly line layout is reasonable or whether the circuit is efficient, etc.
  • the graph encoder module may be a self-supervised disentangled graph encoder, which may characterize invariant factors hidden in diverse graph structures.
  • the graph encoder module may calculate the graph representation of the input graph using Eq. (3) and Eq. (4) .
  • the graph encoder module may be trained by a supervised learning task and a self-supervised learning task.
  • the graph encoder module may calculate and using Eq. (5) and Eq. (6)
  • the method 300 may obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture.
  • the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation.
  • the probability of the operation may be calculated using Eq. (7) for each layer of the searched GNN architecture.
  • the candidate operation set may comprise at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
  • the candidate operation set may comprise other GNN layers, such as GraphConv.
  • a pooling layer may be fixed at the end of the searched GNN architecture as a global mean pooling, or GIN may be fixed at the first layer for the Spurious-Motif dataset.
  • the method 300 may obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space, such as, using Eq. (9) .
  • the weights of the super-network may be directly used as the weights in the searched GNN architecture.
  • the weights of the super-network may be shared among different GNN architectures, and thus the training will be much more efficient compared to tranining weights for different architecutres separately.
  • the trained weights may be directly obtained by the super-network for the searched GNN architecture.
  • the method 300 may comprise optimizing the GNN based on a main loss function and a regularizer, which may comprise repeating blocks 310-330 until converge.
  • the main loss function may be a supervision loss of the searched GNN architecture such as in Eq. (2)
  • the regularizer may be based on a supervised learning loss function such as in Eq. (5) and a self-supervised learning objective function such as in Eq. (6) for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations such as in Eq. (8) .
  • the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, a smaller initial weight for the main loss function may be set, and the weight may be gradually insreased through the training procedure.
  • FIG. 4 illustrates a block diagram of an apparatus 400 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • the apparatus 400 may comprise a graph encoder module 410, an architecture customization module 420, and a super-network module 430.
  • the graph encoder module 410 may be used for obtaining a graph representation of an input graph in a disentangled latent space.
  • the architecture customization module 420 may be used for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation, such as in Eq. (7) .
  • the super-network module 430 may be used for obtaining weights for the searched GNN architecture. Different operations in the candidate operation set are mixed into a continuous space in the super-network module 430. In the training stage, the operations performed by the graph encoder module 410, the architecture customization module 420, and the super-network module 430 may be repeated to optimize the GNN based on a main loss function and a regularizer using Eq. (2) .
  • the apparatus 400 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs.
  • the graph encoder module 410 may obtain a graph representation of an input graph with the parameters learned for a specific graph task.
  • the architecture customization module 420 may obtain different GNN architectures for different input graphs based on a probability or weight of an operation in each layer.
  • the super-network module 430 may obtain weights learned in the training stage and share these weights for different GNN architectures.
  • FIG. 5 illustrates a block diagram of an apparatus 500 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
  • the apparatus 500 may comprise a memory 510 and at least one processor 520.
  • the processor 520 may be coupled to the memory 510 and configured to perform the method 300 described above with reference to FIG. 3.
  • the processor 520 may be a general-purpose processor, or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • the memory 510 may store the input data, output data, data generated by processor 520, and/or instructions executed by processor 520.
  • the apparatus 500 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs in accordance with the present disclosure.
  • a computer program product for computer vision processing may comprise processor executable computer code for performing the method 300 described above with reference to FIG. 3.
  • a computer readable medium may store computer code for computer vision processing, the computer code when executed by a processor may cause the processor to perform the method 300 described above with reference to FIG. 3.
  • Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Any connection may be properly termed as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for designing network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space (310); obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation (320); and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space (330).

Description

METHOD AND APPARATUS FOR GRAPH NEURAL ARCHITECTURE SEARCH UNDER DISTRIBUTION SHIFT FIELD
The present disclosure relates generally to artificial intelligence technical field, and more particularly, to graph neural architecture search technology.
BACKGROUND
Graph-structured data has attracted lots of attention in recent years for its flexible representation ability in various domains. Graph neural networks (GNNs) models have been proposed and achieved great successes in many graph tasks. To save human efforts on designing GNN architectures for different tasks and automatically design more powerful GNNs, graph neural architecture search (GNAS) has been utilized to search for an optimal GNN architecture. These automatically designed architectures have achieved competitive or better performances compared with manually designed GNNs on datasets with the same distributions under the independently and identically distributed (I. I. D. ) assumption, i.e., the training and test graphs are independently sampled from the identical distribution.
However, distribution shifts are ubiquitous and inevitable in real-world graph applications where there exist a large number of unforeseen and uncontrollable hidden factors. The existing GNAS approaches under the I. I. D. assumption only search a single fixed GNN architecture based on the training set before directly applying the selected architecture on the test set, failing to deal with varying distribution shifts under the out-of-distribution setting. Because the single GNN architecture discovered by existing methods may overfit the distributions of the training graph data, it may fail to make accurate predictions on test graph data with various distributions different from the training graph data.
Therefore, there exists a need for an improved method and apparatus for graph neural architecture search under distribution shifts between training graph data and test graph data.
SUMMARY
The following presents a simplified summary of one or more aspects according  to the present disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect of the disclosure, a method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The method comprises: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus comprises: a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space; an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
In another aspect of the disclosure, an apparatus for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The apparatus may comprise a memory and at least one processor coupled to the memory. The at least one processor may be configured to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of  the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, a computer readable medium storing computer code for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer code, when executed by a processor, may cause the processor to: obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
In another aspect of the disclosure, a computer program product for designing a GNN by graph neural architecture search under distribution shifts between training graphs and test graphs is disclosed. The computer program product may comprise processor executable computer code for: obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space; obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
Other aspects or variations of the disclosure will become apparent by consideration of the following detailed description and accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
The following figures depict various embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the methods and structures  disclosed herein may be implemented without departing from the spirit and principles of the disclosure described herein.
FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure.
FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 3 illustrates a flow chart of a method for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 4 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
FIG. 5 illustrates a block diagram of an apparatus for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure.
DETAILED DESCRIPTION
Before any embodiments of the present disclosure are explained in detail, it is to be understood that the disclosure is not limited in its application to the details of construction and the arrangement of features set forth in the following description. The disclosure is capable of other embodiments and of being practiced or of being carried out in various ways.
FIG. 1 illustrates a simplified example of a graph in accordance with one aspect of the present disclosure. A graph is a non-linear data structure consisting of nodes and edges. The nodes may also be referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. For example, as shown in FIG. 1, a simple graph 100 consists of nodes n1-n7 and edges e1-e6, edge e1 connects node n1 and n3, edge e2 connects node n2 and n4, and so on. Graph-structured data may be used in varioius domains, including social networks, information networks, biological networks, infrastructure networks, etc. which can not be structured in Euclidean space. In an example, graph 100 may be an assembly line layout graph, each node in the graph may represent a workshop, and each edge may represent the association between two workshops. In another exmaple, graph 100 may be a circuit design layout graph, each node in the graph may represent a electron device of circuit module, and each edge may represent the association between two electron devices or circuit modules.
Graph neural networks (GNNs) may learn node representations by a recursive message passing scheme where nodes aggregate information from their neighbors iteratively. Then, taking the graph classification task as an example, GNNs may use  pooling methods to derive graph-level representations. Different GNN architectures mainly differ in their message passing mechanism, i.e., how to exchange information, to adapt to the demands of different graph scenarios. Graph neural architecture search (GNAS) may be utilized for automatically designing GNN architectures for various graph tasks. However, when there is a distribution shift between training and test graphs, the existing approaches fail to deal with the problem of adapting to unknown test graph structures since they only search for a fixed architecture for all graphs. Taking drug discovery as an example, there exists only a limited amount of training data that can be obtained for experiments, and the interaction mechanism varies greatly for different molecules due to their complex chemical properties. Therefore, the GNN models designed for drug discovery frequently have to be tested on data with distribution shifts.
In this disclosure, an improved graph neural architecture search approach under distribution shifts is provided. Such a graph neural architecture search approach may be able to capture key information on graphs with widely varying distributions under the out-of-distribution settings through tailoring a unique GNN architecture for each graph instance.
Specifically, a self-supervised disentangled graph encoder is designed, which may project graphs into a disentangled latent space, where each disentangled factor in the space is trained by the supervised task and corresponding self-supervised learning task simultaneously. This design is able to capture the key information hidden in graphs in a more controllable manner via the self-supervised disentangled graph representation, thus improving the ability of the representations to generalize under distribution shifts. Then, architecture customization with prototype is adopted to tailor specialized GNN architectures for graphs based on the similarities of their representations with prototypes vectors in the latent space, where each prototype vector corresponds to one different operation. Next, a customized super-network with differentiable weights on the mixture of different operations is designed, which has great flexibility to ensemble different combinations of operations and enable the disclosed GNAS model to be easily optimized in an end-to-end fashion through gradient based methods. The designs of disentangled graph representations and learnable prototype-operation mapping are able to enhance the generalization ability of the disclosed GNAS model under distribution shifts. Extensive experiments on both synthetic and real-world graph datasets also shows the superiority of the disclosed GNAS model over existing GNAS baselines.
For the purpose of ease of description, a graph space is denoted as
Figure PCTCN2022105600-appb-000001
a label space is denoted as
Figure PCTCN2022105600-appb-000002
a training graph dataset is denoted as
Figure PCTCN2022105600-appb-000003
the  corresponding training label set is denoted as
Figure PCTCN2022105600-appb-000004
a test graph dataset is denoted as
Figure PCTCN2022105600-appb-000005
and the corresponding test label set is denoted as
Figure PCTCN2022105600-appb-000006
Figure PCTCN2022105600-appb-000007
The goal of GNAS under distribution shifts is to design a model
Figure PCTCN2022105600-appb-000008
using G tr and Y tr which works well on G te and Y te under the assumption that P (G tr, Y tr) ≠P (G te, Y te) , where P (G tr, Y tr) denotes the probability distribution of training graph dataset, and P (G te, Y te) denotes the probability distribution of test graph dataset, i.e.,
Figure PCTCN2022105600-appb-000009
where
Figure PCTCN2022105600-appb-000010
is a loss function. In a common yet challenging setting, neither Y te nor unlabeled G te is available in the training phase. F may be GNNs for graph machine learning. A typical GNN consists of two parts: an architecture
Figure PCTCN2022105600-appb-000011
and learnable weights
Figure PCTCN2022105600-appb-000012
where
Figure PCTCN2022105600-appb-000013
and
Figure PCTCN2022105600-appb-000014
denote the architecture space and the weight space, respectively. Therefore, GNNs may be denoted as the following mapping function
Figure PCTCN2022105600-appb-000015
This disclosure mostly focuses on different GNN layers, i.e., message-passing functions, for searching GNN architecture. Therefore, a search space of layer-by-layer architectures without sophisticated connections such as residual or jumping connections is adopted, though the disclosed method can be easily generalized. In an embodiment, five widely used GNN layers may be used as an operation candidate set 
Figure PCTCN2022105600-appb-000016
including graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , graph sample and aggreate (SAGE) , and GraphConv. Besides, multi-layer perceptron (MLP) which does not consider graph structures may also be adopted. A pooling layer may also be fixed at the end of the GNN architecture as the standard global mean pooling.
In this disclosure, instead of using a fixed GNN architecture for all graphs as in the existing GNAS methods, a GNN architecture may be customized for each graph. In this way, the disclosed GNAS method is more flexible and can better handle test graphs under distribution shift since it is known that different GNN architectures suit different graphs. Therefore, it is needed to learn an architecture mapping function
Figure PCTCN2022105600-appb-000017
and a weight mapping function
Figure PCTCN2022105600-appb-000018
so that these functions can automatically generate the optimal GNN for different graphs, including the architecture and its weights. Since the architecture only depends on the graph, the weight mapping function can be further simplified as
Figure PCTCN2022105600-appb-000019
Therefore, Eq. (1) may be  transformed into the following objective function:
Figure PCTCN2022105600-appb-000020
where
Figure PCTCN2022105600-appb-000021
is the regularizer and γ is a hyper-parameter representing the weight of the main loss function. Specific embodiments for properly designing Φ A, Φ w, and 
Figure PCTCN2022105600-appb-000022
will be described in details in connection with FIG. 2 below, so that the disclosed GNAS method can generalize under distribution shifts.
FIG. 2 illustrates a schematic model of graph neural architecture search under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 2, the GNAS model 200 comprises three cascaded modules, i.e., a self-supervised disentangled graph encoder module 210, an architecture customization module 220 with prototype strategy, and a customized super-network module 230, to tailor a unique GNN architecture for each graph instance, thus enabling the ability of the model 200 in dealing with generalization under distribution shifts with non-I. I. D. settings. The graph encoder module 210 may capture diverse graph structures by a self-supervised and a supervised loss. Then, the architecture customization module 220 may tailor the most suitable GNN architecture based on the learned graph representation. Finally, the customized super-network module 230 may enable efficient training by weight sharing. Each of these modules will be described in detail below.
As shown in FIG. 2, graphs g 1, g 2, and g 3 with different structures are input into the graph encoder module 210. It can be understood that much more input graphs may be input into the graph encoder module 210 during the training stage. These input graphs may have diverse graph structures from different distributions. To capture such diverse graph structures, the graph encoder module 210 may learn low-dimensional representations of graphs. In an embodiment, K GNNs may be adopted to learn K-chunk graph representations:
Figure PCTCN2022105600-appb-000023
where
Figure PCTCN2022105600-appb-000024
is the k-th chunk of the node representation at the l-th layer, A is the adjacent matrix of the graph, and || represents concatenation. Different latent factors of the input graphs may be captured by using these disentangled GNN layers. Then, a readout layer may be adopted to aggregate node-level representations into a graph-level representation:
h = Readout (H  (L) ) .              (4)
To learn the parameters of the self-supervised disentangled graph encoder module 210,  both graph supervised learning task and self-supervised learning task may be used simultaneously.
The downstream target graph task naturally provides supervision signals for learning the graph encoder module 210. Therefore, a classification layer may be placed after the obtained graph representation to get the prediction for the graph classification task. The graph representation for g i may be denoted as h i. The supervised learning loss is as follows:
Figure PCTCN2022105600-appb-000025
where
Figure PCTCN2022105600-appb-000026
is the classification layer.
Graph self-supervised learning (SSL) aims to learn informative graph representation through pretext tasks, which has shown several advantages including reducing label reliance, enhancing robustness, and model generalization ability. Therefore, graph SSL may also be used to complement the supervised learning task. Specifically, an SSL auxiliary task may be set by generating pseudo labels from graphs structures, and the pseudo labels may be used as extra supervision signals. Besides, different pseudo labels may be adopted for different chunks of the disentangled GNN, so that the disentangled graph encoder 210 can capture different factors of the graph structure. In an embodiment, the graph encoder 210 may focus on the degree distribution of graphs as a representative and explainable structural feature, while it is straightforward to generalize to other graph structures. Specifically, for the k-th GNN chunk, the pseudo labels may be generated by calculating the ratio of nodes that exactly have degree k. Then, the SSL objective function may be formulated as:
Figure PCTCN2022105600-appb-000027
where
Figure PCTCN2022105600-appb-000028
is the pseudo-label and
Figure PCTCN2022105600-appb-000029
may be obtained by adopting a regression function, such as, a linear layer followed by an activation function, on the k-th chunk of the graph representation h i. In an embodiment, the last chunk may be left without SSL tasks to allow more flexibility in learning the disentangled graph representations.
As shown in FIG. 2, after obtaining the graph representations h 1, h 2 and h 3, they may be input into the architecture customization module 220, which maps the representations into different tailored GNN architectures with prototype strategy. Specifically, the probability of choosing an operation o in the i-th layer of a searched architecture may be denoted as
Figure PCTCN2022105600-appb-000030
where i∈ {1, 2, …, N} , N is the number of layers, and 
Figure PCTCN2022105600-appb-000031
The probability may be calculated as:
Figure PCTCN2022105600-appb-000032
where
Figure PCTCN2022105600-appb-000033
is a learnable prototype vector representation of the operation o. An l 2 normalization on q may be adopted to ensure numerical stability and fair competition among different operations. In the architecture customization module 220, a prototype vector may be learned for each candidate operation, and operations may be selected based on the preferences of the graph, i.e., if the graph representation has a large projection on a prototype vector, its corresponding operation is more likely to be selected. Besides, by using the exponential function, the length of h can decide the shape of
Figure PCTCN2022105600-appb-000034
i.e., the larger ||h|| 2, the more likely that
Figure PCTCN2022105600-appb-000035
are dominated by a few values, indicating that the graph requires specific operations.
In one embodiment, to avoid the mode collapse problem, i.e., vectors of different operations are similar and therefore become indistinguishable, the following regularizer may be adopted based on cosine distances between vectors to keep the diversity of operations:
Figure PCTCN2022105600-appb-000036
The architecture customization with prototype in module 220 can tailor the most suitable GNN architectures for different input graphs based on the graph representations. Besides GNN architectures, the weights of the architectures also need to be learned.
As shown in FIG. 2, a super-network module 230 may be adopted to obtain the weights of architectures. Specifically, in the super-network, all possible operations are jointly considered by mixing different operations into a continuous space as follows:
Figure PCTCN2022105600-appb-000037
where x is the input of the i-th layer and f i (x) is the output. Then, all the weights may be optimized using gradient descend methods. Besides, since weights of different architectures are shared, the training will be much more efficient compared to training weights for different architectures separately.
It should be noted that in most NAS models, the architecture is discretized at the end of the search phase by choosing the operation with the largest
Figure PCTCN2022105600-appb-000038
for all the layers. Then, the weights of the selected architecture are retrained. However, retraining is infeasible in the disclosed novel GNAS modle, since test graphs can be tailored with different architectures from those for training graphs. Therefore, the weights of the super-network may be directly used as the weights in the searched architecture. Besides, The continuous architecture are kept without the discretization step, enhancing  flexibility on architecture customization and simplifying the optimization strategy. Moreover, the customized super-network may serve as a strong ensemble model with 
Figure PCTCN2022105600-appb-000039
being the ensemble weights, which may also benefit out-of-distribution generalization.
As shown in FIG. 2, the GNAS model 200 may be optimized by using gradient descend methods based on a following loss function:
Figure PCTCN2022105600-appb-000040
where
Figure PCTCN2022105600-appb-000041
is the supervision loss of the tailored architectures in Eq. (2) , i.e., the supervision loss of the final prediction given by the super-network module 230, 
Figure PCTCN2022105600-appb-000042
and
Figure PCTCN2022105600-appb-000043
are the supervision loss and self-supervision loss of the self-supervised disentangled graph encoder module 210, 
Figure PCTCN2022105600-appb-000044
is the cosine distance loss of the architecture customization module 230, β 1 and β 2 are hyper-parameters. 
Figure PCTCN2022105600-appb-000045
and
Figure PCTCN2022105600-appb-000046
are three additional loss functions introduced as the regularizer in Eq. (2) , and γ is the hyper-parameter to control the contribution of the regularizer.
For the overall optimization, there are two groups of loss functions: the classification loss and the regularizer. At an early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, leading to unstable architecture customization. Therefore, a larger weight may be set for the regularizer initially, i.e., a smaller initial γ in Eq. (2) , to force the self-supervised disentangled graph encoder to learn through its supervised learning and SSL tasks. As the training procedure continues, it can gradually focus more on training the architecture customization module and the super-network module by increasing γ as:
γ t0+tΔγ t,            (11)
where γ t is the hyper-parameter value at the t-th epoch, Δγ is a small constant.
The overall training procedure is shown as below. The most suitable GNN architecture with its parameters may be directly generated for the test graphs without retraining.
Input: Training Dataset G tr and Y tr, Hyper-parameters γ 0, Δγ t, β 1, β 2
Initialize all leanable parameters and set γ = γ 0
while Not Converge do
    Calculate graph representations h using Eq. (3) and Eq. (4)
    Calculate
Figure PCTCN2022105600-appb-000047
and
Figure PCTCN2022105600-appb-000048
using Eq. (5) and Eq. (6)
    Calculate architecture probability
Figure PCTCN2022105600-appb-000049
using Eq. (7)
    Calculate
Figure PCTCN2022105600-appb-000050
using Eq. (8)
    Get the parameters from the super-network based on Eq. (9)
    Calculate the overall loss in Eq. (2)
    Update parameters using gradient descends
    Update γ = γ -Δγ
end while
In an embodiment, different learning rates may be used for the three modules 210-230. For example, the learning rate of the self-supervised disentangled encoder module may be 1.5e-4. The learning rate of the architecture customization module may be 1e-4. The training procedure of these two modules are consine annealing scheduled. The learning rate of the customized super-network module may be 2e-3. γ may be initialized as 0.07 and increased to 0.5 linearly. In addition, β 1 may be set as 0.05, and β 2 may be set as 0.002. The number of layers may be set as 2 or 3. The disclosed GNAS approach is not limited to such settings.
FIG. 3 illustrates a flow chart of a method 300 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. In an embodiment, the method 300 may be used to design a graph neural network for a graph classification task by graph neural architecture search under distribution shifts between training graphs and test graphs. Method 300 may also be used for other graph machine learning tasks. In an embodiment, the method 300 may be a computer-implemented method.
In block 310, the method 300 may obtain, by a graph encoder module, a graph representation of an input graph in a disentangled latent space. The input graph may be an assembly line layout graph. The input graph may also be a circuit design layout graph, such as printed circuit board design or chip design. GNNs designed by method 300 may be used to make classification on the input assembly line layout graphs or circuit design layout graphs under distribution shifts. For example, the GNNs may classify an input assembly line layout graph or circuit design layout graph as whether the assembly line layout is reasonable or whether the circuit is efficient, etc. The graph encoder module may be a self-supervised disentangled graph encoder, which may characterize invariant factors hidden in diverse graph structures. The graph encoder module may calculate the graph representation of the input graph using Eq. (3) and Eq. (4) . The graph encoder module may be trained by a supervised learning task and a self-supervised learning task. The graph encoder module may calculate
Figure PCTCN2022105600-appb-000051
and
Figure PCTCN2022105600-appb-000052
using Eq. (5) and Eq. (6) .
In block 320, the method 300 may obtain, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN  architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation. In an embodiment, the probability of the operation may be calculated using Eq. (7) for each layer of the searched GNN architecture. The candidate operation set may comprise at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) . The candidate operation set may comprise other GNN layers, such as GraphConv. Depnding on different graph tasks, base graph shapes, and/or datasets, MLP which does not consider graph structures may be adopted, a pooling layer may be fixed at the end of the searched GNN architecture as a global mean pooling, or GIN may be fixed at the first layer for the Spurious-Motif dataset.
In block 330, the method 300 may obtain weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space, such as, using Eq. (9) . In an embodiment, the weights of the super-network may be directly used as the weights in the searched GNN architecture. The weights of the super-network may be shared among different GNN architectures, and thus the training will be much more efficient compared to tranining weights for different architecutres separately. Then, in the test stage, the trained weights may be directly obtained by the super-network for the searched GNN architecture.
Although not shown in FIG. 3, in the training stage, the method 300 may comprise optimizing the GNN based on a main loss function and a regularizer, which may comprise repeating blocks 310-330 until converge. The main loss function may be a supervision loss of the searched GNN architecture such as in Eq. (2) , and the regularizer may be based on a supervised learning loss function such as in Eq. (5) and a self-supervised learning objective function such as in Eq. (6) for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations such as in Eq. (8) . In addition, since at the early stage of the training procedure, the self-supervised disentangled graph encoder may have not been properly trained and the learned graph representation is also not informative, a smaller initial weight for the main loss function may be set, and the weight may be gradually insreased through the training procedure.
FIG. 4 illustrates a block diagram of an apparatus 400 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. As shown in FIG. 4, the apparatus 400 may comprise a graph encoder module 410, an  architecture customization module 420, and a super-network module 430.
The graph encoder module 410 may be used for obtaining a graph representation of an input graph in a disentangled latent space. The architecture customization module 420 may be used for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture. The probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation, such as in Eq. (7) . The super-network module 430 may be used for obtaining weights for the searched GNN architecture. Different operations in the candidate operation set are mixed into a continuous space in the super-network module 430. In the training stage, the operations performed by the graph encoder module 410, the architecture customization module 420, and the super-network module 430 may be repeated to optimize the GNN based on a main loss function and a regularizer using Eq. (2) .
The apparatus 400 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs. For example, in the test stage, the graph encoder module 410 may obtain a graph representation of an input graph with the parameters learned for a specific graph task. The architecture customization module 420 may obtain different GNN architectures for different input graphs based on a probability or weight of an operation in each layer. The super-network module 430 may obtain weights learned in the training stage and share these weights for different GNN architectures.
FIG. 5 illustrates a block diagram of an apparatus 500 for designing GNNs by GNAS under distribution shifts in accordance with one aspect of the present disclosure. The apparatus 500 may comprise a memory 510 and at least one processor 520. The processor 520 may be coupled to the memory 510 and configured to perform the method 300 described above with reference to FIG. 3. The processor 520 may be a general-purpose processor, or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The memory 510 may store the input data, output data, data generated by processor 520, and/or instructions executed by processor 520. The apparatus 500 may also be used for a GNN designed by graph neural architecture search under distribution shifts between training graphs and test graphs in accordance with the present disclosure.
The various operations, modules, and networks described in connection with the disclosure herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. According an embodiment of the disclosure, a computer program product for computer vision processing may comprise processor executable computer code for performing the method 300 described above with reference to FIG. 3. According to another embodiment of the disclosure, a computer readable medium may store computer code for computer vision processing, the computer code when executed by a processor may cause the processor to perform the method 300 described above with reference to FIG. 3. Computer-readable media includes both non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Any connection may be properly termed as a computer-readable medium. Other embodiments and implementations are within the scope of the disclosure.
The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the various embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the various embodiments. Thus, the claims are not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.

Claims (18)

  1. A method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:
    obtaining, by a graph encoder module, a graph representation of an input graph in a disentangled latent space;
    obtaining, by an architecture customization module, a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and
    obtaining weights for the searched GNN architecture, by a super-network module in which different operations in the candidate operation set are mixed into a continuous space.
  2. The method of claim 1, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the designed GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
  3. The method of claim 1, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
  4. The method of claim 3, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
  5. The method of claim 4, wherein a weight for the main loss function is gradually insreased through a training procedure.
  6. The method of claim 1, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
  7. The method of claim 1, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
  8. An apparatus for a graph neural network (GNN) designed by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:
    a graph encoder module for obtaining a graph representation of an input graph in a disentangled latent space;
    an architecture customization module for obtaining a searched GNN architecture for the input graph based on a probability of an operation chosen from a candidate operation set in a layer of the searched GNN architecture, wherein the probability of the operation is a function of similarity between the obtained graph representation and a learnable prototype vector representation of the operation; and
    a super-network module for obtaining weights for the searched GNN architecture, wherein different operations in the candidate operation set are mixed into a continuous space in the super-network module.
  9. The apparatus of claim 8, wherein the input graph is an assembly line layout graph or a circuit design layout graph, and the GNN is used to make classification on the assembly line layout graph or the circuit design layout graph under distribution shifts.
  10. The apparatus of claim 8, wherein the graph encoder module is trained by a supervised learning task and a self-supervised learning task.
  11. The apparatus of claim 10, wherein the GNN is optimized based on a main loss function and a regularizer, and wherein the main loss function is a supervision loss of the searched GNN architecture, and the regularizer is based on a supervised learning loss function and a self-supervised learning objective function for the graph encoder and a cosine distance loss function between learnable prototype vector representations of the different operations.
  12. The apparatus of claim 11, wherein a weight for the main loss function is gradually insreased through a training procedure.
  13. The apparatus of claim 8, wherein the candidate operation set comprises at least one of graph convolutional network (GCN) , graph attention network (GAT) , graph isomorphism network (GIN) , and graph sample and aggreate (SAGE) .
  14. The apparatus of claim 8, wherein a pooling layer is fixed at the end of the searched GNN architecture as a global mean pooling.
  15. An apparatus for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising:
    a memory; and
    at least one processor coupled to the memory and configured to perform the method of one of claims 1-7.
  16. A computer readable medium, storing computer code for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, the computer code when executed by a processor, causing the processor to perform the method of one of claims 1-7.
  17. A computer program product for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising: processor executable computer code for performing the  method of one of claims 1-7.
  18. A computer-implemented method for designing a graph neural network (GNN) by graph neural architecture search under distribution shifts between training graphs and test graphs, comprising steps of the method of one of claims 1-7.
PCT/CN2022/105600 2022-07-14 2022-07-14 Method and apparatus for graph neural architecture search under distribution shift WO2024011475A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/105600 WO2024011475A1 (en) 2022-07-14 2022-07-14 Method and apparatus for graph neural architecture search under distribution shift

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/105600 WO2024011475A1 (en) 2022-07-14 2022-07-14 Method and apparatus for graph neural architecture search under distribution shift

Publications (1)

Publication Number Publication Date
WO2024011475A1 true WO2024011475A1 (en) 2024-01-18

Family

ID=89535094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105600 WO2024011475A1 (en) 2022-07-14 2022-07-14 Method and apparatus for graph neural architecture search under distribution shift

Country Status (1)

Country Link
WO (1) WO2024011475A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688504A (en) * 2024-02-04 2024-03-12 西华大学 Internet of things abnormality detection method and device based on graph structure learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHEN ZHENGYU; XIAO TENG; KUANG KUN: "BA-GNN: On Learning Bias-Aware Graph Neural Network", 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), IEEE, 9 May 2022 (2022-05-09), pages 3012 - 3024, XP034159984, DOI: 10.1109/ICDE53745.2022.00271 *
DING MUCONG, KONG KEZHI, CHEN JIUHAI, KIRCHENBAUER JOHN, GOLDBLUM MICAH, WIPF DAVID, HUANG FURONG, GOLDSTEIN TOM: "A Closer Look at Distribution Shifts and Out-of-Distribution Generalization on Graphs", WORKSHOP ON DISTRIBUTION SHIFTS, 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 29 September 2021 (2021-09-29), pages 1 - 15, XP093127209 *
QITIAN WU; HENGRUI ZHANG; JUNCHI YAN; DAVID WIPF: "Handling Distribution Shifts on Graphs: An Invariance Perspective", ARXIV.ORG, 7 May 2022 (2022-05-07), XP091217381 *
TIANXIANG ZHAO; DONGSHENG LUO; XIANG ZHANG; SUHANG WANG: "On Consistency in Graph Neural Network Interpretation", ARXIV.ORG, 27 May 2022 (2022-05-27), XP091233479 *
ZHU QI, PONOMAREVA NATALIA, HAN JIAWEI, PEROZZI BRYAN: "Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data", 35TH CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2021), 22 May 2021 (2021-05-22), pages 1 - 13, XP093127219, ISSN: 2331-8422 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688504A (en) * 2024-02-04 2024-03-12 西华大学 Internet of things abnormality detection method and device based on graph structure learning
CN117688504B (en) * 2024-02-04 2024-04-16 西华大学 Internet of things abnormality detection method and device based on graph structure learning

Similar Documents

Publication Publication Date Title
Liang et al. Darts+: Improved differentiable architecture search with early stopping
US20220036194A1 (en) Deep neural network optimization system for machine learning model scaling
Liu et al. Progressive neural architecture search
EP4170553A1 (en) Framework for optimization of machine learning architectures
Jean et al. Semi-supervised deep kernel learning: Regression with unlabeled data by minimizing predictive variance
CN111382868B (en) Neural network structure searching method and device
US20220036123A1 (en) Machine learning model scaling system with energy efficient network data transfer for power aware hardware
US20230082597A1 (en) Neural Network Construction Method and System
Ma et al. Adaptive-step graph meta-learner for few-shot graph classification
He et al. The local elasticity of neural networks
Ru et al. Neural architecture generator optimization
Jin et al. Rc-darts: Resource constrained differentiable architecture search
US11586924B2 (en) Determining layer ranks for compression of deep networks
US20210056357A1 (en) Systems and methods for implementing flexible, input-adaptive deep learning neural networks
US20210350203A1 (en) Neural architecture search based optimized dnn model generation for execution of tasks in electronic device
Wang et al. Tackling instance-dependent label noise via a universal probabilistic model
CN113168568A (en) System and method for active transfer learning with deep characterization
CN116594748A (en) Model customization processing method, device, equipment and medium for task
CN111080551A (en) Multi-label image completion method based on depth convolution characteristics and semantic neighbor
WO2024011475A1 (en) Method and apparatus for graph neural architecture search under distribution shift
CN110473195A (en) It is a kind of can automatic customization medicine lesion detection framework and method
Li et al. AutoDet: pyramid network architecture search for object detection
Li et al. LC-NAS: Latency constrained neural architecture search for point cloud networks
Sood et al. Neunets: An automated synthesis engine for neural network design
Zhu et al. When contrastive learning meets active learning: A novel graph active learning paradigm with self-supervision

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950601

Country of ref document: EP

Kind code of ref document: A1