CN113095592A

CN113095592A - Method and system for performing predictions based on GNN and training method and system

Info

Publication number: CN113095592A
Application number: CN202110483604.6A
Authority: CN
Inventors: 赵欢; 郭夏玮; 全雨晗; 姚权铭
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-07-09

Abstract

A method and system for performing predictions based on GNN and a training method and system are provided, wherein the method for performing predictions based on GNN comprises: acquiring table data containing a sample to be predicted; creating a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples; on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

Description

Method and system for performing predictions based on GNN and training method and system

Technical Field

The present application relates generally to the field of artificial intelligence, and more particularly, to a method and system for performing predictions based on a graph neural network GNN and a method and system for training a machine learning model based on a graph neural network GNN.

Background

Table data has wide application in industry, and data such as commodity sales, website browsing, transaction flow, financial loan, etc. can be stored in the form of table data. The table data is learned, unknown values in the existing data are predicted, and the method has wide application in real life, such as sales amount prediction, commodity recommendation, fraud transaction detection, loan risk prediction and the like. The existing supervised learning model based on table data generally considers the existing features to construct new effective cross features, so that the performance of the model is improved, however, the improvement of the model performance and the prediction effect is still not good only by considering the cross features. In view of this, a technique capable of improving model performance and prediction effect in learning of table data is required.

Disclosure of Invention

The present application provides a method and system for performing prediction based on a graph neural network GNN, a method and system for training a machine learning model based on a graph neural network GNN, a computer-readable storage medium storing instructions, and a system comprising at least one computing device and at least one storage device storing instructions to address at least the above-mentioned problems in the related art. The technical scheme of the application is as follows:

according to a first aspect of the present application, there is provided a method of performing a prediction based on a graph neural network, GNN, the method comprising: acquiring table data containing a sample to be predicted; creating a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples; on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

Optionally, the graph is a multiple graph capable of reflecting correlation between samples in table data in multiple aspects, wherein the created graph is represented by using features of a target node corresponding to the sample to be predicted in a GNN learning graph, and the method includes: respectively extracting neighbor nodes and correlation related to a target node in a multiple graph to obtain a plurality of subgraphs related to the target node, wherein each subgraph reflects the correlation between samples in one aspect; respectively learning sub-feature representations of the target nodes by using GNN corresponding to each subgraph on the basis of each subgraph; and aggregating the sub-feature representations of the target node to obtain the feature representation of the target node.

Optionally, the performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node includes: obtaining a prediction result regarding the target node based only on performing prediction using the feature representation of the target node learned by the GNN; alternatively, the feature representation of the target node learned by the GNN is stitched with the original feature representation of the target node or the feature representation of the target node learned by another means different from the GNN, and prediction is performed based on the stitched feature representation to obtain a prediction result on the target node.

Optionally, the creating a graph based on the table data includes: selecting features capable of constructing correlation among samples according to the features of the samples in the table data; associated samples are selected based on the selected features to create a graph.

Optionally, the selecting, according to the characteristics of the samples of the table data, characteristics capable of constructing correlations between the samples includes: directly selecting features capable of constructing correlations between samples from among the features of the samples; alternatively, at least two discrete features of the samples are combined to obtain a new discrete feature as a feature capable of constructing a correlation between the samples.

Optionally, the selecting the associated sample to create the graph based on the selected features includes: under the condition that the selected features are discrete features, determining samples with the same value of the selected features as associated samples, and constructing edges among the associated samples; and under the condition that the selected features are continuous features, converting the continuous features into discrete features, determining samples with the same values of the converted discrete features as associated samples, and constructing edges among the associated samples.

Optionally, learning the sub-feature representation of the target node respectively based on each subgraph by using the GNN corresponding to each subgraph, including: encoding features of samples represented by nodes in each subgraph; and on the basis of the coded features, respectively learning sub-feature representations of the target nodes by using the GNN corresponding to each sub-graph.

Optionally, the aggregating the sub-feature representations of the target node to obtain the feature representation of the target node includes: the sub-feature representations of the target node are aggregated using the attention network to obtain a feature representation of the target node.

Optionally, the performing prediction based only on the learned feature representation of the target node to obtain a prediction result about the target node includes: and performing prediction by utilizing a multilayer perceptron to obtain a prediction result about the target node based on the learned characteristic representation of the target node.

Optionally, the performing prediction based on the spliced feature representation to obtain a prediction result about the target node includes: based on the spliced feature representation, performing prediction using any one of a logistic regression model, a gradient hoist GBM, and a DeepFM model, or a variant thereof, to obtain a prediction result about the target node.

Optionally, the tabular data relates to one of financial loan data, merchandise sales data, website browsing data, and transaction flow data.

According to a second aspect of the present application, there is provided a method of training a machine learning model based on a graph neural network, GNN, the method comprising: acquiring training table data; creating a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples; learning feature representations of the target nodes in the graph with the GNN based on the created graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node; adjusting parameters of the machine learning model by comparing the predicted outcome for the target node with the true signature for the target node.

Optionally, the graph is a multiple graph capable of reflecting correlations between training samples in training table data in multiple aspects, wherein the creating-based graph learns feature representations of target nodes in the graph using GNN, and comprises: respectively extracting neighbor nodes and correlation related to a target node in a multiple graph to obtain a plurality of subgraphs related to the target node, wherein each subgraph reflects the correlation between samples in one aspect; respectively learning sub-feature representations of the target nodes by using GNN corresponding to each subgraph on the basis of each subgraph; and aggregating the sub-feature representations of the target node to obtain the feature representation of the target node.

Optionally, the performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node includes: performing prediction based only on the learned feature representation of the target node to obtain a prediction result about the target node; or, the learned feature representation of the target node is spliced with the original feature representation of the target node or the feature representation of the target node learned by other means different from the GNN, and prediction is performed based on the spliced feature representation to obtain a prediction result about the target node.

Optionally, the creating a graph based on the table data includes: selecting features capable of constructing correlation among training samples according to the features of the training samples in the training table data; an associated training sample is selected based on the selected features to create a graph.

Optionally, the selecting, according to the features of the training samples in the training table data, features capable of constructing correlations between the training samples includes: directly selecting features capable of constructing correlation between training samples from among the features of the training samples; alternatively, at least two discrete features of the training samples are combined to obtain a new discrete feature as a feature capable of constructing a correlation between the training samples.

Optionally, the selecting the associated training samples based on the selected features to create the graph includes: determining the training samples with the same value of the selected features as associated training samples under the condition that the selected features are discrete features, and constructing edges among the associated training samples; and under the condition that the selected features are continuous features, converting the continuous features into discrete features, determining training samples with the same values of the converted discrete features as associated training samples, and constructing edges among the associated training samples.

Optionally, learning the sub-feature representation of the target node respectively based on each subgraph by using the GNN corresponding to each subgraph, including: encoding features of training samples represented by nodes in each subgraph; and on the basis of the coded features, respectively learning sub-feature representations of the target nodes by using the GNN corresponding to each sub-graph.

Optionally, the aggregating the sub-feature representations of the target node to obtain the feature representation of the target node includes: and aggregating the sub-feature representations of the target node by using an attention network included in the machine learning model to obtain the feature representation of the target node.

Optionally, the performing prediction based only on the learned feature representation of the target node to obtain a prediction result about the target node includes: performing prediction with a multi-layered perceptron included in the machine learning model based on the learned feature representation of the target node to obtain a prediction result about the target node.

Optionally, the performing prediction based on the spliced feature representation to obtain a prediction result about the target node includes: performing prediction using one of a logistic regression model, a gradient elevator GBM, and a DeepFM model included in the machine learning model, or a variant thereof, based on the spliced feature representation to obtain a prediction result about the target node.

Optionally, the training form data relates to one of financial loan data, merchandise sales data, website browsing data, and transaction flow data.

According to a third aspect of the present application, there is provided a system for performing a prediction based on a graph neural network, GNN, the system comprising: a data acquisition device configured to acquire table data containing a sample to be predicted; a graph creating device configured to create a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples; a prediction device configured to: on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; and performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

According to a fourth aspect of the present application, there is provided a system for training a machine learning model based on a graph neural network, GNN, the system comprising: a data acquisition device configured to acquire training table data; a graph creation means configured to create a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples; a training apparatus configured to: learning feature representations of the target nodes in the graph with the GNN based on the created graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node; and adjusting parameters of the machine learning model by comparing the predicted result for the target node with the true label for the target node.

Optionally, the performing prediction based on the spliced feature representation to obtain a prediction result about the target node includes: performing prediction using one or a variation of a logistic regression model LR, a gradient elevator GBM, and a DeepFM model included in the machine learning model to obtain a prediction result about the target node based on the spliced feature representation.

According to a fifth aspect of the present application, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method as described above.

According to a sixth aspect of the present application, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the method as described above.

The training system and the training method according to the exemplary embodiments of the present application may train a machine learning model with better model performance, which may be used to perform predictions related to table data, since correlations between samples are considered in the training process. According to the prediction system and the prediction method of the exemplary embodiment of the application, the correlation between samples is utilized in the prediction process, so that the accuracy of the prediction result can be effectively improved.

Drawings

These and/or other aspects and advantages of the present application will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings of which:

fig. 1 is a block diagram illustrating a system for performing predictions based on GNNs according to an exemplary embodiment of the present application;

FIG. 2 is a diagram showing an example of table data;

FIG. 3 is a schematic diagram illustrating an example of performing predictions based on GNN in accordance with the present application;

fig. 4 is a flowchart illustrating a method of performing prediction based on GNNs according to an exemplary embodiment of the present application;

FIG. 5 is a block diagram illustrating a system for training a machine learning model based on a graph neural network GNN according to an exemplary embodiment of the present application;

fig. 6 is a flowchart illustrating a method of training a machine learning model based on a graph neural network GNN according to an exemplary embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, exemplary embodiments of the present application will be described in further detail below with reference to the accompanying drawings and detailed description.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

As background art of the present application, in an existing supervised learning model based on table data, it is generally considered to construct new effective cross features based on existing features, so as to improve performance of the model, however, only considering the cross features is still not good enough to improve performance and prediction effect of the model. The research of the inventor of the application finds that the correlation among samples is ignored in the prediction process by the existing method, so that the model performance and the prediction effect are still poor. In fact, in machine learning, the correlation between samples has some value and can be used to aggregate similar samples, thereby increasing the effect on the final task. However, there is currently no technique for taking into account correlations between samples in supervised learning of tabular data. In view of the above, the present application proposes a table data learning technique based on a Graph Neural Network (GNN), which considers the correlation between samples in the table data learning to improve the model performance and the prediction effect.

Graph (Graph) is a data model which is very common in the real world and scientific fields, such as traffic networks, social networks and the like, and can be modeled into a Graph. Different neural networks are designed based on the graph, different tasks are solved, and the problem of GNN research is solved. Most of the existing GNN models adopt a neighbor aggregation framework, and the expression of nodes on a graph is learned by aggregating the features of neighbors. The method comprises the steps of selecting at least one characteristic capable of describing sample similarity in table data to construct a graph, utilizing GNN to learn the graph, aggregating similar samples to obtain the characterization of each sample, and obtaining a prediction result based on the obtained characterization.

Hereinafter, the concept of the present application will be described in detail with reference to fig. 1 to 6.

Fig. 1 is a block diagram illustrating a system for performing prediction based on GNN (hereinafter, simply referred to as "prediction system" for convenience of description) according to an exemplary embodiment of the present application.

As shown in FIG. 1, the prediction system 100 may include a data acquisition device 110, a graph creation device 120, and a prediction device 130.

In particular, the data acquisition device 110 may acquire table data that includes samples to be predicted. According to an exemplary embodiment, each row of table data may represent a sample, each column may represent a one-dimensional feature or a one-dimensional attribute, or a column of table data may further include a label for the sample. By way of example, the tabular data may relate to one of, but is not limited to, financial loan data, merchandise sales data, website browsing data, and transaction flow data.

Fig. 2 is a diagram showing an example of table data. The example of the table data shown in fig. 2 relates to financial loan data, and the table data shown in fig. 2 has 7 samples each of which may include 9 attributes or characteristics of index, user ID, education, age, city, application time, application amount, repayment amount, and identification time, and may further include a sample flag "overdue" as an example. As shown in fig. 2, the first 5 samples further include already-existing sample flags, where 1 indicates overdue, 0 indicates not overdue, and the remaining two samples are to-be-predicted samples whose flags are unknown, and it is required to predict whether the users corresponding to the two to-be-predicted samples will be overdue based on the entire table data.

It should be noted that fig. 2 is only one example showing table data, the table data may refer to different kinds of data according to different application scenarios, and the number of samples included in the table data, the number and the kind of features or attributes included in each sample may be different according to different application scenarios, which is not limited in this application.

The graph creation means 120 may create a graph based on the table data. Here, each node in the graph may represent one sample in the table data, and an edge in the graph may represent a correlation between two samples. As an example, in the present application, the created graph may be a simple graph, which is a graph capable of reflecting the correlation between the samples in the table data in one aspect, that is, two nodes in the graph are connected by only one type of edge at most, that is, there is only one correlation between two samples. Alternatively, in the present application, the created graph may be a multi-graph that can reflect the dependencies between samples in the table data in multiple aspects, i.e., two nodes in the graph may be connected by more than one type of edge. The type of edge reflects the type of correlation, i.e., by which correlation the two samples to which the edge connects are related.

According to an exemplary embodiment, the graph creation means 120 may select a feature capable of constructing a correlation between samples from features of the samples in the table data, and select an associated sample to create a graph based on the selected feature. For example, in FIG. 2, when we want to predict whether a user 35360 (line 6 in the table data) is able to repay loans on time, then those users with the same education level as the user (e.g., user 12841 at line 1 in the table data, user 28877 at line 3 in the table data, and user 40633 at line 5 in the table data) may provide information useful for the prediction because the loan repayment abilities of people with the same education level are generally similar. However, existing feature intersection methods ignore these correlations between samples. In the present application, the conversion of the correlation between samples in table data into structural information on a graph is realized by converting the table data into a graph (a simple graph or a multiple graph) capable of representing the correlation between samples. For example, in the table data shown in fig. 2, the feature of "education" may be a feature capable of establishing a correlation between samples, but is not limited thereto.

Specifically, when the features capable of constructing the correlation between samples are selected from the features of the samples of the table data, the features capable of constructing the correlation between samples may be directly selected from among the features of the samples. For example, ID class features (e.g., user ID, commodity ID, etc. in a click-through rate estimation scenario) may be directly selected as features capable of constructing correlation between samples, or features with importance exceeding a threshold may be selected from all features of samples of table data by using a pre-trained classifier as features capable of constructing correlation between samples. Optionally, when the feature capable of constructing the correlation between the samples is selected according to the features of the samples of the table data, at least two discrete features of the samples may be combined to obtain a new discrete feature as the feature capable of constructing the correlation between the samples. For example, for already existing discrete features, feature combining and crossing may be performed, e.g., calculating a Cartesian product, to obtain new discrete features.

After selecting features that enable the construction of correlations between samples, the associated samples may be selected to create a graph based on the selected features. Specifically, in the case that the selected feature is a discrete feature, samples having the same value of the selected feature may be determined as associated samples, and an edge may be constructed between the associated samples. Under the condition that the selected features are continuous features, the continuous features can be converted into discrete features, samples with the same values of the converted discrete features are determined as associated samples, and then edges are constructed among the associated samples. For example, a continuous feature (e.g., age) of a numerical class may be first discretized, binned, and the like, and converted into a discrete feature, and then samples having the same value of the transformed discrete feature are determined as associated samples, and edges are constructed between the associated samples.

Fig. 3 is a schematic diagram illustrating an example of performing prediction based on GNN according to the present application. As shown in fig. 3, a legend constructed based on the table data shown in fig. 2 may be, for example, a multiple graph shown below the table data. In the multi-graph, for example, a correlation between samples may be established based on the feature "age" and the feature "education", and a multi-graph may be created therefrom, which may reflect the correlation between samples in both age and education. Further, the edges in the created multi-graph may be edges having directivity, or may also be edges having no directivity. For example, if the samples are sequential samples having chronological order, the edges in the graph may be given directions according to features having temporal attributes in the samples (e.g., application time or recognition time of the table data in fig. 2), such that the first occurring samples point to the later occurring samples.

It should be noted that, although the example of creating the graph is given above, the manner of creating the graph based on the table data of the present application is not limited to the above example, and an appropriate manner may be selected according to actual situations to define the correlation between the samples, and the graph may be created accordingly.

Assuming that there is a correlation in R in the samples of the table data, a multigraph may be created based on the table data, which may be represented as (v, ε)₁，ε₂，…，ε_R) Where v is the set of nodes in the graph, ε_r(R-1, 2, …, R) is the correlation between samples in the R-th aspect.

After creating the graph, the prediction apparatus 130 may learn, based on the created graph, a feature representation of a target node in the graph corresponding to the sample to be predicted using GNN, and perform prediction based on the learned feature representation of the target node to obtain a prediction result regarding the target node.

For example, as described above, the created graph may be a multi-graph capable of reflecting the correlation between the samples in the table data in multiple aspects, in which case, learning the feature representation of the target node corresponding to the sample to be predicted in the graph by using GNN based on the created graph may include: firstly, extracting neighbor nodes and correlation related to a target node in a multiple graph respectively to obtain a plurality of subgraphs related to the target node, wherein each subgraph reflects the correlation between samples on one hand; secondly, respectively learning sub-feature representation of the target node by using GNN corresponding to each subgraph on the basis of each subgraph; and finally, aggregating the sub-feature representations of the target node to obtain the feature representation of the target node.

For example, the table data shown in fig. 2 includes two samples to be predicted, and their user IDs are "35360" and "47533", respectively, and in fig. 3 and the following description, for convenience of description, the target nodes corresponding to the two samples to be predicted will be referred to by their user IDs, respectively. For example, for the target node "35360", a first sub-graph related to the target node "35360" can be obtained by extracting the neighboring node "28877" and the neighboring node "26851" related to the target node "35360" in the multi-graph and the correlation established according to "age" (i.e., the edge constructed according to "age"). In addition, for the target node "35360", a second sub-graph related to the target node "35360" can also be obtained by extracting the neighboring node "28877", the neighboring node "12841", and the neighboring node "40633" related to the target node "35360" in the multi-graph, and the correlation established according to "education" (i.e., edges constructed according to "age"). Here, the first sub-graph reflects the correlation between samples in terms of "age", and the second sub-graph reflects the correlation between samples in terms of "education".

It should be noted that, for convenience of illustration, only two subgraphs related to the target node "35360" are shown in fig. 3, however, in fact, the subgraph related to the target node "35360" that can be obtained according to the multiple graph shown in fig. 3 is not limited to the two subgraphs shown in fig. 3, but other subgraphs can also be obtained in the manner of obtaining the subgraphs, which is not described herein again. Further, for the target node "47533" corresponding to another sample to be predicted in the table data shown in fig. 2, a plurality of subgraphs related to the target node "47533" may be obtained in the same manner as the plurality of subgraphs related to the target node "35360" obtained as described above, and will not be described again here.

As shown in FIG. 3, after multiple subgraphs related to the target node are obtained, the sub-feature representations of the target node may be learned separately based on each subgraph using the GNN corresponding to each subgraph. Specifically, for example, features of samples represented by nodes in each subgraph may be first encoded, and then sub-feature representations of target nodes may be learned separately based on the encoded features using the GNNs corresponding to each subgraph. For example, the features of all nodes in a subgraph may be encoded to obtain embedding (i.e., an embedded representation vector) of the samples in each subgraph, which are then input into each GNN to learn the sub-feature representation of the target node.

Specifically, for sample X in the table data, the features of the sample X are usually of various types, for example, the age of the user is digital type features, and the gender of the user is category type features, so the feature vector X of the sample X cannot be directly input to the GNN, and the feature vector X needs to be converted into a unified feature space through a feature encoder, and the feature space can be represented as h_xENC (x), where ENC (·) denotes an encoding operation of an encoder.

After embedding of samples in each sub-graph is obtained by encoding, the operation procedure of GNN can be as shown in the following equation (1):

where x is the sample in the table data (i.e., the node in the graph),

is a feature representation of a sample (node) learned from the correlation of the r-th aspect (i.e., a feature representation of a node learned through a subgraph reflecting the correlation of the r-th aspect),

is a learnable weight matrix of GNNs shared by all samples, σ is a nonlinear activation function (e.g., can be sigmoid or ReLU), AGG is a GNN aggregator, which is used to aggregate similar samples in the same subgraph,

wherein N is^r(x) Is a set of neighbor nodes of node x connected according to the dependency of the r-th aspect (also referred to as edge type r).

After learning the sub-feature representations of the target nodes separately based on each subgraph using the GNNs corresponding to each subgraph, the sub-feature representations of the target nodes may be aggregated to obtain the feature representation of the target node. As an example, the target node's feature representation may be obtained by aggregating sub-feature representations of the target node using an attention network.

For example, the sub-feature representations of the target nodes obtained by the respective subgraphs may be aggregated by an attention network (e.g., a layer of neural network) according to equations (2) to (4) below:

wherein q, W, b and {

beta

^r1, 2, …, R is a parameter of the attention network, Z_xIs a feature representation of the sample (node) x obtained through aggregation.

After obtaining the feature representation of the target node, a prediction may be performed based on the learned feature representation of the target node to obtain a prediction result for the target node. According to an exemplary embodiment, a prediction result regarding a target node may be obtained based on only performing prediction on a feature representation of the target node learned using GNN. For example, as shown in fig. 3, prediction results regarding the target node may be obtained by performing prediction using a multi-level perceptron (MLP) based on the learned feature representation of the target node. It should be noted that the prediction result about the target node may be obtained by performing prediction using any classifier model based on the learned feature representation of the target node, and is not limited to performing prediction using MLP to obtain the prediction result about the target node.

Alternatively, according to another exemplary embodiment, the feature representation of the target node learned by the GNN may be stitched with the original feature representation of the target node or the feature representation of the target node learned by another manner different from the GNN, and prediction may be performed based on the stitched feature representation to obtain a prediction result regarding the target node. For example, as shown in fig. 3, the prediction result about the target node may be obtained by performing prediction using any one of a logistic regression model, a gradient elevator GBM, and a deep fm model, or a variation thereof, based on the spliced feature representation. It should be noted that, although it is mentioned above that the prediction is performed using any one of the logistic regression model, the gradient elevator GBM, and the DeepFM model or a variation thereof to obtain the prediction result regarding the target node, the present disclosure is not limited thereto, and in fact, the prediction may be performed using any other model currently known to obtain the prediction result regarding the target node based on the characteristics of the concatenation.

In the above exemplary embodiments, performing prediction based on the stitched eigenrepresentation to obtain a prediction result about a target node may obtain a better prediction effect than performing prediction directly based on the eigenrepresentation of the target node learned using GNN to obtain a prediction result about a target node.

In the above, the prediction system according to the exemplary embodiment of the present application has been described with reference to fig. 1 to 3, and can provide a more accurate prediction result for a sample to be predicted in table data since the correlation between samples in table data is fully utilized at the time of prediction.

Fig. 4 is a flowchart illustrating a method of performing prediction based on GNNs (hereinafter, simply referred to as a "prediction method" for convenience of description) according to an exemplary embodiment of the present application.

Here, as an example, the prediction method shown in fig. 4 may be performed by the prediction system 100 shown in fig. 1, may also be implemented entirely in software by a computer program or instructions, and may also be performed by a specifically configured computing system or computing device, for example, by a system including at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the prediction method described above. For convenience of description, it is assumed that the prediction method shown in fig. 4 is performed by the prediction system 100 shown in fig. 1, and that the prediction system 100 may have the configuration shown in fig. 1.

Referring to fig. 4, in step S410, the data acquisition device 110 may acquire table data including a sample to be predicted. Here, the table data may relate to one of financial loan data, commodity sales data, website browsing data, and transaction flow data, but is not limited thereto. Next, at step S420, the graph creation apparatus 120 may create a graph based on the table data, where each node in the graph may represent one sample in the table data, and an edge in the graph may represent a correlation between the two samples. Subsequently, in step S430, the prediction apparatus 130 may learn, based on the created graph, the feature representation of the target node corresponding to the sample to be predicted in the graph by using GNN. Finally, in step S440, the prediction apparatus 130 may perform prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

Since the prediction system, the table data, the example of performing prediction based on GNN, etc. have been described above with reference to fig. 1 to 3, specific operations and details related to steps S410 to S440 are not described herein again, and related contents can be referred to the related description above with reference to fig. 1 to 3. In fact, since the prediction method shown in fig. 4 is performed by the prediction system 100 shown in fig. 1, what is mentioned above with reference to fig. 1 in describing each device included in the prediction system is applicable here, so as to refer to the corresponding description of fig. 1 to 3 for the relevant details involved in the above steps, which are not described again here.

Hereinafter, training of the GNN-based machine learning model used in performing the prediction based on GNN in fig. 1 and 4 above will be described with reference to fig. 5 and 6.

Fig. 5 is a block diagram illustrating a system (hereinafter, simply referred to as "training system" for convenience of description) for training a machine learning model based on a graph neural network GNN according to an exemplary embodiment of the present application.

Referring to FIG. 5, training system 500 may include a data acquisition device 510, a graph creation device 520, and a training device 530.

Specifically, the data acquisition device 510 may acquire training table data. By way of example, the training table data may relate to one of, but is not limited to, financial loan data, merchandise sales data, website browsing data, and transaction flow data. The table data has been described above with reference to fig. 2, and will not be described again, except that the labels of all samples in the training table data are known, and do not include the sample to be predicted.

The graph creation means 520 may create a graph based on the training table data. Here, each node in the graph may represent one training sample in the training table data, and an edge in the graph may represent a correlation between two training samples. As an example, the created graph may be a multi-graph capable of reflecting correlations between training samples in the training table data in multiple aspects. Specifically, the graph creating means 520 may select features capable of constructing correlations between training samples from the features of the training samples in the training table data, and select associated training samples based on the selected features to create the graph. For example, the graph creating means 520 may directly select features capable of constructing correlations between training samples from among the features of the training samples; alternatively, the graph creation means 520 may combine at least two discrete features of the training samples to obtain a new discrete feature as a feature capable of constructing a correlation between the training samples. Under the condition that the selected features are discrete features, the training samples with the same value of the selected features can be determined as associated training samples, and edges are constructed among the associated training samples; however, when the selected feature is a continuous feature, the continuous feature is converted into a discrete feature, the training samples with the same value of the converted discrete feature are determined as associated training samples, and an edge is constructed between the associated training samples. The creation of the map based on the table data has been described above in the description of fig. 1 to 3, and thus will not be described here again. In contrast, in the above, because the prediction process is performed, the table data includes both the sample whose label is known and the sample to be predicted whose label is unknown, and the training samples whose labels are known are used in the process of creating the graph.

The training device 530 may learn feature representations of target nodes in the graph using GNNs based on the created graph, perform prediction based on the learned feature representations of the target nodes to obtain prediction results for the target nodes, and adjust parameters of the machine learning model by comparing the prediction results for the target nodes with the true labels for the target nodes. The feature representation of the target node in the GNN learning graph based on the created graph, and the prediction performed based on the feature representation of the learned target node to obtain the prediction result about the target node have been described above in the description of fig. 1 to 3, and are not described again here. In contrast, the target node in the above-described process of performing prediction is only a node corresponding to a sample to be predicted, and the target node mentioned in the above training process is each of all nodes in the created graph. That is, the training means 530 learns the feature representation of each node in the graph with GNN based on the created graph, performs prediction based on the learned feature representation of each node to obtain a prediction result for each node, and adjusts the parameters of the machine learning model by comparing the prediction result for each node with the true label for each node.

As shown in fig. 3, GNN-based machine learning models according to exemplary embodiments of the present application may include, for example, GNNs for performing eigen-representation learning of samples, attention networks for performing eigen-representation aggregation, and models (e.g., one of MLP, logistic regression model, gradient hoist GBM, and deep fm model or a variation thereof) for performing prediction based on the aggregated eigen-representations.

Specifically, in the case that the created graph is a multiple graph, the training device 530 may obtain multiple subgraphs related to the target node by respectively extracting neighboring nodes and correlations related to the target node in the multiple graph, where each subgraph reflects a correlation between samples in one aspect; subsequently, sub-feature representations of the target nodes are learned respectively by using GNN corresponding to each subgraph based on each subgraph; and finally, aggregating the sub-feature representations of the target node to obtain the feature representation of the target node.

For example, the training device 530 may encode features of training samples represented by nodes in each sub-graph and then learn sub-feature representations of the target nodes using the GNNs corresponding to each sub-graph based on the encoded features. After learning the sub-feature representations of the target node, for example, the training device 530 may aggregate the sub-feature representations of the target node using an attention network included in the machine learning model to obtain the feature representation of the target node. It should be noted that although a GNN and attention network based training scheme is presented for model training, other graph neural networks may be used for training the created graph, such as R-GCN, Geaph sage, etc.

After obtaining the feature representation of the target node, the training device 530 may perform prediction based only on the learned feature representation of the target node to obtain a prediction result for the target node. For example, based on the feature representation of the learned target node, prediction is performed using a multi-layered perceptron included in the machine learning model to obtain a prediction result about the target node. Alternatively, the training device 530 may splice the feature representation of the learned target node with the original feature representation of the target node or a feature representation of the target node learned by another manner different from the GNN, and perform prediction based on the spliced feature representation to obtain a prediction result about the target node. For example, based on the spliced feature representation, prediction is performed using one of a logistic regression model, a gradient hoist GBM, and a deep fm model included in the machine learning model, or a variant thereof to obtain a prediction result about the target node.

After obtaining the prediction result for the target node, the training device 530 adjusts the parameters of the machine learning model by comparing the prediction result for the target node with the true label for the target node. For example, the training device 530 may calculate the cross entropy loss by the following equation (5) and adjust the parameters of the GNN-based machine learning model according to the calculated loss, however, the manner of calculating the predicted loss is not limited thereto, and other loss calculation manners may be adopted.

∑_x∈vl(y_x，σ(W^Tz_x+b_O))， (5)

Where l (-) is the cross entropy loss, W and b₀Is a parameter of a model (e.g., MLP) included in the machine learning model that performs prediction based on a feature representation of a target node of learning to obtain a prediction result about the target node, Z_xIs a characteristic representation of a sample (node) x obtained by aggregation, y_xIs a true label for the sample (node) x, and σ is a non-linear activation function.

Above, the training system according to the exemplary embodiment of the present application has been described with reference to fig. 5, and since the training system makes full use of the correlation between samples in the process of performing the GNN-based machine learning model, the performance of the trained model can be effectively improved, thereby facilitating the provision of more accurate prediction results using the trained model.

Fig. 6 is a flowchart illustrating a method of training a machine learning model based on a graph neural network GNN (hereinafter, simply referred to as "training method" for convenience of description) according to an exemplary embodiment of the present application.

Here, as an example, the training method shown in fig. 6 may be performed by the training system 500 shown in fig. 5, may also be implemented entirely in software by a computer program or instructions, and may also be performed by a specifically configured computing system or computing device, for example, by a system including at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the training method described above. For convenience of description, it is assumed that the training method shown in fig. 6 is performed by the training system 500 shown in fig. 5, and that the training system 500 may have the configuration shown in fig. 5.

Referring to fig. 6, in step S610, the data acquisition device 510 may acquire training table data. In step S620, the graph creation device 520 may create a graph based on the training table data. Here, each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between two training samples. Next, in step S630, the training device 530 may learn feature representations of the target nodes in the graph using GNNs based on the created graph. Subsequently, in step S640, the training device 530 may perform prediction based on the learned feature representation of the target node to obtain a prediction result about the target node. Finally, in step S650, the training device 530 may adjust the parameters of the machine learning model by comparing the predicted result regarding the target node and the true label regarding the target node.

Since the operations performed by the training system 500 have been described above with reference to fig. 5, specific operations and details related to steps S610 to S650 are not described herein again, and for related contents, reference may be made to the related description above with reference to fig. 5. In fact, since the training method shown in fig. 6 is performed by the training system 100 shown in fig. 5, what is mentioned above with reference to fig. 5 in describing the devices included in the training system 500 is applicable here, so as to refer to the corresponding description of fig. 5 for the relevant details involved in the above steps, and the details are not repeated here.

The prediction system and the prediction method and the training system and the training method according to the exemplary embodiments of the present application have been described above with reference to fig. 1 to 6. However, it should be understood that: the systems and devices shown in fig. 1 and 5, respectively, may be configured as software, hardware, firmware, or any combination thereof to perform particular functions. For example, the systems or devices may correspond to application specific integrated circuits, to pure software code, or to modules combining software and hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).

Further, the above method may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: acquiring table data containing a sample to be predicted; creating a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples; on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

Further, according to another exemplary embodiment of the present application, a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: acquiring training table data; creating a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples; learning feature representations of the target nodes in the graph with the GNN based on the created graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node; adjusting parameters of the machine learning model by comparing the predicted outcome for the target node with the true signature for the target node.

The instructions stored in the computer-readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the instructions can also perform more specific processing when the above steps are performed, and the content of the further processing is mentioned in the description referring to fig. 1 to 6, so that the further processing will not be described again here to avoid repetition.

It should be noted that the prediction system and the training system according to the exemplary embodiments of the present application may fully rely on the execution of computer programs or instructions to implement the respective functions, i.e., the respective devices correspond to the respective steps in the functional architecture of the computer programs, so that the entire system is called by a special software package (e.g., lib library) to implement the respective functions.

On the other hand, when the systems and apparatuses shown in fig. 1 and 5 are implemented in software, firmware, middleware or microcode, program code or code segments for performing the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, according to an exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: acquiring table data containing a sample to be predicted; creating a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples; on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

For example, according to another exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: acquiring training table data; creating a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples; learning feature representations of the target nodes in the graph with the GNN based on the created graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node; adjusting parameters of the machine learning model by comparing the predicted outcome for the target node with the true signature for the target node.

In particular, the above-described system may be deployed in a server or client or on a node in a distributed network environment. Further, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). In addition, all components of the system may be connected to each other via a bus and/or a network.

The system here need not be a single system, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the system, the at least one computing device may comprise a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.

While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims

1. A method of performing a prediction based on a graph neural network, GNN, comprising:

acquiring table data containing a sample to be predicted;

creating a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples;

on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph;

performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

2. The method of claim 1, wherein the graph is a multi-graph capable of reflecting dependencies between samples in table data in multiple aspects,

wherein the creating-based graph utilizes characteristics of a target node corresponding to the sample to be predicted in a GNN learning graph to represent, and comprises the following steps:

respectively extracting neighbor nodes and correlation related to a target node in a multiple graph to obtain a plurality of subgraphs related to the target node, wherein each subgraph reflects the correlation between samples in one aspect;

respectively learning sub-feature representations of the target nodes by using GNN corresponding to each subgraph on the basis of each subgraph;

and aggregating the sub-feature representations of the target node to obtain the feature representation of the target node.

3. The method of claim 1 or 2, wherein the performing a prediction based on the learned feature representation of the target node to obtain a prediction result for the target node comprises:

obtaining a prediction result regarding the target node based only on performing prediction using the feature representation of the target node learned by the GNN; or

The feature representation of the target node learned by the GNN is stitched with the original feature representation of the target node or the feature representation of the target node learned by another means different from the GNN, and prediction is performed based on the stitched feature representation to obtain a prediction result on the target node.

4. The method of claim 1, wherein the creating a graph based on table data comprises:

selecting features capable of constructing correlation among samples according to the features of the samples in the table data;

associated samples are selected based on the selected features to create a graph.

5. A method of training a machine learning model based on a graph neural network, GNN, comprising:

acquiring training table data;

creating a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples;

learning feature representations of the target nodes in the graph with the GNN based on the created graph;

performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node;

adjusting parameters of the machine learning model by comparing the predicted outcome for the target node with the true signature for the target node.

6. The method of claim 5, wherein the graph is a multi-graph capable of reflecting correlations between training samples in the training table data in a plurality of aspects,

wherein the creating-based graph learns feature representations of target nodes in the graph using GNN, including:

7. A system for performing predictions based on a graph neural network, GNN, comprising:

a data acquisition device configured to acquire table data containing a sample to be predicted;

a graph creating device configured to create a graph based on the table data, wherein each node in the graph represents one sample in the table data, and an edge in the graph represents a correlation between the two samples;

a prediction device configured to: on the basis of the created graph, utilizing the feature representation of the target node corresponding to the sample to be predicted in the GNN learning graph; and performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node.

8. A system for training a machine learning model based on a graph neural network, GNN, comprising:

a data acquisition device configured to acquire training table data;

a graph creation means configured to create a graph based on the training table data, wherein each node in the graph represents one training sample in the training table data, and an edge in the graph represents a correlation between the two training samples;

a training apparatus configured to: learning feature representations of the target nodes in the graph with the GNN based on the created graph; performing prediction based on the learned feature representation of the target node to obtain a prediction result about the target node; and adjusting parameters of the machine learning model by comparing the predicted result for the target node with the true label for the target node.

9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 6.

10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 6.