WO2021179838A1

WO2021179838A1 - Prediction method and system based on heterogeneous graph neural network model

Info

Publication number: WO2021179838A1
Application number: PCT/CN2021/074479
Authority: WO
Inventors: 胡斌斌; 张志强; 周俊; 杨双红
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2020-03-10
Filing date: 2021-01-29
Publication date: 2021-09-16
Also published as: CN111400560A

Abstract

A prediction method and a system based on a heterogeneous graph neural network model. The method comprises: acquiring heterogeneous graph data related to prediction content, wherein the heterogeneous graph data comprises a node to be subjected to prediction, neighbor nodes of the node to be subjected to prediction, and paths connected between the node to be subjected to prediction and the neighbor nodes, wherein the paths comprise at least one type (302); grouping the neighbor nodes on the basis of the type of the paths, such that the types of paths of the neighbor nodes of the same group are the same (304); and inputting the node to be subjected to prediction, the grouped neighbor nodes, and the paths between the nodes into a trained heterogeneous graph neural network model, so as to obtain a representation vector of the node to be subjected to prediction, and then inputting the representation vector into a trained prediction model for prediction (306).

Description

Method and system for prediction based on heterogeneous graph neural network model

Technical field

One or more embodiments of this specification relate to the field of data processing, and in particular to a method and system for prediction based on a heterogeneous graph neural network model.

Background technique

Heterogeneous graph, also known as heterogeneous information network, is a special graph structure that usually contains different types of nodes and different types of paths, and can be used to represent some complex social network relationship data. Nodes in a heterogeneous graph can be used to represent entity objects such as individual users. Usually, a heterogeneous graph containing entity object data can be input into a graph neural network to predict nodes in a heterogeneous graph, for example, users can be predicted to make judgments. Its category, risk level or preference habits. Heterogeneous graphs exhibit high complexity due to their different node types and path types. Therefore, how to predict the entity object data in the heterogeneous graphs is very important.

Based on this, this specification proposes a method and system for prediction based on a heterogeneous graph neural network model.

Summary of the invention

One of the embodiments of this specification provides a method for judging the type of an entity object through entity object data. The method for judging the category of an entity object through entity object data includes: obtaining heterogeneous graph data related to predicted content, the heterogeneous graph data including a node to be predicted, neighbor nodes of the node to be predicted, and connecting to the node to be predicted. Predict a path between a node and the neighbor node, the path includes at least one type; group the neighbor nodes based on the type of the path, so that the types of paths of the neighbor nodes in the same group are the same Input the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model, to obtain the representation vector of the node to be predicted, and then input the trained prediction model for prediction.

One of the embodiments of this specification provides a prediction system based on a heterogeneous graph neural network model. The system for judging the category of an entity object based on entity object data includes: a heterogeneous graph acquisition module for acquiring heterogeneous graph data related to predicted content, and the heterogeneous graph data includes the node to be predicted and the information of the node to be predicted. A neighbor node and a path connecting the node to be predicted and the neighbor node, the path includes at least one type; a grouping module is configured to group the neighbor nodes based on the type of the path to Make the path types of the neighbor nodes in the same group the same; a node prediction module for inputting the nodes to be predicted, the neighbor nodes after grouping, and the paths between the nodes into the trained heterogeneous graph neural network model , Get the node to be predicted.

One of the embodiments of this specification provides an apparatus for predicting based on a heterogeneous graph neural network model, which includes a processor configured to execute a method for judging the category of an entity object through entity object data.

One of the embodiments of this specification provides a computer-readable storage medium that stores computer instructions. After the computer reads the computer instructions in the storage medium, the computer executes a prediction method based on a heterogeneous graph neural network model.

Description of the drawings

This specification will be further described in the form of exemplary embodiments, and these exemplary embodiments will be described in detail with the accompanying drawings. These embodiments are not restrictive. In these embodiments, the same number represents the same structure, in which:

Fig. 1 is a schematic diagram of a scenario application of prediction based on a heterogeneous graph neural network model according to some embodiments of this specification;

Fig. 2 is a block diagram of a prediction system based on a heterogeneous graph neural network model according to some embodiments of this specification;

Fig. 3 is an exemplary flowchart of a method for prediction based on a heterogeneous graph neural network model according to some embodiments of the present specification;

Fig. 4 is an exemplary sub-flow chart of a prediction method based on a heterogeneous graph neural network model according to some embodiments of the present specification;

Fig. 5 is an exemplary sub-flow chart of a prediction method based on a heterogeneous graph neural network model according to some embodiments of this specification; and

Fig. 6 is an exemplary sub-flow chart of a prediction method based on a heterogeneous graph neural network model according to other embodiments of this specification.

Detailed ways

In order to more clearly describe the technical solutions of the embodiments of the present specification, the following will briefly introduce the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some examples or embodiments of this specification. For those of ordinary skill in the art, without creative work, this specification can also be applied to these drawings. Other similar scenarios. Unless it is obvious from the language environment or otherwise stated, the same reference numerals in the figures represent the same structure or operation.

It should be understood that the “system”, “device”, “unit” and/or “module” used herein is a method for distinguishing different components, elements, parts, parts, or assemblies of different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As shown in this specification and claims, unless the context clearly indicates exceptions, the words "a", "an", "an" and/or "the" do not specifically refer to the singular, but may also include the plural. Generally speaking, the terms "include" and "include" only suggest that the clearly identified steps and elements are included, and these steps and elements do not constitute an exclusive list, and the method or device may also include other steps or elements.

In this specification, a flowchart is used to illustrate the operations performed by the system according to the embodiment of this specification. It should be understood that the preceding or following operations are not necessarily performed exactly in order. Instead, the steps can be processed in reverse order or at the same time. At the same time, other operations can be added to these processes, or a certain step or several operations can be removed from these processes.

Fig. 1 is a schematic diagram of a scenario application of prediction based on a heterogeneous graph neural network model according to some embodiments of this specification.

Heterogeneous graphs include relational networks describing different types of paths for N nodes. Among them, nodes refer to physical objects, such as users. The path represents the connection relationship between nodes. For more detailed descriptions of entity objects and paths, please refer to Figure 3, which will not be repeated here.

For ease of description, only a part of the heterogeneous graph is shown in FIG. 1. In some embodiments, different types of paths may be processed separately to obtain the group aggregation vector of the entity objects in each group of paths; then, according to the importance (weight) corresponding to the node to be predicted and each group of paths, These group aggregation vectors are merged to obtain a representation vector, and finally the representation result is output based on the representation vector. On the one hand, due to the use of multiple different types of paths, the characteristics of the entity object can be more comprehensively described; on the other hand, the network of each group of different paths is processed separately to obtain each group's aggregation vector, which can avoid tedious manual feature extraction Further, it is possible to automatically determine the importance (weight) of different types of path relationship networks under the current prediction content, to achieve information fusion in each group of paths, so as to make the evaluation results of the predicted nodes more accurate.

Use the user financial relationship data graph to predict the user's financial risk level as an application scenario for description. In this application scenario, the entity object is the user AH, and the path type can include the interaction between users (indicated by a two-way arrow), loan method (indicated by a solid line), and income level (indicated by a dotted line). The predicted content is that the user is in The level of risk in financial lending (such as the probability of default). Exemplarily, to predict the financial risk level of user A, the characteristic information of user C and user F in the network can be processed according to the network of the interactive relationship path of user A to obtain the group aggregation vector for user A. In the same way, the characteristic information of user G and users B and H can be processed separately according to the network of the loan mode path and the income level path to obtain two other group aggregation vectors for user A. Then, the importance of each group of users in different paths relative to user A is determined according to the aggregation vector representation of each group. Then, the group aggregation vectors corresponding to different paths are merged based on different degrees of importance to obtain a representation vector for user A. According to the representation vector, the user's financial loan risk level can be output. According to the risk level, follow-up services can be carried out, such as limiting the user's loan amount, prohibiting the user from lending business, etc.

Taking the user viewing data relationship diagram as another embodiment, an application scenario of heterogeneous diagrams containing different types of entity objects is described. Entity objects can include users A, C, D, F, movies B, H, and director G; path types can include interaction paths between users (indicated by two-way arrows), user satisfaction paths with movies (indicated by dashed lines) ), the path of the user's satisfaction with the director (indicated by a solid line). The prediction content can be whether the user prefers a certain movie, or it can be the rating of the movie or the director. Exemplarily, to predict whether user A prefers a certain movie, the characteristic information of movie B and movie H in the network can be processed according to the network of user A's movie satisfaction path to obtain a group aggregation vector for user A. In the same way, the characteristic information of user G and directors B and H can be processed according to the network of interaction path and director satisfaction path, and two other group aggregation vectors for user A can be obtained. Then, the importance of each group of users in different paths relative to user A is determined according to the aggregation vector representation of each group. Then, the group aggregation vectors corresponding to different paths are merged based on different degrees of importance to obtain a representation vector for user A. According to the representation vector and the feature information of a certain movie, the user A's preference for a certain movie is predicted. As another example, predicting the score of director G may be based on the network of users' satisfaction with the movie, and based on users A, D, and F in the network.

For the technical solution of prediction based on the heterogeneous graph neural network model, see below.

Fig. 2 is a block diagram of a system for predicting based on a heterogeneous graph neural network model according to some embodiments of this specification.

As shown in FIG. 2, the system 200 for judging the entity object category based on entity object data may include a heterogeneous graph acquisition module 210, a grouping module 220, and a node prediction module 230.

The heterogeneous graph acquisition module 210 can acquire heterogeneous graph data related to the predicted content, the heterogeneous graph data including the node to be predicted, the neighbor nodes of the node to be predicted, and the connection between the node to be predicted and the neighbor node The path between, the path includes at least one type. For a detailed description of the heterogeneous graph acquisition module 210, refer to step 302 in FIG. 3, which will not be repeated here.

The grouping module 220 may be configured to group the neighbor nodes based on the types of the paths, so that the types of paths of the neighbor nodes in the same group are the same. For a detailed description of the grouping module 220, refer to step 304 in FIG. 3, which will not be repeated here.

The node prediction module 230 may be used to input the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model, obtain the representation vector of the node to be predicted, and then input the trained Predictive models make predictions. In some embodiments, the node prediction module is further used to obtain the representation vector of the node to be predicted after fusing the node attention weight and/or the path attention weight. In some embodiments, the node prediction module 230 may obtain the group aggregation vector corresponding to the group one by one according to the characteristic information of the neighbor nodes of the same group; for each group aggregation vector, the characteristic information of the node to be predicted Fusion, get the node information to be predicted that is grouped and fused. In some embodiments, the node prediction module is further configured to determine the attention weight of the path based on the importance of the path; determine the node to be predicted based on the information of the node to be predicted fused by the group and the attention weight of the path The representation vector. In some embodiments, the node prediction module 230 may determine the attention weight vector of the node based on the importance of neighbor nodes, and determine the attention weight vector of the path based on the importance of the path. In some embodiments, the node prediction module 230 may determine the representation vector of the node to be predicted based on the grouped fusion of node information to be predicted, the attention weight vector of the node, and the attention weight vector of the path. For a detailed description of the node prediction module 230, refer to step 306 in FIG. 3, which is not repeated here.

In some embodiments, the system 100 further includes: a training module for end-to-end training of the heterogeneous graph network neural model and prediction model of the system, specifically the iterative update of the prediction model based on the loss function of the prediction model and the different Compose the model parameters of the neural network model until the iteration cut-off condition is met. In some embodiments, several heterogeneous graph data can be used as training data, and the correct result of the node corresponding to the heterogeneous graph data can be used as the label data of the training data. The parameters of the prediction model and the heterogeneous graph neural network The parameters of the model are updated through training iterations using the training data and the label data. Specifically, you can participate in the training process of the heterogeneous graph network neural model and prediction model described elsewhere in this article, and I will not repeat it here.

It should be understood that the system and its modules shown in FIG. 2 can be implemented in various ways. For example, in some embodiments, the system and its modules may be implemented by hardware, software, or a combination of software and hardware. Among them, the hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. Those skilled in the art can understand that the above-mentioned methods and systems can be implemented using computer-executable instructions and/or included in processor control codes, for example on a carrier medium such as a disk, CD or DVD-ROM, such as a read-only memory (firmware Such codes are provided on a programmable memory or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification can not only be implemented by hardware circuits such as very large-scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. It may also be implemented by software executed by various types of processors, or may be implemented by a combination of the foregoing hardware circuit and software (for example, firmware).

It should be noted that the above description of the candidate item display, determination system and its modules is only for convenience of description, and does not limit this specification to the scope of the examples mentioned. It can be understood that for those skilled in the art, after understanding the principle of the system, it is possible to arbitrarily combine various modules, or form a subsystem to connect with other modules without departing from this principle. For example, in some embodiments, for example, the heterogeneous graph acquisition module 210, the grouping module 220, the node prediction module 230, and the training module disclosed in FIG. The function of two or more modules. For example, the heterogeneous graph acquisition module 210 and the grouping module 220 may be two modules, or one module may have both heterogeneous graph acquisition and grouping functions. For example, each module may share a storage module, and each module may also have its own storage module. Such deformations are all within the protection scope of this specification.

Fig. 3 is an exemplary flowchart of a method for prediction based on a heterogeneous graph neural network model according to some embodiments of the present specification. As shown in FIG. 3, the method 300 for judging the category of an entity object based on entity object data includes:

Step 302: Obtain heterogeneous graph data related to the predicted content. The heterogeneous graph data includes a node to be predicted, neighbor nodes of the node to be predicted, and a path connecting the node to be predicted and the neighbor node, and the path includes at least one type.

Specifically, step 302 may be performed by the heterogeneous graph acquisition module 210.

Heterogeneous graph data is a data relationship graph including different types of nodes and paths between different types of nodes. For example, a graph of user viewing data. Another example is a graph of user financial relationship data.

Nodes refer to physical objects. In some embodiments, the types of nodes may be the same. For example, nodes can all be users. In some embodiments, the types of nodes can be different. For example, nodes can include users, directors, actors, and so on.

The path represents the connection relationship between nodes. In some embodiments, different types of paths may represent different connection relationships between nodes.

In some embodiments, the connection relationship may be a connection relationship between entity objects of the same type. In some embodiments, the connection relationship may be an interaction relationship between entity objects of the same type (such as attention, likes, and comments between users). Exemplarily, the path can be expressed as: User A—Like—User B. In some embodiments, the connection relationship may also be a common preference between entity objects of the same type. For example, the viewing methods that users commonly prefer (such as online viewing on Youku, viewing in movie theaters). Exemplarily, the path may be expressed as: user A—youku online viewing—user B.

In some embodiments, the connection relationship may also be a relationship between different types of entity objects. For example, the satisfaction of the entity object user A with the entity object director K (such as like, insensitivity, dislike). Exemplarily, the path can be expressed as: user A-likes-director K.

In some embodiments, the heterogeneous graph data may be obtained through data stored in a social network, a review website, a credit information website, etc. authorized by the user, and may also be obtained by calling related interfaces or other methods. This embodiment does not limit the acquisition method. For example, the entity object-user can be obtained through the registration information authorized by the social network, and the path can be obtained through the interactive behavior between the users of the social network.

The content to be predicted may be unknown data of the physical object. In some embodiments, in some embodiments, the prediction content of the prediction model may include the category, risk level, or preference habits of the entity object corresponding to the node to be predicted.

Exemplarily, the category of the entity object may include the user's income level category (poor, well-off, and rich), director evaluation category (poor, fair, good, very good), and so on.

Exemplarily, the risk level of the entity object may include the user's financial lending risk level (for example, the probability of default is low, the probability of default is medium, the probability of default is high), the health risk level of the applicant (for example, the probability of compensation is low, The possibility of compensation is medium, the probability of compensation is high), etc.

Exemplarily, the preference habits of the entity object may include the user's preferred movie type (eg, comedy, action, science fiction film), and the user's preferred loan repayment method (eg, equal principal and interest, equal principal).

The node to be predicted is a node that outputs the content to be predicted. It is understandable that the node to be predicted can be selected according to the content to be predicted. Take the user’s preferred movie type as an example. The entity object includes a heterogeneous graph of users, directors, and actors. If the content to be predicted is the type of movie that each user likes, the node to be predicted may be the user; the content to be predicted is the rating of each director , The node to be predicted can be the director. For further explanation, the following is an example of the prediction content of the user's loan risk.

The neighbor nodes are directly connected to the node to be predicted through the path. In some embodiments, a prediction node may be connected to multiple neighbor nodes through multiple paths. In some embodiments, the types of multiple paths may be the same or different. For example, if user A of the node to be predicted "likes" 1 user and "comments" 2 users, the number of neighbor nodes of the interactive path is 3; user A of the node to be predicted and the other 2 users both use "Youku" To watch movies online, and to watch movies in cinemas with 3 other users at the same time, the number of neighbor nodes of the path of the movie viewing mode is 5.

In order to facilitate further understanding, an application scenario of user loan risk is taken as another embodiment for description. For example, in a heterogeneous graph where the entity object is a user, the corresponding node to be predicted is the user, and the content to be predicted is the loan risk level of each user. Path types can include interactions between users, user’s lending methods, income levels, etc. The corresponding paths can be exemplarily corresponding to: user A-comment-user D, user A-microfinance company-user B, user D -Middle class-User C.

Step 304: Group the neighboring nodes based on the types of the paths, so that the types of paths of the neighboring nodes in the same group are the same.

Specifically, step 304 may be performed by the grouping module 220.

In some embodiments, neighbor nodes are classified according to the type of path connecting the node to be predicted and its neighbor nodes. Following the above example, the neighbor nodes connected by the interactive path are grouped into a group, and the neighbor nodes connected by the path of the viewing mode are grouped into a group.

In some embodiments, each group of neighbor nodes corresponds to a graph neural network based on the path connection structure of the node to be predicted and the neighbor nodes therein.

Step 306: Input the node to be predicted, the grouped neighbor nodes, and the path between the nodes into the trained heterogeneous graph neural network model to obtain the representation vector of the node to be predicted and then input the trained prediction model to make predictions .

Specifically, step 306 may be performed by the node prediction module 230.

The characteristic information of the node to be predicted, the grouped neighbor nodes, and the path between the nodes are input into the trained heterogeneous graph neural network model, and the output is the representation vector of the node to be predicted. The characteristic information of a node is data that can characterize the characteristics of an entity object. Correspondingly, the characteristic information of the path is data that can characterize the characteristic of the path. The detailed description of the heterogeneous graph neural network model can be seen in Fig. 4, which will not be repeated here.

Input the representation vector of the node to be predicted into the trained prediction model, and output the prediction result.

In some embodiments, the measurement result is the value of the content to be predicted. As mentioned above, the prediction content of the prediction model may include the category, risk level, or preference habits of the entity object corresponding to the node to be predicted.

In some embodiments, the prediction model may be a machine learning model with classification capabilities, such as a binary classification model, a logistic regression model, or a neural network, which is not specifically limited in this specification.

In a possible implementation, after the information representation learning vector is input to the prediction model, multiple output values may be obtained, and each output value is used to characterize the confidence of the category corresponding to the output value, that is, the model's Forecast probability. For example, there are three output values, 0.6, 0.2, 0.2, respectively, corresponding to the probability that the user is a comedy fan, an action fan, and a science fiction fan. Since the probability of a comedy fan is 0.6, it can be considered that Users belong to comedy fans.

Fig. 4 is an exemplary sub-flow chart of a method for prediction based on a heterogeneous graph neural network model according to some embodiments of the present specification.

In some embodiments, the heterogeneous graph neural network model may include a group aggregation vector layer, a node information layer to be predicted, and a representation vector layer.

As shown in FIG. 4, the method 400 for inputting the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into a trained heterogeneous graph neural network model to obtain a representation vector of the node to be predicted includes step 402 ~Step 404.

Step 402: According to the feature information of the neighbor nodes in the same group, obtain the group aggregation vector corresponding to the group one by one.

In some embodiments, the group aggregation vector layer may be used to aggregate the input feature information of neighbor nodes of the same group to generate a group aggregation vector.

In some embodiments, the group aggregation vector layer can first vectorize the feature information of the neighbor nodes to obtain the feature vector X; the feature vectors of the same group of neighbor nodes are then aggregated through the graph neural network to obtain the feature aggregation vector h; The group aggregation vector H is obtained based on the feature aggregation vector.

As mentioned earlier, the characteristic information of a node is data that can characterize the characteristics of an entity object. For example, the entity object is a user, and the characteristic information of the entity object may include the user's identity information data (such as age, occupation), and may also include the user's historical preference data (such as favorite movies, TV series, etc.). In some embodiments, the feature information of the entity object can be obtained through the authorization data entered by the user during registration, or through the actual operation of the user (such as likes, comments, etc.), and it can also be obtained by reading stored data and calling related Obtained via interface or other methods. This embodiment does not limit the acquisition method.

In some embodiments, the feature information may be vectorized through the vectorized representation model in advance to obtain the corresponding feature vector. In some embodiments, the vectorized representation model may be a word embedding model, and the word embedding model may include but is not limited to: Word2vec model, term frequency-inverse document frequency model (Term Frequency-Inverse Document Frequency, TF-IDF) or SSWE-C (skip-gram based combined-sentiment word embedding) model, etc.

In some embodiments, the feature aggregation vector h can be obtained by performing feature aggregation on the feature vector X of each neighbor node. It can be understood that a group of neighbor nodes corresponds to a feature aggregation vector h.

In some embodiments, neighbor nodes within a predetermined order (such as order 2) of the node to be predicted can be used as feature aggregation nodes, or neighbors within a predetermined order can be sampled, and the neighbor nodes obtained by sampling can be used for feature aggregation. .

In some embodiments, the feature aggregation may aggregate the feature vector X through a preset aggregation method; it may also aggregate the feature vector X using the attention mechanism based on the graph neural network corresponding to each group of neighbor nodes; The feature vector X of each neighbor node can be convolved through the graph neural network. These three methods will be introduced separately below.

In some embodiments, the pre-set aggregation method may be to use a pre-trained parameter matrix to add, average, take the maximum value, calculate the weighted sum, etc., on the feature vector X of the neighbor nodes, etc., which will not be used here. limited.

_{For example, after obtaining the feature vector X u of the} node to be predicted and the feature vector of the neighbor node

_{Then, if the dimension of the feature vector X u} of the node to be predicted is 1*8 and the dimension of the first parameter matrix is 8*5, the feature vector X _{u of the} node to be predicted is multiplied by the first parameter matrix to obtain a dimension of 1*5 To be predicted feature aggregation vector h _u , the first parameter matrix can be regarded as _{a weighted sum operation for each element of the feature vector X u} of the node to be predicted, and then the feature information of the node to be predicted is aggregated, so that the node to be predicted The elements of the feature aggregation vector h _u can represent richer information. For the neighbor node feature vector, the neighbor node feature vector shown in Figure 5

For example, if its dimension is 1*10, add the feature vectors of multiple neighbor nodes and divide by 3 to obtain an average vector with a dimension of 1*10, and then multiply the average vector by a dimension of 10*5 The second parameter matrix of, obtains the feature aggregation vector of neighbor nodes with dimension 1*5

In the same way, the feature aggregation vector of neighbor nodes can be obtained

Of course, for neighboring nodes corresponding to different types of paths, the second parameter matrix may be different. In order to obtain the first parameter matrix and the second parameter matrix, the heterogeneous graph neural network model can be trained in advance. The training process can be referred to the description elsewhere in this article.

As mentioned above, in some embodiments, it is also possible to aggregate the feature vector X using the attention mechanism based on the graph neural network corresponding to each group of neighbor nodes.

The graph neural network can have one or more layers. In some embodiments, there may also be multiple sets of model parameters corresponding to each layer of graph neural network.

Attention Mechanism (Attention Mechanism) is a deep learning technology that enables graph neural networks to have the ability to focus on a subset of their inputs (or features). In some embodiments, the neighbor attention weight (ie, neighbor weight vector) of the neural network can be obtained through the attention mechanism; for multi-layer graph neural networks, the feature attention weight of the hidden layer (ie feature Weight vector), which in turn makes the more important feature vectors in neighbor nodes to be used to a higher degree, and reduces the interference of noise information.

In some embodiments, each group of model parameters is a weight matrix, and a layer of graph neural network may correspond to multiple weight matrices. For the trained graph neural network, the model parameters can be determined through parameter adjustment during the training process.

In some embodiments, the weight matrix may be obtained based on the importance of each neighbor node. The importance of each neighbor node can be determined by the feature vector of the node to be predicted and the neighbor node.

In some embodiments, the neighbor weight vector can be calculated by formula (1). For example, there are N nodes in the j-th group of neighbor nodes of the node u to be predicted. For the node u to be predicted, the neighbor weight of the neighbor node k can be:

α(u,k)=softmax(V·tanh(W ₁ [X _u ||X _k ])+b ₁ )

Among them, the matrices V (for example, called the first weight matrix) and W ₁ (for example, called the second weight matrix) are the model parameters determined during the training process of the graph neural network, and b ₁ is the constant parameter determined during the training process of the graph neural network , X _u and X _k are the current feature expression vector corresponding to node u and node k respectively. When the first layer of graph neural network is aggregated, the current feature expression vector of each node is determined by the feature vector of the corresponding node, [X _u | |X _k ] represents the stitching vector of two vectors. It is understandable that the activation functions softmax and tanh can also be replaced by other activation functions (such as Relu, etc.), which are not limited here.

In this way, the corresponding neighbor weight can be determined for each neighbor node. When the current feature vector of each neighbor node is different, the neighbor weight for the corresponding neighbor node is also different. It is worth noting that the node u to be predicted can also be regarded as its own neighbor node, for example, it is called a zero-order neighbor node.

According to the neighbor weight, the feature aggregation of each neighbor node can be performed by methods such as weighted sum. For example, by N _u represents the j th group of neighbor nodes of node u set to be predicted, to be predicted neighbor nodes after node u neighbor neural network of FIG polymerization results are:

It can be understood that for each node, after a layer of graph neural network, an aggregation result of the current layer can be obtained, for example, the aggregation result of node k (also called a feature expression vector) is

In some embodiments, the above neighbor aggregation result may be further aggregated with the feature expression vector of the node to be predicted to obtain the aggregation result of the node to be predicted in the current layer of the graph neural network. For example, suppose that the graph neural network is a multi-layer network, node 1, node 2, node 3...Node k... are neighbor nodes of the node u to be predicted, and aggregate their corresponding features in the i-1th layer (i≥2). Respectively denoted as

The feature aggregation result corresponding to the node u to be predicted in the i-1th layer is recorded as

Then in the i-th layer, the current feature expression vector of the corresponding node is the feature aggregation result of the i-1th layer (that is, the feature expression vector output by the i-1th layer).

In some embodiments, the neighbor nodes of node u are aggregated to obtain the neighbor aggregation result

Then, the feature aggregation results of the upper network node u

As feature vector

with

Perform aggregation to get the feature expression vector of the node u to be predicted in the i-th layer

Therefore, in the graph neural network of the j-th group of neighbor nodes, through the iterative processing of the pre-trained graph neural network, a corresponding feature aggregation vector can finally be obtained.

Here, will

with

The process of aggregation can be, for example, summation, averaging, or weighted summation. However, in the feature expression vector, the contribution of each feature to the expression vector of the node may also be different. Therefore, in a further optional implementation manner, each feature expression vector may also have a feature importance (feature weight).

In some embodiments, the hidden layer of the graph neural network can determine the feature weight vector formed by the feature weights corresponding to each feature expression vector in the following manner:

Among them, W ₂ (for example, called the third weight matrix) and W ₃ (for example, called the fourth weight matrix) are the weight matrices of the i-th layer in the graph neural network, and b ₂ and b ₃ are constant parameters. These models The parameters can be adjusted and determined according to the loss function during the training process of the graph neural network. In a hidden layer of the neural network, W ₂ , W ₃ , b ₂ , and b ₃ can be used as general parameters.

Represents the splicing of two vectors, where,

Represents the feature vector of the node u to be predicted in the i-th layer,

Represents the neighbor aggregation result of the neighbor node of the node u to be predicted in the i-th layer,

The final feature aggregation result in the upper layer of the neural network of the neighbor node u to be predicted can be used

Sure. The excitation function Relu can also be replaced by other suitable excitation functions, which will not be repeated here.

Each element in the feature weight vector β _i corresponds to the feature weight of each feature. By multiplying the corresponding feature weights with the corresponding elements in the neighbor aggregation result in a one-to-one correspondence, the feature aggregation result of the current node u in the current layer can be obtained. The way to determine the final aggregation result according to the feature weight can be expressed as:

Among them, ⊙ means multiplying the corresponding elements of two matrices (such as Hadamard product). For a vector, the k-th element in _{β i is the same as}

As a result of aggregation

The kth element in. For example, the result of the vector (A, B, C) ⊙ (a, b, c) is (Aa, Bb, Cc).

In this way, the node contribution degree and feature contribution degree can be considered at the same time, and a more accurate feature aggregation result of neighbor nodes can be obtained. When the feature aggregation model is a graph neural network, the aggregation result obtained in the last layer is the feature aggregation vector used by the j-th group of neighbor node pairs.

In some embodiments, the feature aggregation vector may also be obtained by convolution based on the graph neural network. Specifically, one or more layers of convolution can be performed on the feature vector of each neighbor node, and then the output convolution results of each neighbor node can be summed, averaged, maximum value, weighted sum, etc. , It is not limited here.

For example, in the graph neural network of the j-th group of neighbor nodes (in the graph convolutional neural network), the intermediate vector of the neighbor node v in the l-th layer of the graph neural network can be:

in:

Is the intermediate vector of the l-th layer in the graph neural network where node v is in the j-th group of neighbor nodes; N(v) is the neighbor node of node v; d _k and d _v are normalization factors, such as the degree of the corresponding node , That is, the number of connected edges connected to the corresponding node, or the number of first-order neighbor nodes;

Is the intermediate vector of node v in the lth layer of the graph convolutional neural network;

Is the intermediate vector of node k in the lth layer of the graph convolutional neural network; W ^l is the model parameter of the lth layer of the corresponding node graph convolutional neural network. When there are multiple neighbor nodes, W ^l can be a model parameter in the form of a matrix, which can be called a weight matrix. The formula can also consider the feature aggregation of higher-order neighbor nodes of the current node, which is represented by an ellipsis here. The principle is similar to the feature aggregation of first-order neighbor nodes, and will not be repeated here. Among them, different neighbor nodes have different normalization factors and different feature expression vectors, so the product multiplied by the weight matrix is also different, so they have different neighbor weights.

It is understandable that the intermediate vector output by the last layer of the graph convolutional neural network is the convolution result of node v. E.g,

Since the feature information of neighbor nodes corresponding to different types of paths are quite different, the feature information of neighbor nodes corresponding to the same type of path is relatively close. The neighbor nodes are classified based on the type of path and the information is aggregated to make the final representation learning vector The information contained is richer.

It is worth noting that feature aggregation is not limited to the above three methods, and other methods can also be used.

In some embodiments, the feature aggregation vector can be directly used as the group aggregation vector for further processing. In some embodiments, the feature aggregation vector can also be reduced in dimension. Dimensionality reduction can further perform feature extraction on the feature aggregation vector, and at the same time make the subsequent steps more efficient.

For example, a feature aggregation vector with a dimension of 1×3N

Reduce dimensionality to 1×5 dimensional group aggregation vector

Correspondingly, m groups of neighbor nodes can get m group aggregation vectors

In some embodiments, the above-mentioned dimensionality reduction methods may include, but are not limited to: principal component analysis (PCA), linear discriminant analysis (LDA), multidimensional scaling (MDS), partial Locally Linear Embedding (LLE), Adjacency Map (ISOMAP, Isometric Feature Mapping), and Kernel Principle Component Analysis (KPCA), etc.

Step 404: For each group of aggregation vectors, the feature information of the node to be predicted is fused to obtain grouped and fused information of the node to be predicted.

In some embodiments, the to-be-predicted node information layer may be used to fuse the input feature information of the to-be-predicted node and the group aggregation vector, and output the grouped and fused information of the to-be-predicted node.

The grouped and fused information of the node to be predicted is a vector that combines the feature information of the node to be predicted and the feature information of the neighbor node (ie, the adjacent node fusion vector).

In some embodiments, the information layer of the node to be predicted may vectorize the feature information of the node to be predicted. In some embodiments, the dimensionality of the vectorized feature information of the node to be predicted may be reduced to obtain the feature vector of the node to be predicted. For example, H _u .

In some embodiments, the vectorization and dimensionality reduction methods of the feature information of the node to be predicted may refer to step 402, which will not be repeated here.

In some embodiments, the feature vector of the node to be predicted may be spliced with each group aggregation vector obtained in step 402 to obtain the corresponding group aggregation splicing vector. For example, H _u and

Perform splicing to obtain m group aggregation splicing vectors. In some embodiments, the splicing method may be direct splicing or polymerization in step 402, which is not limited in this embodiment.

In some embodiments, the splicing vector may be aggregated based on each group to obtain the corresponding node information to be predicted (ie, the adjacent node fusion vector). E.g,

For the specific method, refer to step 402, which will not be repeated here.

In some embodiments, the grouped and fused node information to be predicted is input to the next layer of processing of the heterogeneous graph neural network model. In some embodiments, the next layer of the heterogeneous graph neural network model represents the vector layer. In the representation vector layer, the information of the node to be predicted can be fused with at least one of the node attention weight or the path attention weight to obtain the final representation learning vector. For example, in the embodiment shown in FIG. 5, the node attention weight and The path attention weights are all fused, and for example, in Figure 6, only the path attention weights are fused. In other embodiments, in the representation vector layer, the node attention weight and the path attention weight may not be merged, and the final representation vector is obtained directly based on the information of the node to be predicted.

Fig. 5 is an exemplary sub-flow chart of a prediction method based on a heterogeneous graph neural network model according to some embodiments of the application.

In some embodiments, the representation vector layer may output the representation vector of the node to be predicted based on the input node information to be predicted, the node attention weight, and the path attention weight.

In some embodiments, the presentation vector layer can first calculate the attention weight vector of the node through the group aggregation vector of the node to be predicted and the adjacent node fusion vector; then obtain the corresponding attention weight vector based on the node's attention weight vector and the adjacent node fusion vector Information weighted fusion vector; then the attention weight vector of the path is calculated by the information weighted fusion vector and the feature vector of the path; finally, the representation vector of the node to be predicted is obtained from the information weighted fusion vector and the attention weight vector of the path.

Step 502: Determine the attention weight vector of the node based on the importance of neighbor nodes.

The attention weight vector of neighbor nodes is a vector whose elements are the importance of each group of neighbor nodes. The importance of neighbor nodes represents the degree to which the feature information of each group of neighbor nodes is used in the process of calculating the representation vector of the node to be predicted for different nodes to be predicted.

For example, in the prediction of user loan risk, the path importance of the user interaction relationship type is low, and the path importance of the user income level type is high.

In some embodiments, the attention weight vector of each group of neighbor nodes can be determined by the feature vector of the node to be predicted and the neighbor node fusion vector of each group of neighbor nodes, including: The adjacent node fusion vector is spliced with the feature vector of the node to be predicted, and the vector obtained based on the splicing is multiplied by the preset third weight matrix to obtain the attention weight vector of the node corresponding to the adjacent node fusion vector .

In some embodiments, the attention weight vector of the node can refer to the way of obtaining the neighbor weight in step 402, which will not be repeated here.

Step 504: Obtain a corresponding information weighted fusion vector based on the attention weight vector of the node and the adjacent node fusion vector.

In some embodiments, the adjacent node fusion vector and the node attention weight vector have the same dimension, and each element of the node attention weight vector is used to represent the same position of the adjacent node fusion vector. The importance of the element. For each of the adjacent node fusion vectors, the information weight corresponding to the adjacent node fusion vector is obtained based on the node attention weight vector corresponding to the adjacent node fusion vector and the adjacent node fusion vector The fusion vector includes: for each adjacent node fusion vector, multiplying each element of the adjacent node fusion vector by the element at the same position of the node attention weight vector corresponding to the adjacent node fusion vector, And use the obtained product as the element at the same position of the information weighted fusion vector to obtain the information weighted fusion vector.

For example, the fusion vector of adjacent nodes with dimension 1*6

Is [A1, B1, C1, D1, E1, F1], and its corresponding dimension is 1*6 node attention weight vector

It is [A2, B2, C2, D2, E2, F2], in a possible embodiment, A2 is used to characterize the importance of A1, B2 is used to characterize the importance of B1, and the rest of the elements are analogously. Then based on the multiplication between the above elements, the fusion vector corresponding to the adjacent node is obtained

Information weighted fusion vector

It is [A1*A2, B1*B2, C1*C2, D1*D2, E1*E2, F1*F2]. Similarly, the information weighted fusion vector

It can be obtained based on the same method.

Step 506: Determine the attention weight vector of the path based on the importance of the path.

In some embodiments, the attention weight vector of the path is obtained based on the information weighted fusion vector.

In some embodiments, the method for obtaining the attention weight vector of the path may refer to the obtaining of the feature weight vector in step 402.

For example, based on the information weighted fusion vector with a dimension of 1*6

And a fourth weight matrix with a dimension of 6*1 to obtain the path attention weight β _j corresponding to the information weighted fusion vector. In the same way, for other m information weighted fusion vectors, other path attention weights are obtained based on the same method as described above: β ₁ , β ₂ ,...β _m . Combining all the path attentions can obtain the path attention weight vector β _u as shown in Fig. 5. The parameter vectors of different information weighted fusion vectors can be the same, and the parameter vectors can be obtained by training the model corresponding to the method for judging the entity object category through the entity object data.

Step 508: Determine the representation vector of the node to be predicted based on the grouped fusion of the node information to be predicted, the attention weight vector of the node, and the attention weight vector of the path.

In some embodiments, obtaining the information representation learning vector of the node to be predicted based on the information weighted fusion vector and the path attention weight corresponding to the information weighted fusion vector includes: weighting each information fusion vector

Multiply it with the path attention weight β _j in the information-weighted fusion vector β _u , and add all the vectors obtained by the multiplication to obtain the information representation learning vector e _u , such as:

For example, after the calculation of the above steps, a total of 2 information weighted fusion vectors are obtained

The attention weight vector of the corresponding path is β ₁ , β ₂ , and the obtained information indicates that the learning vector e _u is

The information can be directly represented by the learning vector e _u for related calculations, for example, directly input into the prediction model for calculation.

Because of the integration of path attention weight, node attention weight, and characteristic information of neighbor nodes and nodes to be predicted, this information indicates that the learning vector can express richer information and make subsequent prediction results more accurate.

As mentioned above, in some embodiments, the attention weight of the node may not be merged, and the corresponding representation learning vector e _u is obtained based on the path attention weight and the feature information of the node to be predicted, ie, see Fig. 6 and Fig. 6 It is an exemplary sub-flow chart of a prediction method based on a heterogeneous graph neural network model shown in other embodiments of this specification.

Step 602 and step 604 obtain the node information to be predicted based on the grouped neighbor nodes. For detailed description, please refer to

steps

402 and 404 respectively, which will not be repeated here.

Steps

606 and 608 respectively determine the attention weight of the path based on the importance of the path, and determine the representation learning vector e _{u of the} node to be predicted based on the above-mentioned node information to be predicted and the attention weight of the path. The specific method can be seen in Figure 5

Steps

506 and 508 are not repeated here.

In some embodiments, end-to-end training may be performed on the heterogeneous graph network model and the prediction model based on a large number of training samples with identifications. The end-to-end training refers to multiple models in the training process, according to the model's data processing steps, input data from the input end of the first model, and obtain the result from the output end of the last model, based on the result and the true value The error of all models are adjusted iteratively until the model meets the cut-off condition. The end-to-end training can save the training data required for training each independent model, and at the same time, the training results between the models will not affect each other.

Specifically, in some embodiments, the training samples with the logo may be input into the heterogeneous graph network model, and then the representation vector output by the heterogeneous network model is input into the prediction model, and the loss function is constructed based on the output value of the prediction model. Iteratively update the parameters of the heterogeneous graph network model and the prediction model.

In some embodiments, the training sample may be several pieces of heterogeneous graph data related to the target object (for example, the user's credit rating; or the user's movie preference type), and each piece of heterogeneous graph data includes related node data, Adjacent node data and path information between each node. The heterogeneous graph data can be constructed based on acquired historical data information, for example, based on acquired user personal data, user preference data for movie types, preference data for movie actors, preference data for movie directors, and the like.

Then, each piece of heterogeneous graph data in the training sample is labeled, and the labeled data includes the evaluation result of the target content in each piece of heterogeneous graph data (for example, the user’s credit rating is good; or the user likes The movie genre is comedy movie). Wherein, the evaluation result may also be determined by obtaining historical evaluation information of the target object (for example, the user).

The following will be based on the above labeled heterogeneous graph data input into the heterogeneous graph network model and the prediction model for simultaneous training. Specifically, the heterogeneous graph data with sample identification will be input into the heterogeneous graph network model, and the heterogeneous graph data will be input into the heterogeneous graph network model. The representation vector output by the graph network model is used as the input data of the prediction model, the labeled data corresponding to the heterogeneous graph data is used as the output data of the prediction model, and the input data and output data are input to the prediction model for training. In the training process, a loss function can be constructed based on the actual output value of the prediction model, and the parameters of the heterogeneous graph network model hybrid prediction model can be iteratively updated through the loss function.

In some embodiments, the parameters of the heterogeneous network model may include a first parameter matrix, a second parameter matrix, a first weight matrix, a second weight matrix, etc., and the parameters of a prediction model may include a binary classification model, a logistic regression model, and a neural network. The weights, thresholds, etc. of the network model.

In some embodiments, when the trained model meets the iteration cutoff condition, the training ends. Among them, the iterative cut-off condition can be that the result of the loss function converges or is less than a threshold.

It should be noted that the above description of the related process is only for example and description, and does not limit the scope of application of this specification. For those skilled in the art, various modifications and changes can be made to the process under the guidance of this specification. However, these corrections and changes are still within the scope of this specification.

The beneficial effects that the embodiments of this specification may bring include, but are not limited to: (1) The method for judging the entity object category through entity object data combines the feature information of the node to be predicted and the neighbor node, as well as the node attention weight and path attention The weight fully extracts all aspects of the information in the heterogeneous graph, so that the obtained information indicates that the information contained in the learning vector is richer, and the prediction result of the input prediction model is more accurate; (2) For the same type of path, different The feature information of neighbor nodes effectively extracts and utilizes the structural information in the heterogeneous graph, so that the obtained information learning representation vector contains richer semantic information. It should be noted that different embodiments may have different beneficial effects. In different embodiments, the possible beneficial effects may be any one or a combination of the above, or any other beneficial effects that may be obtained.

The basic concepts have been described above. Obviously, for those skilled in the art, the above detailed disclosure is only an example, and does not constitute a limitation to this specification. Although it is not explicitly stated here, those skilled in the art may make various modifications, improvements and amendments to this specification. Such modifications, improvements, and corrections are suggested in this specification, so such modifications, improvements, and corrections still belong to the spirit and scope of the exemplary embodiments of this specification.

Meanwhile, this specification uses specific words to describe the embodiments of this specification. For example, "one embodiment", "an embodiment", and/or "some embodiments" mean a certain feature, structure, or characteristic related to at least one embodiment of this specification. Therefore, it should be emphasized and noted that “one embodiment” or “one embodiment” or “an alternative embodiment” mentioned twice or more in different positions in this specification does not necessarily refer to the same embodiment. . In addition, some features, structures, or characteristics in one or more embodiments of this specification can be appropriately combined.

In addition, those skilled in the art can understand that various aspects of this specification can be explained and described through a number of patentable categories or situations, including any new and useful process, machine, product, or combination of substances, or a combination of them. Any new and useful improvements. Correspondingly, various aspects of this specification can be completely executed by hardware, can be completely executed by software (including firmware, resident software, microcode, etc.), or can be executed by a combination of hardware and software. The above hardware or software can all be referred to as "data block", "module", "engine", "unit", "component" or "system". In addition, various aspects of this specification may be embodied as a computer product located in one or more computer-readable media, and the product includes computer-readable program codes.

The computer storage medium may contain a propagated data signal containing a computer program code, for example on a baseband or as part of a carrier wave. The propagated signal may have multiple manifestations, including electromagnetic forms, optical forms, etc., or a suitable combination. The computer storage medium may be any computer readable medium other than the computer readable storage medium, and the medium may be connected to an instruction execution system, device, or device to realize communication, propagation, or transmission of the program for use. The program code located on the computer storage medium can be transmitted through any suitable medium, including radio, cable, fiber optic cable, RF, or similar medium, or any combination of the above medium.

The computer program codes required for the operation of each part of this manual can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python Etc., conventional programming languages such as C language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code can be run entirely on the user's computer, or run as an independent software package on the user's computer, or partly run on the user's computer and partly run on a remote computer, or run entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any network form, such as a local area network (LAN) or a wide area network (WAN), or connected to an external computer (for example, via the Internet), or in a cloud computing environment, or as a service Use software as a service (SaaS).

In addition, unless explicitly stated in the claims, the order of processing elements and sequences, the use of numbers and letters, or the use of other names described in this specification are not used to limit the order of processes and methods in this specification. Although the foregoing disclosure uses various examples to discuss some embodiments of the invention that are currently considered useful, it should be understood that such details are only for illustrative purposes, and the appended claims are not limited to the disclosed embodiments. On the contrary, the rights are The requirements are intended to cover all modifications and equivalent combinations that conform to the essence and scope of the embodiments of this specification. For example, although the system components described above can be implemented by hardware devices, they can also be implemented only by software solutions, such as installing the described system on an existing server or mobile device.

For the same reason, it should be noted that, in order to simplify the expressions disclosed in this specification and help the understanding of one or more embodiments of the invention, in the foregoing description of the embodiments of this specification, multiple features are sometimes combined into one embodiment. In the drawings or its description. However, this method of disclosure does not mean that the subject of the specification requires more features than those mentioned in the claims. In fact, the features of the embodiment are less than all the features of the single embodiment disclosed above.

In some embodiments, numbers describing the number of ingredients and attributes are used. It should be understood that such numbers used in the description of the embodiments use the modifier "about", "approximately" or "substantially" in some examples. Retouch. Unless otherwise stated, "approximately", "approximately" or "substantially" indicates that the number is allowed to vary by ±20%. Correspondingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, and the approximate values can be changed according to the required characteristics of individual embodiments. In some embodiments, the numerical parameter should consider the prescribed effective digits and adopt the method of general digit retention. Although the numerical ranges and parameters used to confirm the breadth of the ranges in some embodiments of this specification are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.

For each patent, patent application, patent application publication and other materials cited in this specification, such as articles, books, specifications, publications, documents, etc., the entire contents are hereby incorporated into this specification as a reference. The application history documents that are inconsistent or conflict with the content of this specification are excluded, and the documents that restrict the broadest scope of the claims of this specification (currently or later appended to this specification) are also excluded. It should be noted that if there is any inconsistency or conflict between the description, definition, and/or use of terms in the accompanying materials of this manual and the content of this manual, the description, definition and/or use of terms in this manual shall prevail. .

Finally, it should be understood that the embodiments described in this specification are only used to illustrate the principles of the embodiments of this specification. Other variations may also fall within the scope of this specification. Therefore, as an example and not a limitation, the alternative configuration of the embodiment of the present specification can be regarded as consistent with the teaching of the present specification. Accordingly, the embodiments of this specification are not limited to the embodiments explicitly introduced and described in this specification.

Claims

A method for prediction based on a heterogeneous graph neural network model, the method comprising:

Acquiring heterogeneous graph data related to the predicted content, the heterogeneous graph data including a node to be predicted, neighbor nodes of the node to be predicted, and a path connecting the node to be predicted and the neighbor node, the The path includes at least one type;

Grouping the neighboring nodes based on the type of the path, so that the types of paths of the neighboring nodes in the same group are the same;

The node to be predicted, the grouped neighbor nodes, and the path between the nodes are input into the trained heterogeneous graph neural network model, and the representation vector of the node to be predicted is obtained and then input into the trained prediction model for prediction.
The method according to claim 1, wherein inputting the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model, and obtaining the representation vector of the node to be predicted further comprises:

After fusing the node attention weight and/or the path attention weight, the representation vector of the node to be predicted is obtained.
The method according to claim 1, wherein inputting the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model to obtain the representation vector of the node to be predicted comprises:

Determine the attention weight of the path based on the importance of the path;

The representation vector of the node to be predicted is determined based on the grouped fusion of the node information to be predicted and the attention weight of the path.
The method according to claim 3, inputting the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model, and obtaining the representation vector of the node to be predicted further comprises:

The attention weight of the node is determined based on the importance of the neighbor nodes, and the representation vector of the node to be predicted is determined based on the information of the node to be predicted that is fused by the group, the attention weight of the node, and the attention weight of the path.
The method according to claim 1, wherein the prediction content includes predicting the category, risk level, or preference habits of the target object.
The method according to claim 1, wherein the trained heterogeneous graph neural network model and the trained prediction model are obtained by the following end-to-end training:

Based on the loss function of the prediction model, iteratively update the model parameters of the prediction model and the heterogeneous graph neural network model until the iteration cut-off condition is met.
The method of claim 6, wherein the end-to-end training further comprises:

Several heterogeneous graph data are used as training data, and the correct result of the node corresponding to the heterogeneous graph data is used as the label data of the training data. The parameters of the prediction model and the parameters of the heterogeneous graph neural network model use the training The data and the label data are updated through training iterations.
The method according to claim 1, wherein the inputting the node to be predicted, the neighbor nodes after grouping, and the path between the nodes into the trained heterogeneous graph neural network model to obtain the representation vector of the node to be predicted comprises :

According to the feature information of the neighbor nodes of the same group, obtain the group aggregation vector corresponding to the group one by one;

For each group of aggregation vectors, the feature information of the node to be predicted is fused to obtain grouped and fused node information to be predicted.
A prediction system based on a heterogeneous graph neural network model, the system comprising:

Heterogeneous graph acquisition module for acquiring heterogeneous graph data related to predicted content, the heterogeneous graph data including a node to be predicted, neighbor nodes of the node to be predicted, and connections between the node to be predicted and the neighbor A path between nodes, the path includes at least one type;

A grouping module, configured to group the neighboring nodes based on the type of the path, so that the types of paths of the neighboring nodes in the same group are the same;

The node prediction module is used to input the node to be predicted, the grouped neighbor nodes, and the path between the nodes into the trained heterogeneous graph neural network model to obtain the representation vector of the node to be predicted and then input the trained prediction The model makes predictions.
The system according to claim 9, wherein the node prediction module is further used to obtain the representation vector of the node to be predicted after fusing the node attention weight and/or the path attention weight.
The system according to claim 9, wherein the node prediction module is further configured to: determine the attention weight of the path based on the importance of the path; State the representation vector of the node to be predicted.
The system according to claim 11, wherein the node prediction module is further configured to:

The attention weight of the node is determined based on the importance of the neighbor nodes, and the representation vector of the node to be predicted is determined based on the information of the node to be predicted that is fused by the group, the attention weight of the node, and the attention weight of the path.
The system according to claim 9, wherein the prediction content includes the category, risk level, or preference habits of the predicted target object.
9. The system according to claim 9, further comprising a model training module for obtaining a trained heterogeneous graph neural network model and a trained prediction model by adopting the following end-to-end training:

Based on the loss function of the prediction model, iteratively update the model parameters of the prediction model and the heterogeneous graph neural network model until the iteration cut-off condition is met.
The system according to claim 14, wherein the model training module is further used for:

Several heterogeneous graph data are used as training data, and the correct result of the node corresponding to the heterogeneous graph data is used as the label data of the training data. The parameters of the prediction model and the parameters of the heterogeneous graph neural network model use the training The data and the label data are updated through training iterations.
The system according to claim 9, wherein the node prediction module is further configured to:

According to the feature information of the neighbor nodes of the same group, obtain the group aggregation vector corresponding to the group one by one;

For each group of aggregation vectors, the feature information of the node to be predicted is fused to obtain grouped and fused node information to be predicted.
A device for prediction based on a heterogeneous graph neural network model, comprising a processor, wherein the processor is configured to execute the method for prediction based on a heterogeneous graph neural network model according to any one of claims 1 to 8 .
A computer-readable storage medium that stores computer instructions. After the computer reads the computer instructions in the storage medium, the computer executes the heterogeneous graph-based neural network model according to any one of claims 1 to 8. Method of making predictions.