CN115017371A

CN115017371A - Target node determination method, storage medium, and program product

Info

Publication number: CN115017371A
Application number: CN202210617036.9A
Authority: CN
Inventors: 林田谦谨; 李旭瑞; 康杨杨; 孙常龙
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-09-06

Abstract

An embodiment of the present application provides a target node determination method, a storage medium, and a program product, where the target node determination method includes: extracting a prediction subgraph which takes a node to be predicted as a center from a predetermined graph network; obtaining a plurality of virtual subgraphs matched with the predicted subgraphs, wherein the virtual subgraphs are subgraph networks which are obtained by pre-training a plurality of subgraph training samples and are at least partially matched with the graph structures of the subgraph training samples; based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph, respectively aligning a plurality of virtual subgraphs with the predicted subgraph; and determining whether the node to be predicted in the prediction subgraph is a target node or not according to the alignment result.

Description

Target node determination method, storage medium, and program product

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a target node determining method, a storage medium and a program product.

Background

The application of graph networks (graph networks) in the e-commerce field, the academic field and the like is very wide. But now, when prediction or recognition is carried out based on a graph network, three modes are generally adopted, one mode is based on proximity, namely, the embedded representation of the nodes is obtained by learning the probability that the nodes are adjacent; the second is based on the relation, namely learning the probability of the existence of the triples in a graph network to obtain an embedded representation; the third is message-passing based, i.e. an embedded representation of a node is obtained by aggregating the nodes' neighbors.

However, in the above methods, the embedded representation is generally obtained by performing calculation in a weighted average manner according to the information of the node, but this method cannot make good use of the structure information in the graph network, and thus the advantages of the graph network are weakened.

Disclosure of Invention

In view of the above, embodiments of the present application provide a target node determination scheme to at least partially solve the above problem.

According to a first aspect of embodiments of the present application, a method for determining a target node is provided, including: extracting a prediction subgraph which takes a node to be predicted as a center from a predetermined graph network; obtaining a plurality of virtual subgraphs matched with the predicted subgraphs, wherein the virtual subgraphs are subgraph networks which are obtained by pre-training a plurality of subgraph training samples and are at least partially matched with the graph structures of the subgraph training samples; based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph, respectively aligning a plurality of virtual subgraphs with the predicted subgraph; and determining whether the node to be predicted in the prediction subgraph is a target node or not according to the alignment result.

According to a second aspect of the embodiments of the present application, there is provided a method for predicting an abnormal enterprise, including: generating a graph network by taking enterprises and users related to the enterprises as nodes and taking the association relationship between the enterprises or the association relationship between the enterprises and the users as edges; extracting a prediction subgraph taking a node corresponding to an enterprise to be predicted as a center from the graph network; obtaining a plurality of virtual subgraphs matched with the predicted subgraphs, wherein the virtual subgraphs are subgraph networks which are obtained by pre-training a plurality of subgraph training samples and are at least partially matched with the graph structure of a target subgraph which takes a node corresponding to an abnormal enterprise as a center; based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph, respectively aligning a plurality of virtual subgraphs with the predicted subgraph; and determining whether the node to be predicted in the prediction subgraph is a node corresponding to an abnormal enterprise or not according to the alignment result.

According to a third aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the target node determination method as described above.

According to a fourth aspect of embodiments of the present application, there is provided a computer program product, which includes computer instructions for instructing a computing device to execute operations corresponding to the target node determination method described above.

According to the scheme provided by the embodiment of the application, a prediction subgraph taking a node to be predicted as a center is extracted from a predetermined graph network; obtaining a plurality of virtual subgraphs matched with the predicted subgraph, wherein the virtual subgraph is a subgraph network which is obtained by pre-training a plurality of subgraph training samples and at least partially matched with the graph structure of the subgraph training samples, and the virtual subgraph can be used as the structural feature of the subgraph training samples because the graph structure of the virtual subgraph is at least partially matched with the graph structure of the subgraph training samples; aligning a number of the virtual subgraphs with the predicted subgraphs respectively based on the graph structure of the virtual subgraphs and the graph structure of the predicted subgraphs, namely, the matching degree of the structural features of the prediction subgraph and the subgraph training sample can be determined through the virtual subgraph, then whether the node to be predicted in the prediction subgraph is the target node or not can be determined according to the alignment result, therefore, the structural characteristics of the sub-graph training sample can be directly learned through the virtual sub-graph, whether the node to be predicted is the target node or not is determined based on the virtual sub-graph, the utilization rate of the graph structure is greatly improved, thereby improving the accuracy of the determined target node, and the virtual subgraph can be intuitively displayed in a graph mode and has intuitive semantics, therefore, the interpretability of the virtual subgraph is also higher, so that the interpretability of determining the node to be predicted as the target node through the scheme provided by the embodiment is also higher.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1A is a flowchart illustrating steps of a method for determining a target node according to a first embodiment of the present application;

FIG. 1B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 1A;

fig. 2 is a flowchart illustrating steps of a method for determining a target node according to a second embodiment of the present application;

fig. 3A is a flowchart illustrating steps of a method for determining a virtual sub-graph in the present embodiment;

FIG. 3B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

fig. 4A is a flowchart illustrating steps of a method for forecasting an abnormal enterprise according to an embodiment of the present disclosure;

FIG. 4B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 4A;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

In order to more clearly illustrate the present application, the following description is provided as an example of a usage scenario of the present application.

The application of graph networks (graph networks) in the e-commerce field, the academic field and the like is very wide.

For example, in the e-commerce field, a graph network can be constructed according to each user and each commodity, and commodity recommendation can be performed to the user according to the constructed graph network; in the academic field, the knowledge graph can be constructed according to the knowledge points, and other academic related operations can be carried out based on the knowledge graph. In addition, the graph network can be applied to other fields, such as abnormal enterprise identification, black birth identification, behavior prediction and other aspects.

When the graph network is used specifically, a graph-embedded representation of a certain node is generally obtained through computation according to the graph network, and then prediction or recognition is performed by other processing units at the downstream according to the graph-embedded representation, so as to obtain a prediction or recognition result.

But now, when prediction or recognition is carried out based on a graph network, three modes are generally adopted, one mode is based on proximity, namely, the embedded representation of the nodes is obtained by learning the probability that the nodes are adjacent; the second is based on the relation, namely learning the probability of the existence of the triples in a graph network to obtain an embedded representation; the third is message-passing based, i.e. an embedded representation of a node is obtained by aggregating the nodes' neighbors.

However, in the above-described method, the embedded representation is generally calculated by a weighted average method based on information of the node. For example, in the first mode, the probability that a node is adjacent can be learned by counting information of other nodes adjacent to the node; in the second mode, learning can be performed by counting the occurrence probability of the triples in the graph network; in the third mode, the information about the neighbor nodes of the node is processed by a weighted average method, and the learned information is the feature distribution information of the information about the node, not the structural information about the node in the graph network.

The above methods do not make good use of the structure information in the graph network, and the advantages of the graph network are weakened.

Therefore, in the embodiment, a target node determination method is provided, in which structural information related to a target node in a graph network is learned through a virtual sub-graph, and whether a node to be predicted is the target node is determined according to the structural information, so that perception of the graph structure in the process of determining the target node is improved, and further, the utilization rate of the structural information in the graph network is improved.

Referring to fig. 1A, fig. 1A is a schematic flowchart of a target node determining method provided in the embodiment of the present application, and as shown in the figure, the method includes:

s101, extracting a prediction subgraph taking a node to be predicted as a center from a predetermined graph network.

In this embodiment, the predetermined graph network may be a graph network generated according to information collected in advance. For example, if it is required to identify whether an enterprise is an abnormal enterprise, relevant information of several enterprises may be collected in advance, and a graph network may be generated according to the relevant information of several enterprises.

The graph network may be a network composed of nodes and edges for connecting the nodes, the edges connecting the nodes being used to represent an association relationship between the nodes.

In this step, when an object needs to be identified based on the graph network, a node to be predicted corresponding to the object may be determined, and a prediction subgraph with the node to be predicted as a center is extracted from the graph network. For example, when it is necessary to identify whether a certain enterprise is an abnormal enterprise, a node to be predicted corresponding to the enterprise may be determined from the graph network, and a prediction subgraph centered on the node to be predicted is extracted.

Specifically, when extraction is performed, a sub-graph can be extracted from the graph network by taking a node to be predicted as a center and an edge connecting the nodes as a path.

S102, obtaining a plurality of virtual sub-graphs matched with the prediction sub-graph.

In this embodiment, the virtual subgraph is a subgraph network which is obtained by training a plurality of subgraph training samples in advance and is at least partially matched with the graph structure of the subgraph training samples, so that a plurality of graph structure features of the subgraph training samples can be represented by a plurality of virtual subgraphs. In addition, the virtual subgraph can be presented in a graph form, so that the virtual subgraph has strong interpretability.

In the enterprise example, the relationship of co-investments may occur across many enterprises, but sometimes this is a means for an enterprise to achieve certain business objectives, sometimes because of strategic business investments. Generally, in order to realize a co-investment relationship for a certain business purpose, enterprises with the co-investment relationship have strong relationship in operation; when strategic business investment is carried out, a plurality of enterprises with co-investment relations are often independent of each other, and the difference can be directly reflected on the graph structure.

In this embodiment, if the co-investment relationship for realizing a certain business purpose is identified, the sub-graph corresponding to the enterprise can be used as a sub-graph training sample to train to obtain a virtual sub-graph, so that the graph structure of the virtual sub-graph can represent a stronger relationship in operation between enterprises having the co-investment relationship. Of course, the above description is merely illustrative and not restrictive of the present application.

S103, aligning a plurality of virtual subgraphs with the prediction subgraphs respectively based on the graph structures of the virtual subgraphs and the graph structures of the prediction subgraphs.

In this embodiment, the graph structure may specifically be information such as a topology structure of a virtual sub-graph or a predicted sub-graph, for example, the graph structure of the virtual sub-graph is a triangle structure formed by three enterprise nodes, a linear structure formed by one enterprise and two user nodes, and the like. Since the virtual subgraphs can characterize the graph structure characteristics of the subgraph training sample, whether the graph structure of the subgraph training sample is included in the predicted subgraph can be determined by aligning the virtual subgraph and the predicted subgraph based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph.

Specifically, taking an enterprise as an example, if a certain virtual sub-graph can represent a stronger relationship between enterprises with a common investment relationship in operation, and after aligning the predicted sub-graph and the virtual sub-graph, it is determined that graph structures of the two sub-graphs are similar to each other to a higher degree, it can be determined that the common investment relationship between the enterprises corresponding to the nodes in the predicted sub-graph is the common investment relationship for achieving a certain business purpose.

For a specific method for aligning the virtual sub-graph and the predicted sub-graph based on the graph structure, reference may be made to related technologies, which are not described herein again.

And S104, determining whether the node to be predicted in the prediction subgraph is a target node or not according to the alignment result.

In this embodiment, if it is determined that the degree of similarity between the graph structure of the predicted subgraph and the graph structures of the virtual subgraphs is high according to the alignment result, it may be determined that the node to be predicted is the target node. Further, it may be determined that all nodes in the prediction subgraph are target nodes.

The following provides an exemplary description of the present application with reference to a specific implementation scenario.

Referring to fig. 1B, a usage scenario diagram for target node identification based on a graph network is shown.

Specifically, taking a business as an example, if it is desired to identify whether a certain business is a target business. For example, an enterprise with high investment value or an enterprise with high investment risk may collect information related to several enterprises, and establish a graph network according to the collected information.

Then, taking the corresponding node of the enterprise which is determined to have higher investment value or higher investment risk as a center in the graph network, and extracting a sub-graph training sample; and a plurality of virtual subgraphs can be obtained through subgraph training by a subgraph training sample, so that the graph structure of the virtual subgraph is at least partially matched with the graph structure of the subgraph training sample, and therefore, the structural characteristics of the subgraph training sample can be represented by the virtual subgraph, for example, the graph structure of the virtual subgraph can be used for representing stronger business relation of enterprises with a common investment relationship.

It should be noted that, because the trained virtual subgraph can be intuitively displayed through the pattern of the graph network and has intuitive semantics, the interpretability of the virtual subgraph is also higher, so that the interpretability of determining a certain node as a target node through the scheme provided by the embodiment is also higher.

When a specific enterprise is identified, a prediction subgraph which takes a node to be predicted corresponding to the enterprise to be identified as a center can be extracted from the graph network, and a plurality of virtual subgraphs which are matched with the prediction subgraph, namely a plurality of virtual subgraphs obtained by previous training, are obtained.

And then aligning the virtual subgraphs and the prediction subgraphs through the alignment module based on the graph structure, inputting the alignment result to the prediction module, outputting the result of whether the node to be predicted is the target node through the prediction module, determining whether the prediction subgraph comprises the graph structure characteristics of the subgraph training sample according to the alignment result by the prediction module, and determining whether the enterprise corresponding to the node to be predicted in the prediction subgraph is an enterprise with higher investment value or an enterprise with higher investment risk according to the alignment result.

For example, if the predicted subgraph is aligned with a virtual subgraph, it is determined that the graph structures of the two subgraphs are higher in similarity, and the virtual subgraph is used for representing the stronger business relation of the enterprises with the co-investment relationship, it is described that the enterprises with the co-investment relationship corresponding to the predicted subgraph are stronger in business relation, that is, the enterprises are invested only for achieving a certain business purpose, thereby indicating that the investment risk of the enterprises may be higher.

According to the scheme provided by the embodiment, a prediction subgraph taking a node to be predicted as a center is extracted from a predetermined graph network; obtaining a plurality of virtual subgraphs matched with the predicted subgraph, wherein the virtual subgraph is a subgraph network which is obtained by pre-training a plurality of subgraph training samples and at least partially matched with the graph structure of the subgraph training samples, and the virtual subgraph can be used as the structural feature of the subgraph training samples because the graph structure of the virtual subgraph is at least partially matched with the graph structure of the subgraph training samples; aligning a number of the virtual subgraphs with the predicted subgraphs respectively based on the graph structure of the virtual subgraphs and the graph structure of the predicted subgraphs, namely, the matching degree of the structural features of the prediction subgraph and the subgraph training sample can be determined through the virtual subgraph, then whether the node to be predicted in the prediction subgraph is the target node or not can be determined according to the alignment result, therefore, the structural characteristics of the sub-graph training sample can be directly learned through the virtual sub-graph, whether the node to be predicted is the target node or not is determined based on the virtual sub-graph, the utilization rate of the graph structure is greatly improved, thereby improving the accuracy of the determined target node, and the virtual subgraph can be intuitively displayed in a graph mode and has intuitive semantics, therefore, the interpretability of the virtual subgraph is also higher, so that the interpretability of determining the node to be predicted as the target node through the scheme provided by the embodiment is also higher.

The target node determination method of the present embodiment may be performed by any suitable electronic device with data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.

Fig. 2 is a flowchart illustrating steps of a method for determining a target node according to the present application, where as shown in the figure, the method includes:

s201, extracting a prediction subgraph taking a node to be predicted as a center from a predetermined graph network.

S202, obtaining a plurality of virtual subgraphs matched with the predicted subgraphs.

The virtual subgraph is a subgraph network which is obtained by training a plurality of subgraph training samples in advance and at least partially matched with the graph structure of the subgraph training samples.

The specific implementation of steps S201-S202 can refer to steps S101 and S102 in the above embodiments, and are not described herein again.

S203, according to the graph mode corresponding to the graph network, determining a meta-path with the node type of the node to be predicted as a starting point.

In this embodiment, the graph mode is used to indicate a node type of a node in the graph network and an edge type of an edge used to connect the nodes. Illustratively, taking an enterprise as an example, the graph schema may be specifically used to indicate that the node types in the graph network include enterprise, people, and the like, and the edge types include legal, investment, high-management, stockholder, and the like.

The meta-path is a sequence of alternating node types and edge types. For example, taking an enterprise as an example, the meta-path may be a corporate- > person- > investment- > enterprise of the enterprise < - >. Where enterprise, artifact node types, and in "< >" are edge types.

In this embodiment, according to the graph mode, the node type of the node to be predicted may be determined, the edge type of an edge connectable to the node may be determined, and the node type of another node to which the edge is connected may be further determined according to the edge type, so that the meta-path starting from the node type of the node to be predicted may be determined, and the analogy may be performed in this order, and a sequence in which a plurality of node types and edge types are alternated may be obtained. For example, when the node type of the node to be predicted is "enterprise", the edge types connected to the node of the enterprise type may include "legal person", "investment", "high management", and the like; the node type of another node connected by the edge with the edge type of 'legal' can be 'human', and the node type of another node connected by the edge with the edge type of 'investment' can be enterprise; accordingly, the meta-path that may be determined starting from the node type of the node to be predicted may include: corporate-human-of-enterprise, enterprise-investment-enterprise, etc.

Specifically, in this embodiment, the determining, according to the graph mode corresponding to the graph network, a meta-path with a node type of the node to be predicted as a starting point includes: determining a meta-path tree according to a graph mode corresponding to the graph network by taking the node type of the node to be predicted as a root node; and traversing the meta-path tree from the root node to obtain a plurality of meta-paths with the node type of the node to be predicted as a starting point.

The meta-path tree is a set of meta-paths extending from a certain node type by taking the node type as a root node after the node type is determined; by determining and traversing the metapath tree, a more comprehensive metapath can be obtained, and the subsequent step S204 can be continuously executed according to the plurality of metapaths obtained by traversing, so that the virtual subgraph and the predicted subgraph are aligned based on the metapath tree, and the accuracy of the alignment result is ensured.

Specifically, in this embodiment, when a plurality of meta-paths are obtained by traversing the meta-path tree starting from the root node and using the node type of the node to be predicted as a starting point, each node type in the meta-path tree may correspond to one meta-path. For example, if there is a root node and three layers of the meta-path tree total nine child nodes, nine child nodes may correspond to nine meta-paths.

S204, aiming at the virtual subgraph and the predicted subgraph to be aligned, extracting a virtual neighbor node set from the virtual subgraph and extracting a predicted neighbor node set from the predicted subgraph based on the meta-path.

In this embodiment, the problem that high-order neighbor information is diluted due to layer-by-layer transmission can be avoided by obtaining the virtual neighbor node set and the prediction neighbor node set based on the meta-path and calculating the neighbor similarity based on the neighbor node set. During specific extraction, a node with the same type as that of the node at the starting point of the meta-path can be determined from the virtual subgraph, and a virtual neighbor node set is obtained by extracting the node as the path according to the edge in the virtual subgraph, wherein the node type of each node in the virtual neighbor node set is the same as that of the node at the tail end of the meta-path; the method for extracting the prediction neighbor node set is the same as the method for extracting the virtual neighbor node set, and the description is omitted here.

Optionally, in this embodiment, step S204 may include: determining a corresponding virtual node from the virtual subgraph as the center of the virtual subgraph according to the node type of the starting point of the meta-path; taking the virtual node as a center, extracting a node which accords with the meta-path from the virtual subgraph to obtain the virtual neighbor node set; and taking the node to be predicted as a center, extracting the node which accords with the meta-path from the prediction subgraph, and obtaining the prediction neighbor node set.

For example, if the meta-path is "corporate- > person < -investment- > enterprise", the obtained prediction neighbor node set is extracted as a node set corresponding to the enterprise invested by the corporate of enterprise a from the prediction subgraph centered on the node to be predicted corresponding to enterprise a.

In the embodiment, the virtual neighbor node set and the prediction neighbor node set are obtained by extracting the meta-path, which is equivalent to extracting the content related to the semantics corresponding to the meta-path from the virtual subgraph and the prediction subgraph, and the neighbor similarity is calculated through subsequent steps based on the extracted content, so that various structural information in the graph network can be greatly prevented from being diluted, the accuracy of the calculated neighbor similarity is ensured, and the accuracy of subsequently determining whether the node to be predicted in the prediction subgraph is the target node is further improved.

S205, calculating the neighbor similarity of the virtual neighbor node set and the prediction neighbor node set.

Specifically, the neighbor similarity may be calculated based on a dot product or cosine similarity of node attribute sums in the virtual neighbor node set and the predicted neighbor node set. Of course, these methods for calculating the similarity are only examples and are not intended to limit the present application.

S206, determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity.

Optionally, in this embodiment, the method further includes: calculating the node similarity of the virtual node and the node to be predicted; step S206 may include: and determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity and the node similarity.

Specifically, the similarity between the node attributes of the virtual node and the node to be predicted may be calculated as the node similarity.

In this embodiment, when the alignment score of the virtual sub-graph and the predicted sub-graph is determined according to the neighbor similarity and the node similarity, the node similarity may be multiplied by the neighbor similarity, and a multiplication result is used as the alignment score.

Optionally, in this embodiment, the method may further include: mapping the virtual sub-graph and the prediction sub-graph into a plurality of spaces based on a preset mapping function; in each space, executing the step S204 to obtain the neighbor similarity corresponding to each of the plurality of spaces; correspondingly, step S206 may specifically include: and determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity corresponding to each of a plurality of spaces.

Because two subgraphs with the same graph structure may generate a large difference due to different node attributes, in this embodiment, the virtual subgraph and the predicted subgraph may be mapped into multiple spaces by a preset mapping function, and the above steps S204 and S205 are performed in each space, so as to obtain the neighbor similarity of the virtual subgraph and the predicted subgraph in different spaces, thereby further improving the accuracy of the determined alignment result.

Specifically, the preset mapping function may be a multilayer perceptron, and the like, and the specific mapping function may be predetermined by a person skilled in the art, or may be obtained by training a sub-graph training sample, which is not limited in this embodiment.

When mapping to multiple spaces, the multiple spaces may be summarized based on a preset calculation function, and an alignment score obtained by synthesizing alignment results of the multiple spaces is obtained. Specifically, the preset calculation function may be a convolution function or the like, which is not limited in this embodiment.

The following describes an exemplary procedure for calculating the alignment result according to a specific scenario.

When determining the virtual subgraph and the predicted subgraph to be aligned, a meta-path tree corresponding to graph patterns of the two subgraphs can be obtained, each node in the meta-path tree corresponds to a node type in the subgraph, and branches of the meta-path tree correspond to edge types in the subgraph. The meta-path tree may be obtained directly from the virtual subgraph or the predicted subgraph, for example, traversing the virtual subgraph with the virtual node as the center, extracting the node type and the edge type of each node therein to obtain the meta-path tree, where the root node of the meta-path tree is the virtual node and the node type corresponding to the node to be predicted.

After the meta-path tree is determined, the meta-path tree may be traversed in a hierarchical manner to obtain meta-paths corresponding to nodes in the meta-path tree, except for a root node.

After multiple meta-paths are obtained, the virtual sub-graph and the prediction sub-graph can be respectively mapped to multiple spaces through a preset mapping function for each meta-path, the adjacent similarity is obtained through calculation according to the meta-paths in the multiple spaces, the node similarity between the virtual node and the node to be predicted is obtained through calculation, the node similarity summarized in each space can be multiplied by the adjacent similarity, and the alignment score corresponding to the space is obtained.

Then, based on each meta-path, the scores of the multiple spaces corresponding to each other may be calculated through a preset calculation function, so as to obtain an alignment score corresponding to each meta-path, and the alignment scores corresponding to the multiple meta-paths are summarized to obtain an alignment result.

S207, determining whether the node to be predicted in the prediction subgraph is a target node or not according to the alignment score.

In this embodiment, when the virtual subgraph includes a plurality of virtual subgraphs, whether the node to be predicted is the target node may be determined according to the alignment result of the plurality of virtual subgraphs and the predicted subgraph, respectively.

Specifically, in this embodiment, the alignment result may be input to a pre-trained encoder, prediction is performed according to the alignment result by the encoder, and a result of whether a node to be predicted in the prediction subgraph is a target node is output, where the encoder may specifically be a multilayer perceptron or the like. In this embodiment, the encoder may encode a plurality of alignment results to obtain an alignment coding vector, and then input the alignment coding vector into a prediction function (e.g., a multilayer perceptron, etc.), so as to obtain a prediction result used for indicating whether a node to be predicted is a target node.

In this embodiment, the encoder may be trained with the virtual sub-graph in an end-to-end manner, and specifically, the training method may refer to the subsequent embodiment, which is not described herein again.

According to the scheme provided by the embodiment, the multiple meta paths are obtained by traversing the meta path tree, and then the neighbor similarity is calculated based on the meta paths, so that the neighbor similarity corresponding to each order of meta paths can be calculated, and the high-order neighbor information is prevented from being diluted due to layer-by-layer transmission; in addition, in the embodiment, the meta-path is obtained by traversing the meta-path tree, so that different meta-path information can be automatically extracted and combined, and the problem of semantic missing caused by artificially defining the meta-path is avoided; in addition, in the embodiment, the alignment between the virtual subgraph and the predicted subgraph is realized based on the meta-path, the semantic structure corresponding to the virtual subgraph is explicitly considered, and the trained virtual subgraph has intuitive semantics, so that the interpretability of the prediction result is increased.

Fig. 3A is a method for determining a virtual subgraph provided in the embodiment of the present application, and as shown in fig. 3A, the method includes:

s301, generating a plurality of candidate virtual subgraphs based on the graph mode of the graph network.

The graph schema is used to indicate node types for nodes in the graph network and edge types for edges connecting nodes.

Optionally, in this embodiment, step S301 may specifically include: randomly generating nodes corresponding to the node types according to the node types and the edge types included in the graph modes, and randomly generating an adjacent matrix corresponding to the edge types to obtain topological structure information of the candidate virtual subgraphs; and randomly generating node attribute information of each node to obtain the candidate virtual subgraph.

For example, in this embodiment, the nodes and edges in the virtual subgraph may be generated randomly based on the node types and edge types defined by the graph structure, so as to obtain the topology structure of the virtual subgraph, and then the node attributes of each node are generated, so as to obtain the candidate virtual subgraph.

Specifically, since the edge types may also include a plurality of edge types, an adjacency matrix corresponding to the edge types one to one may be randomly generated according to the graph pattern, edges between nodes are stored in the adjacency matrix, and the candidate virtual subgraph may be obtained by using the adjacency matrices corresponding to the plurality of edge types.

Referring to fig. 3B, a virtual sub-graph may be generated by the virtual sub-graph generation module according to the graph mode, and the implementation manner of the virtual sub-graph generation module may refer to related contents, which is not described herein again.

S302, extracting a sample subgraph taking a sample node as a center from a predetermined graph network, and determining a sample label for identifying the sample node as a target node to obtain a subgraph training sample.

Taking an enterprise as an example, relevant data of each enterprise can be collected in advance, and a graph network corresponding to the enterprise is generated based on the node type and the edge type defined in the graph mode. And then determining enterprises which are determined to be at risk as sample enterprises from a plurality of enterprises, extracting a sample subgraph which takes the sample nodes corresponding to the sample enterprises as the center from the graph network, and taking the risk labels of the enterprises as sample labels to obtain a subgraph training sample.

S303, aligning a plurality of candidate virtual subgraphs with the sample subgraphs respectively based on the graph structures of the candidate virtual subgraphs and the graph structures of the sample subgraphs.

The specific method for performing this step may refer to the above embodiments, and will not be described herein again.

Referring to fig. 3B, the candidate virtual subgraph and the sample subgraph in the subgraph training sample may be aligned by the alignment module, and the alignment result is output.

S304, predicting whether the sample node in the sample subgraph is a target node or not according to the alignment result to obtain a prediction label.

In this embodiment, a plurality of alignment results obtained by aligning the candidate virtual sub-images with the sample sub-image may be input to an encoder, and the encoder performs prediction according to the plurality of alignment results to output the prediction tag. Referring to fig. 3B, the alignment result may be input to an encoder, through which a prediction tag is output.

For specific implementation of the encoder, reference may be made to related technologies, which are not described herein again.

S305, based on the difference between the sample label and the prediction label, adjusting the candidate virtual subgraph to enable the virtual subgraph to at least partially match with the graph structure of the sample subgraph.

When the encoder performs prediction according to the alignment result, the method specifically includes: and adjusting the candidate virtual subgraph and the encoder based on the difference between the sample label and the prediction label, so that the virtual subgraph and the encoder can be obtained by training simultaneously in an end-to-end training mode.

For a specific adjustment method, reference may be made to related technologies, which are not described herein again.

Fig. 4A is a flowchart illustrating steps of a method for forecasting an abnormal enterprise according to an embodiment of the present application, where the method includes:

s401, generating the graph network by taking the enterprise and the related users of the enterprise as nodes and taking the association relationship between the enterprise and the enterprise or the association relationship between the enterprise and the users as edges.

Referring to fig. 4B, a graph schema is illustrated, in which node types may include "business" and "people", and edge types may include "investment" for connecting nodes, "investment" for connecting businesses and people, "legal" and "high management".

In this embodiment, relevant data of each enterprise may be collected in advance, a graph network corresponding to the enterprise may be generated based on the node type and the edge type defined in the graph mode, and the graph network may be updated in real time or at regular time according to the relevant data of each enterprise.

S402, extracting a prediction subgraph which takes the node corresponding to the enterprise to be predicted as the center from the graph network.

And S403, obtaining a plurality of virtual subgraphs matched with the predicted subgraphs.

The virtual subgraph is a subgraph network which is obtained by training a plurality of subgraph training samples in advance and at least partially matched with the graph structure of a target subgraph which takes the node corresponding to the abnormal enterprise as the center.

Referring to fig. 4B, a schematic diagram of three virtual subgraphs is shown, the graph pattern of the virtual subgraphs being the same as the graph pattern of the graph network in the above steps. Because the virtual subgraph can be displayed in a graph mode, the virtual subgraph has intuitive semantics, and the scheme provided by the embodiment has interpretability.

S404, aligning a plurality of virtual subgraphs with the prediction subgraphs respectively based on the graph structures of the virtual subgraphs and the graph structures of the prediction subgraphs;

s405, determining whether the node to be predicted in the prediction subgraph is a node corresponding to an abnormal enterprise or not according to the alignment result.

The alignment results corresponding to the virtual subgraphs can be input into a pre-trained encoder, and a prediction result is output through the encoder and used for indicating whether an enterprise to be predicted is an abnormal enterprise or not.

Similarly, if a certain enterprise is determined to be an abnormal enterprise, all enterprises included in the sub-graph centered on the node corresponding to the enterprise may also be determined to be abnormal enterprises.

Referring to fig. 5, a schematic structural diagram of an electronic device provided in an embodiment of the present application is shown, and a specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 5, the electronic device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with other electronic devices or servers.

The processor 502 is configured to execute the program 510, and may specifically execute relevant steps in the above target node determination method embodiment.

In particular, program 510 may include program code comprising computer operating instructions.

The processor 502 may be a CPU (Central processing Unit), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

For specific implementation of each step in the program 510, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing target node determination method embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The present application further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the target node determination methods in the above multiple method embodiments.

An embodiment of the present application further provides a computer program product, which includes a computer instruction, where the computer instruction instructs a computing device to execute an operation corresponding to any target node determination method in the foregoing multiple method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the target node determination methods described herein. Further, when a general-purpose computer accesses code for implementing the target node determination method shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the target node determination method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of the patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A target node determination method includes:

extracting a prediction subgraph which takes a node to be predicted as a center from a predetermined graph network;

obtaining a plurality of virtual subgraphs matched with the predicted subgraphs, wherein the virtual subgraphs are subgraph networks which are obtained by pre-training a plurality of subgraph training samples and are at least partially matched with the graph structures of the subgraph training samples;

based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph, respectively aligning a plurality of virtual subgraphs with the predicted subgraph;

and determining whether the node to be predicted in the prediction subgraph is a target node or not according to the alignment result.

2. The method of claim 1, wherein the aligning a number of the virtual subgraphs to the predicted subgraphs based on the graph structure of the virtual subgraph and the graph structure of the predicted subgraph comprises:

determining a meta-path with the node type of the node to be predicted as a starting point according to a graph mode corresponding to the graph network, wherein the graph mode is used for indicating the node type of the node in the graph network and an edge type of an edge for connecting the node, and the meta-path is a sequence formed by alternating the node type and the edge type;

extracting a virtual neighbor node set from the virtual subgraph and a prediction neighbor node set from the prediction subgraph based on the meta-path aiming at the virtual subgraph and the prediction subgraph to be aligned;

calculating the neighbor similarity of the virtual neighbor node set and the prediction neighbor node set;

and determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity.

3. The method of claim 2, wherein the extracting a set of virtual neighbor nodes from the virtual subgraph and a set of predicted neighbor nodes from the predicted subgraph based on the meta-path for the virtual subgraph and the predicted subgraph to be aligned comprises:

determining a corresponding virtual node from the virtual subgraph as the center of the virtual subgraph according to the node type of the starting point of the meta-path;

taking the virtual node as a center, extracting a node which accords with the meta-path from the virtual subgraph to obtain the virtual neighbor node set;

and taking the node to be predicted as a center, extracting the node which accords with the meta-path from the prediction subgraph, and obtaining the prediction neighbor node set.

4. The method of claim 2, wherein the method further comprises:

calculating the node similarity of the virtual node and the node to be predicted;

the determining an alignment score of the virtual sub-graph and the predicted sub-graph according to the neighbor similarity comprises:

and determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity and the node similarity.

5. The method according to claim 2, wherein the determining a meta-path starting from the node type of the node to be predicted according to the graph mode corresponding to the graph network comprises:

determining a meta-path tree according to a graph mode corresponding to the graph network by taking the node type of the node to be predicted as a root node;

and traversing the meta-path tree from the root node to obtain a plurality of meta-paths with the node type of the node to be predicted as a starting point.

6. The method of claim 2, wherein the method further comprises:

mapping the virtual sub-graph and the prediction sub-graph into a plurality of spaces based on a preset mapping function;

in each space, executing the steps of aiming at the virtual subgraph and the predicted subgraph to be aligned, extracting a virtual neighbor node set from the virtual subgraph based on the meta-path, and extracting a predicted neighbor node set from the predicted subgraph to obtain the neighbor similarity corresponding to each of a plurality of spaces;

and determining the alignment score of the virtual subgraph and the predicted subgraph according to the neighbor similarity corresponding to each of a plurality of spaces.

7. The method of claim 1, wherein the virtual subgraph is trained by:

generating a number of candidate virtual subgraphs based on graph patterns of the graph network, the graph patterns being used for indicating node types of nodes in the graph network and edge types of edges used for connecting the nodes;

extracting a sample subgraph taking a sample node as a center from a predetermined graph network, and determining a sample label for identifying the sample node as a target node to obtain a subgraph training sample;

based on the graph structure of the candidate virtual subgraph and the graph structure of the sample subgraph, respectively aligning a plurality of candidate virtual subgraphs with the sample subgraph;

predicting whether the sample node in the sample subgraph is a target node or not according to an alignment result to obtain a prediction label;

based on a difference between the sample label and the prediction label, adjusting the candidate virtual subgraph such that the virtual subgraph at least partially matches a graph structure of the sample subgraph.

8. The method of claim 7, wherein the generating a number of candidate virtual subgraphs based on the graph schema of the graph network comprises:

randomly generating nodes corresponding to the node types according to the node types and the edge types included in the graph modes, and randomly generating an adjacent matrix corresponding to the edge types to obtain topological structure information of the candidate virtual subgraphs;

and randomly generating node attribute information of each node to obtain the candidate virtual subgraph.

9. The method of claim 7, wherein the predicting whether the sample node in the sample subgraph is a target node according to the alignment result to obtain a prediction label comprises:

respectively aligning a plurality of candidate virtual sub-images with the sample sub-images to obtain a plurality of alignment results, inputting the alignment results into an encoder, predicting according to the alignment results through the encoder, and outputting the prediction label;

said adjusting said candidate virtual subgraph based on the difference between said sample label and said prediction label comprises:

adjusting the candidate virtual sub-graph and the encoder based on a difference between the sample label and the prediction label.

10. A prediction method of abnormal enterprises comprises the following steps:

generating a graph network by taking enterprises and users related to the enterprises as nodes and taking the association relationship between the enterprises or the association relationship between the enterprises and the users as edges;

extracting a prediction subgraph taking a node corresponding to an enterprise to be predicted as a center from the graph network;

obtaining a plurality of virtual subgraphs matched with the predicted subgraphs, wherein the virtual subgraphs are subgraph networks which are obtained by pre-training a plurality of subgraph training samples and are at least partially matched with the graph structure of a target subgraph which takes a node corresponding to an abnormal enterprise as a center;

and determining whether the node to be predicted in the prediction subgraph is a node corresponding to an abnormal enterprise or not according to the alignment result.

11. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.

12. A computer program product comprising computer instructions to instruct a computing device to perform operations corresponding to the method of any of claims 1-10.