WO2020015464A1

WO2020015464A1 - Method and apparatus for embedding relational network diagram

Info

Publication number: WO2020015464A1
Application number: PCT/CN2019/089022
Authority: WO
Inventors: 向彪; 刘子奇; 周俊; 李小龙
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2018-07-17
Filing date: 2019-05-29
Publication date: 2020-01-23
Also published as: CN109063041A; CN109063041B; TWI700599B; TW202006571A

Abstract

Provided by the embodiments of the present description are a method and apparatus for embedding a relational network diagram into a multidimensional space, the method comprising: randomly determining an initial embedding vector Ci of each node i in a multidimensional space; obtaining neighbor nodes of each node i and the strength of association thereof with each neighbor node; determining a current embedding vector of the each neighbor node of the node i; forming a position initialization item and a position offset item of the nodes i on the basis of a predetermined attenuation coefficient, the strength of association and the current position of the neighbor nodes, and determining a current embedding vector of the nodes i accordingly; repeating the described steps until a convergence condition is satisfied, at which time the embedding vector of each node i in the multidimensional space may be determined. Thus, a relational network graph is efficiently embedded into the multi-dimensional space.

Description

Method and device for embedding relation network graph

Technical field

One or more embodiments of the present specification relate to the field of computer information processing, and in particular, to a method and an apparatus for embedding a relational network graph.

Background technique

A relational network diagram is a description of the relationships between entities in the real world. It is currently widely used in various computer information processing. Generally, a relational network graph contains a set of nodes and a set of edges. Nodes represent entities in the real world and edges represent connections between entities in the real world. For example, in a social network, a person is an entity, and a relationship or connection between people is an edge.

In many cases, it is desirable to represent each node (entity) in a relational network graph with coordinate values in a multi-dimensional space, that is, to map each node to a multi-dimensional space, and use points in the multi-dimensional space to represent the node. Multidimensional space can be 2D, 3D, or higher dimensional. Representing the nodes in the graph with coordinates in multidimensional space can be used to calculate the similarity between nodes and nodes, find the community structure in the graph, predict the possible edge connections in the future, and visualize the graph. The process of mapping nodes in a graph to a multidimensional space is called graph embedding.

Graph embedding is a very important basic technical capability. At present, a variety of graph embedding methods have been developed in the academic community, such as DeepWalk, node2vec, GraphRep and so on. However, because these algorithms use the Monte Carlo sampling method internally, the calculation efficiency is relatively low. When the scale of the graph becomes very large (such as the Alipay Friendship Network with more than 500 million nodes), it will consume huge computing resources to perform graph embedding calculations.

Therefore, it is hoped that there can be an improved scheme to perform the graph embedding process of the relation network graph more quickly and efficiently.

Summary of the invention

One or more embodiments of the present specification describe a graph embedding method for a relational network graph, which can efficiently embed nodes in a complex relational network graph into a multi-dimensional space to facilitate subsequent information processing.

According to a first aspect, a method for embedding a relational network graph into a multi-dimensional space is provided. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength. Methods include:

Randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes;

For each node i, obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node;

Determine the current embedding vectors of each neighboring node of the node i;

Obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on the initial embedding vector Ci It is determined that the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and an association strength between the node i and each neighboring node;

Determine whether the predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, determine the current embedding vector of each neighboring node of the node i again, and determine the current embedding vector Ei of the node i again until the predetermined convergence condition is obtained Satisfy;

Based on at least the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.

According to one embodiment, the neighbor node information of node i is obtained in the following manner:

Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;

Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.

Further, determining each neighbor node of node i through the adjacency matrix, and each association strength includes:

Obtaining the i-th row element or i-th column element corresponding to node i in the adjacency matrix;

A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.

According to one embodiment, the position initial term is determined based on the initial embedding vector Ci and the predetermined attenuation coefficient.

In one embodiment, the position offset of node i is obtained in the following manner:

Using the strength of the association between node i and each neighbor node as a weight, sum the current embedding vectors of each neighbor node to determine the center position of the neighbor;

The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.

In another embodiment, the position offset of node i is obtained in the following manner:

Determine the sum of the association strengths of node i and all its neighbors;

Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;

Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;

The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.

According to a possible design, the predetermined convergence condition may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or the current time of each node The sum of the difference between the determined current embedded vector and the previously determined current embedded vector is less than the second predetermined value.

According to another possible design, the foregoing predetermined convergence condition may be that the number of times that the current embedding vector Ei of each node i is determined to reach a predetermined number of thresholds.

In one embodiment, the embedded vector of node i is determined as the difference between the current embedded vector Ei of node i and its initial position when the predetermined convergence condition is satisfied.

According to a second aspect, a device for embedding a relational network graph into a multi-dimensional space is provided. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength. The device include:

An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i of the plurality of nodes in a multi-dimensional space;

The neighbor node determining unit is configured to obtain, for each node i, a neighbor node connected to the node i, and an association strength between the node i and each neighbor node;

A neighbor location determining unit configured to determine a current embedding vector of each neighbor node of the node i;

The node position determining unit is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and a strength of association between the node i and each neighboring node;

The condition determining unit is configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, cause the neighbor position determination unit to determine a current embedding vector of each neighbor node of the node i again, and the node position The determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied;

The embedding position determining unit is configured to determine an embedding vector of each node i in the multi-dimensional space based on at least a current embedding vector Ei of each node i that satisfies the predetermined convergence condition.

According to a third aspect, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.

According to a fourth aspect, there is provided a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method of the first aspect is implemented .

Through the methods and devices provided in the embodiments of the present specification, a relational network graph can be efficiently embedded in a multi-dimensional space, which facilitates subsequent node information processing.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solution of the embodiment of the present invention more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can obtain other drawings according to the drawings without paying creative labor.

FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in the specification; FIG.

2 illustrates a method of embedding a relational network graph into a multi-dimensional space according to one embodiment;

FIG. 3 shows an example of a relational network graph embedded in a two-dimensional space;

FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment.

detailed description

The solutions provided in this specification are described below with reference to the drawings.

FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in this specification. As shown in FIG. 1, the relationship network diagram includes multiple nodes. For clarity, these nodes are numbered in FIG. 1. Among these nodes, the nodes with association relationship are connected by edges. In one example, the nodes in Figure 1 represent people or users in the social network. The two nodes are connected by edges, which means that the corresponding two users have social associations, such as transfers, messages, communications, etc. .

In one embodiment, the association relationship between the nodes also has different association strengths. For example, in one example, different association strengths are set for different social interaction behaviors, for example, the association strength of users who perform transfer interaction is 0.8, the association strength of users who perform message operations is 0.5, and so on. In one embodiment, in a case where the association relationship has different association strengths, the attribute of the edge or the weight of the edge may be used to represent the strength of the association between the two users connected to the edge.

In the relationship network diagram in FIG. 1, in order to show each node and the connection relationship between the nodes, the locations of each node are schematically shown. In fact, the network diagram does not set the location of the nodes. For the position of the nodes, the method of graph embedding is needed to map each node into a multi-dimensional space. The method of graph embedding provided by the embodiment of the present specification is described below.

FIG. 2 illustrates a method for embedding a relational network graph into a multi-dimensional space according to an embodiment, where the relational network graph includes a plurality of nodes, and nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The above method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in FIG. 2, the method includes: step 21, randomly determining an initial embedding vector Ci of each node i in a plurality of nodes in a multi-dimensional space; step 22, for each node i, obtaining a neighbor node connected to the node i And the strength of the association between the node i and each neighbor node; step 23, determine the current embedding vector of each neighbor node of the node i; step 24, obtain the initial position and position offset terms of the node i, and according to The position initial term and position offset term determine a current embedding vector Ei of node i, wherein the position initial term is determined based on the initial embedding vector Ci, and the position offset term is based on a predetermined attenuation coefficient α, the The current embedding vector of each neighbor node and the strength of the association between this node i and each neighbor node are determined; step 25, it is determined whether the predetermined convergence condition is satisfied; if the predetermined convergence condition is not satisfied, each neighbor of this node i is determined again The current embedding vector of the node and the current embedding vector Ei of the node i are determined again until the predetermined convergence condition is satisfied; step 26, Less based satisfies the predetermined convergence condition of each node i of the current embedded vector Ei, determine respective node i is embedded in a multidimensional vector space. The following describes how the above steps are performed.

First, in step 21, an initial embedding vector Ci of each node i in a multi-dimensional space among a plurality of nodes in the relation network graph is randomly determined. Assume that the relational network graph contains N nodes, and the dimension of the multidimensional space to be embedded is s, then for each node i of the N nodes, an s-dimensional vector Ci is randomly generated as its initial embedding vector.

On the other hand, in step 22, for each node i, a neighbor node connected to the node i and the strength of the association between the node i and each neighbor node are obtained.

It can be understood that in a relational network diagram, nodes with an association relationship are connected to each other, and nodes connected to each other are neighbor nodes to each other. In addition, it can be understood that the topology of the relational network graph can be recorded in a variety of ways. For example, in one example, the connection relationships of a relational network diagram are recorded by a chart. At this time, the neighbor node information of each node i and the strength of the association between node i and the neighbor node can be read from the above-mentioned graph.

In one embodiment, the connection relationships of the relational network graph are recorded by a matrix. For example, a matrix describing a relational network graph may have an adjacency matrix, a degree matrix, a Laplacian matrix, and the like. In one example, the neighbor information and association strength information of the nodes are obtained by recording the adjacency matrix of the network relationships of the relationship network graph.

Specifically, assuming that matrix A is an adjacency matrix of a relational network graph G, matrix A can be expressed as:

A = [a _mk ] _{N * N} ,

Among them, the element a _mk in the m-th row and the k-th column corresponds to the strength of the association between the node m and the node k.

If there is no connection between two nodes and there is no association relationship, then the association strength between them is 0.

Through such an adjacency matrix, the neighbor information and association strength information of each node can be easily obtained. Specifically, for the node i, the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix A is obtained, that is, a _ij or a _ji ; Node j is determined as the neighbor of node i, and the value of the non-zero element is determined as the strength of association between node i and the corresponding neighbor node.

After determining the neighbor node j of each node i, in step 23, the current embedding vector Ej of each neighbor node j of node i is determined.

It can be understood that, since the initial embedding vector is randomly generated for each node in step 21, when the step 23 is first performed, for the neighbor node j that has not updated the current embedding vector, its current embedding vector Ej is its corresponding The initial embedding vector Cj. The current embedding vector of each node will be updated in subsequent iterations, which will be described in subsequent steps.

Based on the relevant information obtained for node i in

steps

22 and 23, in step 24, the current embedding vector Ei of node i is determined. Specifically, the current embedding vector Ei of the node i can be considered to be composed of two parts: a position initial term VI and a position offset term VD:

Ei = VI + VD,

The position initial term VI is determined based on the initial embedding vector Ci, and the position offset term VD is determined according to a predetermined attenuation coefficient α, the current embedding vector Ej of each neighboring node j, and the strength of association a _ij between the node i and each neighboring node. .

In one embodiment, the initial position VI of the position of node i is its initial embedding vector Ci, that is:

VI = Ci.

In another embodiment, the initial position term may be an initial embedding vector Ci multiplied by a certain coefficient. For example, the coefficient may be related to the attenuation coefficient α introduced in the position offset term. Therefore, in one embodiment, the position initial term can be determined based on the initial embedding vector Ci and the attenuation coefficient α. Specifically, in one example, the position initial term VI is determined as:

VI = (1-α) C _i

Generally, once the initial position item is determined, it is fixed during subsequent update iterations.

On the other hand, the position offset term VD of the node i is also determined. According to at least one embodiment of the specification, the position offset term VD is determined according to a predetermined attenuation coefficient α, a current embedding vector Ej of each neighboring node j, and an association strength a _ij between the node i and each neighboring node.

The attenuation coefficient α is used to adjust the step size or size of the position offset adjustment, and is generally preset to a value between 0 and 1.

In one embodiment, using the strength of association aij between node i and each neighbor node j as a weight, the current embedding vector Ej of each neighbor node j is summed to determine the center position of the neighbor; then based on a predetermined attenuation coefficient α, the neighbor The center position determines the position offset VD.

In one example, according to the above idea, the position offset term VD is determined as:

Where N (i) represents the set of neighbor nodes of node i.

The above calculation method of VD is more suitable for the case where the correlation strength a _ij itself is defined between 0 and 1. If the range of the correlation strength a _ij is large, it can be set to a smaller value when the attenuation coefficient is set in advance.

In another embodiment, the position offset term VD of node i is determined by determining the sum value di of the association strength between node i and all its neighboring nodes j; and determining the association strength aij between node i and each neighboring node j The ratio to the sum value di is taken as the relative correlation strength; the current embedding vector Ej of each neighbor node j is summed to determine the neighbor center position using the relative correlation strength as a weight; the neighbor center position and a predetermined attenuation coefficient α The product of, as the position offset term VD.

among them:

In this way, considering the correlation strength between node i and each neighboring node j, determine the center position of the neighbor of node i, and then use the attenuation coefficient as an adjustment to determine the position offset VD. In this way, the position offset VD can reflect the bias toward the center of the neighbor. The distance moved.

According to a specific example, combining the foregoing position initial term and the position offset term determined according to the relative correlation strength as described above, the current embedding vector Ei of node i can be determined as:

The above describes various ways to determine the current embedding vector Ei of the node i.

According to either method, steps 23 and 24 above are performed for each node i in the relational network graph, thereby determining the current embedding vector for each node.

Next, in step 25, it is determined whether a predetermined convergence condition is satisfied. If the predetermined convergence condition is not satisfied, return to step 23 and step 24 to determine the current embedding vector of each neighboring node of node i again, and the current embedding vector Ei of node i again.

It can be understood that the

above steps

23 and 24 are performed for each node in the relational network graph, so each time the loop of

steps

23 and 24 is executed, the current embedding vector of each node will be updated. Correspondingly, when step 23 is performed at the (n + 1) th time, for the same node i, the current embedding vector Ej of its neighbor node j is different from that at the nth time. Actually, the If yes, the current embedding vector of each node when step 24 is executed. In this way, the position offset term in step 24 will be changed each time the above-mentioned loop is executed, so that the current embedding vector of each node i is constantly updated.

Such a loop is repeatedly performed until a predetermined convergence condition is satisfied.

In one embodiment, the predetermined convergence condition is set according to an offset adjustment amount that corresponds to an offset between a position determined this time and a position determined previously.

Specifically, in one embodiment, the predetermined convergence condition may be set such that, for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value. For example, for N nodes in a relational network graph, if the difference between the current embedding vector of each node and the last determined embedding vector, that is, the offset distance, is less than a distance threshold, then it means that the node The position adjustment has been reduced to a certain extent, and the position of the nodes tends to stabilize and converge, so as to reach the convergence condition.

In another embodiment, the predetermined convergence condition may be set such that the sum of the difference between the current embedded vector determined this time and the previously determined current embedded vector of each node is less than the second predetermined value. That is, consider the sum DT of the offset distances of N nodes:

Di is the offset distance of node i, that is, the difference between the current embedding vector and the embedding vector determined last time.

When the sum of the offset distances DT is less than a certain threshold value, it means that the overall position adjustment of the node is small, and the position of the node tends to stabilize and converge, thereby achieving the convergence condition.

In another embodiment, according to experience, a preset number of executions of the loop may be used as the convergence condition. That is, when it is determined that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds, it is considered that a convergence condition is satisfied. According to experience, the above execution times can generally be set between 10-20 times.

If the convergence condition is satisfied, then exit the loop and proceed to step 26 to determine the embedding vector Qi of each node i in the multi-dimensional space based on at least the current embedding vector Ei of each node i that meets the predetermined convergence condition.

In one embodiment, the current embedding vector Ei of each node i that satisfies the convergence condition is used as its embedding vector Qi, that is, Qi = Ei.

In another embodiment, in order to reduce the impact of the initial embedding vector that is randomly generated initially, the embedding vector of node i is determined as the difference between the current embedding vector Ei of node i and its initial position when the predetermined convergence condition is met, that is:

Qi = Ei-VI

Where VI is associated with the initial embedding vector Ci, for example equal to Ci, or equal to Ci multiplied by a certain coefficient, such as (1-α) Ci.

In this way, the embedding vector of each node i in the multi-dimensional space is determined.

Based on the embedding vectors determined in this way, the nodes in the relational network graph can be embedded into the multi-dimensional space. The nodes embedded in the multi-dimensional space have position information, and since the connection relationship and connection strength between nodes are considered in the embedding process, the position information also reflects the association relationship between the nodes. For example, there is a stronger association between nodes that are close to each other in a multi-dimensional space. In this way, it is very helpful for further processing of node relationship information, such as clustering nodes, finding groups formed by nodes, calculating similarity between nodes, predicting potential edge connections of nodes, and so on. When the relational network diagram is embedded in a two-dimensional space or a three-dimensional space, it is also very helpful for the visual presentation of the relational network.

FIG. 3 shows an example of a relational network diagram embedded in a two-dimensional space. More specifically, FIG. 3 is an example of embedding the relationship network graph of FIG. 1 into a two-dimensional space by using the method shown in FIG. 2. Compared to the nodes randomly arranged in FIG. 1 for the sake of illustration, the positions of the nodes in FIG. 3 contain more information, which reflects the association relationship between the nodes. Some nodes are very close to each other, which means that these nodes have a stronger association relationship. Moreover, it can also be seen from the distribution of node positions that nodes will present a potential cluster of nodes. Such information will be beneficial to the further processing of node information in the relational network.

According to another aspect, an embodiment of the present specification further provides an apparatus for embedding a relational network graph into a multi-dimensional space, wherein the relational network graph to be embedded includes a plurality of nodes, and the nodes having the association relationship among the multiple nodes have a certain association strength. Connected to each other. FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment. As shown in FIG. 4, the graph embedding apparatus 400 includes: an initial position determining unit 41 configured to randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes; a neighbor node determining unit 42 configured to Node i, to obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node; the neighbor location determination unit 43 is configured to determine the current embedding vectors of each neighbor node of the node i; node The position determining unit 44 is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, wherein the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, the current embedding vector of each neighbor node, and the strength of the association between the node i and each neighbor node; the condition determination unit 45 , Configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, make the neighbor bit The determination unit determines the current embedding vector of each neighboring node of the node i again, and the node position determination unit determines the current embedment vector Ei of the node i again until the predetermined convergence condition is satisfied; and the embedding position determination unit 46 is configured to at least Based on the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.

According to an embodiment, the neighbor node determining unit 42 is configured to obtain an adjacency matrix that records a network relationship of the relationship network graph, and the elements in the m-th row and the k-th column of the adjacency matrix correspond to the m-th node and the k-th node The strength of association between the nodes; the neighbor matrix of node i and the strength of association between node i and each neighboring node are determined through the adjacency matrix.

Further, in a specific example, the neighbor node determination unit 42 determines neighbor node information in the following manner: obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix; A node corresponding to a non-zero element in a row element or an i-th column element is determined as a neighbor node of the node i; a value of the non-zero element is determined as a strength of association between the node i and a corresponding neighbor node.

In one embodiment, the node position determination unit 44 includes an initial term determination module 441 configured to determine the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.

In one embodiment, the node position determination unit 44 includes an offset term determination module 442 for determining an offset term.

In one example, the offset determination module 442 is configured to: sum the current embedded vectors of each neighbor node with the strength of the association between node i and each neighbor node to determine the center position of the neighbor; based at least on the predetermined The attenuation coefficient α, the position of the neighbor center, determines the position offset term.

In another example, the offset term determination module 442 is configured to: determine the sum of the association strength of node i and all its neighbor nodes; determine the ratio of the association strength between node i and each neighboring node to the sum value, as Relative correlation strength; using the relative correlation strength as a weight, summing the current embedding vectors of each neighbor node to determine the center position of the neighbor; and using the product of the center position of the neighbor and the predetermined attenuation coefficient α as the position offset term .

According to a possible design, the predetermined convergence condition on which the condition determining unit 45 is based may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or , The sum of the differences between the current embedded vector determined by the current node and the current embedded vector determined previously is less than the second predetermined value.

According to a possible design, the predetermined convergence condition may also be that the number of times that the current embedding vector Ei of each node i is determined reaches a predetermined number of thresholds.

In one embodiment, the embedding position determining unit 46 is configured to determine the embedding vector of the node i as the difference between the current embedding vector Ei of the node i and its initial position when the predetermined convergence condition is satisfied.

Through the above methods and devices, a complex relational network graph can be quickly and efficiently embedded in a multi-dimensional space of any dimension, thereby facilitating subsequent node information processing.

According to another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.

According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, where the memory stores executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 is implemented. method.

Those skilled in the art should be aware that, in one or more of the above examples, the functions described in the present invention may be implemented by hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

The specific embodiments described above further describe the objectives, technical solutions, and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the present invention The scope of protection, any modification, equivalent replacement, or improvement made on the basis of the technical solution of the present invention shall be included in the scope of protection of the present invention.

Claims

A method for embedding a relational network graph into a multi-dimensional space. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The method includes:

Randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes;

For each node i, obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node;

Determine the current embedding vectors of each neighboring node of the node i;

Obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on the initial embedding vector Ci It is determined that the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and an association strength between the node i and each neighboring node;

Determine whether the predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, determine the current embedding vector of each neighboring node of the node i again, and determine the current embedding vector Ei of the node i again until the predetermined convergence condition is obtained Satisfy;

Based on at least the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.
The method according to claim 1, wherein obtaining a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node comprises:

Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;

Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
The method according to claim 2, wherein determining the neighbor node of node i through the adjacency matrix, and the strength of association between node i and each neighboring node include:

Obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix;

A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
The method according to claim 1, wherein the obtaining the position initial term of the node i comprises determining the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
The method according to claim 1, wherein obtaining a position offset item of the node i comprises:

Sum the current embedding vectors of each neighbor node with the strength of the association between node i and each neighbor node to determine the center position of the neighbor;

The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.
The method according to claim 1, wherein obtaining a position offset item of the node i comprises:

Determine the sum of the association strengths of node i and all its neighbors;

Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;

Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;

The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.
The method of claim 1, wherein the predetermined convergence conditions include:

For each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or

The sum of the differences between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value.
The method according to claim 1, wherein the predetermined convergence condition comprises: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds.
The method according to claim 1, wherein determining the embedding vector of each node i in the multi-dimensional space comprises determining the embedding vector of node i as the current embedding vector Ei of node i and its initial position when the predetermined convergence condition is satisfied. Term difference.
A device for embedding a relational network graph into a multi-dimensional space. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The device includes:

An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i of the plurality of nodes in a multi-dimensional space;

The neighbor node determining unit is configured to obtain, for each node i, a neighbor node connected to the node i, and an association strength between the node i and each neighbor node;

A neighbor location determining unit configured to determine a current embedding vector of each neighbor node of the node i;

The node position determining unit is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and a strength of association between the node i and each neighboring node;

The condition determining unit is configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, cause the neighbor position determination unit to determine a current embedding vector of each neighbor node of the node i again, and the node position The determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied;

The embedding position determining unit is configured to determine an embedding vector of each node i in the multi-dimensional space based on at least a current embedding vector Ei of each node i that satisfies the predetermined convergence condition.
The apparatus according to claim 10, wherein the neighbor node determination unit is configured to:

Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;

Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
The apparatus according to claim 11, wherein the neighbor node determination unit is configured to:

Obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix;

A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
The apparatus according to claim 10, wherein the node position determination unit comprises an initial term determination module configured to determine the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
The apparatus according to claim 10, wherein the node position determination unit comprises an offset term determination module configured to:

Using the strength of the association between node i and each neighbor node as a weight, sum the current embedding vectors of each neighbor node to determine the center position of the neighbor;

The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.
The apparatus according to claim 10, wherein the node position determination unit comprises an offset term determination module configured to:

Determine the sum of the association strengths of node i and all its neighbors;

Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;

Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;

The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.
The apparatus according to claim 10, wherein the predetermined convergence conditions include:

For each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or

The sum of the differences between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value.
The apparatus according to claim 10, wherein the predetermined convergence condition comprises: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds.
The apparatus according to claim 10, wherein the embedded position determination unit is configured to determine an embedded vector of the node i as a difference between a current embedded vector Ei of the node i and an initial position of the position when the predetermined convergence condition is satisfied.
A computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, causes the computer to execute the method according to any one of claims 1-9.
A computing device includes a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the processor according to any one of claims 1-9 is implemented. method.