WO2020015464A1 - Method and apparatus for embedding relational network diagram - Google Patents

Method and apparatus for embedding relational network diagram Download PDF

Info

Publication number
WO2020015464A1
WO2020015464A1 PCT/CN2019/089022 CN2019089022W WO2020015464A1 WO 2020015464 A1 WO2020015464 A1 WO 2020015464A1 CN 2019089022 W CN2019089022 W CN 2019089022W WO 2020015464 A1 WO2020015464 A1 WO 2020015464A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
neighbor
embedding vector
association
strength
Prior art date
Application number
PCT/CN2019/089022
Other languages
French (fr)
Chinese (zh)
Inventor
向彪
刘子奇
周俊
李小龙
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020015464A1 publication Critical patent/WO2020015464A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Definitions

  • One or more embodiments of the present specification relate to the field of computer information processing, and in particular, to a method and an apparatus for embedding a relational network graph.
  • a relational network diagram is a description of the relationships between entities in the real world. It is currently widely used in various computer information processing.
  • a relational network graph contains a set of nodes and a set of edges. Nodes represent entities in the real world and edges represent connections between entities in the real world. For example, in a social network, a person is an entity, and a relationship or connection between people is an edge.
  • each node (entity) in a relational network graph with coordinate values in a multi-dimensional space, that is, to map each node to a multi-dimensional space, and use points in the multi-dimensional space to represent the node.
  • Multidimensional space can be 2D, 3D, or higher dimensional. Representing the nodes in the graph with coordinates in multidimensional space can be used to calculate the similarity between nodes and nodes, find the community structure in the graph, predict the possible edge connections in the future, and visualize the graph.
  • the process of mapping nodes in a graph to a multidimensional space is called graph embedding.
  • Graph embedding is a very important basic technical capability.
  • a variety of graph embedding methods have been developed in the academic community, such as DeepWalk, node2vec, GraphRep and so on.
  • these algorithms use the Monte Carlo sampling method internally, the calculation efficiency is relatively low.
  • the scale of the graph becomes very large (such as the Alipay Friendship Network with more than 500 million nodes), it will consume huge computing resources to perform graph embedding calculations.
  • One or more embodiments of the present specification describe a graph embedding method for a relational network graph, which can efficiently embed nodes in a complex relational network graph into a multi-dimensional space to facilitate subsequent information processing.
  • a method for embedding a relational network graph into a multi-dimensional space includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength.
  • Methods include:
  • each node i For each node i, obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node;
  • Obtain a position initial term and a position offset term of the node i and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on the initial embedding vector Ci It is determined that the position offset term is determined according to a predetermined attenuation coefficient ⁇ , a current embedding vector of each neighboring node, and an association strength between the node i and each neighboring node;
  • the embedding vector of each node i in the multi-dimensional space is determined.
  • the neighbor node information of node i is obtained in the following manner:
  • an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;
  • the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
  • determining each neighbor node of node i through the adjacency matrix, and each association strength includes:
  • a node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
  • the position initial term is determined based on the initial embedding vector Ci and the predetermined attenuation coefficient.
  • the position offset of node i is obtained in the following manner:
  • the position offset term is determined based on at least the predetermined attenuation coefficient ⁇ and the neighbor center position.
  • the position offset of node i is obtained in the following manner:
  • the product of the center position of the neighbor and the predetermined attenuation coefficient ⁇ is used as the position offset term.
  • the predetermined convergence condition may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or the current time of each node The sum of the difference between the determined current embedded vector and the previously determined current embedded vector is less than the second predetermined value.
  • the foregoing predetermined convergence condition may be that the number of times that the current embedding vector Ei of each node i is determined to reach a predetermined number of thresholds.
  • the embedded vector of node i is determined as the difference between the current embedded vector Ei of node i and its initial position when the predetermined convergence condition is satisfied.
  • a device for embedding a relational network graph into a multi-dimensional space includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength.
  • the device include:
  • An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i of the plurality of nodes in a multi-dimensional space
  • the neighbor node determining unit is configured to obtain, for each node i, a neighbor node connected to the node i, and an association strength between the node i and each neighbor node;
  • a neighbor location determining unit configured to determine a current embedding vector of each neighbor node of the node i;
  • the node position determining unit is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient ⁇ , a current embedding vector of each neighboring node, and a strength of association between the node i and each neighboring node;
  • the condition determining unit is configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, cause the neighbor position determination unit to determine a current embedding vector of each neighbor node of the node i again, and the node position The determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied;
  • the embedding position determining unit is configured to determine an embedding vector of each node i in the multi-dimensional space based on at least a current embedding vector Ei of each node i that satisfies the predetermined convergence condition.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method of the first aspect is implemented .
  • a relational network graph can be efficiently embedded in a multi-dimensional space, which facilitates subsequent node information processing.
  • FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in the specification
  • FIG. 2 illustrates a method of embedding a relational network graph into a multi-dimensional space according to one embodiment
  • FIG. 3 shows an example of a relational network graph embedded in a two-dimensional space
  • FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment.
  • FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in this specification.
  • the relationship network diagram includes multiple nodes. For clarity, these nodes are numbered in FIG. 1.
  • the nodes with association relationship are connected by edges.
  • the nodes in Figure 1 represent people or users in the social network.
  • the two nodes are connected by edges, which means that the corresponding two users have social associations, such as transfers, messages, communications, etc. .
  • the association relationship between the nodes also has different association strengths.
  • different association strengths are set for different social interaction behaviors, for example, the association strength of users who perform transfer interaction is 0.8, the association strength of users who perform message operations is 0.5, and so on.
  • the attribute of the edge or the weight of the edge may be used to represent the strength of the association between the two users connected to the edge.
  • the locations of each node are schematically shown. In fact, the network diagram does not set the location of the nodes.
  • the method of graph embedding is needed to map each node into a multi-dimensional space. The method of graph embedding provided by the embodiment of the present specification is described below.
  • FIG. 2 illustrates a method for embedding a relational network graph into a multi-dimensional space according to an embodiment, where the relational network graph includes a plurality of nodes, and nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength.
  • the above method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in FIG.
  • the method includes: step 21, randomly determining an initial embedding vector Ci of each node i in a plurality of nodes in a multi-dimensional space; step 22, for each node i, obtaining a neighbor node connected to the node i And the strength of the association between the node i and each neighbor node; step 23, determine the current embedding vector of each neighbor node of the node i; step 24, obtain the initial position and position offset terms of the node i, and according to The position initial term and position offset term determine a current embedding vector Ei of node i, wherein the position initial term is determined based on the initial embedding vector Ci, and the position offset term is based on a predetermined attenuation coefficient ⁇ , the The current embedding vector of each neighbor node and the strength of the association between this node i and each neighbor node are determined; step 25, it is determined whether the predetermined convergence condition is satisfied; if the predetermined convergence condition is not satisfied, each neighbor of this node i is determined again The current embed
  • an initial embedding vector Ci of each node i in a multi-dimensional space among a plurality of nodes in the relation network graph is randomly determined.
  • the relational network graph contains N nodes, and the dimension of the multidimensional space to be embedded is s, then for each node i of the N nodes, an s-dimensional vector Ci is randomly generated as its initial embedding vector.
  • step 22 for each node i, a neighbor node connected to the node i and the strength of the association between the node i and each neighbor node are obtained.
  • nodes with an association relationship are connected to each other, and nodes connected to each other are neighbor nodes to each other.
  • the topology of the relational network graph can be recorded in a variety of ways. For example, in one example, the connection relationships of a relational network diagram are recorded by a chart. At this time, the neighbor node information of each node i and the strength of the association between node i and the neighbor node can be read from the above-mentioned graph.
  • connection relationships of the relational network graph are recorded by a matrix.
  • a matrix describing a relational network graph may have an adjacency matrix, a degree matrix, a Laplacian matrix, and the like.
  • the neighbor information and association strength information of the nodes are obtained by recording the adjacency matrix of the network relationships of the relationship network graph.
  • matrix A is an adjacency matrix of a relational network graph G
  • matrix A can be expressed as:
  • the element a mk in the m-th row and the k-th column corresponds to the strength of the association between the node m and the node k.
  • the neighbor information and association strength information of each node can be easily obtained.
  • the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix A is obtained, that is, a ij or a ji ;
  • Node j is determined as the neighbor of node i, and the value of the non-zero element is determined as the strength of association between node i and the corresponding neighbor node.
  • step 23 the current embedding vector Ej of each neighbor node j of node i is determined.
  • the initial embedding vector is randomly generated for each node in step 21, when the step 23 is first performed, for the neighbor node j that has not updated the current embedding vector, its current embedding vector Ej is its corresponding The initial embedding vector Cj.
  • the current embedding vector of each node will be updated in subsequent iterations, which will be described in subsequent steps.
  • the current embedding vector Ei of node i is determined.
  • the current embedding vector Ei of the node i can be considered to be composed of two parts: a position initial term VI and a position offset term VD:
  • the position initial term VI is determined based on the initial embedding vector Ci, and the position offset term VD is determined according to a predetermined attenuation coefficient ⁇ , the current embedding vector Ej of each neighboring node j, and the strength of association a ij between the node i and each neighboring node. .
  • the initial position VI of the position of node i is its initial embedding vector Ci, that is:
  • the initial position term may be an initial embedding vector Ci multiplied by a certain coefficient.
  • the coefficient may be related to the attenuation coefficient ⁇ introduced in the position offset term. Therefore, in one embodiment, the position initial term can be determined based on the initial embedding vector Ci and the attenuation coefficient ⁇ . Specifically, in one example, the position initial term VI is determined as:
  • the initial position item is determined, it is fixed during subsequent update iterations.
  • the position offset term VD of the node i is also determined.
  • the position offset term VD is determined according to a predetermined attenuation coefficient ⁇ , a current embedding vector Ej of each neighboring node j, and an association strength a ij between the node i and each neighboring node.
  • the attenuation coefficient ⁇ is used to adjust the step size or size of the position offset adjustment, and is generally preset to a value between 0 and 1.
  • the current embedding vector Ej of each neighbor node j is summed to determine the center position of the neighbor; then based on a predetermined attenuation coefficient ⁇ , the neighbor The center position determines the position offset VD.
  • the position offset term VD is determined as:
  • N (i) represents the set of neighbor nodes of node i.
  • VD is more suitable for the case where the correlation strength a ij itself is defined between 0 and 1. If the range of the correlation strength a ij is large, it can be set to a smaller value when the attenuation coefficient is set in advance.
  • the position offset term VD of node i is determined by determining the sum value di of the association strength between node i and all its neighboring nodes j; and determining the association strength aij between node i and each neighboring node j The ratio to the sum value di is taken as the relative correlation strength; the current embedding vector Ej of each neighbor node j is summed to determine the neighbor center position using the relative correlation strength as a weight; the neighbor center position and a predetermined attenuation coefficient ⁇ The product of, as the position offset term VD.
  • the position offset term VD is determined as:
  • the position offset VD can reflect the bias toward the center of the neighbor. The distance moved.
  • the current embedding vector Ei of node i can be determined as:
  • steps 23 and 24 above are performed for each node i in the relational network graph, thereby determining the current embedding vector for each node.
  • step 25 it is determined whether a predetermined convergence condition is satisfied. If the predetermined convergence condition is not satisfied, return to step 23 and step 24 to determine the current embedding vector of each neighboring node of node i again, and the current embedding vector Ei of node i again.
  • step 23 is performed for each node in the relational network graph, so each time the loop of steps 23 and 24 is executed, the current embedding vector of each node will be updated.
  • step 23 is performed at the (n + 1) th time, for the same node i, the current embedding vector Ej of its neighbor node j is different from that at the nth time.
  • step 24 is executed. In this way, the position offset term in step 24 will be changed each time the above-mentioned loop is executed, so that the current embedding vector of each node i is constantly updated.
  • Such a loop is repeatedly performed until a predetermined convergence condition is satisfied.
  • the predetermined convergence condition is set according to an offset adjustment amount that corresponds to an offset between a position determined this time and a position determined previously.
  • the predetermined convergence condition may be set such that, for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value. For example, for N nodes in a relational network graph, if the difference between the current embedding vector of each node and the last determined embedding vector, that is, the offset distance, is less than a distance threshold, then it means that the node The position adjustment has been reduced to a certain extent, and the position of the nodes tends to stabilize and converge, so as to reach the convergence condition.
  • the predetermined convergence condition may be set such that the sum of the difference between the current embedded vector determined this time and the previously determined current embedded vector of each node is less than the second predetermined value. That is, consider the sum DT of the offset distances of N nodes:
  • Di is the offset distance of node i, that is, the difference between the current embedding vector and the embedding vector determined last time.
  • a preset number of executions of the loop may be used as the convergence condition. That is, when it is determined that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds, it is considered that a convergence condition is satisfied.
  • the above execution times can generally be set between 10-20 times.
  • step 26 determines the embedding vector Qi of each node i in the multi-dimensional space based on at least the current embedding vector Ei of each node i that meets the predetermined convergence condition.
  • the embedding vector of node i is determined as the difference between the current embedding vector Ei of node i and its initial position when the predetermined convergence condition is met, that is:
  • VI is associated with the initial embedding vector Ci, for example equal to Ci, or equal to Ci multiplied by a certain coefficient, such as (1- ⁇ ) Ci.
  • the nodes in the relational network graph can be embedded into the multi-dimensional space.
  • the nodes embedded in the multi-dimensional space have position information, and since the connection relationship and connection strength between nodes are considered in the embedding process, the position information also reflects the association relationship between the nodes. For example, there is a stronger association between nodes that are close to each other in a multi-dimensional space. In this way, it is very helpful for further processing of node relationship information, such as clustering nodes, finding groups formed by nodes, calculating similarity between nodes, predicting potential edge connections of nodes, and so on.
  • the relational network diagram is embedded in a two-dimensional space or a three-dimensional space, it is also very helpful for the visual presentation of the relational network.
  • FIG. 3 shows an example of a relational network diagram embedded in a two-dimensional space. More specifically, FIG. 3 is an example of embedding the relationship network graph of FIG. 1 into a two-dimensional space by using the method shown in FIG. 2.
  • the positions of the nodes in FIG. 3 contain more information, which reflects the association relationship between the nodes. Some nodes are very close to each other, which means that these nodes have a stronger association relationship.
  • an embodiment of the present specification further provides an apparatus for embedding a relational network graph into a multi-dimensional space, wherein the relational network graph to be embedded includes a plurality of nodes, and the nodes having the association relationship among the multiple nodes have a certain association strength. Connected to each other.
  • FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment. As shown in FIG.
  • the graph embedding apparatus 400 includes: an initial position determining unit 41 configured to randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes; a neighbor node determining unit 42 configured to Node i, to obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node; the neighbor location determination unit 43 is configured to determine the current embedding vectors of each neighbor node of the node i; node The position determining unit 44 is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, wherein the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient ⁇ , the current embedding vector of each neighbor node, and the strength of the association between the node i and each neighbor node;
  • the neighbor node determining unit 42 is configured to obtain an adjacency matrix that records a network relationship of the relationship network graph, and the elements in the m-th row and the k-th column of the adjacency matrix correspond to the m-th node and the k-th node
  • the strength of association between the nodes; the neighbor matrix of node i and the strength of association between node i and each neighboring node are determined through the adjacency matrix.
  • the neighbor node determination unit 42 determines neighbor node information in the following manner: obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix; A node corresponding to a non-zero element in a row element or an i-th column element is determined as a neighbor node of the node i; a value of the non-zero element is determined as a strength of association between the node i and a corresponding neighbor node.
  • the node position determination unit 44 includes an initial term determination module 441 configured to determine the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
  • the node position determination unit 44 includes an offset term determination module 442 for determining an offset term.
  • the offset determination module 442 is configured to: sum the current embedded vectors of each neighbor node with the strength of the association between node i and each neighbor node to determine the center position of the neighbor; based at least on the predetermined The attenuation coefficient ⁇ , the position of the neighbor center, determines the position offset term.
  • the offset term determination module 442 is configured to: determine the sum of the association strength of node i and all its neighbor nodes; determine the ratio of the association strength between node i and each neighboring node to the sum value, as Relative correlation strength; using the relative correlation strength as a weight, summing the current embedding vectors of each neighbor node to determine the center position of the neighbor; and using the product of the center position of the neighbor and the predetermined attenuation coefficient ⁇ as the position offset term .
  • the predetermined convergence condition on which the condition determining unit 45 is based may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or , The sum of the differences between the current embedded vector determined by the current node and the current embedded vector determined previously is less than the second predetermined value.
  • the predetermined convergence condition may also be that the number of times that the current embedding vector Ei of each node i is determined reaches a predetermined number of thresholds.
  • the embedding position determining unit 46 is configured to determine the embedding vector of the node i as the difference between the current embedding vector Ei of the node i and its initial position when the predetermined convergence condition is satisfied.
  • a complex relational network graph can be quickly and efficiently embedded in a multi-dimensional space of any dimension, thereby facilitating subsequent node information processing.
  • a computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, where the memory stores executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 is implemented. method.
  • the functions described in the present invention may be implemented by hardware, software, firmware, or any combination thereof.
  • the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.

Abstract

Provided by the embodiments of the present description are a method and apparatus for embedding a relational network diagram into a multidimensional space, the method comprising: randomly determining an initial embedding vector Ci of each node i in a multidimensional space; obtaining neighbor nodes of each node i and the strength of association thereof with each neighbor node; determining a current embedding vector of the each neighbor node of the node i; forming a position initialization item and a position offset item of the nodes i on the basis of a predetermined attenuation coefficient, the strength of association and the current position of the neighbor nodes, and determining a current embedding vector of the nodes i accordingly; repeating the described steps until a convergence condition is satisfied, at which time the embedding vector of each node i in the multidimensional space may be determined. Thus, a relational network graph is efficiently embedded into the multi-dimensional space.

Description

关系网络图嵌入的方法及装置Method and device for embedding relation network graph 技术领域Technical field
本说明书一个或多个实施例涉及计算机信息处理领域,尤其涉及关系网络图嵌入的方法和装置。One or more embodiments of the present specification relate to the field of computer information processing, and in particular, to a method and an apparatus for embedding a relational network graph.
背景技术Background technique
关系网络图是对现实世界中实体之间的关系的描述,目前广泛地应用于各种计算机信息处理中。一般地,关系网络图包含一个节点集合和一个边集合,节点表示现实世界中的实体,边表示现实世界中实体之间的联系。例如,在社交网络中,人就是实体,人和人之间的关系或联系就是边。A relational network diagram is a description of the relationships between entities in the real world. It is currently widely used in various computer information processing. Generally, a relational network graph contains a set of nodes and a set of edges. Nodes represent entities in the real world and edges represent connections between entities in the real world. For example, in a social network, a person is an entity, and a relationship or connection between people is an edge.
在许多情况下,希望将关系网络图中的每个节点(实体)用多维空间中的坐标值来表示,也就是将各个节点映射到一个多维空间中,用多维空间中的点代表图中的节点。多维空间可以是2维、3维空间,也可以是更高维空间。用多维空间的坐标来表达图中的节点,可以应用于计算节点和节点之间的相似度,发现图中的社团结构,预测未来可能形成的边联系,以及对图进行可视化等。将图中的节点映射到多维空间的过程称为图嵌入。In many cases, it is desirable to represent each node (entity) in a relational network graph with coordinate values in a multi-dimensional space, that is, to map each node to a multi-dimensional space, and use points in the multi-dimensional space to represent the node. Multidimensional space can be 2D, 3D, or higher dimensional. Representing the nodes in the graph with coordinates in multidimensional space can be used to calculate the similarity between nodes and nodes, find the community structure in the graph, predict the possible edge connections in the future, and visualize the graph. The process of mapping nodes in a graph to a multidimensional space is called graph embedding.
图嵌入是一种非常重要的基础技术能力。当前学术界已研究出多种图嵌入方法,如DeepWalk,node2vec,GraphRep等。但由于这些算法内部均采用了蒙特卡洛采样方法,计算效率比较低。当图的规模变得很大时(如支付宝朋友关系网络有5亿以上节点),进行图嵌入计算将耗费巨大的计算资源。Graph embedding is a very important basic technical capability. At present, a variety of graph embedding methods have been developed in the academic community, such as DeepWalk, node2vec, GraphRep and so on. However, because these algorithms use the Monte Carlo sampling method internally, the calculation efficiency is relatively low. When the scale of the graph becomes very large (such as the Alipay Friendship Network with more than 500 million nodes), it will consume huge computing resources to perform graph embedding calculations.
因此,希望能有改进的方案,更加快速有效地进行关系网络图的图嵌入过程。Therefore, it is hoped that there can be an improved scheme to perform the graph embedding process of the relation network graph more quickly and efficiently.
发明内容Summary of the invention
本说明书一个或多个实施例描述了一种关系网络图的图嵌入方法,可以高效地将复杂关系网络图中的节点嵌入到多维空间中,以便于后续的信息处理。One or more embodiments of the present specification describe a graph embedding method for a relational network graph, which can efficiently embed nodes in a complex relational network graph into a multi-dimensional space to facilitate subsequent information processing.
根据第一方面,提供了一种将关系网络图嵌入到多维空间的方法,所述关系网络图包括多个节点,所述多个节点中具有关联关系的节点以一定关联强度互相连接,所述方法包括:According to a first aspect, a method for embedding a relational network graph into a multi-dimensional space is provided. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength. Methods include:
随机确定所述多个节点中各个节点i在多维空间的初始嵌入向量Ci;Randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes;
对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;For each node i, obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node;
确定该节点i的各个邻居节点的当前嵌入向量;Determine the current embedding vectors of each neighboring node of the node i;
获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;Obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on the initial embedding vector Ci It is determined that the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and an association strength between the node i and each neighboring node;
判断预定收敛条件是否得到满足,在不满足该预定收敛条件的情况下,再次确定该节点i的各个邻居节点的当前嵌入向量,以及再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得到满足;Determine whether the predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, determine the current embedding vector of each neighboring node of the node i again, and determine the current embedding vector Ei of the node i again until the predetermined convergence condition is obtained Satisfy;
至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在所述多维空间的嵌入向量。Based on at least the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.
根据一种实施方式,通过以下方式获取节点i的邻居节点信息:According to one embodiment, the neighbor node information of node i is obtained in the following manner:
获取记录所述关系网络图的网络关系的邻接矩阵,邻接矩阵中第m行第k列的元素对应于第m节点与第k节点之间的关联强度;Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;
通过所述邻接矩阵,确定节点i的邻居节点,以及节点i与各个邻居节点之间的关联强度。Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
进一步地,通过邻接矩阵确定节点i的各个邻居节点,以及各个关联强度包括:Further, determining each neighbor node of node i through the adjacency matrix, and each association strength includes:
获取邻接矩阵中与节点i对应的第i行元素或第i列元素;Obtaining the i-th row element or i-th column element corresponding to node i in the adjacency matrix;
将所述第i行元素或第i列元素中非零元素对应的节点确定为节点i的邻居节点;将所述非零元素的值确定为节点i与对应邻居节点之间的关联强度。A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
根据一个实施例,位置初始项基于初始嵌入向量Ci以及所述预定衰减系数而确定。According to one embodiment, the position initial term is determined based on the initial embedding vector Ci and the predetermined attenuation coefficient.
在一个实施例中,通过以下方式获取节点i的位置偏移项:In one embodiment, the position offset of node i is obtained in the following manner:
以节点i与各个邻居节点之间的关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;Using the strength of the association between node i and each neighbor node as a weight, sum the current embedding vectors of each neighbor node to determine the center position of the neighbor;
至少基于所述预定衰减系数α,所述邻居中心位置,确定所述位置偏移项。The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.
在另一实施例中,通过以下方式获取节点i的位置偏移项:In another embodiment, the position offset of node i is obtained in the following manner:
确定节点i与其所有邻居节点的关联强度的和值;Determine the sum of the association strengths of node i and all its neighbors;
确定节点i与各个邻居节点之间的关联强度与所述和值的比例,作为相对关联强度;Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;
以所述相对关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;
将邻居中心位置与所述预定衰减系数α的乘积,作为所述位置偏移项。The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.
根据一种可能的设计,上述预定收敛条件可以是:对于每个节点,本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值小于第一预定值;或者,各个节点的本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值的总和小于第二预定值。According to a possible design, the predetermined convergence condition may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or the current time of each node The sum of the difference between the determined current embedded vector and the previously determined current embedded vector is less than the second predetermined value.
根据另一种可能的设计,上述预定收敛条件可以是,确定各个节点i的当前嵌入向量Ei的次数达到预定次数阈值。According to another possible design, the foregoing predetermined convergence condition may be that the number of times that the current embedding vector Ei of each node i is determined to reach a predetermined number of thresholds.
在一个实施例中,将节点i的嵌入向量确定为,满足所述预定收敛条件时节点i的当前嵌入向量Ei与其位置初始项之差。In one embodiment, the embedded vector of node i is determined as the difference between the current embedded vector Ei of node i and its initial position when the predetermined convergence condition is satisfied.
根据第二方面,提供一种将关系网络图嵌入到多维空间的装置,所述关系网络图包括多个节点,所述多个节点中具有关联关系的节点以一定关联强度互相连接,所述装置包括:According to a second aspect, a device for embedding a relational network graph into a multi-dimensional space is provided. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength. The device include:
初始位置确定单元,配置为随机确定所述多个节点中各个节点i在多维空间的初始嵌入向量Ci;An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i of the plurality of nodes in a multi-dimensional space;
邻居节点确定单元,配置为对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;The neighbor node determining unit is configured to obtain, for each node i, a neighbor node connected to the node i, and an association strength between the node i and each neighbor node;
邻居位置确定单元,配置为确定该节点i的各个邻居节点的当前嵌入向量;A neighbor location determining unit configured to determine a current embedding vector of each neighbor node of the node i;
节点位置确定单元,配置为获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;The node position determining unit is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and a strength of association between the node i and each neighboring node;
条件判定单元,配置为判断预定收敛条件是否得到满足,在不满足该预定收敛条件的情况下,使得所述邻居位置确定单元再次确定该节点i的各个邻居节点的当前嵌入向量,所述节点位置确定单元再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得 到满足;The condition determining unit is configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, cause the neighbor position determination unit to determine a current embedding vector of each neighbor node of the node i again, and the node position The determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied;
嵌入位置确定单元,配置为至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在所述多维空间的嵌入向量。The embedding position determining unit is configured to determine an embedding vector of each node i in the multi-dimensional space based on at least a current embedding vector Ei of each node i that satisfies the predetermined convergence condition.
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。According to a third aspect, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。According to a fourth aspect, there is provided a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method of the first aspect is implemented .
通过本说明书实施例提供的方法和装置,可以高效地将关系网络图嵌入到多维空间中,便于后续的节点信息处理。Through the methods and devices provided in the embodiments of the present specification, a relational network graph can be efficiently embedded in a multi-dimensional space, which facilitates subsequent node information processing.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solution of the embodiment of the present invention more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can obtain other drawings according to the drawings without paying creative labor.
图1为本说明书披露的一个实施例的关系网络图的示意图;FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in the specification; FIG.
图2示出根据一个实施例的将关系网络图嵌入到多维空间的方法;2 illustrates a method of embedding a relational network graph into a multi-dimensional space according to one embodiment;
图3示出嵌入到二维空间的关系网络图的示例;FIG. 3 shows an example of a relational network graph embedded in a two-dimensional space;
图4示出根据一个实施例的图嵌入装置的示意性框图。FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment.
具体实施方式detailed description
下面结合附图,对本说明书提供的方案进行描述。The solutions provided in this specification are described below with reference to the drawings.
图1为本说明书披露的一个实施例的关系网络图的示意图。如图1所示,该关系网络图包括多个节点,为了清楚起见,图1中为这些节点进行了编号。在这些节点中,具有关联关系的节点之间用边进行连接。在一个例子中,图1中的节点表示社交网络中的人或用户,两个节点之间通过边连接起来,即表示对应的两个用户存在社交上的关联,例如转账、留言、通讯等等。FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in this specification. As shown in FIG. 1, the relationship network diagram includes multiple nodes. For clarity, these nodes are numbered in FIG. 1. Among these nodes, the nodes with association relationship are connected by edges. In one example, the nodes in Figure 1 represent people or users in the social network. The two nodes are connected by edges, which means that the corresponding two users have social associations, such as transfers, messages, communications, etc. .
在一个实施例中,节点之间的关联关系还具有不同的关联强度。例如,在一个例子 中,针对不同的社交交互行为,设定不同的关联强度,比如,进行转账交互的用户的关联强度为0.8,进行留言操作的用户的关联强度为0.5,等等。在一个实施例中,在关联关系具有不同关联强度的情况下,可以利用边的属性或边的权值来表示该边所连接的两个用户之间的关联强度。In one embodiment, the association relationship between the nodes also has different association strengths. For example, in one example, different association strengths are set for different social interaction behaviors, for example, the association strength of users who perform transfer interaction is 0.8, the association strength of users who perform message operations is 0.5, and so on. In one embodiment, in a case where the association relationship has different association strengths, the attribute of the edge or the weight of the edge may be used to represent the strength of the association between the two users connected to the edge.
在图1中的关系网络图中,为了示出各个节点以及节点间的连接关系,而示意性地示出各个节点的位置。实际上,网络关系图并不对节点的位置进行设置。对于节点的位置,需要采用图嵌入的方法,将各个节点映射到多维空间中。下面描述本说明书实施例所提供的图嵌入的方法。In the relationship network diagram in FIG. 1, in order to show each node and the connection relationship between the nodes, the locations of each node are schematically shown. In fact, the network diagram does not set the location of the nodes. For the position of the nodes, the method of graph embedding is needed to map each node into a multi-dimensional space. The method of graph embedding provided by the embodiment of the present specification is described below.
图2示出根据一个实施例的将关系网络图嵌入到多维空间的方法,其中关系网络图包括多个节点,多个节点中具有关联关系的节点以一定关联强度互相连接。上述方法可以由任何具有计算、处理能力的装置、设备、平台、设备集群执行。如图2所示,所述方法包括:步骤21,随机确定多个节点中各个节点i在多维空间的初始嵌入向量Ci;步骤22,对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;步骤23,确定该节点i的各个邻居节点的当前嵌入向量;步骤24,获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;步骤25,判断预定收敛条件是否得到满足;如果不满足预定收敛条件,则再次确定该节点i的各个邻居节点的当前嵌入向量,以及再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得到满足;步骤26,至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在多维空间的嵌入向量。下面描述以上各个步骤的执行方式。FIG. 2 illustrates a method for embedding a relational network graph into a multi-dimensional space according to an embodiment, where the relational network graph includes a plurality of nodes, and nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The above method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in FIG. 2, the method includes: step 21, randomly determining an initial embedding vector Ci of each node i in a plurality of nodes in a multi-dimensional space; step 22, for each node i, obtaining a neighbor node connected to the node i And the strength of the association between the node i and each neighbor node; step 23, determine the current embedding vector of each neighbor node of the node i; step 24, obtain the initial position and position offset terms of the node i, and according to The position initial term and position offset term determine a current embedding vector Ei of node i, wherein the position initial term is determined based on the initial embedding vector Ci, and the position offset term is based on a predetermined attenuation coefficient α, the The current embedding vector of each neighbor node and the strength of the association between this node i and each neighbor node are determined; step 25, it is determined whether the predetermined convergence condition is satisfied; if the predetermined convergence condition is not satisfied, each neighbor of this node i is determined again The current embedding vector of the node and the current embedding vector Ei of the node i are determined again until the predetermined convergence condition is satisfied; step 26, Less based satisfies the predetermined convergence condition of each node i of the current embedded vector Ei, determine respective node i is embedded in a multidimensional vector space. The following describes how the above steps are performed.
首先,在步骤21,随机确定关系网络图的多个节点中各个节点i在多维空间的初始嵌入向量Ci。假定关系网络图包含N个节点,要嵌入的多维空间的维数为s,那么针对N个节点中的每个节点i,为其随机生成一个s维向量Ci,作为其初始嵌入向量。First, in step 21, an initial embedding vector Ci of each node i in a multi-dimensional space among a plurality of nodes in the relation network graph is randomly determined. Assume that the relational network graph contains N nodes, and the dimension of the multidimensional space to be embedded is s, then for each node i of the N nodes, an s-dimensional vector Ci is randomly generated as its initial embedding vector.
另一方面,在步骤22,对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度。On the other hand, in step 22, for each node i, a neighbor node connected to the node i and the strength of the association between the node i and each neighbor node are obtained.
可以理解,在关系网络图中,具有关联关系的节点之间会互相连接,互相连接的节点之间,互为邻居节点。另外,可以理解,关系网络图的拓扑结构可以通过多种方式记录。例如,在一个示例中,通过图表记录关系网络图的连接关系。此时,可以从上述图 表中读取各个节点i的邻居节点信息,以及节点i与邻居节点之间的关联强度。It can be understood that in a relational network diagram, nodes with an association relationship are connected to each other, and nodes connected to each other are neighbor nodes to each other. In addition, it can be understood that the topology of the relational network graph can be recorded in a variety of ways. For example, in one example, the connection relationships of a relational network diagram are recorded by a chart. At this time, the neighbor node information of each node i and the strength of the association between node i and the neighbor node can be read from the above-mentioned graph.
在一个实施例中,通过矩阵来记录关系网络图的连接关系。例如,描述一个关系网络图的矩阵可以有邻接矩阵,度矩阵,拉普拉斯矩阵等。在一个示例中,通过记录关系网络图的网络关系的邻接矩阵来获取节点的邻居信息和关联强度信息。In one embodiment, the connection relationships of the relational network graph are recorded by a matrix. For example, a matrix describing a relational network graph may have an adjacency matrix, a degree matrix, a Laplacian matrix, and the like. In one example, the neighbor information and association strength information of the nodes are obtained by recording the adjacency matrix of the network relationships of the relationship network graph.
具体地,假定矩阵A是关系网络图G的邻接矩阵,矩阵A可以表示为:Specifically, assuming that matrix A is an adjacency matrix of a relational network graph G, matrix A can be expressed as:
A=[a mk] N*NA = [a mk ] N * N ,
其中,第m行第k列的元素a mk对应于节点m与节点k之间的关联强度。 Among them, the element a mk in the m-th row and the k-th column corresponds to the strength of the association between the node m and the node k.
如果两个节点之间没有连接,不存在关联关系,那么它们之间的关联强度为0。If there is no connection between two nodes and there is no association relationship, then the association strength between them is 0.
通过这样的邻接矩阵,可以简单地获取各个节点的邻居信息和关联强度信息。具体地,对于节点i,获取邻接矩阵A中与节点i对应的第i行元素或第i列元素,即a ij或a ji;将第i行元素或第i列元素中非零元素对应的节点j确定为节点i的邻居节点,并将非零元素的值确定为节点i与对应邻居节点之间的关联强度。 Through such an adjacency matrix, the neighbor information and association strength information of each node can be easily obtained. Specifically, for the node i, the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix A is obtained, that is, a ij or a ji ; Node j is determined as the neighbor of node i, and the value of the non-zero element is determined as the strength of association between node i and the corresponding neighbor node.
在确定出各个节点i的邻居节点j的基础上,在步骤23,确定节点i的各个邻居节点j的当前嵌入向量Ej。After determining the neighbor node j of each node i, in step 23, the current embedding vector Ej of each neighbor node j of node i is determined.
可以理解,由于在步骤21为每个节点都随机产生了初始嵌入向量,在第一次执行该步骤23时,对于未更新过当前嵌入向量的邻居节点j,其当前嵌入向量Ej即其对应的初始嵌入向量Cj。各个节点的当前嵌入向量会在后续进行迭代更新,这将在后续步骤中展开描述。It can be understood that, since the initial embedding vector is randomly generated for each node in step 21, when the step 23 is first performed, for the neighbor node j that has not updated the current embedding vector, its current embedding vector Ej is its corresponding The initial embedding vector Cj. The current embedding vector of each node will be updated in subsequent iterations, which will be described in subsequent steps.
基于步骤22和步骤23针对节点i获得的相关信息,在步骤24,确定节点i的当前嵌入向量Ei。具体地,节点i的当前嵌入向量Ei可以认为由两部分组成:位置初始项VI和位置偏移项VD:Based on the relevant information obtained for node i in steps 22 and 23, in step 24, the current embedding vector Ei of node i is determined. Specifically, the current embedding vector Ei of the node i can be considered to be composed of two parts: a position initial term VI and a position offset term VD:
Ei=VI+VD,Ei = VI + VD,
其中位置初始项VI基于初始嵌入向量Ci而确定,位置偏移项VD根据预定衰减系数α、各个邻居节点j的当前嵌入向量Ej以及该节点i与各个邻居节点之间的关联强度a ij而确定。 The position initial term VI is determined based on the initial embedding vector Ci, and the position offset term VD is determined according to a predetermined attenuation coefficient α, the current embedding vector Ej of each neighboring node j, and the strength of association a ij between the node i and each neighboring node. .
在一个实施例中,节点i的位置初始项VI即为其初始嵌入向量Ci,即:In one embodiment, the initial position VI of the position of node i is its initial embedding vector Ci, that is:
VI=Ci。VI = Ci.
在另一实施例中,初始位置项可以是初始嵌入向量Ci乘以一定系数。例如,该系数可以与在位置偏移项中引入的衰减系数α有关。因此,在一个实施例中,可以基于初始嵌入向量Ci以及该衰减系数α,确定位置初始项。具体地,在一个例子中,将位置初始项VI确定为:In another embodiment, the initial position term may be an initial embedding vector Ci multiplied by a certain coefficient. For example, the coefficient may be related to the attenuation coefficient α introduced in the position offset term. Therefore, in one embodiment, the position initial term can be determined based on the initial embedding vector Ci and the attenuation coefficient α. Specifically, in one example, the position initial term VI is determined as:
VI=(1-α)C i VI = (1-α) C i
一般地,位置初始项一旦确定,在后续更新迭代过程中固定不变。Generally, once the initial position item is determined, it is fixed during subsequent update iterations.
另一方面,还要确定节点i的位置偏移项VD。根据说明书的至少一个实施例,根据预定衰减系数α、各个邻居节点j的当前嵌入向量Ej以及该节点i与各个邻居节点之间的关联强度a ij来确定位置偏移项VD。 On the other hand, the position offset term VD of the node i is also determined. According to at least one embodiment of the specification, the position offset term VD is determined according to a predetermined attenuation coefficient α, a current embedding vector Ej of each neighboring node j, and an association strength a ij between the node i and each neighboring node.
其中衰减系数α用于调节位置偏移调整的步长或大小,一般被预设为0到1之间的数值。The attenuation coefficient α is used to adjust the step size or size of the position offset adjustment, and is generally preset to a value between 0 and 1.
在一个实施例中,以节点i与各个邻居节点j之间的关联强度aij为权重,对各个邻居节点j的当前嵌入向量Ej求和,确定邻居中心位置;然后基于预定衰减系数α,上述邻居中心位置,确定位置偏移项VD。In one embodiment, using the strength of association aij between node i and each neighbor node j as a weight, the current embedding vector Ej of each neighbor node j is summed to determine the center position of the neighbor; then based on a predetermined attenuation coefficient α, the neighbor The center position determines the position offset VD.
在一个例子中,根据上述思想,将位置偏移项VD确定为:In one example, according to the above idea, the position offset term VD is determined as:
Figure PCTCN2019089022-appb-000001
Figure PCTCN2019089022-appb-000001
其中N(i)表示节点i的邻居节点集合。Where N (i) represents the set of neighbor nodes of node i.
以上VD的计算方式比较适合于关联强度a ij本身就定义在0到1之间的情况。如果关联强度a ij的范围较大,可以在预先设定衰减系数的时候,将其设定为较小的数值。 The above calculation method of VD is more suitable for the case where the correlation strength a ij itself is defined between 0 and 1. If the range of the correlation strength a ij is large, it can be set to a smaller value when the attenuation coefficient is set in advance.
在另一实施例中,通过以下方式确定节点i的位置偏移项VD:确定节点i与其所有邻居节点j的关联强度的和值di;确定节点i与各个邻居节点j之间的关联强度aij与所述和值di的比例,作为相对关联强度;以所述相对关联强度为权重,对各个邻居节点j的当前嵌入向量Ej求和,确定邻居中心位置;将邻居中心位置与预定衰减系数α的乘积,作为位置偏移项VD。In another embodiment, the position offset term VD of node i is determined by determining the sum value di of the association strength between node i and all its neighboring nodes j; and determining the association strength aij between node i and each neighboring node j The ratio to the sum value di is taken as the relative correlation strength; the current embedding vector Ej of each neighbor node j is summed to determine the neighbor center position using the relative correlation strength as a weight; the neighbor center position and a predetermined attenuation coefficient α The product of, as the position offset term VD.
在一个例子中,根据上述思想,将位置偏移项VD确定为:In one example, according to the above idea, the position offset term VD is determined as:
Figure PCTCN2019089022-appb-000002
Figure PCTCN2019089022-appb-000002
其中:among them:
Figure PCTCN2019089022-appb-000003
Figure PCTCN2019089022-appb-000003
如此,考虑节点i与各个邻居节点j的关联强度,确定节点i的邻居中心位置,然后以衰减系数作为调节,确定位置偏移项VD,如此,该位置偏移项VD可以反映向邻居中心偏移的距离。In this way, considering the correlation strength between node i and each neighboring node j, determine the center position of the neighbor of node i, and then use the attenuation coefficient as an adjustment to determine the position offset VD. In this way, the position offset VD can reflect the bias toward the center of the neighbor. The distance moved.
根据一个具体例子,结合前述的位置初始项,以及如上所述根据相对关联强度确定的位置偏移项,可以将节点i的当前嵌入向量Ei确定为:According to a specific example, combining the foregoing position initial term and the position offset term determined according to the relative correlation strength as described above, the current embedding vector Ei of node i can be determined as:
Figure PCTCN2019089022-appb-000004
Figure PCTCN2019089022-appb-000004
以上描述了多种确定节点i的当前嵌入向量Ei的方式。The above describes various ways to determine the current embedding vector Ei of the node i.
根据任一方式,对于关系网络图中的每个节点i执行以上的步骤23和24,从而为每个节点确定出当前嵌入向量。According to either method, steps 23 and 24 above are performed for each node i in the relational network graph, thereby determining the current embedding vector for each node.
接着,在步骤25,判断预定收敛条件是否得到满足。如果不满足预定收敛条件,则返回到步骤23和步骤24,再次确定节点i的各个邻居节点的当前嵌入向量,以及再次确定节点i的当前嵌入向量Ei。Next, in step 25, it is determined whether a predetermined convergence condition is satisfied. If the predetermined convergence condition is not satisfied, return to step 23 and step 24 to determine the current embedding vector of each neighboring node of node i again, and the current embedding vector Ei of node i again.
可以理解,以上的步骤23和24是针对关系网络图中每个节点执行,因此每次执行步骤23和24的循环,各个节点的当前嵌入向量都会得到更新。相应地,在第n+1次执行步骤23时,对于同一节点i,其邻居节点j的当前嵌入向量Ej与第n次执行时有所不同,实际上在第n+1次执行时使用的是,第n次执行完步骤24时各个节点的当前嵌入向量。这样,步骤24中的位置偏移项在每次执行上述循环时都会发生改变,进而使得各个节点i的当前嵌入向量不断得到更新。It can be understood that the above steps 23 and 24 are performed for each node in the relational network graph, so each time the loop of steps 23 and 24 is executed, the current embedding vector of each node will be updated. Correspondingly, when step 23 is performed at the (n + 1) th time, for the same node i, the current embedding vector Ej of its neighbor node j is different from that at the nth time. Actually, the If yes, the current embedding vector of each node when step 24 is executed. In this way, the position offset term in step 24 will be changed each time the above-mentioned loop is executed, so that the current embedding vector of each node i is constantly updated.
这样的循环反复执行,直到预定收敛条件得到满足。Such a loop is repeatedly performed until a predetermined convergence condition is satisfied.
在一个实施例中,预定收敛条件根据偏移调整量而设置,所述偏移调整量对应于本次确定的位置与前次确定的位置之间的偏移。In one embodiment, the predetermined convergence condition is set according to an offset adjustment amount that corresponds to an offset between a position determined this time and a position determined previously.
具体地,在一个实施例中,预定收敛条件可以设置为,对于每个节点,本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值小于第一预定值。例如,对于关系网络图中的N个节点,如果每个节点的当前嵌入向量相对于上一次确定的嵌入向量之间的差值,也就是偏移距离,都小于一个距离阈值,那么说明,节点的位置调整已经小到一定程度,节点的位置趋于稳定和收敛,从而达到收敛条件。Specifically, in one embodiment, the predetermined convergence condition may be set such that, for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value. For example, for N nodes in a relational network graph, if the difference between the current embedding vector of each node and the last determined embedding vector, that is, the offset distance, is less than a distance threshold, then it means that the node The position adjustment has been reduced to a certain extent, and the position of the nodes tends to stabilize and converge, so as to reach the convergence condition.
在另一实施例中,预定收敛条件可以设置为,各个节点的本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值的总和小于第二预定值。也就是说,考虑N个节点的偏移距离的总和DT:In another embodiment, the predetermined convergence condition may be set such that the sum of the difference between the current embedded vector determined this time and the previously determined current embedded vector of each node is less than the second predetermined value. That is, consider the sum DT of the offset distances of N nodes:
Figure PCTCN2019089022-appb-000005
Figure PCTCN2019089022-appb-000005
其中Di为节点i的偏移距离,即当前嵌入向量相对于上一次确定的嵌入向量之间的差值。Di is the offset distance of node i, that is, the difference between the current embedding vector and the embedding vector determined last time.
当偏移距离的总和DT小于某个阈值,那么说明,节点的总体位置调整较小,节点的位置趋于稳定和收敛,从而达到收敛条件。When the sum of the offset distances DT is less than a certain threshold value, it means that the overall position adjustment of the node is small, and the position of the node tends to stabilize and converge, thereby achieving the convergence condition.
在另一实施例中,还可以根据经验,预设循环的执行次数作为收敛条件。也就是说,当确定各个节点i的当前嵌入向量Ei的次数达到预定次数阈值,即认为满足收敛条件。根据经验,上述执行次数一般可以设置为10-20次之间。In another embodiment, according to experience, a preset number of executions of the loop may be used as the convergence condition. That is, when it is determined that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds, it is considered that a convergence condition is satisfied. According to experience, the above execution times can generally be set between 10-20 times.
如果收敛条件得到满足,那么退出循环,进入步骤26,至少基于满足预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在多维空间的嵌入向量Qi。If the convergence condition is satisfied, then exit the loop and proceed to step 26 to determine the embedding vector Qi of each node i in the multi-dimensional space based on at least the current embedding vector Ei of each node i that meets the predetermined convergence condition.
在一个实施例中,将满足收敛条件的各个节点i的当前嵌入向量Ei,作为其嵌入向量Qi,即Qi=Ei。In one embodiment, the current embedding vector Ei of each node i that satisfies the convergence condition is used as its embedding vector Qi, that is, Qi = Ei.
在另一实施例中,为了减少最初随机产生的初始嵌入向量的影响,将节点i的嵌入向量确定为,满足预定收敛条件时节点i的当前嵌入向量Ei与其位置初始项之差,即:In another embodiment, in order to reduce the impact of the initial embedding vector that is randomly generated initially, the embedding vector of node i is determined as the difference between the current embedding vector Ei of node i and its initial position when the predetermined convergence condition is met, that is:
Qi=Ei-VIQi = Ei-VI
其中VI与初始嵌入向量Ci相关联,例如等于Ci,或者等于Ci乘以一定系数,比如(1-α)Ci。Where VI is associated with the initial embedding vector Ci, for example equal to Ci, or equal to Ci multiplied by a certain coefficient, such as (1-α) Ci.
如此,确定了各个节点i在多维空间的嵌入向量。In this way, the embedding vector of each node i in the multi-dimensional space is determined.
基于这样确定的嵌入向量,就可以将关系网络图中的节点嵌入到多维空间中。嵌入 到多维空间中的节点具有了位置信息,并且由于嵌入过程中考虑了节点之间的连接关系和连接强度,因此其位置信息中也体现了节点之间的关联关系。例如,多维空间中位置相近的节点之间,关联关系更强。如此,非常有利于后续对节点关系信息的进一步处理,例如对节点进行聚类,发现节点形成的团体,计算节点之间的相似度,预测节点潜在的边联系,等等。当将关系网络图嵌入到二维空间或三维空间时,还非常有利于对关系网络的可视化呈现。Based on the embedding vectors determined in this way, the nodes in the relational network graph can be embedded into the multi-dimensional space. The nodes embedded in the multi-dimensional space have position information, and since the connection relationship and connection strength between nodes are considered in the embedding process, the position information also reflects the association relationship between the nodes. For example, there is a stronger association between nodes that are close to each other in a multi-dimensional space. In this way, it is very helpful for further processing of node relationship information, such as clustering nodes, finding groups formed by nodes, calculating similarity between nodes, predicting potential edge connections of nodes, and so on. When the relational network diagram is embedded in a two-dimensional space or a three-dimensional space, it is also very helpful for the visual presentation of the relational network.
图3示出嵌入到二维空间的关系网络图的示例。更具体地,图3是采用图2所示的方法,将图1的关系网络图嵌入到二维空间的示例。相比于图1中出于示意而随意摆放的节点,图3中节点的位置含有更多的信息量,体现了节点之间的关联关系。一些节点之间位置非常靠近,意味着,这些节点之间具有更强的关联关系。并且,从节点位置分布上也可以看出,节点会呈现潜在的节点簇。这样的信息都会有利于对关系网络中节点信息的进一步处理。FIG. 3 shows an example of a relational network diagram embedded in a two-dimensional space. More specifically, FIG. 3 is an example of embedding the relationship network graph of FIG. 1 into a two-dimensional space by using the method shown in FIG. 2. Compared to the nodes randomly arranged in FIG. 1 for the sake of illustration, the positions of the nodes in FIG. 3 contain more information, which reflects the association relationship between the nodes. Some nodes are very close to each other, which means that these nodes have a stronger association relationship. Moreover, it can also be seen from the distribution of node positions that nodes will present a potential cluster of nodes. Such information will be beneficial to the further processing of node information in the relational network.
根据另一方面,本说明书的实施例还提供一种将关系网络图嵌入到多维空间的装置,其中有待嵌入的关系网络图包括多个节点,多个节点中具有关联关系的节点以一定关联强度互相连接。图4示出根据一个实施例的图嵌入装置的示意性框图。如图4所示,图嵌入装置400包括:初始位置确定单元41,配置为随机确定所述多个节点中各个节点i在多维空间的初始嵌入向量Ci;邻居节点确定单元42,配置为对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;邻居位置确定单元43,配置为确定该节点i的各个邻居节点的当前嵌入向量;节点位置确定单元44,配置为获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;条件判定单元45,配置为判断预定收敛条件是否得到满足,在不满足该预定收敛条件的情况下,使得所述邻居位置确定单元再次确定该节点i的各个邻居节点的当前嵌入向量,所述节点位置确定单元再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得到满足;以及嵌入位置确定单元46,配置为至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在所述多维空间的嵌入向量。According to another aspect, an embodiment of the present specification further provides an apparatus for embedding a relational network graph into a multi-dimensional space, wherein the relational network graph to be embedded includes a plurality of nodes, and the nodes having the association relationship among the multiple nodes have a certain association strength. Connected to each other. FIG. 4 shows a schematic block diagram of a graph embedding apparatus according to an embodiment. As shown in FIG. 4, the graph embedding apparatus 400 includes: an initial position determining unit 41 configured to randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes; a neighbor node determining unit 42 configured to Node i, to obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node; the neighbor location determination unit 43 is configured to determine the current embedding vectors of each neighbor node of the node i; node The position determining unit 44 is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, wherein the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, the current embedding vector of each neighbor node, and the strength of the association between the node i and each neighbor node; the condition determination unit 45 , Configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, make the neighbor bit The determination unit determines the current embedding vector of each neighboring node of the node i again, and the node position determination unit determines the current embedment vector Ei of the node i again until the predetermined convergence condition is satisfied; and the embedding position determination unit 46 is configured to at least Based on the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.
根据一种实施方式,邻居节点确定单元42配置为:获取记录所述关系网络图的网络关系的邻接矩阵,所述邻接矩阵中第m行第k列的元素对应于第m节点与第k节点之 间的关联强度;通过所述邻接矩阵,确定节点i的邻居节点,以及节点i与各个邻居节点之间的关联强度。According to an embodiment, the neighbor node determining unit 42 is configured to obtain an adjacency matrix that records a network relationship of the relationship network graph, and the elements in the m-th row and the k-th column of the adjacency matrix correspond to the m-th node and the k-th node The strength of association between the nodes; the neighbor matrix of node i and the strength of association between node i and each neighboring node are determined through the adjacency matrix.
进一步地,在一个具体例子中,所述邻居节点确定单元42通过以下方式确定邻居节点信息:获取所述邻接矩阵中与节点i对应的第i行元素或第i列元素;将所述第i行元素或第i列元素中非零元素对应的节点确定为节点i的邻居节点;将所述非零元素的值确定为节点i与对应邻居节点之间的关联强度。Further, in a specific example, the neighbor node determination unit 42 determines neighbor node information in the following manner: obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix; A node corresponding to a non-zero element in a row element or an i-th column element is determined as a neighbor node of the node i; a value of the non-zero element is determined as a strength of association between the node i and a corresponding neighbor node.
在一个实施例中,节点位置确定单元44包括,初始项确定模块441,配置为基于所述初始嵌入向量Ci以及所述预定衰减系数,确定所述位置初始项。In one embodiment, the node position determination unit 44 includes an initial term determination module 441 configured to determine the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
在一个实施例中,节点位置确定单元44包括偏移项确定模块442,用于确定偏移项。In one embodiment, the node position determination unit 44 includes an offset term determination module 442 for determining an offset term.
在一个例子中,偏移项确定模块442配置为:以节点i与各个邻居节点之间的关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;至少基于所述预定衰减系数α,所述邻居中心位置,确定所述位置偏移项。In one example, the offset determination module 442 is configured to: sum the current embedded vectors of each neighbor node with the strength of the association between node i and each neighbor node to determine the center position of the neighbor; based at least on the predetermined The attenuation coefficient α, the position of the neighbor center, determines the position offset term.
在另一例子中,偏移项确定模块442配置为:确定节点i与其所有邻居节点的关联强度的和值;确定节点i与各个邻居节点之间的关联强度与所述和值的比例,作为相对关联强度;以所述相对关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;将邻居中心位置与所述预定衰减系数α的乘积,作为所述位置偏移项。In another example, the offset term determination module 442 is configured to: determine the sum of the association strength of node i and all its neighbor nodes; determine the ratio of the association strength between node i and each neighboring node to the sum value, as Relative correlation strength; using the relative correlation strength as a weight, summing the current embedding vectors of each neighbor node to determine the center position of the neighbor; and using the product of the center position of the neighbor and the predetermined attenuation coefficient α as the position offset term .
根据一种可能的设计,条件判定单元45所依据的预定收敛条件可以是:对于每个节点,本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值小于第一预定值;或者,各个节点的本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值的总和小于第二预定值。According to a possible design, the predetermined convergence condition on which the condition determining unit 45 is based may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or , The sum of the differences between the current embedded vector determined by the current node and the current embedded vector determined previously is less than the second predetermined value.
根据一种可能的设计,预定收敛条件也可以是,确定各个节点i的当前嵌入向量Ei的次数达到预定次数阈值。According to a possible design, the predetermined convergence condition may also be that the number of times that the current embedding vector Ei of each node i is determined reaches a predetermined number of thresholds.
在一个实施例中,嵌入位置确定单元46配置为,将节点i的嵌入向量确定为,满足所述预定收敛条件时节点i的当前嵌入向量Ei与其位置初始项之差。In one embodiment, the embedding position determining unit 46 is configured to determine the embedding vector of the node i as the difference between the current embedding vector Ei of the node i and its initial position when the predetermined convergence condition is satisfied.
通过以上的方法和装置,可以快速有效地将复杂的关系网络图嵌入到任意维度的多维空间中,从而便于后续的节点信息处理。Through the above methods and devices, a complex relational network graph can be quickly and efficiently embedded in a multi-dimensional space of any dimension, thereby facilitating subsequent node information processing.
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机 程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。According to another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所述的方法。According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, where the memory stores executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 is implemented. method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the above examples, the functions described in the present invention may be implemented by hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the objectives, technical solutions, and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the present invention The scope of protection, any modification, equivalent replacement, or improvement made on the basis of the technical solution of the present invention shall be included in the scope of protection of the present invention.

Claims (20)

  1. 一种将关系网络图嵌入到多维空间的方法,所述关系网络图包括多个节点,所述多个节点中具有关联关系的节点以一定关联强度互相连接,所述方法包括:A method for embedding a relational network graph into a multi-dimensional space. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The method includes:
    随机确定所述多个节点中各个节点i在多维空间的初始嵌入向量Ci;Randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space in the plurality of nodes;
    对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;For each node i, obtain a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node;
    确定该节点i的各个邻居节点的当前嵌入向量;Determine the current embedding vectors of each neighboring node of the node i;
    获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;Obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on the initial embedding vector Ci It is determined that the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and an association strength between the node i and each neighboring node;
    判断预定收敛条件是否得到满足,在不满足该预定收敛条件的情况下,再次确定该节点i的各个邻居节点的当前嵌入向量,以及再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得到满足;Determine whether the predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, determine the current embedding vector of each neighboring node of the node i again, and determine the current embedding vector Ei of the node i again until the predetermined convergence condition is obtained Satisfy;
    至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在所述多维空间的嵌入向量。Based on at least the current embedding vector Ei of each node i that satisfies the predetermined convergence condition, the embedding vector of each node i in the multi-dimensional space is determined.
  2. 根据权利要求1所述的方法,其中获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度包括:The method according to claim 1, wherein obtaining a neighbor node connected to the node i, and the strength of the association between the node i and each neighbor node comprises:
    获取记录所述关系网络图的网络关系的邻接矩阵,所述邻接矩阵中第m行第k列的元素对应于第m节点与第k节点之间的关联强度;Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;
    通过所述邻接矩阵,确定节点i的邻居节点,以及节点i与各个邻居节点之间的关联强度。Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
  3. 根据权利要求2所述的方法,其中通过所述邻接矩阵,确定节点i的邻居节点,以及节点i与各个邻居节点之间的关联强度包括:The method according to claim 2, wherein determining the neighbor node of node i through the adjacency matrix, and the strength of association between node i and each neighboring node include:
    获取所述邻接矩阵中与节点i对应的第i行元素或第i列元素;Obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix;
    将所述第i行元素或第i列元素中非零元素对应的节点确定为节点i的邻居节点;将所述非零元素的值确定为节点i与对应邻居节点之间的关联强度。A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
  4. 根据权利要求1所述的方法,其中所述获取该节点i的位置初始项包括,基于所述初始嵌入向量Ci以及所述预定衰减系数,确定所述位置初始项。The method according to claim 1, wherein the obtaining the position initial term of the node i comprises determining the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
  5. 根据权利要求1所述的方法,其中获取该节点i的位置偏移项包括:The method according to claim 1, wherein obtaining a position offset item of the node i comprises:
    以节点i与各个邻居节点之间的关联强度为权重,对各个邻居节点的当前嵌入向量 求和,确定邻居中心位置;Sum the current embedding vectors of each neighbor node with the strength of the association between node i and each neighbor node to determine the center position of the neighbor;
    至少基于所述预定衰减系数α,所述邻居中心位置,确定所述位置偏移项。The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.
  6. 根据权利要求1所述的方法,其中获取该节点i的位置偏移项包括:The method according to claim 1, wherein obtaining a position offset item of the node i comprises:
    确定节点i与其所有邻居节点的关联强度的和值;Determine the sum of the association strengths of node i and all its neighbors;
    确定节点i与各个邻居节点之间的关联强度与所述和值的比例,作为相对关联强度;Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;
    以所述相对关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;
    将邻居中心位置与所述预定衰减系数α的乘积,作为所述位置偏移项。The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.
  7. 根据权利要求1所述的方法,其中所述预定收敛条件包括:The method of claim 1, wherein the predetermined convergence conditions include:
    对于每个节点,本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值小于第一预定值;或者For each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or
    各个节点的本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值的总和小于第二预定值。The sum of the differences between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value.
  8. 根据权利要求1所述的方法,其中所述预定收敛条件包括:确定各个节点i的当前嵌入向量Ei的次数达到预定次数阈值。The method according to claim 1, wherein the predetermined convergence condition comprises: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds.
  9. 根据权利要求1所述的方法,其中确定各个节点i在所述多维空间的嵌入向量包括,将节点i的嵌入向量确定为,满足所述预定收敛条件时节点i的当前嵌入向量Ei与其位置初始项之差。The method according to claim 1, wherein determining the embedding vector of each node i in the multi-dimensional space comprises determining the embedding vector of node i as the current embedding vector Ei of node i and its initial position when the predetermined convergence condition is satisfied. Term difference.
  10. 一种将关系网络图嵌入到多维空间的装置,所述关系网络图包括多个节点,所述多个节点中具有关联关系的节点以一定关联强度互相连接,所述装置包括:A device for embedding a relational network graph into a multi-dimensional space. The relational network graph includes a plurality of nodes, and the nodes having an association relationship among the multiple nodes are connected to each other with a certain association strength. The device includes:
    初始位置确定单元,配置为随机确定所述多个节点中各个节点i在多维空间的初始嵌入向量Ci;An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i of the plurality of nodes in a multi-dimensional space;
    邻居节点确定单元,配置为对于各个节点i,获取与该节点i相连接的邻居节点,以及该节点i与各个邻居节点之间的关联强度;The neighbor node determining unit is configured to obtain, for each node i, a neighbor node connected to the node i, and an association strength between the node i and each neighbor node;
    邻居位置确定单元,配置为确定该节点i的各个邻居节点的当前嵌入向量;A neighbor location determining unit configured to determine a current embedding vector of each neighbor node of the node i;
    节点位置确定单元,配置为获取该节点i的位置初始项和位置偏移项,并根据所述位置初始项和位置偏移项,确定节点i的当前嵌入向量Ei,其中所述位置初始项基于所述初始嵌入向量Ci而确定,所述位置偏移项根据预定衰减系数α、所述各个邻居节点的当前嵌入向量以及该节点i与各个邻居节点之间的关联强度而确定;The node position determining unit is configured to obtain a position initial term and a position offset term of the node i, and determine a current embedding vector Ei of the node i according to the position initial term and the position offset term, where the position initial term is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, a current embedding vector of each neighboring node, and a strength of association between the node i and each neighboring node;
    条件判定单元,配置为判断预定收敛条件是否得到满足,在不满足该预定收敛条件 的情况下,使得所述邻居位置确定单元再次确定该节点i的各个邻居节点的当前嵌入向量,所述节点位置确定单元再次确定节点i的当前嵌入向量Ei,直到该预定收敛条件得到满足;The condition determining unit is configured to determine whether a predetermined convergence condition is satisfied, and if the predetermined convergence condition is not satisfied, cause the neighbor position determination unit to determine a current embedding vector of each neighbor node of the node i again, and the node position The determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied;
    嵌入位置确定单元,配置为至少基于满足所述预定收敛条件的各个节点i的当前嵌入向量Ei,确定各个节点i在所述多维空间的嵌入向量。The embedding position determining unit is configured to determine an embedding vector of each node i in the multi-dimensional space based on at least a current embedding vector Ei of each node i that satisfies the predetermined convergence condition.
  11. 根据权利要求10所述的装置,其中所述邻居节点确定单元配置为:The apparatus according to claim 10, wherein the neighbor node determination unit is configured to:
    获取记录所述关系网络图的网络关系的邻接矩阵,所述邻接矩阵中第m行第k列的元素对应于第m节点与第k节点之间的关联强度;Obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the elements in the m-th row and the k-th column in the adjacency matrix correspond to the strength of the association between the m-th node and the k-th node;
    通过所述邻接矩阵,确定节点i的邻居节点,以及节点i与各个邻居节点之间的关联强度。Through the adjacency matrix, the neighbor nodes of node i and the strength of association between node i and each neighbor node are determined.
  12. 根据权利要求11所述的装置,其中所述邻居节点确定单元配置为:The apparatus according to claim 11, wherein the neighbor node determination unit is configured to:
    获取所述邻接矩阵中与节点i对应的第i行元素或第i列元素;Obtaining the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix;
    将所述第i行元素或第i列元素中非零元素对应的节点确定为节点i的邻居节点;将所述非零元素的值确定为节点i与对应邻居节点之间的关联强度。A node corresponding to a non-zero element in the i-th row element or an i-th column element is determined as a neighbor node of the node i; and a value of the non-zero element is determined as an association strength between the node i and a corresponding neighbor node.
  13. 根据权利要求10所述的装置,其中所述节点位置确定单元包括,初始项确定模块,配置为基于所述初始嵌入向量Ci以及所述预定衰减系数,确定所述位置初始项。The apparatus according to claim 10, wherein the node position determination unit comprises an initial term determination module configured to determine the position initial term based on the initial embedding vector Ci and the predetermined attenuation coefficient.
  14. 根据权利要求10所述的装置,其中所述节点位置确定单元包括偏移项确定模块,配置为:The apparatus according to claim 10, wherein the node position determination unit comprises an offset term determination module configured to:
    以节点i与各个邻居节点之间的关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;Using the strength of the association between node i and each neighbor node as a weight, sum the current embedding vectors of each neighbor node to determine the center position of the neighbor;
    至少基于所述预定衰减系数α,所述邻居中心位置,确定所述位置偏移项。The position offset term is determined based on at least the predetermined attenuation coefficient α and the neighbor center position.
  15. 根据权利要求10所述的装置,其中所述节点位置确定单元包括偏移项确定模块,配置为:The apparatus according to claim 10, wherein the node position determination unit comprises an offset term determination module configured to:
    确定节点i与其所有邻居节点的关联强度的和值;Determine the sum of the association strengths of node i and all its neighbors;
    确定节点i与各个邻居节点之间的关联强度与所述和值的比例,作为相对关联强度;Determining the ratio of the correlation strength between node i and each neighboring node to the sum value as the relative correlation strength;
    以所述相对关联强度为权重,对各个邻居节点的当前嵌入向量求和,确定邻居中心位置;Sum the current embedding vectors of each neighbor node using the relative correlation strength as a weight to determine the center position of the neighbor;
    将邻居中心位置与所述预定衰减系数α的乘积,作为所述位置偏移项。The product of the center position of the neighbor and the predetermined attenuation coefficient α is used as the position offset term.
  16. 根据权利要求10所述的装置,其中所述预定收敛条件包括:The apparatus according to claim 10, wherein the predetermined convergence conditions include:
    对于每个节点,本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值小于第一预定值;或者For each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or
    各个节点的本次确定的当前嵌入向量与前次确定的当前嵌入向量的差值的总和小于第二预定值。The sum of the differences between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value.
  17. 根据权利要求10所述的装置,其中所述预定收敛条件包括:确定各个节点i的当前嵌入向量Ei的次数达到预定次数阈值。The apparatus according to claim 10, wherein the predetermined convergence condition comprises: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number of thresholds.
  18. 根据权利要求10所述的装置,其中所述嵌入位置确定单元配置为,将节点i的嵌入向量确定为,满足所述预定收敛条件时节点i的当前嵌入向量Ei与其位置初始项之差。The apparatus according to claim 10, wherein the embedded position determination unit is configured to determine an embedded vector of the node i as a difference between a current embedded vector Ei of the node i and an initial position of the position when the predetermined convergence condition is satisfied.
  19. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9中任一项的所述的方法。A computer-readable storage medium having stored thereon a computer program, and when the computer program is executed in a computer, causes the computer to execute the method according to any one of claims 1-9.
  20. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9中任一项所述的方法。A computing device includes a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the processor according to any one of claims 1-9 is implemented. method.
PCT/CN2019/089022 2018-07-17 2019-05-29 Method and apparatus for embedding relational network diagram WO2020015464A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810784744.5A CN109063041B (en) 2018-07-17 2018-07-17 Method and device for embedding relational network graph
CN201810784744.5 2018-07-17

Publications (1)

Publication Number Publication Date
WO2020015464A1 true WO2020015464A1 (en) 2020-01-23

Family

ID=64816992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089022 WO2020015464A1 (en) 2018-07-17 2019-05-29 Method and apparatus for embedding relational network diagram

Country Status (3)

Country Link
CN (1) CN109063041B (en)
TW (1) TWI700599B (en)
WO (1) WO2020015464A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063041B (en) * 2018-07-17 2020-04-07 阿里巴巴集团控股有限公司 Method and device for embedding relational network graph
CN109992700A (en) * 2019-01-22 2019-07-09 阿里巴巴集团控股有限公司 The method and apparatus for obtaining the insertion vector of relational network figure interior joint
CN110119475B (en) * 2019-01-29 2020-01-07 成都信息工程大学 POI recommendation method and system
CN109919316B (en) * 2019-03-04 2021-03-12 腾讯科技(深圳)有限公司 Method, device and equipment for acquiring network representation learning vector and storage medium
CN110032665B (en) * 2019-03-25 2023-11-17 创新先进技术有限公司 Method and device for determining graph node vector in relational network graph
CN110515986B (en) * 2019-08-27 2023-01-06 腾讯科技(深圳)有限公司 Processing method and device of social network diagram and storage medium
CN112149000B (en) * 2020-09-09 2021-12-17 浙江工业大学 Online social network user community discovery method based on network embedding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
CN107145977A (en) * 2017-04-28 2017-09-08 电子科技大学 A kind of method that structured attributes deduction is carried out to online social network user
CN107392782A (en) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 Corporations' construction method, device and computer-processing equipment based on word2Vec
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side
CN109063041A (en) * 2018-07-17 2018-12-21 阿里巴巴集团控股有限公司 The method and device of relational network figure insertion

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080033897A1 (en) * 2006-08-02 2008-02-07 Lloyd Kenneth A Object Oriented System and Method of Graphically Displaying and Analyzing Complex Systems
TW201115366A (en) * 2009-10-27 2011-05-01 Hon Hai Prec Ind Co Ltd System and method for analyzing relationships among persons
CN103838964B (en) * 2014-02-25 2017-01-18 中国科学院自动化研究所 Social relationship network generation method and device based on artificial transportation system
TWI575470B (en) * 2014-06-26 2017-03-21 國立臺灣大學 A global relationship model and a relationship search method for internet social networks
CN106445988A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Intelligent big data processing method and system
CN108062551A (en) * 2017-06-28 2018-05-22 浙江大学 A kind of figure Feature Extraction System based on adjacency matrix, figure categorizing system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
CN107145977A (en) * 2017-04-28 2017-09-08 电子科技大学 A kind of method that structured attributes deduction is carried out to online social network user
CN107392782A (en) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 Corporations' construction method, device and computer-processing equipment based on word2Vec
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side
CN109063041A (en) * 2018-07-17 2018-12-21 阿里巴巴集团控股有限公司 The method and device of relational network figure insertion

Also Published As

Publication number Publication date
CN109063041A (en) 2018-12-21
CN109063041B (en) 2020-04-07
TWI700599B (en) 2020-08-01
TW202006571A (en) 2020-02-01

Similar Documents

Publication Publication Date Title
WO2020015464A1 (en) Method and apparatus for embedding relational network diagram
CN109194707B (en) Distributed graph embedding method and device
CN114787824A (en) Combined hybrid model
CN111400555B (en) Graph data query task processing method and device, computer equipment and storage medium
US9350969B2 (en) Target region filling involving source regions, depth information, or occlusions
CN111291768B (en) Image feature matching method and device, equipment and storage medium
CN110334757A (en) Secret protection clustering method and computer storage medium towards big data analysis
US9380286B2 (en) Stereoscopic target region filling
CN112767551B (en) Three-dimensional model construction method and device, electronic equipment and storage medium
WO2018094719A1 (en) Method for generating point cloud map, computer system, and device
US20170243127A1 (en) Graph based techniques for predicting results
CN111460234A (en) Graph query method and device, electronic equipment and computer readable storage medium
CN114202632A (en) Grid linear structure recovery method and device, electronic equipment and storage medium
KR102239588B1 (en) Image processing method and apparatus
CN109952742B (en) Graph structure processing method, system, network device and storage medium
Nicholas A dividing rectangles algorithm for stochastic simulation optimization
US9679363B1 (en) System and method for reducing image noise
Lakemond et al. Resection-intersection bundle adjustment revisited
CN113537308A (en) Two-stage k-means clustering processing system and method based on localized differential privacy
US11393069B2 (en) Image processing apparatus, image processing method, and computer readable recording medium
Berger et al. Fast multiple histogram computation using Kruskal's algorithm
KR100965843B1 (en) A method on extracting the boundary vector of object obtained from laser scanning data
KR101700829B1 (en) Parallel particle-based fluid simulation system and method thereof
WO2022100118A1 (en) Model processing method and related device
US20220382741A1 (en) Graph embeddings via node-property-aware fast random projection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19836979

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19836979

Country of ref document: EP

Kind code of ref document: A1