TWI700599B

TWI700599B - Method and device for embedding relationship network diagram, computer readable storage medium and computing equipment

Info

Publication number: TWI700599B
Application number: TW108115553A
Authority: TW
Inventors: 向彪; 劉子奇; 周俊; 小龍李
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2018-07-17
Filing date: 2019-05-06
Publication date: 2020-08-01
Also published as: CN109063041A; CN109063041B; TW202006571A; WO2020015464A1

Abstract

本說明書實施例提供一種將關係網絡圖嵌入到多維空間的方法和裝置，在該方法中，首先隨機確定各個節點i在多維空間的初始嵌入向量Ci，然後對於各個節點i，獲取其相鄰節點，以及與各個相鄰節點之間的關聯強度。此外，還確定節點i的各個相鄰節點的當前嵌入向量。基於預定的衰減係數、上述關聯強度以及相鄰節點的當前位置，形成節點i的位置初始項和位置偏移項，並據此確定節點i的當前嵌入向量。反覆執行上述步驟直到收斂條件得到滿足，此時即可以確定各個節點i在多維空間的嵌入向量。如此，高效地將關係網絡圖嵌入到多維空間中。 The embodiment of this specification provides a method and device for embedding a relational network graph into a multi-dimensional space. In this method, the initial embedding vector Ci of each node i in the multi-dimensional space is first randomly determined, and then for each node i, its neighboring nodes are obtained , And the strength of association with each adjacent node. In addition, the current embedding vector of each neighboring node of node i is also determined. Based on the predetermined attenuation coefficient, the above-mentioned correlation strength and the current position of adjacent nodes, the initial position item and the position offset item of node i are formed, and the current embedding vector of node i is determined accordingly. Repeat the above steps until the convergence condition is satisfied, at this time the embedding vector of each node i in the multi-dimensional space can be determined. In this way, the relationship network graph is efficiently embedded into the multi-dimensional space.

Description

Method and device for embedding relationship network diagram, computer readable storage medium and computing equipment

本說明書一個或多個實施例關於電腦資訊處理領域，尤其關於關係網絡圖嵌入的方法和裝置。 One or more embodiments of this specification relate to the field of computer information processing, and particularly relate to methods and devices for embedding relationship network graphs.

關係網絡圖是對現實世界中實體之間的關係的描述，目前廣泛地應用於各種電腦資訊處理中。一般地，關係網絡圖包含一個節點集合和一個邊集合，節點表示現實世界中的實體，邊表示現實世界中實體之間的聯繫。例如，在社交網路中，人就是實體，人和人之間的關係或聯繫就是邊。 The relationship network diagram is a description of the relationship between entities in the real world, and is currently widely used in various computer information processing. Generally, a relational network graph contains a set of nodes and a set of edges. The nodes represent entities in the real world, and the edges represent the connections between entities in the real world. For example, in social networks, people are entities, and relationships or connections between people are edges.

在許多情況下，希望將關係網絡圖中的每個節點(實體)用多維空間中的座標值來表示，也就是將各個節點映射到一個多維空間中，用多維空間中的點代表圖中的節點。多維空間可以是2維、3維空間，也可以是更高維空間。用多維空間的座標來表達圖中的節點，可以應用於計算節點和節點之間的相似度，發現圖中的社團結構，預測未來可能形成的邊聯繫，以及對圖進行視覺化等。將圖中的節點映射到多維空間的過程稱為圖嵌入。 In many cases, it is hoped that each node (entity) in the relational network diagram is represented by a coordinate value in a multidimensional space, that is, each node is mapped to a multidimensional space, and the points in the multidimensional space represent the node. Multi-dimensional space can be 2-dimensional, 3-dimensional space, or higher-dimensional space. Using multidimensional space coordinates to express the nodes in the graph can be used to calculate the similarity between nodes and nodes, discover the community structure in the graph, predict the possible future edge connections, and visualize the graph. The process of mapping nodes in a graph to a multidimensional space is called graph embedding.

圖嵌入是一種非常重要的基礎技術能力。當前學術界已研究出多種圖嵌入方法，如DeepWalk，node2vec，GraphRep等。但由於這些演算法內部均採用了蒙特卡洛採樣方法，計算效率比較低。當圖的規模變得很大時(如支付寶朋友關係網絡有5億以上節點)，進行圖嵌入計算將耗費巨大的計算資源。 Graph embedding is a very important basic technical capability. Current academic circles Many graph embedding methods have been developed, such as DeepWalk, node2vec, GraphRep, etc. However, since these algorithms all use Monte Carlo sampling methods, the calculation efficiency is relatively low. When the scale of the graph becomes very large (for example, the Alipay friend relationship network has more than 500 million nodes), the graph embedding calculation will consume huge computing resources.

因此，希望能有改進的方案，更加快速有效地進行關係網絡圖的圖嵌入過程。 Therefore, it is hoped that there will be an improved scheme to carry out the graph embedding process of the relationship network graph more quickly and effectively.

本說明書一個或多個實施例描述了一種關係網絡圖的圖嵌入方法，可以高效地將複雜關係網絡圖中的節點嵌入到多維空間中，以便於後續的資訊處理。 One or more embodiments of this specification describe a graph embedding method of a relational network graph, which can efficiently embed nodes in a complex relational network graph into a multi-dimensional space to facilitate subsequent information processing.

根據第一態樣，提供了一種將關係網絡圖嵌入到多維空間的方法，所述關係網絡圖包括多個節點，所述多個節點中具有關聯關係的節點以一定關聯強度互相連接，所述方法包括：隨機確定所述多個節點中各個節點i在多維空間的初始嵌入向量Ci；對於各個節點i，獲取與該節點i相連接的相鄰節點，以及該節點i與各個相鄰節點之間的關聯強度；確定該節點i的各個相鄰節點的當前嵌入向量；獲取該節點i的位置初始項和位置偏移項，並根據所述位置初始項和位置偏移項，確定節點i的當前嵌入向量Ei，其中所述位置初始項基於所述初始嵌入向量Ci而確定，所述位置偏移項根據預定衰減係數α、所述各個相鄰節點的當前嵌入向量以及該節點i與各個相鄰節點之間的關聯強度而確定；判斷預定收斂條件是否得到滿足，在不滿足該預定收斂條件的情況下，再次確定該節點i的各個相鄰節點的當前嵌入向量，以及再次確定節點i的當前嵌入向量Ei，直到該預定收斂條件得到滿足；至少基於滿足所述預定收斂條件的各個節點i的當前嵌入向量Ei，確定各個節點i在所述多維空間的嵌入向量。根據一種實施方式，透過以下方式獲取節點i的相鄰節點資訊：獲取記錄所述關係網絡圖的網路關係的鄰接矩陣，鄰接矩陣中第m行第k列的元素對應於第m節點與第k節點之間的關聯強度；透過所述鄰接矩陣，確定節點i的相鄰節點，以及節點i與各個相鄰節點之間的關聯強度。進一步地，透過鄰接矩陣確定節點i的各個相鄰節點，以及各個關聯強度包括：獲取鄰接矩陣中與節點i對應的第i行元素或第i列元素；將所述第i行元素或第i列元素中非零元素對應的節點確定為節點i的相鄰節點；將所述非零元素的值確定為節點i與對應相鄰節點之間的關聯強度。根據一個實施例，位置初始項基於初始嵌入向量Ci以及所述預定衰減係數而確定。在一個實施例中，透過以下方式獲取節點i的位置偏移項：以節點i與各個相鄰節點之間的關聯強度為權重，對各個相鄰節點的當前嵌入向量求和，確定相鄰中心位置；至少基於所述預定衰減係數α，所述相鄰中心位置，確定所述位置偏移項。在另一實施例中，透過以下方式獲取節點i的位置偏移項：確定節點i與其所有相鄰節點的關聯強度的和值；確定節點i與各個相鄰節點之間的關聯強度與所述和值的比例，作為相對關聯強度；以所述相對關聯強度為權重，對各個相鄰節點的當前嵌入向量求和，確定相鄰中心位置；將相鄰中心位置與所述預定衰減係數α的乘積，作為所述位置偏移項。根據一種可能的設計，上述預定收斂條件可以是：對於每個節點，本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值小於第一預定值；或者，各個節點的本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值的總和小於第二預定值。根據另一種可能的設計，上述預定收斂條件可以是，確定各個節點i的當前嵌入向量Ei的次數達到預定次數閾值。在一個實施例中，將節點i的嵌入向量確定為，滿足所述預定收斂條件時節點i的當前嵌入向量Ei與其位置初始項之差。根據第二態樣，提供一種將關係網絡圖嵌入到多維空間的裝置，所述關係網絡圖包括多個節點，所述多個節點中具有關聯關係的節點以一定關聯強度互相連接，所述裝置包括：初始位置確定單元，配置為隨機確定所述多個節點中各個節點i在多維空間的初始嵌入向量Ci；相鄰節點確定單元，配置為對於各個節點i，獲取與該節點i相連接的相鄰節點，以及該節點i與各個相鄰節點之間的關聯強度；相鄰位置確定單元，配置為確定該節點i的各個相鄰節點的當前嵌入向量；節點位置確定單元，配置為獲取該節點i的位置初始項和位置偏移項，並根據所述位置初始項和位置偏移項，確定節點i的當前嵌入向量Ei，其中所述位置初始項基於所述初始嵌入向量Ci而確定，所述位置偏移項根據預定衰減係數α、所述各個相鄰節點的當前嵌入向量以及該節點i與各個相鄰節點之間的關聯強度而確定；條件判定單元，配置為判斷預定收斂條件是否得到滿足，在不滿足該預定收斂條件的情況下，使得所述相鄰位置確定單元再次確定該節點i的各個相鄰節點的當前嵌入向量，所述節點位置確定單元再次確定節點i的當前嵌入向量Ei，直到該預定收斂條件得到滿足；嵌入位置確定單元，配置為至少基於滿足所述預定收斂條件的各個節點i的當前嵌入向量Ei，確定各個節點i在所述多維空間的嵌入向量。根據第三態樣，提供了一種電腦可讀儲存媒體，其上儲存有電腦程式，當所述電腦程式在電腦中執行時，令電腦執行第一方面的方法。根據第四態樣，提供了一種計算設備，包括記憶體和處理器，其特徵在於，所述記憶體中儲存有可執行代碼，所述處理器執行所述可執行代碼時，實現第一方面的方法。透過本說明書實施例提供的方法和裝置，可以高效地將關係網絡圖嵌入到多維空間中，便於後續的節點資訊處理。According to a first aspect, there is provided a method for embedding a relationship network graph into a multidimensional space, the relationship network graph including a plurality of nodes, and nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength, the The method includes: randomly determining the initial embedding vector Ci of each node i in the multi-dimensional space among the multiple nodes; for each node i, obtaining the adjacent node connected to the node i, and the relationship between the node i and each adjacent node Determine the current embedding vector of each adjacent node of the node i; obtain the initial position item and position offset item of the node i, and determine the position of the node i according to the initial position item and position offset item The current embedding vector Ei, wherein the initial position term is determined based on the initial embedding vector Ci, and the position offset term is determined based on a predetermined attenuation coefficient α, the current embedding vector of each adjacent node, and the node i and each phase Determined by the strength of the association between neighboring nodes; Determine whether the predetermined convergence condition is met, and if the predetermined convergence condition is not met, determine the current embedding vector of each neighboring node of the node i again, and determine the current embedding vector Ei of the node i again, until the predetermined convergence condition Be satisfied Determine the embedding vector of each node i in the multi-dimensional space based at least on the current embedding vector Ei of each node i that meets the predetermined convergence condition. According to one embodiment, the neighbor node information of node i is obtained in the following manner: Acquiring an adjacency matrix that records the network relationship of the relationship network graph, and the element in the mth row and kth column in the adjacency matrix corresponds to the correlation strength between the mth node and the kth node; Through the adjacency matrix, determine the neighboring nodes of node i, and the correlation strength between node i and each neighboring node. Further, determining each neighboring node of node i through the adjacency matrix, and each association strength includes: Obtain the i-th row element or the i-th column element corresponding to the node i in the adjacency matrix; Determine the node corresponding to the non-zero element in the i-th row element or the i-th column element as the adjacent node of node i; determine the value of the non-zero element as the correlation strength between node i and the corresponding adjacent node . According to an embodiment, the initial position term is determined based on the initial embedding vector Ci and the predetermined attenuation coefficient. In one embodiment, the position offset item of node i is obtained in the following manner: Taking the correlation strength between node i and each adjacent node as a weight, sum the current embedding vectors of each adjacent node to determine the adjacent center position; The position offset term is determined based on at least the predetermined attenuation coefficient α and the adjacent center positions. In another embodiment, the position offset term of node i is obtained in the following manner: Determine the sum of the associated strength of node i and all its neighboring nodes; Determine the ratio of the correlation strength between node i and each adjacent node to the sum value as the relative correlation strength; Using the relative correlation strength as a weight, sum the current embedding vectors of each adjacent node to determine the adjacent center position; The product of the adjacent center position and the predetermined attenuation coefficient α is used as the position offset term. According to a possible design, the aforementioned predetermined convergence condition may be: For each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or, the current determination of each node The sum of the difference between the current embedding vector and the previously determined current embedding vector is less than the second predetermined value. According to another possible design, the aforementioned predetermined convergence condition may be that the number of times of determining the current embedding vector Ei of each node i reaches a predetermined number of times threshold. In an embodiment, the embedding vector of node i is determined as the difference between the current embedding vector Ei of node i and its initial position term when the predetermined convergence condition is satisfied. According to a second aspect, there is provided a device for embedding a relationship network graph into a multi-dimensional space, the relationship network graph including a plurality of nodes, and nodes having an association relationship among the plurality of nodes are connected to each other with a certain association strength, the device include: An initial position determining unit configured to randomly determine an initial embedding vector Ci of each node i in the multi-dimensional space among the multiple nodes; The adjacent node determining unit is configured to obtain, for each node i, the adjacent node connected to the node i and the correlation strength between the node i and each adjacent node; The adjacent position determining unit is configured to determine the current embedding vector of each adjacent node of the node i; The node position determining unit is configured to obtain the initial position item and the position offset item of the node i, and determine the current embedding vector Ei of the node i according to the initial position item and the position offset item, wherein the initial position item is based on The initial embedding vector Ci is determined, and the position offset term is determined according to a predetermined attenuation coefficient α, the current embedding vector of each adjacent node, and the correlation strength between the node i and each adjacent node; The condition determining unit is configured to determine whether a predetermined convergence condition is met, and if the predetermined convergence condition is not met, the adjacent position determining unit again determines the current embedding vector of each adjacent node of the node i, The node position determining unit determines the current embedding vector Ei of node i again until the predetermined convergence condition is satisfied; The embedding position determining unit is configured to determine the embedding vector of each node i in the multi-dimensional space based at least on the current embedding vector Ei of each node i that meets the predetermined convergence condition. According to a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect. According to a fourth aspect, there is provided a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the first aspect is implemented Methods. Through the method and device provided by the embodiments of this specification, the relational network graph can be efficiently embedded in the multi-dimensional space, which is convenient for subsequent node information processing.

下面結合附圖，對本說明書提供的方案進行描述。圖1為本說明書揭露的一個實施例的關係網絡圖的示意圖。如圖1所示，該關係網絡圖包括多個節點，為了清楚起見，圖1中為這些節點進行了編號。在這些節點中，具有關聯關係的節點之間用邊進行連接。在一個例子中，圖1中的節點表示社交網路中的人或使用者，兩個節點之間透過邊連接起來，即表示對應的兩個用戶存在社交上的關聯，例如轉帳、留言、通訊等等。在一個實施例中，節點之間的關聯關係還具有不同的關聯強度。例如，在一個例子中，針對不同的社交交互行為，設定不同的關聯強度，比如，進行轉帳交互的用戶的關聯強度為0.8，進行留言操作的用戶的關聯強度為0.5，等等。在一個實施例中，在關聯關係具有不同關聯強度的情況下，可以利用邊的屬性或邊的權值來表示該邊所連接的兩個用戶之間的關聯強度。在圖1中的關係網絡圖中，為了示出各個節點以及節點間的連接關係，而示意性地示出各個節點的位置。實際上，網路關係圖並不對節點的位置進行設置。對於節點的位置，需要採用圖嵌入的方法，將各個節點映射到多維空間中。下面描述本說明書實施例所提供的圖嵌入的方法。圖2示出根據一個實施例的將關係網絡圖嵌入到多維空間的方法，其中關係網絡圖包括多個節點，多個節點中具有關聯關係的節點以一定關聯強度互相連接。上述方法可以由任何具有計算、處理能力的裝置、設備、平臺、設備集群執行。如圖2所示，所述方法包括：步驟21，隨機確定多個節點中各個節點i在多維空間的初始嵌入向量Ci；步驟22，對於各個節點i，獲取與該節點i相連接的相鄰節點，以及該節點i與各個相鄰節點之間的關聯強度；步驟23，確定該節點i的各個相鄰節點的當前嵌入向量；步驟24，獲取該節點i的位置初始項和位置偏移項，並根據所述位置初始項和位置偏移項，確定節點i的當前嵌入向量Ei，其中所述位置初始項基於所述初始嵌入向量Ci而確定，所述位置偏移項根據預定衰減係數α、所述各個相鄰節點的當前嵌入向量以及該節點i與各個相鄰節點之間的關聯強度而確定；步驟25，判斷預定收斂條件是否得到滿足；如果不滿足預定收斂條件，則再次確定該節點i的各個相鄰節點的當前嵌入向量，以及再次確定節點i的當前嵌入向量Ei，直到該預定收斂條件得到滿足；步驟26，至少基於滿足所述預定收斂條件的各個節點i的當前嵌入向量Ei，確定各個節點i在多維空間的嵌入向量。下面描述以上各個步驟的執行方式。首先，在步驟21，隨機確定關係網絡圖的多個節點中各個節點i在多維空間的初始嵌入向量Ci。假定關係網絡圖包含N個節點，要嵌入的多維空間的維數為s，那麼針對N個節點中的每個節點i，為其隨機生成一個s維向量Ci，作為其初始嵌入向量。另一方面，在步驟22，對於各個節點i，獲取與該節點i相連接的相鄰節點，以及該節點i與各個相鄰節點之間的關聯強度。可以理解，在關係網絡圖中，具有關聯關係的節點之間會互相連接，互相連接的節點之間，互為相鄰節點。另外，可以理解，關係網絡圖的拓撲結構可以透過多種方式記錄。例如，在一個示例中，透過圖表記錄關係網絡圖的連接關係。此時，可以從上述圖表中讀取各個節點i的相鄰節點資訊，以及節點i與相鄰節點之間的關聯強度。在一個實施例中，透過矩陣來記錄關係網絡圖的連接關係。例如，描述一個關係網絡圖的矩陣可以有鄰接矩陣、度矩陣、拉普拉斯矩陣等。在一個示例中，透過記錄關係網絡圖的網路關係的鄰接矩陣來獲取節點的相鄰資訊和關聯強度資訊。具體地，假定矩陣A是關係網絡圖G的鄰接矩陣，矩陣A可以表示為： A=[a_mk ]_N*N ，其中，第m行第k列的元素a_mk 對應於節點m與節點k之間的關聯強度。如果兩個節點之間沒有連接，不存在關聯關係，那麼它們之間的關聯強度為0。透過這樣的鄰接矩陣，可以簡單地獲取各個節點的相鄰資訊和關聯強度資訊。具體地，對於節點i，獲取鄰接矩陣A中與節點i對應的第i行元素或第i列元素，即a_ij 或a_ji ；將第i行元素或第i列元素中非零元素對應的節點j確定為節點i的相鄰節點，並將非零元素的值確定為節點i與對應相鄰節點之間的關聯強度。在確定出各個節點i的相鄰節點j的基礎上，在步驟23，確定節點i的各個相鄰節點j的當前嵌入向量Ej。可以理解，由於在步驟21為每個節點都隨機產生了初始嵌入向量，在第一次執行該步驟23時，對於未更新過當前嵌入向量的相鄰節點j，其當前嵌入向量Ej即其對應的初始嵌入向量Cj。各個節點的當前嵌入向量會在後續進行反覆運算更新，這將在後續步驟中展開描述。基於步驟22和步驟23針對節點i獲得的相關資訊，在步驟24，確定節點i的當前嵌入向量Ei。具體地，節點i的當前嵌入向量Ei可以認為由兩部分組成：位置初始項VI和位置偏移項VD： Ei=VI+VD，其中位置初始項VI基於初始嵌入向量Ci而確定，位置偏移項VD根據預定衰減係數α、各個相鄰節點j的當前嵌入向量Ej以及該節點i與各個相鄰節點之間的關聯強度a_ij 而確定。在一個實施例中，節點i的位置初始項VI即為其初始嵌入向量Ci，即： VI=Ci。在另一實施例中，初始位置項可以是初始嵌入向量Ci乘以一定係數。例如，該係數可以與在位置偏移項中引入的衰減係數α有關。因此，在一個實施例中，可以基於初始嵌入向量Ci以及該衰減係數α，確定位置初始項。具體地，在一個例子中，將位置初始項VI確定為：

一般地，位置初始項一旦確定，在後續更新反覆運算過程中固定不變。另一方面，還要確定節點i的位置偏移項VD。根據說明書的至少一個實施例，根據預定衰減係數α、各個相鄰節點j的當前嵌入向量Ej以及該節點i與各個相鄰節點之間的關聯強度a_ij 來確定位置偏移項VD。其中衰減係數α用於調節位置偏移調整的步長或大小，一般被預設為0到1之間的數值。在一個實施例中，以節點i與各個相鄰節點j之間的關聯強度aij為權重，對各個相鄰節點j的當前嵌入向量Ej求和，確定相鄰中心位置；然後基於預定衰減係數α，上述相鄰中心位置，確定位置偏移項VD。在一個例子中，根據上述思想，將位置偏移項VD確定為：

其中N（i）表示節點i的相鄰節點集合。以上VD的計算方式比較適合於關聯強度a_ij 本身就定義在0到1之間的情況。如果關聯強度a_ij 的範圍較大，可以在預先設定衰減係數的時候，將其設定為較小的數值。在另一實施例中，透過以下方式確定節點i的位置偏移項VD：確定節點i與其所有相鄰節點j的關聯強度的和值di；確定節點i與各個相鄰節點j之間的關聯強度aij與所述和值di的比例，作為相對關聯強度；以所述相對關聯強度為權重，對各個相鄰節點j的當前嵌入向量Ej求和，確定相鄰中心位置；將相鄰中心位置與預定衰減係數α的乘積，作為位置偏移項VD。在一個例子中，根據上述思想，將位置偏移項VD確定為：

其中：

如此，考慮節點i與各個相鄰節點j的關聯強度，確定節點i的相鄰中心位置，然後以衰減係數作為調節，確定位置偏移項VD，如此，該位置偏移項VD可以反映向相鄰中心偏移的距離。根據一個具體例子，結合前述的位置初始項，以及如上所述根據相對關聯強度確定的位置偏移項，可以將節點i的當前嵌入向量Ei確定為：

以上描述了多種確定節點i的當前嵌入向量Ei的方式。根據任一方式，對於關係網絡圖中的每個節點i執行以上的步驟23和24，從而為每個節點確定出當前嵌入向量。接著，在步驟25，判斷預定收斂條件是否得到滿足。如果不滿足預定收斂條件，則返回到步驟23和步驟24，再次確定節點i的各個相鄰節點的當前嵌入向量，以及再次確定節點i的當前嵌入向量Ei。可以理解，以上的步驟23和24是針對關係網絡圖中每個節點執行，因此每次執行步驟23和24的迴圈，各個節點的當前嵌入向量都會得到更新。相應地，在第n+1次執行步驟23時，對於同一節點i，其相鄰節點j的當前嵌入向量Ej與第n次執行時有所不同，實際上在第n+1次執行時使用的是，第n次執行完步驟24時各個節點的當前嵌入向量。這樣，步驟24中的位置偏移項在每次執行上述迴圈時都會發生改變，進而使得各個節點i的當前嵌入向量不斷得到更新。這樣的迴圈反覆執行，直到預定收斂條件得到滿足。在一個實施例中，預定收斂條件根據偏移調整量而設置，所述偏移調整量對應於本次確定的位置與前次確定的位置之間的偏移。具體地，在一個實施例中，預定收斂條件可以設置為，對於每個節點，本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值小於第一預定值。例如，對於關係網絡圖中的N個節點，如果每個節點的當前嵌入向量相對於上一次確定的嵌入向量之間的差值，也就是偏移距離，都小於一個距離閾值，那麼說明，節點的位置調整已經小到一定程度，節點的位置趨於穩定和收斂，從而達到收斂條件。在另一實施例中，預定收斂條件可以設置為，各個節點的本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值的總和小於第二預定值。也就是說，考慮N個節點的偏移距離的總和DT：

其中Di為節點i的偏移距離，即當前嵌入向量相對於上一次確定的嵌入向量之間的差值。當偏移距離的總和DT小於某個閾值，那麼說明，節點的總體位置調整較小，節點的位置趨於穩定和收斂，從而達到收斂條件。在另一實施例中，還可以根據經驗，預設迴圈的執行次數作為收斂條件。也就是說，當確定各個節點i的當前嵌入向量Ei的次數達到預定次數閾值，即認為滿足收斂條件。根據經驗，上述執行次數一般可以設置為10-20次之間。如果收斂條件得到滿足，那麼退出迴圈，進入步驟26，至少基於滿足預定收斂條件的各個節點i的當前嵌入向量Ei，確定各個節點i在多維空間的嵌入向量Qi。在一個實施例中，將滿足收斂條件的各個節點i的當前嵌入向量Ei，作為其嵌入向量Qi，即Qi=Ei。在另一實施例中，為了減少最初隨機產生的初始嵌入向量的影響，將節點i的嵌入向量確定為，滿足預定收斂條件時節點i的當前嵌入向量Ei與其位置初始項之差，即： Qi=Ei-VI 其中VI與初始嵌入向量Ci相關聯，例如等於Ci，或者等於Ci乘以一定係數，比如（1-α）Ci。如此，確定了各個節點i在多維空間的嵌入向量。基於這樣確定的嵌入向量，就可以將關係網絡圖中的節點嵌入到多維空間中。嵌入到多維空間中的節點具有了位置資訊，並且由於嵌入過程中考慮了節點之間的連接關係和連接強度，因此其位置資訊中也體現了節點之間的關聯關係。例如，多維空間中位置相近的節點之間，關聯關係更強。如此，非常有利於後續對節點關係資訊的進一步處理，例如對節點進行聚類，發現節點形成的團體，計算節點之間的相似度，預測節點潛在的邊聯繫，等等。當將關係網絡圖嵌入到二維空間或三維空間時，還非常有利於對關係網絡的視覺化呈現。圖3示出嵌入到二維空間的關係網絡圖的示例。更具體地，圖3是採用圖2所示的方法，將圖1的關係網絡圖嵌入到二維空間的示例。相比於圖1中出於示意而隨意擺放的節點，圖3中節點的位置含有更多的信息量，體現了節點之間的關聯關係。一些節點之間位置非常靠近，意味著，這些節點之間具有更強的關聯關係。並且，從節點位置分佈上也可以看出，節點會呈現潛在的節點叢集。這樣的資訊都會有利於對關係網絡中節點信息的進一步處理。根據另一方面，本說明書的實施例還提供一種將關係網絡圖嵌入到多維空間的裝置，其中有待嵌入的關係網絡圖包括多個節點，多個節點中具有關聯關係的節點以一定關聯強度互相連接。圖4示出根據一個實施例的圖嵌入裝置的示意性框圖。如圖4所示，圖嵌入裝置400包括：初始位置確定單元41，配置為隨機確定所述多個節點中各個節點i在多維空間的初始嵌入向量Ci；相鄰節點確定單元42，配置為對於各個節點i，獲取與該節點i相連接的相鄰節點，以及該節點i與各個相鄰節點之間的關聯強度；相鄰位置確定單元43，配置為確定該節點i的各個相鄰節點的當前嵌入向量；節點位置確定單元44，配置為獲取該節點i的位置初始項和位置偏移項，並根據所述位置初始項和位置偏移項，確定節點i的當前嵌入向量Ei，其中所述位置初始項基於所述初始嵌入向量Ci而確定，所述位置偏移項根據預定衰減係數α、所述各個相鄰節點的當前嵌入向量以及該節點i與各個相鄰節點之間的關聯強度而確定；條件判定單元45，配置為判斷預定收斂條件是否得到滿足，在不滿足該預定收斂條件的情況下，使得所述相鄰位置確定單元再次確定該節點i的各個相鄰節點的當前嵌入向量，所述節點位置確定單元再次確定節點i的當前嵌入向量Ei，直到該預定收斂條件得到滿足；以及嵌入位置確定單元46，配置為至少基於滿足所述預定收斂條件的各個節點i的當前嵌入向量Ei，確定各個節點i在所述多維空間的嵌入向量。根據一種實施方式，相鄰節點確定單元42配置為：獲取記錄所述關係網絡圖的網路關係的鄰接矩陣，所述鄰接矩陣中第m行第k列的元素對應於第m節點與第k節點之間的關聯強度；透過所述鄰接矩陣，確定節點i的相鄰節點，以及節點i與各個相鄰節點之間的關聯強度。進一步地，在一個具體例子中，所述相鄰節點確定單元42透過以下方式確定相鄰節點資訊：獲取所述鄰接矩陣中與節點i對應的第i行元素或第i列元素；將所述第i行元素或第i列元素中非零元素對應的節點確定為節點i的相鄰節點；將所述非零元素的值確定為節點i與對應相鄰節點之間的關聯強度。在一個實施例中，節點位置確定單元44包括，初始項確定模組441，配置為基於所述初始嵌入向量Ci以及所述預定衰減係數，確定所述位置初始項。在一個實施例中，節點位置確定單元44包括偏移項確定模組442，用於確定偏移項。在一個例子中，偏移項確定模組442配置為：以節點i與各個相鄰節點之間的關聯強度為權重，對各個相鄰節點的當前嵌入向量求和，確定相鄰中心位置；至少基於所述預定衰減係數α，所述相鄰中心位置，確定所述位置偏移項。在另一例子中，偏移項確定模組442配置為：確定節點i與其所有相鄰節點的關聯強度的和值；確定節點i與各個相鄰節點之間的關聯強度與所述和值的比例，作為相對關聯強度；以所述相對關聯強度為權重，對各個相鄰節點的當前嵌入向量求和，確定相鄰中心位置；將相鄰中心位置與所述預定衰減係數α的乘積，作為所述位置偏移項。根據一種可能的設計，條件判定單元45所依據的預定收斂條件可以是：對於每個節點，本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值小於第一預定值；或者，各個節點的本次確定的當前嵌入向量與前次確定的當前嵌入向量的差值的總和小於第二預定值。根據一種可能的設計，預定收斂條件也可以是，確定各個節點i的當前嵌入向量Ei的次數達到預定次數閾值。在一個實施例中，嵌入位置確定單元46配置為，將節點i的嵌入向量確定為，滿足所述預定收斂條件時節點i的當前嵌入向量Ei與其位置初始項之差。透過以上的方法和裝置，可以快速有效地將複雜的關係網絡圖嵌入到任意維度的多維空間中，從而便於後續的節點資訊處理。根據另一方面的實施例，還提供一種電腦可讀儲存媒體，其上儲存有電腦程式，當所述電腦程式在電腦中執行時，令電腦執行結合圖2所描述的方法。根據再一方面的實施例，還提供一種計算設備，包括記憶體和處理器，所述記憶體中儲存有可執行代碼，所述處理器執行所述可執行代碼時，實現結合圖2所述的方法。本領域技術人員應該可以意識到，在上述一個或多個示例中，本發明所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時，可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。以上所述的具體實施方式，對本發明的目的、技術方案和有益效果進行了進一步詳細說明，所應理解的是，以上所述僅為本發明的具體實施方式而已，並不用於限定本發明的保護範圍，凡在本發明的技術方案的基礎之上，所做的任何修改、等同替換、改進等，均應包括在本發明的保護範圍之內。The following describes the solutions provided in this specification with reference to the drawings. FIG. 1 is a schematic diagram of a relationship network diagram according to an embodiment disclosed in this specification. As shown in Figure 1, the relationship network diagram includes multiple nodes. For clarity, these nodes are numbered in Figure 1. Among these nodes, the nodes with an association relationship are connected by edges. In one example, the nodes in Figure 1 represent people or users in a social network. Two nodes are connected by edges, which means that the corresponding two users are socially connected, such as transfers, messages, and communications. and many more. In an embodiment, the association relationship between the nodes also has different association strengths. For example, in an example, different association strengths are set for different social interaction behaviors, for example, the association strength of the user performing the transfer interaction is 0.8, the association strength of the user performing the message operation is 0.5, and so on. In one embodiment, when the association relationship has different association strengths, the attribute of the edge or the weight of the edge may be used to represent the strength of the association between the two users connected by the edge. In the relational network diagram in FIG. 1, in order to show each node and the connection relationship between the nodes, the position of each node is schematically shown. In fact, the network diagram does not set the position of the node. For the position of the node, a graph embedding method is needed to map each node into a multi-dimensional space. The following describes the method of image embedding provided by the embodiment of this specification. Fig. 2 shows a method for embedding a relational network graph into a multidimensional space according to an embodiment, wherein the relational network graph includes a plurality of nodes, and nodes having an association relationship among the plurality of nodes are connected to each other with a certain correlation strength. The above method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in Fig. 2, the method includes: Step 21, randomly determining the initial embedding vector Ci of each node i in the multi-dimensional space among multiple nodes; Step 22, for each node i, obtain the adjacent node i connected to the node i Node, and the strength of the association between the node i and each adjacent node; step 23, determine the current embedding vector of each adjacent node of the node i; step 24, obtain the initial position item and the position offset item of the node i , And determine the current embedding vector Ei of node i according to the initial position term and the position offset term, wherein the initial position term is determined based on the initial embedding vector Ci, and the position offset term is determined according to a predetermined attenuation coefficient α , The current embedding vector of each adjacent node and the correlation strength between the node i and each adjacent node; step 25, determine whether the predetermined convergence condition is satisfied; if the predetermined convergence condition is not satisfied, determine the The current embedding vector of each adjacent node of node i, and the current embedding vector Ei of node i is determined again until the predetermined convergence condition is met; step 26, based at least on the current embedding vector of each node i that meets the predetermined convergence condition Ei, determine the embedding vector of each node i in the multi-dimensional space. The following describes how to perform the above steps. First, in step 21, randomly determine the initial embedding vector Ci of each node i in the multi-dimensional space among multiple nodes of the relationship network graph. Assuming that the relational network graph contains N nodes, and the dimension of the multi-dimensional space to be embedded is s, then for each node i in the N nodes, an s-dimensional vector Ci is randomly generated for it as its initial embedding vector. On the other hand, in step 22, for each node i, the neighboring nodes connected to the node i and the correlation strength between the node i and each neighboring node are obtained. It can be understood that in the relational network graph, nodes that have an association relationship are connected to each other, and nodes that are connected to each other are adjacent nodes to each other. In addition, it can be understood that the topology of the relationship network graph can be recorded in a variety of ways. For example, in one example, the connection relationship of the relationship network diagram is recorded through a chart. At this time, the neighbor node information of each node i and the correlation strength between node i and neighbor nodes can be read from the above graph. In one embodiment, the connection relationship of the relationship network graph is recorded through a matrix. For example, the matrix describing a relational network graph can include adjacency matrix, degree matrix, Laplacian matrix, etc. In one example, the adjacency matrix of the network relationship of the relationship network graph is recorded to obtain the neighbor information and the relationship strength information of the node. Specifically, assuming that the matrix A is the adjacency matrix of the relational network graph G, the matrix A can be expressed as: A=[a _mk ] _N*N , where the element a _mk in the mth row and kth column corresponds to node m and node k The strength of the association. If there is no connection between two nodes and there is no association relationship, then the association strength between them is 0. Through such an adjacency matrix, the neighbor information and correlation strength information of each node can be simply obtained. Specifically, for node i, obtain the i-th row element or the i-th column element corresponding to node i in the adjacency matrix A, that is, a _ij or a _ji ; the i-th row element or the non-zero element in the i-th column element corresponds to Node j is determined as the adjacent node of node i, and the value of the non-zero element is determined as the correlation strength between node i and the corresponding adjacent node. After determining the adjacent node j of each node i, in step 23, determine the current embedding vector Ej of each adjacent node j of node i. It can be understood that since the initial embedding vector is randomly generated for each node in step 21, when step 23 is executed for the first time, for the adjacent node j that has not updated the current embedding vector, its current embedding vector Ej is its corresponding The initial embedding vector Cj. The current embedding vector of each node will be updated repeatedly, which will be described in subsequent steps. Based on the relevant information obtained for node i in step 22 and step 23, in step 24, the current embedding vector Ei of node i is determined. Specifically, the current embedding vector Ei of node i can be considered to be composed of two parts: position initial term VI and position offset term VD: Ei=VI+VD, where the position initial term VI is determined based on the initial embedding vector Ci, and the position offset The term VD is determined according to the predetermined attenuation coefficient α, the current embedding vector Ej of each adjacent node j, and the correlation strength a _ij between the node i and each adjacent node. In one embodiment, the initial position item VI of node i is its initial embedding vector Ci, namely: VI=Ci. In another embodiment, the initial position term may be the initial embedding vector Ci multiplied by a certain coefficient. For example, the coefficient may be related to the attenuation coefficient α introduced in the position offset term. Therefore, in an embodiment, the initial position term may be determined based on the initial embedding vector Ci and the attenuation coefficient α. Specifically, in an example, the position initial item VI is determined as:

Generally, once the initial position item is determined, it will be fixed in the process of subsequent update iterations. On the other hand, the position offset term VD of node i must be determined. According to at least one embodiment of the specification, the position offset term VD is determined according to the predetermined attenuation coefficient α, the current embedding vector Ej of each adjacent node j, and the correlation strength a _ij between the node i and each adjacent node. The attenuation coefficient α is used to adjust the step size or size of the position offset adjustment, and is generally preset to a value between 0 and 1. In one embodiment, the correlation strength aij between the node i and each adjacent node j is used as the weight, and the current embedding vector Ej of each adjacent node j is summed to determine the adjacent center position; then based on the predetermined attenuation coefficient α , The above adjacent center positions, determine the position offset term VD. In an example, based on the above idea, the position offset term VD is determined as:

Where N(i) represents the set of adjacent nodes of node i. The above calculation method of VD is more suitable for the situation where the correlation strength a _ij itself is defined between 0 and 1. If the range of the correlation strength a _ij is larger, it can be set to a smaller value when the attenuation coefficient is preset. In another embodiment, the position offset term VD of node i is determined by: determining the sum di of the correlation strength of node i and all its neighboring nodes j; determining the correlation between node i and each neighboring node j The ratio of the intensity aij to the sum value di is taken as the relative strength of association; taking the relative strength of association as the weight, sum the current embedding vector Ej of each adjacent node j to determine the adjacent center position; The product of the predetermined attenuation coefficient α is used as the position offset term VD. In an example, based on the above idea, the position offset term VD is determined as:

among them:

In this way, considering the correlation strength between node i and each adjacent node j, determine the adjacent center position of node i, and then use the attenuation coefficient as adjustment to determine the position offset term VD. In this way, the position offset term VD can reflect the relative phase The offset distance from the adjacent center. According to a specific example, combining the aforementioned initial position item and the position offset item determined according to the relative correlation strength as described above, the current embedding vector Ei of node i can be determined as:

The foregoing describes multiple ways to determine the current embedding vector Ei of node i. According to either method, the

above steps

23 and 24 are performed for each node i in the relational network graph, so as to determine the current embedding vector for each node. Next, in step 25, it is determined whether the predetermined convergence condition is satisfied. If the predetermined convergence condition is not met, return to step 23 and step 24, and determine the current embedding vector of each adjacent node of node i again, and determine the current embedding vector Ei of node i again. It can be understood that the

above steps

23 and 24 are performed for each node in the relational network graph, so each time the loop of

steps

23 and 24 is executed, the current embedding vector of each node will be updated. Correspondingly, when step 23 is executed for the n+1th time, for the same node i, the current embedding vector Ej of its neighboring node j is different from that at the nth execution. In fact, it is used at the n+1th execution Is the current embedding vector of each node when step 24 is executed for the nth time. In this way, the position offset item in step 24 will change every time the above loop is executed, so that the current embedding vector of each node i is constantly updated. This loop is repeated until the predetermined convergence condition is satisfied. In one embodiment, the predetermined convergence condition is set according to an offset adjustment amount, which corresponds to the offset between the position determined this time and the position determined last time. Specifically, in one embodiment, the predetermined convergence condition may be set as that, for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value. For example, for N nodes in the relational network graph, if the difference between the current embedding vector of each node and the embedding vector determined last time, that is, the offset distance, is less than a distance threshold, then the node The position adjustment of has been small to a certain extent, and the position of the node tends to stabilize and converge, so as to achieve the convergence condition. In another embodiment, the predetermined convergence condition may be set as that the sum of the difference between the current embedding vector determined this time and the current embedding vector determined last time of each node is less than the second predetermined value. In other words, consider the sum DT of the offset distances of N nodes:

Where Di is the offset distance of node i, that is, the difference between the current embedding vector and the embedding vector determined last time. When the sum DT of the offset distance is less than a certain threshold, it means that the overall position adjustment of the node is small, and the position of the node tends to stabilize and converge, thereby achieving the convergence condition. In another embodiment, based on experience, the number of times of loop execution can be preset as the convergence condition. That is, when it is determined that the number of times the current embedding vector Ei of each node i reaches the predetermined number threshold, it is considered that the convergence condition is satisfied. According to experience, the above execution times can generally be set between 10-20 times. If the convergence condition is met, exit the loop and go to step 26, at least based on the current embedding vector Ei of each node i that meets the predetermined convergence condition, determine the embedding vector Qi of each node i in the multidimensional space. In an embodiment, the current embedding vector Ei of each node i that meets the convergence condition is used as its embedding vector Qi, that is, Qi=Ei. In another embodiment, in order to reduce the influence of the initial embedding vector randomly generated initially, the embedding vector of node i is determined as the difference between the current embedding vector Ei of node i and its initial position term when the predetermined convergence condition is met, namely: Qi =Ei-VI where VI is associated with the initial embedding vector Ci, for example equal to Ci, or equal to Ci multiplied by a certain coefficient, such as (1-α)Ci. In this way, the embedding vector of each node i in the multi-dimensional space is determined. Based on the embedding vector determined in this way, the nodes in the relational network graph can be embedded in a multi-dimensional space. The nodes embedded in the multi-dimensional space have location information, and since the connection relationship and connection strength between the nodes are considered during the embedding process, the location information also reflects the relationship between the nodes. For example, the relationship between nodes that are close in multidimensional space is stronger. This is very helpful for subsequent further processing of node relationship information, such as clustering nodes, discovering groups formed by nodes, calculating similarity between nodes, predicting potential edge connections between nodes, and so on. When the relationship network diagram is embedded in a two-dimensional space or a three-dimensional space, it is also very conducive to the visual presentation of the relationship network. Fig. 3 shows an example of a relational network diagram embedded in a two-dimensional space. More specifically, FIG. 3 is an example of embedding the relationship network diagram of FIG. 1 into a two-dimensional space by using the method shown in FIG. 2. Compared with the nodes in Figure 1 that are randomly placed for illustration, the position of the nodes in Figure 3 contains more information and reflects the relationship between the nodes. Some nodes are located very close together, which means that these nodes have a stronger relationship. Moreover, it can also be seen from the distribution of node positions that nodes will present potential node clusters. Such information will facilitate the further processing of node information in the relational network. According to another aspect, the embodiment of the present specification also provides an apparatus for embedding a relationship network graph into a multidimensional space, wherein the relationship network graph to be embedded includes a plurality of nodes, and the nodes having an association relationship among the plurality of nodes are mutually associated with each other with a certain correlation strength. connection. Fig. 4 shows a schematic block diagram of a graph embedding device according to an embodiment. As shown in FIG. 4, the graph embedding device 400 includes: an initial position determining unit 41 configured to randomly determine an initial embedding vector Ci of each node i in the multiple nodes in a multi-dimensional space; an adjacent node determining unit 42 configured to Each node i obtains the adjacent nodes connected to the node i and the correlation strength between the node i and each adjacent node; the adjacent position determining unit 43 is configured to determine the value of each adjacent node of the node i The current embedding vector; the node position determining unit 44 is configured to obtain the initial position item and the position offset item of the node i, and determine the current embedding vector Ei of the node i according to the initial position item and the position offset item, where The position initial term is determined based on the initial embedding vector Ci, and the position offset term is determined based on a predetermined attenuation coefficient α, the current embedding vector of each adjacent node, and the correlation strength between the node i and each adjacent node And confirm; the condition determining unit 45 is configured to determine whether the predetermined convergence condition is met, and if the predetermined convergence condition is not met, the adjacent position determining unit again determines the current embedding of each adjacent node of the node i Vector, the node position determining unit again determines the current embedding vector Ei of node i until the predetermined convergence condition is satisfied; and the embedding position determining unit 46 is configured to be based at least on the current embedding of each node i that meets the predetermined convergence condition The vector Ei determines the embedding vector of each node i in the multi-dimensional space. According to an embodiment, the adjacent node determining unit 42 is configured to obtain an adjacency matrix that records the network relationship of the relationship network graph, and the element in the mth row and kth column in the adjacency matrix corresponds to the mth node and the kth node. The correlation strength between nodes; through the adjacency matrix, the neighboring nodes of node i and the correlation strength between node i and each neighboring node are determined. Further, in a specific example, the adjacent node determining unit 42 determines adjacent node information in the following manner: obtaining the i-th row element or the i-th column element corresponding to node i in the adjacency matrix; The node corresponding to the non-zero element in the i-th row element or the i-th column element is determined as the adjacent node of node i; the value of the non-zero element is determined as the correlation strength between node i and the corresponding adjacent node. In an embodiment, the node position determining unit 44 includes an initial term determining module 441 configured to determine the initial position term based on the initial embedding vector Ci and the predetermined attenuation coefficient. In one embodiment, the node position determining unit 44 includes an offset item determining module 442 for determining an offset item. In one example, the offset item determination module 442 is configured to: take the correlation strength between node i and each adjacent node as a weight, and sum the current embedding vectors of each adjacent node to determine the adjacent center position; at least Based on the predetermined attenuation coefficient α and the adjacent center positions, the position offset term is determined. In another example, the offset item determination module 442 is configured to: determine the sum of the correlation strength between node i and all its neighboring nodes; determine the correlation strength between node i and each neighboring node and the sum of the The ratio is used as the relative correlation strength; the relative correlation strength is used as the weight, and the current embedding vector of each adjacent node is summed to determine the adjacent center position; the product of the adjacent center position and the predetermined attenuation coefficient α is used as The position offset term. According to a possible design, the predetermined convergence condition based on the condition determination unit 45 may be: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than the first predetermined value; or, The sum of the difference between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value. According to a possible design, the predetermined convergence condition may also be that the number of times of determining the current embedding vector Ei of each node i reaches the predetermined number of times threshold. In one embodiment, the embedding position determining unit 46 is configured to determine the embedding vector of node i as the difference between the current embedding vector Ei of node i and its initial position term when the predetermined convergence condition is satisfied. Through the above method and device, it is possible to quickly and effectively embed a complex relationship network graph into a multi-dimensional space of any dimension, thereby facilitating subsequent node information processing. According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2. According to another embodiment, there is also provided a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 Methods. Those skilled in the art should be aware that in one or more of the above examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

41‧‧‧初始位置確定單元 42‧‧‧相鄰節點確定單元 43‧‧‧相鄰位置確定單元 44‧‧‧節點位置確定單元 45‧‧‧條件判定單元 46‧‧‧嵌入位置確定單元 400‧‧‧圖嵌入裝置 441‧‧‧初始項確定模組 442‧‧‧偏移項確定模組41‧‧‧Initial position determination unit 42‧‧‧Adjacent node determination unit 43‧‧‧Adjacent position determination unit 44‧‧‧Node position determination unit 45‧‧‧Condition determination unit 46‧‧‧Embedded position determination unit 400‧‧‧Picture Embedding Device 441‧‧‧Initial item determination module 442‧‧‧Offset item determination module

為了更清楚地說明本發明實施例的技術方案，下面將對實施例描述中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖僅僅是本發明的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其它的附圖。圖1為本說明書揭露的一個實施例的關係網絡圖的示意圖；圖2示出根據一個實施例的將關係網絡圖嵌入到多維空間的方法；圖3示出嵌入到二維空間的關係網絡圖的示例；圖4示出根據一個實施例的圖嵌入裝置的示意性框圖。 In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work. FIG. 1 is a schematic diagram of a relationship network diagram of an embodiment disclosed in this specification; Fig. 2 shows a method of embedding a relational network graph into a multi-dimensional space according to an embodiment; Figure 3 shows an example of a relationship network diagram embedded in a two-dimensional space; Fig. 4 shows a schematic block diagram of a graph embedding device according to an embodiment.

Claims

A method for embedding a relational network graph into a multi-dimensional space, the relational network graph comprising a plurality of nodes, and nodes having an association relationship among the plurality of nodes are connected to each other with a certain correlation strength, and the method includes: randomly determining the plurality of nodes The initial embedding vector Ci of each node i in the multi-dimensional space; for each node i, obtain the adjacent node connected to the node i, and the correlation strength between the node i and each adjacent node; determine the node i The current embedding vector of each adjacent node; obtain the position initial item and position offset item of the node i, and determine the current embedding vector Ei of node i according to the position initial item and position offset item, wherein the position initial item is at least It is determined based on the initial embedding vector Ci, and the position offset term is determined according to the predetermined attenuation coefficient α, the current embedding vector of each adjacent node, and the correlation strength between the node i and each adjacent node; judging the predetermined convergence Whether the condition is met, if the predetermined convergence condition is not met, determine the current embedding vector of each adjacent node of the node i again, and determine the current embedding vector Ei of the node i again, until the predetermined convergence condition is met; Determine the embedding vector of each node i in the multi-dimensional space based at least on the current embedding vector Ei of each node i that meets the predetermined convergence condition.

The method according to claim 1, wherein the connection to the node i is obtained The adjacent nodes of, and the correlation strength between the node i and each adjacent node includes: obtaining an adjacency matrix that records the network relationship of the relationship network graph, and the element in the mth row and kth column of the adjacency matrix corresponds to the The correlation strength between the m node and the kth node; through the adjacency matrix, determine the neighboring nodes of node i, and the correlation strength between node i and each neighboring node.

The method according to claim 2, wherein determining the neighboring nodes of node i through the adjacency matrix, and the correlation strength between node i and each neighboring node includes: obtaining the i-th corresponding to node i in the adjacency matrix Row element or i-th column element; determine the node corresponding to the non-zero element in the i-th row element or i-th column element as the adjacent node of node i; determine the value of the non-zero element as node i and the corresponding adjacent node The strength of the association.

The method according to claim 1, wherein the obtaining the initial position item of the node i includes determining the initial position item based on the initial embedding vector Ci and the predetermined attenuation coefficient.

The method according to claim 1, wherein obtaining the position offset item of the node i includes: taking the correlation strength between the node i and each adjacent node as a weight, and The current embedding vectors of each adjacent node are summed to determine the adjacent center position; at least based on the predetermined attenuation coefficient α, the adjacent center position, the position offset term is determined.

The method according to claim 1, wherein obtaining the position offset item of the node i includes: determining the sum of the correlation strength of the node i and all its neighboring nodes; determining the correlation strength between the node i and each neighboring node The ratio of the sum value is used as the relative correlation strength; the relative correlation strength is used as the weight to sum the current embedding vectors of each adjacent node to determine the adjacent center position; the product of the adjacent center position and the predetermined attenuation coefficient α , As the position offset item.

The method according to claim 1, wherein the predetermined convergence condition includes: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than a first predetermined value; or the original value of each node The sum of the difference between the current embedding vector determined last time and the current embedding vector determined last time is smaller than the second predetermined value.

The method according to claim 1, wherein the predetermined convergence condition includes: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number threshold.

The method according to claim 1, wherein determining the embedding vector of each node i in the multi-dimensional space includes determining the embedding vector of node i as the difference between the current embedding vector Ei of node i and its initial position item when the predetermined convergence condition is satisfied difference.

A device for embedding a relational network graph into a multi-dimensional space, the relational network graph comprising a plurality of nodes, and nodes having an association relationship among the plurality of nodes are connected to each other with a certain correlation strength. The device includes: an initial position determining unit configured to Randomly determine the initial embedding vector Ci of each node i in the multi-dimensional space among the multiple nodes; the adjacent node determining unit is configured to obtain, for each node i, the adjacent node connected to the node i, and the node i and The correlation strength between each adjacent node; the adjacent position determining unit is configured to determine the current embedding vector of each adjacent node of the node i; the node position determining unit is configured to obtain the initial position and position deviation of the node i And determine the current embedding vector Ei of node i according to the initial position term and the position offset term, where the initial position term is determined based on at least the initial embedding vector Ci, and the position offset term is determined according to the predetermined attenuation coefficient α, The current embedding vector of each neighboring node and the strength of the association between the node i and each neighboring node are determined; the condition determining unit is configured to determine whether a predetermined convergence condition is met, and if the predetermined convergence condition is not met Down so that the adjacent position The determining unit determines the current embedding vector of each adjacent node of the node i again, and the node position determining unit determines the current embedding vector Ei of the node i again until the predetermined convergence condition is met; the embedding position determining unit is configured to be based at least on The current embedding vector Ei of each node i of the predetermined convergence condition determines the embedding vector of each node i in the multi-dimensional space.

The device according to claim 10, wherein the adjacent node determining unit is configured to: obtain an adjacency matrix that records the network relationship of the relationship network graph, and the element in the mth row and kth column of the adjacency matrix corresponds to the mth node The strength of association with the k-th node; through the adjacency matrix, determine the neighboring nodes of node i, and the strength of association between node i and each neighboring node.

The device according to claim 11, wherein the adjacent node determining unit is configured to: obtain the i-th row element or the i-th column element corresponding to node i in the adjacency matrix; and the i-th row element or the i-th column element The node corresponding to the non-zero element in the middle is determined as the adjacent node of node i; the value of the non-zero element is determined as the correlation strength between node i and the corresponding adjacent node.

The device according to claim 10, wherein the node position determining unit It includes an initial term determining module configured to determine the initial term of the position based on the initial embedding vector Ci and the predetermined attenuation coefficient.

The device according to claim 10, wherein the node position determining unit includes an offset item determining module, configured to: take the correlation strength between node i and each adjacent node as a weight, and embed the current value of each adjacent node The vector summation determines the adjacent center position; at least based on the predetermined attenuation coefficient α, the adjacent center position determines the position offset term.

The device according to claim 10, wherein the node position determining unit includes an offset item determining module configured to: determine the sum of the associated strengths of node i and all its neighboring nodes; determine the relationship between node i and each neighboring node The ratio of the correlation strength between the two to the sum value is used as the relative correlation strength; using the relative correlation strength as the weight, sum the current embedding vectors of each neighboring node to determine the neighboring center position; the neighboring center position and the predetermined The product of the attenuation coefficient α is used as the position offset term.

The device according to claim 10, wherein the predetermined convergence condition includes: for each node, the difference between the current embedding vector determined this time and the current embedding vector determined last time is less than a first predetermined value; or The sum of the difference between the current embedding vector determined this time and the current embedding vector determined last time for each node is less than the second predetermined value.

The device according to claim 10, wherein the predetermined convergence condition includes: determining that the number of times the current embedding vector Ei of each node i reaches a predetermined number threshold.

The device according to claim 10, wherein the embedding position determining unit is configured to determine the embedding vector of node i as the difference between the current embedding vector Ei of node i and its initial position item when the predetermined convergence condition is satisfied.

A computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is made to execute the method described in any one of items 1 to 9 of the request.

A computing device, comprising a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, it implements any one of items 1 to 9 of the request method.