CN109614397B - Method and device for acquiring node sequence of relational network based on distributed system - Google Patents

Method and device for acquiring node sequence of relational network based on distributed system Download PDF

Info

Publication number
CN109614397B
CN109614397B CN201811278432.3A CN201811278432A CN109614397B CN 109614397 B CN109614397 B CN 109614397B CN 201811278432 A CN201811278432 A CN 201811278432A CN 109614397 B CN109614397 B CN 109614397B
Authority
CN
China
Prior art keywords
node
nodes
edge vector
servers
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811278432.3A
Other languages
Chinese (zh)
Other versions
CN109614397A (en
Inventor
杨新星
周俊
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201811278432.3A priority Critical patent/CN109614397B/en
Publication of CN109614397A publication Critical patent/CN109614397A/en
Application granted granted Critical
Publication of CN109614397B publication Critical patent/CN109614397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification provides a method and a device for acquiring a node sequence of a relational network based on a distributed system, wherein the method comprises the following steps: acquiring a plurality of outbound values of a plurality of first nodes in a plurality of nodes, and transmitting the acquired data to at least one first designated server in a plurality of servers; receiving an accumulation matrix from a plurality of servers; acquiring node identifiers and types of respective adjacent nodes of at least one first node; calculating the arrangement positions of node identifiers of each adjacent node of at least one first node in the edge vector to acquire partial elements of the edge vector, and sending the partial elements of the edge vector to at least one second designated server in the plurality of servers; receiving the edge vectors from a plurality of servers; and sequentially and randomly acquiring a plurality of node identifiers respectively as the node sequence according to a preset path, wherein the preset path limits the types of the nodes respectively corresponding to the plurality of node identifiers.

Description

Method and device for acquiring node sequence of relational network based on distributed system
Technical Field
The embodiment of the specification relates to the technical field of machine learning, in particular to a method and a device for acquiring a node sequence of a relational network based on a distributed system.
Background
Among the recommended scenarios of the internet, there are a large number of graph computation scenarios, such as user personalized recommendation tasks: based on the historical behavior of the user, a relational network is established. According to the relation network, the node sequence of the relation network can be determined through a random walk algorithm, commodities possibly desired or purchased by a user can be mined, and satisfaction and purchase will of the user are improved. Therefore, the random walk algorithm serves as the most basic and important loop of graph computation and plays a critical role in data mining. In the prior art, random walk algorithms are typically implemented in a single machine, where one random walk algorithm treats the user and the commodity as the same node, forming a single network. However, with the development of the internet, the number of users and commodities has been increasing explosively, and the scale has even reached the order of billions, and the single-machine-version of the meta-path random walk algorithm has not been able to meet the current demands.
Thus, there is a need for a more efficient solution for obtaining a sequence of nodes of a relational network.
Disclosure of Invention
Embodiments of the present disclosure aim to provide a more efficient solution for acquiring a node sequence of a relational network based on a distributed system, so as to solve the deficiencies in the prior art.
To achieve the above object, an aspect of the present specification provides a method of acquiring a node sequence of a relational network based on a distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and an adjacent node, the method being performed in a first working machine among the plurality of working machines, including:
acquiring a plurality of outbound values of a plurality of first nodes in the plurality of nodes, and transmitting the acquired data to at least one first designated server in the plurality of servers;
receiving an accumulation matrix from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
Acquiring node identifiers and types of respective adjacent nodes of the at least one first node;
calculating an arrangement position of node identifiers of respective adjacent nodes of the at least one first node in an edge vector based on node identifiers and types of respective adjacent nodes of the at least one first node and the accumulation matrix to acquire partial elements of the edge vector, and sending the partial elements of the edge vector to at least one second designated server in the plurality of servers, wherein the edge vector comprises at least one part corresponding to the at least one type respectively, and node identifiers of respective adjacent nodes of the respective corresponding types of the plurality of nodes are sequentially arranged in each part;
receiving the edge vectors from the plurality of servers; and
based on the respective types of the plurality of nodes, the accumulation matrix and the edge vector, sequentially and randomly acquiring a plurality of node identifiers as the node sequence according to a preset path, wherein the preset path limits the types of the nodes respectively corresponding to the plurality of node identifiers.
In one embodiment, sequentially randomly acquiring the plurality of node identities as the node sequence includes randomly acquiring one node identity from respective node identities of a first predetermined type of the plurality of nodes as a first node identity of the node sequence according to a predetermined path.
In one embodiment, sequentially randomly acquiring a plurality of node identities as the node sequence, respectively, comprises:
acquiring a second preset type of output value of the node corresponding to the first node identifier based on the accumulation matrix according to a preset path;
randomly acquiring a first integer based on the output value; and
and calculating an arrangement position in the edge vector based on the first integer, the second preset type, the first node identification, the accumulation matrix and the edge vector, so as to obtain a node identification corresponding to the arrangement position as a second node identification.
In one embodiment, the relationship network is a bipartite graph network, and the at least one type includes a user type and a commodity type.
In one embodiment, the elements of the ith row and jth column of the accumulation matrix are sums of the outbound values of type i for nodes having node identifications of 0 through j-1, respectively, where i and j are both counted from 0.
In one embodiment, in the edge vector, an arrangement position of at least one adjacent node of the type i of the node whose node is identified as j in the edge vector is at least one position in the edge vector starting from a first position, and respective node identifications of the at least one adjacent node are arranged in the at least one position in order from small to large, wherein a column number of the first position in the edge vector is equal to a starting column number of a portion corresponding to the type i in the edge vector plus an element value of an ith row and a jth column in the accumulation matrix, wherein a column number in the edge vector is counted from 0.
In one embodiment, sequentially randomly acquiring the plurality of node identities, respectively, according to the predetermined path ends in any one of the following cases:
the number of the plurality of node identifiers reaches a predetermined number;
the next node identity cannot be found.
In one embodiment, the step of sequentially randomly acquiring the plurality of node identifications, respectively, according to the predetermined path, is looped a plurality of times, the method further comprising writing a predetermined number of rows of node sequences acquired through the predetermined number of loops into the database.
In one embodiment, the step of sequentially randomly acquiring the plurality of node identities, respectively, according to a predetermined path stops the loop when: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
In one embodiment, sequentially randomly acquiring a plurality of node identities as the node sequence according to a predetermined path based on the respective type of the plurality of nodes, the accumulation matrix and the edge vector, respectively, includes receiving the accumulation matrix and the edge vector from the plurality of servers in the event that the first working machine loses memory data.
Another aspect of the present specification provides a method of acquiring a node sequence of a relational network based on a distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and an adjacent node, the method being performed in a first server of the plurality of servers, comprising:
Receiving a plurality of output values from at least one of the work machines, the plurality of output values corresponding to a designated plurality of second nodes of the plurality of nodes, respectively;
calculating at least part of the elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers and transmitting at least part of the elements of the accumulation matrix to each of the work machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
receiving partial elements of an edge vector from at least one of the work machines, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, each of the portions having node identifications of respective corresponding types of neighboring nodes of the plurality of nodes sequentially arranged therein; and
and sending part of elements of the edge vector to each working machine.
In one embodiment, the method further comprises, after computing at least a portion of the elements of the accumulation matrix, transmitting the at least a portion of the elements of the accumulation matrix to at least one other server of the plurality of servers to backup the at least a portion of the elements of the accumulation matrix.
In one embodiment, the method further comprises, after obtaining the partial elements of the edge vector, sending the partial elements of the edge vector to at least one other server of the plurality of servers to backup the partial elements of the edge vector.
In one embodiment, the method further comprises, prior to sending the partial elements of the edge vectors to each of the work machines, receiving the partial elements of the corresponding edge vectors from all other servers of the plurality of servers to obtain the edge vectors, wherein sending the partial elements of the edge vectors to each of the work machines comprises sending the edge vectors to each of the work machines.
In one embodiment, the method further comprises, after transmitting the partial elements of the edge vector to each of the work machines, transmitting at least a portion of the elements of the accumulation matrix and the partial elements of the edge vector to the respective work machine in response to a request from the work machine.
Another aspect of the present disclosure provides an apparatus for acquiring a node sequence of a relational network based on a distributed system, the distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identifier, one of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and an adjacent node, the apparatus being implemented in a first working machine of the plurality of working machines, including:
A first obtaining unit configured to obtain a plurality of output values of each of a plurality of first nodes in the plurality of nodes, and send the obtained data to at least one first designated server in the plurality of servers;
a first receiving unit configured to receive an accumulation matrix from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
a second obtaining unit configured to obtain node identifiers and types of respective neighboring nodes of the at least one first node;
a calculation unit configured to calculate, based on node identifiers and types of respective adjacent nodes of the at least one first node and the accumulation matrix, arrangement positions of the node identifiers of respective adjacent nodes of the at least one first node in an edge vector, to obtain partial elements of the edge vector, and to send the partial elements of the edge vector to at least one second specifying server of the plurality of servers, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, in each of which node identifiers of respective corresponding types of adjacent nodes of the plurality of nodes are sequentially arranged;
A second receiving unit configured to receive the edge vectors from the plurality of servers; and
and a third obtaining unit configured to sequentially and randomly obtain a plurality of node identifiers as the node sequence according to a predetermined path, based on respective types of the plurality of nodes, the accumulation matrix and the edge vector, wherein the predetermined path defines the type of each node to which the plurality of node identifiers respectively correspond.
In one embodiment, the third obtaining unit is further configured to randomly obtain, as the first node identifier of the node sequence, one node identifier from node identifiers of each of a plurality of nodes of the first predetermined type among the plurality of nodes according to a predetermined path.
In one embodiment, the third acquisition unit includes:
the first acquisition subunit is configured to acquire a second preset type of output value of the node corresponding to the first node identifier based on the accumulation matrix according to a preset path;
a second acquisition subunit configured to randomly acquire a first integer based on the output value; and
and the calculating subunit is configured to calculate an arrangement position in the edge vector based on the first integer, the second preset type, the first node identifier, the accumulation matrix and the edge vector, so as to acquire the node identifier corresponding to the arrangement position as a second node identifier.
In one embodiment, the third acquisition unit ends implementation in any of the following cases:
the number of the plurality of node identifiers reaches a predetermined number;
the next node identity cannot be found.
In an embodiment, the third acquisition unit is implemented in a number of loops, and the apparatus further comprises a writing unit configured to write a sequence of nodes of a predetermined number of rows acquired by a predetermined number of loops into the database.
In one embodiment, the third acquisition unit stops the loop implementation when: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
In one embodiment, the third obtaining unit includes a receiving subunit configured to receive the accumulation matrix and the edge vector from the plurality of servers in a case where the first working machine loses memory data.
Another aspect of the present specification provides an apparatus for acquiring a node sequence of a relational network based on a distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and neighboring nodes, the apparatus being implemented in a first server of the plurality of servers, comprising:
A first receiving unit configured to receive, from at least one of the working machines, a plurality of output values respectively corresponding to a specified plurality of second nodes among the plurality of nodes;
a computing unit configured to compute at least part of the elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers, and to send at least part of the elements of the accumulation matrix to each of the working machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
a second receiving unit configured to receive partial elements of an edge vector from at least one of the working machines, wherein the edge vector includes at least one portion respectively corresponding to the at least one type, and node identifications of respective corresponding types of neighboring nodes of the plurality of nodes are sequentially arranged in each of the portions; and
and the first transmitting unit is configured to transmit part of elements of the edge vector to each working machine.
In an embodiment, the apparatus further comprises a second sending unit configured to send at least part of the elements of the accumulation matrix to at least one other server of the plurality of servers after calculating the at least part of the elements of the accumulation matrix, so as to backup the at least part of the elements of the accumulation matrix.
In one embodiment, the apparatus further includes a third sending unit configured to send the partial element of the edge vector to at least one other server of the plurality of servers to backup the partial element of the edge vector after the partial element of the edge vector is acquired.
In an embodiment, the apparatus further comprises a third receiving unit configured to receive the partial elements of the respective edge vectors from all other servers of the plurality of servers to obtain the edge vectors before transmitting the partial elements of the edge vectors to each of the working machines, wherein the first transmitting unit is further configured to transmit the edge vectors to each of the working machines.
In an embodiment, the apparatus further comprises a fourth transmitting unit configured to transmit at least part of the elements of the accumulation matrix and part of the elements of the edge vector to the respective working machine in response to a request of the working machine after transmitting part of the elements of the edge vector to each of the working machines.
Another aspect of the present disclosure provides a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements any of the methods described above.
With the solution of acquiring the node sequence of the relational network based on the distributed system according to the embodiment of the present specification, with the increase of the number of machines for parallel computing, the computing power is nearly doubled, and with the distributed architecture of the parameter server, very large scale graph nodes can be processed. In addition, the scheme optimizes the graph storage mode, and reduces the memory usage to the maximum extent.
Drawings
The embodiments of the present specification may be further clarified by describing the embodiments of the present specification with reference to the accompanying drawings:
FIG. 1 shows a schematic diagram of a distributed system 100 for acquiring a sequence of nodes of a relational network according to an embodiment of the present description;
FIG. 2 illustrates a method for obtaining a sequence of nodes of a relational network based on a distributed system, according to an embodiment of the present description;
FIG. 3 schematically illustrates a relational network and corresponding graph data;
fig. 4 schematically shows the process in step 202;
fig. 5 schematically shows the process in step S208;
fig. 6 schematically shows the process in step S212;
FIG. 7 illustrates a method for obtaining a sequence of nodes of a relational network based on a distributed system, according to an embodiment of the present disclosure;
FIG. 8 illustrates an apparatus 800 for obtaining a sequence of nodes of a relational network based on a distributed system; and
Fig. 9 illustrates an apparatus 900 for acquiring a sequence of nodes of a relational network based on a distributed system according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present specification will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of a distributed system 100 for acquiring a sequence of nodes of a relational network according to an embodiment of the present description. As shown in fig. 1, the system 100 includes a database 11, a plurality of work machines 12 (three are schematically shown in the figure), and a plurality of servers 13 (three are schematically shown in the figure). In carrying out the method according to the embodiment of the present description by the system 100, first, map data including respective out-degree values corresponding to different types of the respective partial nodes in the relational network are read in parallel from the database 11 by the respective working machines 12, respectively, and the map data is transmitted to the at least one designated server 13. Then, the plurality of servers 13 acquire an accumulation matrix based on the output values received by them, respectively, and transmit the accumulation matrix to each working machine 12. Then, each working machine 12 again reads in the node identifiers and types of the respective adjacent nodes of the corresponding node in parallel from the database 11, respectively, and calculates the arrangement positions of the node identifiers of the respective adjacent nodes of the corresponding node in the edge vector based on the data and the accumulation matrix to acquire a partial element of the edge vector, and transmits the partial element to the designation server 13. The plurality of servers, after acquiring the partial elements of the edge vectors, respectively, transmit the acquired partial elements thereof to each working machine 12, respectively, so that each working machine 12 acquires all the elements of the edge vectors. After each working machine 12 acquires the edge vectors separately, the node sequence of the relational network is acquired randomly based on the accumulation matrix and the edge vectors according to a predetermined path.
It is to be understood that the system 100 shown in fig. 1, and the above description of the system 100 in fig. 1, are illustrative only, and the distributed system according to embodiments of the present description is not limited thereto. For example, the system does not necessarily include a database, and the work machine may acquire data from the database via a network or the like. For example, the plurality of servers 13, upon receiving the partial elements of the corresponding edge vectors, may send to the designated server, and join the partial elements of the edge vectors together by the designated server to obtain the edge vectors, and send the edge vectors to each work machine 12.
Fig. 2 shows a method for acquiring a node sequence of a relational network based on a distributed system according to an embodiment of the present specification, the distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one type of at least one types, a plurality of out-degree values respectively corresponding to the respective types, and an adjacent node, the method being performed in a first working machine of the plurality of working machines, including:
in step S202, a plurality of output values of each of a plurality of first nodes in the plurality of nodes are acquired, and the acquired data is sent to at least one first designated server in the plurality of servers;
In step S204, an accumulation matrix is received from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
in step S206, the node identifier and the type of each neighboring node of each of the at least one first node are obtained;
in step S208, based on the node identifiers and types of the respective neighboring nodes of the at least one first node and the accumulation matrix, an arrangement position of the node identifiers of the respective neighboring nodes of the at least one first node in an edge vector is calculated to obtain a partial element of the edge vector, and the partial element of the edge vector is sent to at least one second designated server in the plurality of servers, where the edge vector includes at least one portion corresponding to the at least one type, and each portion has node identifiers of neighboring nodes of respective corresponding types of the plurality of nodes sequentially arranged therein;
receiving the edge vectors from the plurality of servers at step S210; and
In step S212, a plurality of node identifiers are sequentially and randomly acquired as the node sequence according to a predetermined path, based on the respective types of the plurality of nodes, the accumulation matrix and the edge vector, wherein the predetermined path defines the type of each node to which the plurality of node identifiers respectively correspond.
The distributed system is a parameter server system (parameter server), and mainly comprises two parts: a server cluster (a plurality of server nodes) and a work machine cluster (a plurality of work machine nodes). The server cluster mainly realizes model parallelism, that is, a whole quantity of model parameters, such as a whole quantity of accumulation matrixes, edge vectors and the like, are maintained in the memory of the server cluster. The working machine cluster is used for reading different image data from the database and performing calculation in parallel. For example, the method shown in fig. 2 is performed in parallel in each work machine.
The relationship network may be a plurality of relationship networks including a plurality of nodes which may be of a plurality of types, such as two types, three types, and so on. For example, the relationship network may be a bipartite graph network that includes two types of nodes, namely user nodes and commodity nodes. Fig. 3 schematically shows a relational network and corresponding graph data. Where the left side of fig. 3 shows a relational network comprising six nodes with node identifications (ids) of 0, 1, … 5. Wherein the numbers outside the circles represent the types of nodes, and as can be seen from the figure, the six nodes in the figure are of the type 0 or 1. The table (database) on the right side of fig. 3 shows graph data corresponding to the relationship network on the left side, and as shown in the graph, the first column from the left of the table is a node identification, the second column is an out-degree value of which the type of each node is 0, the third column is an out-degree value of which the type of each node is 1, the fourth column is a neighboring node of each node, and the fifth column is a node type of each node. The outtake value of a specific type of a specific node is the number of all neighboring nodes of the specific type. For example, for node 0, its neighboring nodes include node 1 and node 5, where node 5 is of type 0, node 1 is of type 1, that is, node 0 is of type 0 with an out value of 1, and node 1 is of type 1 with an out value of 1.
First, in step S202, a plurality of out-degree values that each of a plurality of first nodes has are acquired, and the acquired data is transmitted to at least one first designated server of the plurality of servers.
Fig. 4 schematically shows the process in step 202. As shown in fig. 4, in which the working machines 0, 1, and 2, and the servers 0, 1, and 2 are schematically shown, it is to be understood that the numbers of the working machines and the servers are merely illustrative, and not restrictive. The first work machine may be any one of the work machines 0, 1, and 2, and the work machine 0 will be described as an example of the first work machine. In fig. 4, the work machine 0 acquires the out-degree values of the type 0 and the type 1 of each of the node 0 and the node 1 from the database shown in fig. 3, for example. In the working machine 0, the output value information is stored in the degree matrix, and the degree matrix in each working machine is shown in the lower part of fig. 4. As shown in fig. 4, in the working machine 0, the 0 th row of the degree matrix corresponds to the type 0, the 1 st row corresponds to the type 1, the 0 th column does not correspond to any node, the 1 st column corresponds to the node 0, the 2 nd column corresponds to the node 1, that is, the column j of the degree matrix corresponds to the node j-1. Here, both row i and column j are counted starting from 0. In addition, the element of the ith row and jth column of the degree matrix represents the degree value of the type i of the node j-1, for example, the element of the 0 th row and 1 st column represents the degree value of the type 0 of the node 0, namely 1, and the element of the 0 th row and 2 nd column represents the degree value of the type 0 of the node 1, namely 2, wherein the element values of the type 0 and the type 1 of the 0 th column are assigned to 0. And acquiring partial data in the database through each working machine, wherein each working machine acquires partial elements of the degree matrix, and the partial elements of the degree matrix acquired by all working machines are combined together to form a complete degree matrix. The upper left matrix in fig. 4 shows the complete degree matrix.
After acquiring the degree matrix, the working machine 0 transmits the data in the degree matrix to the designated servers (here, server 0 and server 1), that is, in the distributed system, the designated server 0 receives the data of the 0 th column and the 1 st column of the degree matrix, and the designated server 1 receives the data of the 2 nd column and the 3 rd column, and therefore, the working machine 0 transmits the data of the 0 th column and the 1 st column of the degree matrix to the server 0 and the data of the 2 nd column to the server 1 according to the designation in the system. Here, it is understood that the designation of the server is illustrative and not limiting, and that, for example, column 5 and column 6 of the degree matrix of the working machine 2 are transmitted to one designated server, i.e., the server 2.
In step S204, an accumulation matrix is received from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifies a sum of the outbound values of the respective nodes preceding the node.
The plurality of servers acquire the complete degree matrix after receiving a part of the elements of the degree matrix transmitted from each working machine, and acquire an accumulation matrix based on the degree matrix, and a specific acquisition process of the accumulation matrix will be described in detail below. The upper right table in fig. 4 schematically shows an accumulation matrix. As shown in the figure, the element value of the ith row and the jth column of the accumulation matrix is equal to the sum of the j element values before the ith row in the degree matrix, for example, the value of the 0 th row and the 2 nd column (i=0, j=2, i, j are all counted from 0) in the accumulation matrix is equal to the sum of the 0 th column, the 1 st column and the 2 nd column of the 0 th row in the degree matrix, that is, 0+1+2=3. From this accumulation matrix, the position of the neighboring node of the respective type of each node in the edge vector described below can be known, e.g. the position of the neighboring node of the type 0 of the node 2 in the edge vector corresponds to the value of row 0, column 2, i.e. 3 in the accumulation matrix, and thus the accumulation matrix may also be referred to as an index matrix, which may be used for indexing the position of the neighboring node of the respective node in the edge vector. In the accumulation matrix, the last column of element values shows the number of edges of different types in the relationship network, e.g., 9 and 5 are shown in the last column of the accumulation matrix, where 9 shows the number of edges of type 0 in the relationship network and 5 shows the number of edges of type 1 in the relationship network. The specific node and any adjacent node form a directed edge, and the type of the directed edge is the type of the adjacent node.
The plurality of servers, after acquiring the complete accumulation matrix, transmit the accumulation matrix to each work machine, such as work machine 0, work machine 1, and work machine 2. So that work machine 0 receives the accumulation matrix from the plurality of servers.
In step S206, node identifiers and types of respective neighboring nodes of the at least one first node are obtained.
For example, the working machine 0 acquires the neighboring node of the node 0 including the node 1 and the node 5 from the database shown in fig. 3, and acquires the type of the node 1 as 1 and the type of the node 5 as 0 from the database. Similarly, work machine 0 may obtain the node identification and type of the neighboring node of node 1 from the database.
In step S208, based on the node identifiers and types of the respective neighboring nodes of the at least one first node and the accumulation matrix, an arrangement position of the node identifiers of the respective neighboring nodes of the at least one first node in an edge vector is calculated to obtain a partial element of the edge vector, and the partial element of the edge vector is sent to at least one second designated server in the plurality of servers, where the edge vector includes at least one portion corresponding to the at least one type, and node identifiers of the respective neighboring nodes of the respective corresponding types of the plurality of nodes are sequentially arranged in each portion.
Fig. 5 schematically shows the process in step S208. The edge vector includes information of all edges of the relational network. As described above, the last column of the accumulation matrix in fig. 4 shows the number of edges of each type in the relationship network, i.e., the number of edges of type 0 is 9 and the number of edges of type 1 is 5. That is, 14 elements should be included in the edge vector. Fig. 5 shows the edge vector, the number of columns of the edge vector is counted from 0, and the edge vector comprises 14 elements, namely 0, 1, … and 13, wherein the elements from 0 th column to 8 th column are node identifications of adjacent nodes of respective types 0 to 5 of the sequentially arranged nodes, and the elements from 9 th column to 13 th column are node identifications of adjacent nodes of respective types 1 of the sequentially arranged nodes 0 to 5. That is, in the edge vector, the edge information is divided into two parts in type 0 and type 1, and in each part, the edge information is represented by the node identification of the neighboring node of the corresponding type of each node.
Thus, based on this structure of the edge vector, the positions of the neighboring nodes of the corresponding node in the edge vector can be easily found. As described above, by accumulating the matrices, the positions (columns) of the neighboring nodes of the respective types of the respective nodes in the edge vector can be known. For example, where the corresponding node j includes a plurality of neighboring nodes of type i, then the node identification of each of the plurality of neighboring nodes should be as shown in equation (1) for the column number j' in the edge vector:
j’=s(i)+idx[i][j]+m (1)
Where s (i) represents the starting column number of node identifications of the nodes of type i in the edge vector, e.g., in the edge vector shown in fig. 5, s (0) represents the starting position of node identifications of the nodes of type 0, i.e., 0, and s (1) represents the starting position of node identifications of the nodes of type 1, i.e., 9.idx [ i ] [ j ] represents the element of the ith row and jth column of the accumulation matrix, where i, j are counted from 0. m is the sequence number of the plurality of adjacent nodes from small to large, wherein m is counted from 0.
Taking working machine 0 as an example, it needs to find the positions (columns) in the edge vector of neighboring nodes 1 (type 1) and 5 (type 0) of node 0, and neighboring nodes 0 (type 0) and 2 (type 0) of node 1. For the neighboring node 1, i=1, j=0 of type 1 of node 0, according to equation (1),
j’=s(1)+idx[1][0]+0=9+0+0=9,
so that the element of column 9 in the edge vector is the node identification of node 1, i.e. 1.
For a neighboring node 5,i =0, j=0 of type 0 of node 0, according to equation (1),
j’=s(0)+idx[0][0]+0=0+0+0=0,
so that the element of column 0 in the edge vector is the node identification of node 5, i.e., 5.
For the neighboring node 0, i=0, j=1 of type 0 of node 1, according to equation (1),
j’=s(0)+idx[0][1]+0=0+1+0=1,
so that the element of column 1 in the edge vector is the node identification of node 0, i.e., 0.
For the neighboring node 2 of the type 0 of the node 1, according to the above, the starting positions of the neighboring nodes of the type 0 of the node 1 are the positions of the neighboring node 0 (0<2), i.e. 1, and the positions of the neighboring nodes 2 are the positions sequentially arranged from the starting position, i.e. 2, so the element of the 2 nd column in the edge vector is the node identifier of the node 2, i.e. 2.
Namely: j' =s (0) +idx [0] [1] +1=0+1+1=2.
Each working machine sends the arrangement position data to at least one second designated server in the plurality of servers after calculating the arrangement positions of node identifiers of respective adjacent nodes of the at least one first node in an edge vector to acquire partial elements of the edge vector. For example, as shown in fig. 5, the server 0 is designated to receive the elements of columns 0 to 4 in the edge vector in the system, and the server 2 is designated to receive the elements of columns 9 to 13 in the edge vector, so that the working machine 0 transmits the elements of columns 0 to 2 acquired therein to the server 0 and the elements of columns 9 to the server 2.
In step S210, the edge vector is received from the plurality of servers. The plurality of servers may acquire the complete edge vector after receiving the partial elements of the edge vector from the respective work machines, respectively, and transmit the complete edge vector to the respective work machines, that is, each work machine acquires the complete edge vector from the plurality of servers.
In step S212, a plurality of node identifiers are sequentially and randomly acquired as the node sequence according to a predetermined path, based on the respective types of the plurality of nodes, the accumulation matrix and the edge vector, wherein the predetermined path defines the type of each node to which the plurality of node identifiers respectively correspond.
Fig. 6 schematically shows the process in step S212. The predetermined path, the meta path, defines the type of each node in the sequence of nodes. For example, in the example shown in FIG. 6, the meta-paths are, for example, 0,1,0, i.e., the sequence of nodes starts with a node of type 0 and alternates between type 0 and type 1. The lower part in fig. 6 schematically shows a plurality of rows of node sequences, and a node sequence of the fourth row (2, 1,0, 1 … 5, 4) will be described below as an example node sequence. In the example node sequence, the node 2 corresponding to the 1 st node identifier 2 is a node of type 0, the node 1 corresponding to the 2 nd node identifier 1 is a node of type 1, the node 0 corresponding to the 3 rd node identifier 0 is a node of type 0, and the following node identifiers are all alternately arranged according to the rule.
When the first node identification of the node sequence is acquired, all the nodes of the type 0 in the relation network are acquired, and one node identification is randomly acquired from the node identifications of all the nodes of the type 0 and is used as the first node identification of the node sequence. That is, referring to fig. 3, one node identity 2 is randomly acquired from node identities (i.e., 0, 2, 3, 5) corresponding to nodes 0, 2, 3, 5 as a first node identity of the node sequence.
When the second node identifier of the node sequence is acquired, firstly, according to a preset path, a second preset type of output value of the node corresponding to the first node identifier is acquired based on the accumulation matrix. Wherein the output value is obtained by the following formula (2):
dgr[i][j]=idx[i][j+1]-idx[i][j] (2)
wherein dgr [ i ] [ j ] represents the outbound value of type i for node j, and idx [ i ] [ j ] is the same as in equation (1). For example, for obtaining the second node identifier of the example node sequence, according to a predetermined path, i.e. obtaining the output value of the type 1 of the node 2, the output value is equal to dgr [1] [2] =idx [1] [3] -idx [1] [2] =3-1=2, based on the accumulation matrix, by the formula (2).
Then, based on the out-degree value, a first integer is randomly acquired. If the out-degree value of the type i identified by the first node is denoted as D0, an integer value of [0, D0-1] can be randomly obtained, and can be denoted as k as a first integer. For example, for the example node sequence described above, d0=2, then an integer may be randomly derived from [0,1] as k, e.g., k=0 may be randomly derived.
And finally, calculating the arrangement position in the edge vector based on the first integer, the second preset type, the first node identification, the accumulation matrix and the edge vector, so as to acquire the node identification corresponding to the arrangement position as a second node identification. According to formula (1), taking the second predetermined type as i, taking the first node identification as j, substituting k as m into formula (1), calculating j ', and acquiring the node identification of the j' th column from the edge vector as the second node identification of the node sequence. For example, for the example node sequence, i=1, j=2, m=0, with equation (1), j' =9+1+0=10 can be calculated, with the edge vector, to obtain that its column 10 element is 1, i.e., the second node identification of the example node sequence should be 1.
The third node identification of the node sequence may be obtained similarly to the second node identification. And will not be described in detail herein.
In one embodiment, sequentially randomly acquiring the plurality of node identities, respectively, according to the predetermined path ends in any one of the following cases: the number of the plurality of node identifiers reaches a predetermined number; the next node identity cannot be found. Referring to fig. 6, in the lower part of fig. 6, the node identification number (walking number) is a predetermined number of a preset line of node sequences, and when the number of generated node identifications is equal to the node identification number, the acquisition of the next node identification is stopped in this step, or when the next node identification is not found according to the method, the step is also stopped.
In one embodiment, the step of sequentially randomly acquiring the plurality of node identifications, respectively, according to the predetermined path, is looped a plurality of times, the method further comprising writing a predetermined number of rows of node sequences acquired through the predetermined number of loops into the database. For example, referring to fig. 6, in the lower part of fig. 6, the number of rows is a preset predetermined number of rows written into the database, and after the step is cycled a predetermined number of times (equal to the predetermined number of rows) to obtain a node sequence of the predetermined number of rows, the node sequence of the predetermined number of rows may be written into the database.
In one embodiment, the step of sequentially randomly acquiring the plurality of node identities, respectively, according to a predetermined path stops the loop when: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
Fig. 7 illustrates a method for acquiring a node sequence of a relational network based on a distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one type of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and neighboring nodes, according to an embodiment of the present specification, the method being performed in a first server of the plurality of servers, including:
in step S702, a plurality of output values corresponding to a designated plurality of second nodes among the plurality of nodes are received from at least one of the working machines;
at step S704, calculating at least part of elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers, and transmitting at least part of elements of the accumulation matrix to each of the working machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
Receiving, from at least one of the work machines, a partial element of an edge vector, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, each of the portions having node identifications of respective corresponding types of neighboring nodes of the plurality of nodes sequentially arranged therein at step S706; and
in step S708, a part of the elements of the edge vector is transmitted to each of the working machines.
The method of fig. 7 is a corresponding operation of the server when the work machine in the distributed system performs the operations of fig. 4-6. And thus may be described with reference to fig. 4-6 as well.
First, in step S702, a plurality of output values corresponding to a designated plurality of second nodes among the plurality of nodes are received from at least one of the working machines. This step may be described with reference to fig. 4. Taking server 1 as an example, server 1 receives from work machine 0 and work machine 1 respectively the degree matrices of node 1 (i.e., column 2) and node 2 (i.e., column 3), the elements in the degree matrices being the type 0 and type 1 degree values corresponding to node 1 and node 2 respectively. That is, in the figure, the distributed system designation server 1 receives the out-degree values of the nodes 1 and 2, that is, the nodes 1 and 2 are the plurality of second nodes.
At step S704, calculating at least part of elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers, and transmitting at least part of elements of the accumulation matrix to each of the working machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifies a sum of the outbound values of the respective nodes preceding the node.
As described above, the element value of the ith row and jth column of the accumulation matrix is equal to the sum of the j previous element values of the ith row in the degree matrix. In one embodiment, the server calculates the elements of the specified columns of the accumulation matrix in the present server simply by obtaining the required degree matrix element values from the other corresponding servers. For example, the server 1 is assigned columns 2 and 3, which acquires the values of the respective types of the 2 nd columns of the accumulation matrix by receiving the element values of the 1 st column of the degree matrix from the server 0, thereby accumulating the element values of the 1 st and 2 nd columns of the degree matrix for the respective types.
In one embodiment, the system designates one server (e.g., a first server) of the plurality of servers to obtain a complete degree matrix by obtaining its respective degree matrix element values from all other servers, and calculates an accumulation matrix based on the complete degree matrix.
In one embodiment, after computing at least a portion of the elements of the accumulation matrix, the at least a portion of the elements of the accumulation matrix are transmitted to at least one other server of the plurality of servers to backup the at least a portion of the elements of the accumulation matrix.
In one embodiment, each server obtains only a portion of the elements of the accumulation matrix and each server sends its respective portion of the elements to each work machine, thereby allowing each work machine to obtain the complete accumulation matrix. In one embodiment, a designated server (e.g., a first server) obtains a complete accumulation matrix and sends the accumulation matrix to each work machine such that each work machine obtains a complete accumulation matrix.
In step S706, a partial element of an edge vector is received from at least one of the working machines, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, each of the portions having node identifications of respective types of neighboring nodes of the plurality of nodes sequentially arranged therein. This step may be described with reference to fig. 4, where, for example, the system specifies that server 0 receives column 0-4 elements of the edge vector, and thus server 0 receives column 0-2 elements from work machine 0 and column 3-4 elements from work machine 1 in the upper portion of fig. 4. For a specific description of the edge vector, reference is made to the above specific description in step S208, and no further description is given here.
In one embodiment, after the partial elements of the edge vector are obtained, the partial elements of the edge vector are sent to at least one other server of the plurality of servers to backup the partial elements of the edge vector.
In step S708, a part of the elements of the edge vector is transmitted to each of the working machines.
In one embodiment, each of the plurality of servers sends a portion of the elements of its respective edge vector to each work machine, such that each work machine obtains the complete edge vector.
In one embodiment, a server (e.g., a first server) of a plurality of servers is designated to receive a portion of the elements of the respective edge vector from all other servers of the plurality of servers to obtain the edge vector and to send the edge vector to each of the work machines before sending the portion of the elements of the edge vector to each of the work machines. In one embodiment, the first server, after obtaining the complete edge vector, sends the edge vector to at least one other server to backup the edge vector.
In one embodiment, after transmitting the partial elements of the edge vector to each of the work machines, at least a portion of the elements of the accumulation matrix and the partial elements of the edge vector are transmitted to the corresponding work machine in response to a request from the work machine. This embodiment is applicable to some specific scenarios, for example, a sudden dead restart of the working machine 1, in which case all of the data in its memory is lost, so that the generation of the node sequence can continue by re-acquiring the accumulation matrix and edge vectors from the plurality of servers by sending data requests to the servers.
Fig. 8 illustrates an apparatus 800 for obtaining a sequence of nodes of a relational network based on a distributed system comprising a plurality of servers and a plurality of work machines, the relational network comprising a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values corresponding to respective types, and neighboring nodes, the apparatus being implemented in a first work machine of the plurality of work machines, comprising:
a first obtaining unit 81 configured to obtain a plurality of output values that each of a plurality of first nodes has, and send the obtained data to at least one first specified server of the plurality of servers;
a first receiving unit 82 configured to receive an accumulation matrix from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
a second obtaining unit 83 configured to obtain node identifiers and types of respective neighboring nodes of the at least one first node;
A calculating unit 84 configured to calculate, based on node identifiers and types of respective adjacent nodes of the at least one first node and the accumulation matrix, arrangement positions of the node identifiers of respective adjacent nodes of the at least one first node in an edge vector, to obtain partial elements of the edge vector, and to transmit the partial elements of the edge vector to at least one second designated server of the plurality of servers, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, in each of which node identifiers of respective adjacent nodes of the respective corresponding types of the plurality of nodes are sequentially arranged;
a second receiving unit 85 configured to receive the edge vectors from the plurality of servers; and
a third obtaining unit 86, configured to sequentially and randomly obtain a plurality of node identifiers as the node sequence according to a predetermined path, based on respective types of the plurality of nodes, the accumulation matrix and the edge vector, respectively, where the predetermined path defines types of respective nodes to which the plurality of node identifiers respectively correspond.
In one embodiment, the third obtaining unit 86 is further configured to randomly obtain, as the first node identifier of the node sequence, a node identifier from node identifiers of each of the plurality of nodes of the first predetermined type of the plurality of nodes according to a predetermined path.
In one embodiment, the third obtaining unit 86 includes:
a first obtaining subunit 861 configured to obtain, according to a predetermined path, a second predetermined type of output value of a node corresponding to the first node identifier based on the accumulation matrix;
a second obtaining subunit 862 configured to randomly obtain the first integer based on the output value; and
a calculating subunit 863 configured to calculate, based on the first integer, the second predetermined type, the first node identifier, the accumulation matrix, and the edge vector, an arrangement position in the edge vector, thereby obtaining a node identifier corresponding to the arrangement position as a second node identifier.
In one embodiment, the third acquisition unit 86 ends implementation in any of the following cases:
the number of the plurality of node identifiers reaches a predetermined number;
the next node identity cannot be found.
In an embodiment, the third acquisition unit is implemented in a number of loops, and the apparatus further comprises a writing unit 87 configured to write the sequence of nodes of a predetermined number of rows acquired by the predetermined number of loops to the database.
In one embodiment, the third acquisition unit stops the loop implementation when: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
In one embodiment, the third obtaining unit includes a receiving subunit 864 configured to receive the accumulation matrix and the edge vector from the plurality of servers in a case where the first working machine loses memory data.
Fig. 9 illustrates an apparatus 900 for acquiring a node sequence of a relational network based on a distributed system including a plurality of servers and a plurality of working machines, the relational network including a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to the respective types, and neighboring nodes, according to an embodiment of the present specification, the apparatus being implemented in a first server of the plurality of servers, comprising:
a first receiving unit 91 configured to receive, from at least one of the working machines, a plurality of output values respectively corresponding to a specified plurality of second nodes among the plurality of nodes;
a calculating unit 92 configured to calculate at least part of the elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers, and to transmit at least part of the elements of the accumulation matrix to each of the working machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
A second receiving unit 93 configured to receive partial elements of an edge vector from at least one of the working machines, wherein the edge vector includes at least one portion respectively corresponding to the at least one type, and node identifications of respective corresponding types of neighboring nodes of the plurality of nodes are sequentially arranged in each of the portions; and
a first transmitting unit 94 configured to transmit a part of the elements of the edge vector to each of the working machines.
In an embodiment, the apparatus further comprises a second sending unit 95 configured to send at least part of the elements of the accumulation matrix to at least one other server of the plurality of servers for backing up at least part of the elements of the accumulation matrix after calculating the at least part of the elements of the accumulation matrix.
In one embodiment, the apparatus further includes a third sending unit 96 configured to send the partial element of the edge vector to at least one other server of the plurality of servers to backup the partial element of the edge vector after the partial element of the edge vector is acquired.
In an embodiment, the apparatus further comprises, before sending the partial elements of the edge vectors to each of the working machines, a third receiving unit 97 configured to receive the partial elements of the corresponding edge vectors from all other servers of the plurality of servers to obtain the edge vectors, wherein the first sending unit is further configured to send the edge vectors to each of the working machines.
In an embodiment, the apparatus further comprises a fourth sending unit 98 configured to send at least part of the elements of the accumulation matrix and part of the elements of the edge vector to the respective working machine in response to a request of the working machine after sending part of the elements of the edge vector to each of the working machines.
Another aspect of the present disclosure provides a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements any of the methods described above.
With the solution of acquiring the node sequence of the relational network based on the distributed system according to the embodiment of the present specification, with the increase of the number of machines for parallel computing, the computing power is nearly doubled, and with the distributed architecture of the parameter server, very large scale graph nodes can be processed. In addition, the scheme optimizes the graph storage mode, and reduces the memory usage to the maximum extent.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Those of ordinary skill would further appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (31)

1. A method of obtaining a sequence of nodes of a relational network based on a distributed system, the distributed system comprising a plurality of servers and a plurality of work machines, the relational network comprising a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to respective types, and neighboring nodes, the method being performed in a first work machine of the plurality of work machines, comprising:
Acquiring a plurality of outbound values of a plurality of first nodes in the plurality of nodes, and transmitting the acquired data to at least one first designated server in the plurality of servers;
receiving an accumulation matrix from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
acquiring node identifiers and types of respective adjacent nodes of the at least one first node;
calculating an arrangement position of node identifiers of respective adjacent nodes of the at least one first node in an edge vector based on node identifiers and types of respective adjacent nodes of the at least one first node and the accumulation matrix to acquire partial elements of the edge vector, and sending the partial elements of the edge vector to at least one second designated server in the plurality of servers, wherein the edge vector comprises at least one part corresponding to the at least one type respectively, and node identifiers of respective adjacent nodes of the respective corresponding types of the plurality of nodes are sequentially arranged in each part;
Receiving the edge vectors from the plurality of servers; and
based on the respective types of the plurality of nodes, the accumulation matrix and the edge vector, sequentially and randomly acquiring a plurality of node identifiers as the node sequence according to a preset path, wherein the preset path limits the types of the nodes respectively corresponding to the plurality of node identifiers.
2. The method of claim 1, wherein sequentially randomly acquiring a plurality of node identities, respectively, as the node sequence comprises randomly acquiring one node identity from respective node identities of a first predetermined type of the plurality of nodes as a first node identity of the node sequence according to a predetermined path.
3. The method of claim 2, wherein sequentially randomly acquiring a plurality of node identities as the sequence of nodes, respectively, comprises:
acquiring a second preset type of output value of the node corresponding to the first node identifier based on the accumulation matrix according to a preset path;
randomly acquiring a first integer based on the output value; and
and calculating an arrangement position in the edge vector based on the first integer, the second preset type, the first node identification, the accumulation matrix and the edge vector, so as to obtain a node identification corresponding to the arrangement position as a second node identification.
4. The method of claim 1, wherein the relationship network is a bipartite graph network, the at least one type comprising a user type and a commodity type.
5. The method of claim 1, wherein elements of an ith row and jth column of the accumulation matrix are sums of the outbound values of type i for respective nodes having node identifications of 0 through j-1, respectively, wherein i and j are both counted from 0.
6. The method of claim 5, wherein in the edge vector, an arrangement position of at least one neighboring node of a type i of a node whose node identity is j in the edge vector is at least one position in the edge vector starting from a first position in which the respective node identities of the at least one neighboring node are arranged in order from small to large, wherein a column number of the first position in the edge vector is equal to a starting column number of a portion of the edge vector corresponding to the type i plus an element value of an ith row and a jth column in the accumulation matrix, wherein a column number in the edge vector is counted from 0.
7. The method of claim 1, wherein sequentially randomly acquiring the plurality of node identities, respectively, according to the predetermined path ends in any of the following cases:
The number of the plurality of node identifiers reaches a predetermined number;
the next node identity cannot be found.
8. The method of claim 1, wherein the step of sequentially randomly acquiring the plurality of node identifications, respectively, according to the predetermined path is looped a plurality of times, the method further comprising writing a predetermined number of rows of node sequences acquired through the predetermined number of loops into the database.
9. The method of claim 8, wherein the step of sequentially randomly acquiring the plurality of node identities, respectively, according to a predetermined path, stops cycling if: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
10. The method of claim 1, wherein sequentially randomly acquiring a plurality of node identities as the sequence of nodes, respectively, according to a predetermined path based on the respective type of the plurality of nodes, the accumulation matrix, and the edge vector comprises receiving the accumulation matrix and the edge vector from the plurality of servers in the event that memory data is lost by the first work machine.
11. A method of obtaining a sequence of nodes of a relational network based on a distributed system, the distributed system comprising a plurality of servers and a plurality of work machines, the relational network comprising a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values corresponding to respective types, and neighboring nodes, the method being performed in a first server of the plurality of servers, comprising:
Receiving a plurality of output values from at least one of the work machines, the plurality of output values corresponding to a designated plurality of second nodes of the plurality of nodes, respectively;
calculating at least part of the elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers and transmitting at least part of the elements of the accumulation matrix to each of the work machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
receiving partial elements of an edge vector from at least one of the work machines, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, each of the portions having node identifications of respective corresponding types of neighboring nodes of the plurality of nodes sequentially arranged therein; and
and sending part of elements of the edge vector to each working machine.
12. The method of claim 11, further comprising, after computing at least a portion of the elements of the accumulation matrix, transmitting the at least a portion of the elements of the accumulation matrix to at least one other server of the plurality of servers to backup the at least a portion of the elements of the accumulation matrix.
13. The method of claim 11, further comprising, after obtaining the partial elements of the edge vector, sending the partial elements of the edge vector to at least one other server of the plurality of servers to backup the partial elements of the edge vector.
14. The method of claim 11, further comprising, prior to sending the partial elements of the edge vectors to each of the work machines, receiving the partial elements of the corresponding edge vectors from all other servers of the plurality of servers to obtain the edge vectors, wherein sending the partial elements of the edge vectors to each of the work machines comprises sending the edge vectors to each of the work machines.
15. The method of claim 11, further comprising, after transmitting the partial elements of the edge vector to each of the work machines, transmitting at least a portion of the elements of the accumulation matrix and the partial elements of the edge vector to the respective work machine in response to a request of the work machine.
16. An apparatus for obtaining a sequence of nodes of a relational network based on a distributed system, the distributed system comprising a plurality of servers and a plurality of work machines, the relational network comprising a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to respective types, and neighboring nodes, the apparatus being implemented in a first work machine of the plurality of work machines, comprising:
A first obtaining unit configured to obtain a plurality of output values of each of a plurality of first nodes in the plurality of nodes, and send the obtained data to at least one first designated server in the plurality of servers;
a first receiving unit configured to receive an accumulation matrix from the plurality of servers, the accumulation matrix showing: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
a second obtaining unit configured to obtain node identifiers and types of respective neighboring nodes of the at least one first node;
a calculation unit configured to calculate, based on node identifiers and types of respective adjacent nodes of the at least one first node and the accumulation matrix, arrangement positions of the node identifiers of respective adjacent nodes of the at least one first node in an edge vector, to obtain partial elements of the edge vector, and to send the partial elements of the edge vector to at least one second specifying server of the plurality of servers, wherein the edge vector includes at least one portion corresponding to the at least one type, respectively, in each of which node identifiers of respective corresponding types of adjacent nodes of the plurality of nodes are sequentially arranged;
A second receiving unit configured to receive the edge vectors from the plurality of servers; and
and a third obtaining unit configured to sequentially and randomly obtain a plurality of node identifiers as the node sequence according to a predetermined path, based on respective types of the plurality of nodes, the accumulation matrix and the edge vector, wherein the predetermined path defines the type of each node to which the plurality of node identifiers respectively correspond.
17. The apparatus of claim 16, wherein the third obtaining unit is further configured to randomly obtain, as the first node identifier of the node sequence, one node identifier from node identifiers of each of a plurality of nodes of the first predetermined type among the plurality of nodes according to a predetermined path.
18. The apparatus of claim 17, wherein the third acquisition unit comprises:
the first acquisition subunit is configured to acquire a second preset type of output value of the node corresponding to the first node identifier based on the accumulation matrix according to a preset path;
a second acquisition subunit configured to randomly acquire a first integer based on the output value; and
and the calculating subunit is configured to calculate an arrangement position in the edge vector based on the first integer, the second preset type, the first node identifier, the accumulation matrix and the edge vector, so as to acquire the node identifier corresponding to the arrangement position as a second node identifier.
19. The apparatus of claim 16, wherein the relationship network is a bipartite graph network, the at least one type comprising a user type and a commodity type.
20. The apparatus of claim 16, wherein elements of an ith row and jth column of the accumulation matrix are sums of outbound values of type i for respective nodes having node identifications of 0 through j-1, respectively, wherein i and j are both counted from 0.
21. The apparatus of claim 20, wherein in the edge vector, an arrangement position of at least one neighboring node of a type i of a node identified as j in the edge vector is at least one position in the edge vector starting from a first position, and respective node identifications of the at least one neighboring node are arranged in the at least one position in order from small to large, wherein a column number of the first position in the edge vector is equal to a starting column number of a portion of the edge vector corresponding to the type i plus an element value of an ith row and a jth column in the accumulation matrix, wherein a column number in the edge vector is counted from 0.
22. The apparatus of claim 16, wherein the third acquisition unit ends implementation in any of the following cases:
The number of the plurality of node identifiers reaches a predetermined number;
the next node identity cannot be found.
23. The apparatus according to claim 16, wherein the third acquisition unit is implemented in a plurality of loops, the apparatus further comprising a writing unit configured to write a predetermined number of rows of the node sequence acquired by the predetermined number of loops into the database.
24. The apparatus of claim 23, wherein the third acquisition unit stops loop implementation when: and the sum of the number of lines of the node sequence respectively acquired by the plurality of working machines reaches a preset value.
25. The apparatus of claim 16, wherein the third acquisition unit comprises a receiving subunit configured to receive the accumulation matrix and the edge vector from the plurality of servers in the event that memory data is lost by the first work machine.
26. An apparatus for obtaining a sequence of nodes of a relational network based on a distributed system, the distributed system comprising a plurality of servers and a plurality of work machines, the relational network comprising a plurality of nodes connected to each other, wherein each node has a node identification, one of at least one type, a plurality of out-degree values respectively corresponding to respective types, and neighboring nodes, the apparatus being implemented in a first server of the plurality of servers, comprising:
A first receiving unit configured to receive, from at least one of the working machines, a plurality of output values respectively corresponding to a specified plurality of second nodes among the plurality of nodes;
a computing unit configured to compute at least part of the elements of an accumulation matrix by receiving corresponding output value data from other servers of the plurality of servers, and to send at least part of the elements of the accumulation matrix to each of the working machines, wherein the accumulation matrix shows: for each of the plurality of nodes, for each of the at least one type, the node identifying a sum of the outbound values of the respective nodes preceding the node;
a second receiving unit configured to receive partial elements of an edge vector from at least one of the working machines, wherein the edge vector includes at least one portion respectively corresponding to the at least one type, and node identifications of respective corresponding types of neighboring nodes of the plurality of nodes are sequentially arranged in each of the portions; and
and the first transmitting unit is configured to transmit part of elements of the edge vector to each working machine.
27. The apparatus of claim 26, further comprising a second transmitting unit configured to transmit at least a portion of the elements of the accumulation matrix to at least one other server of the plurality of servers to backup at least a portion of the elements of the accumulation matrix after calculating the at least a portion of the elements of the accumulation matrix.
28. The apparatus of claim 26, further comprising a third sending unit configured to send the partial element of the edge vector to at least one other server of the plurality of servers to backup the partial element of the edge vector after the partial element of the edge vector is acquired.
29. The apparatus of claim 26, further comprising a third receiving unit configured to receive the partial elements of the respective edge vectors from all other servers of the plurality of servers to obtain the edge vectors before transmitting the partial elements of the edge vectors to each of the work machines, wherein the first transmitting unit is further configured to transmit the edge vectors to each of the work machines.
30. The apparatus of claim 26, further comprising a fourth transmitting unit configured to transmit at least a portion of the elements of the accumulation matrix and a portion of the elements of the edge vector to respective work machines in response to a request of the work machines after transmitting a portion of the elements of the edge vector to each of the work machines.
31. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-15.
CN201811278432.3A 2018-10-30 2018-10-30 Method and device for acquiring node sequence of relational network based on distributed system Active CN109614397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811278432.3A CN109614397B (en) 2018-10-30 2018-10-30 Method and device for acquiring node sequence of relational network based on distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811278432.3A CN109614397B (en) 2018-10-30 2018-10-30 Method and device for acquiring node sequence of relational network based on distributed system

Publications (2)

Publication Number Publication Date
CN109614397A CN109614397A (en) 2019-04-12
CN109614397B true CN109614397B (en) 2023-06-20

Family

ID=66002567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811278432.3A Active CN109614397B (en) 2018-10-30 2018-10-30 Method and device for acquiring node sequence of relational network based on distributed system

Country Status (1)

Country Link
CN (1) CN109614397B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436947A (en) * 2008-12-17 2009-05-20 中山大学 Expandable island type multicast transmission system suitable for IPTV stream medium business
CN101534205A (en) * 2008-03-11 2009-09-16 中国网通集团宽带业务应用国家工程实验室有限公司 Application layer multicast service realizing method, terminal and system thereof
CN101562556A (en) * 2008-04-15 2009-10-21 华为技术有限公司 Method, device and system for reducing network coding cost
CN102318288A (en) * 2011-07-29 2012-01-11 华为技术有限公司 Node sequencing and choosing method, Apparatus and system
CN103457757A (en) * 2012-05-29 2013-12-18 塔塔咨询服务有限公司 Method and system for network transaction monitoring using transaction flow signatures
CN104158840A (en) * 2014-07-09 2014-11-19 东北大学 Method for calculating node similarity of chart in distributing manner
CN105075179A (en) * 2013-02-05 2015-11-18 思科技术公司 Learning machine based detection of abnormal network performance
CN106953801A (en) * 2017-01-24 2017-07-14 上海交通大学 Stochastic shortest route implementation method based on hierarchical structure learning automaton
CN107750471A (en) * 2015-06-25 2018-03-02 艾尔斯潘网络公司 The radio network configuration determined using the path loss between node
CN108021610A (en) * 2017-11-02 2018-05-11 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on distributed system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050243736A1 (en) * 2004-04-19 2005-11-03 International Business Machines Corporation System, method, and service for finding an optimal collection of paths among a plurality of paths between two nodes in a complex network
US7626544B2 (en) * 2006-10-17 2009-12-01 Ut-Battelle, Llc Robust low-frequency spread-spectrum navigation system
US8396855B2 (en) * 2010-05-28 2013-03-12 International Business Machines Corporation Identifying communities in an information network
US10361926B2 (en) * 2017-03-03 2019-07-23 Nec Corporation Link prediction with spatial and temporal consistency in dynamic networks

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534205A (en) * 2008-03-11 2009-09-16 中国网通集团宽带业务应用国家工程实验室有限公司 Application layer multicast service realizing method, terminal and system thereof
CN101562556A (en) * 2008-04-15 2009-10-21 华为技术有限公司 Method, device and system for reducing network coding cost
CN101436947A (en) * 2008-12-17 2009-05-20 中山大学 Expandable island type multicast transmission system suitable for IPTV stream medium business
CN102318288A (en) * 2011-07-29 2012-01-11 华为技术有限公司 Node sequencing and choosing method, Apparatus and system
CN103457757A (en) * 2012-05-29 2013-12-18 塔塔咨询服务有限公司 Method and system for network transaction monitoring using transaction flow signatures
CN105075179A (en) * 2013-02-05 2015-11-18 思科技术公司 Learning machine based detection of abnormal network performance
CN104158840A (en) * 2014-07-09 2014-11-19 东北大学 Method for calculating node similarity of chart in distributing manner
CN107750471A (en) * 2015-06-25 2018-03-02 艾尔斯潘网络公司 The radio network configuration determined using the path loss between node
CN106953801A (en) * 2017-01-24 2017-07-14 上海交通大学 Stochastic shortest route implementation method based on hierarchical structure learning automaton
CN108021610A (en) * 2017-11-02 2018-05-11 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on distributed system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Is random walk truly memoryless — Traffic analysis and source location privacy under random walks";Rui Shi;《2013 Proceedings IEEE INFOCOM》;20130725;第3021-3029页 *
"应用随机游走的社交网络用户分类方法";贺超波 等;《计算机科学》;20150215;第198-203页 *

Also Published As

Publication number Publication date
CN109614397A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109087054B (en) Collaborative office data stream processing method, device, computer equipment and storage medium
CN110824587B (en) Image prediction method, image prediction device, computer equipment and storage medium
CN101208696A (en) Aggregating data with complex operations
CN109324901B (en) Deep learning distributed computing method, system and node based on block chain
Ribeiro et al. Efficient heuristics for the workover rig routing problem with a heterogeneous fleet and a finite horizon
CN106875167B (en) Detection method and device for fund transaction path in electronic payment process
CN109614397B (en) Method and device for acquiring node sequence of relational network based on distributed system
CN113239114B (en) Data storage method and device, storage medium and electronic device
CN112989211B (en) Method and system for determining information similarity
CN112560939B (en) Model verification method and device and computer equipment
US10824663B2 (en) Adverse information based ontology reinforcement
CN103049486A (en) Processing method and system for synergizing filter distances
CN109993338B (en) Link prediction method and device
CN116049708A (en) Association relation screening method and device based on atlas
CN111290713B (en) Data storage method and device, electronic equipment and storage medium
CN107203550B (en) Data processing method and database server
CN114880315A (en) Service information cleaning method and device, computer equipment and storage medium
CN108564135B (en) Method for constructing framework program and realizing high-performance computing program running time prediction
CN110717727A (en) Information processing method and device based on multiple platforms, computer equipment and storage medium
CN113064720B (en) Object allocation method, device, server and storage medium
Wei et al. Connecting AI Learning and Blockchain Mining in 6G Systems
CN112115488B (en) Data processing method and device and electronic equipment
CN103049489B (en) For the treatment of the method and system of collaborative filtering distance
CN117290560B (en) Method and device for acquiring graph data in graph calculation task
CN116909816B (en) Database recovery method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant