CN111061918A

CN111061918A - Graph data processing method and device and storage medium

Info

Publication number: CN111061918A
Application number: CN201811204753.9A
Authority: CN
Inventors: 张宏; 王龙; 李道彪; 陈哲嘉; 沈秋军
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-10-16
Filing date: 2018-10-16
Publication date: 2020-04-24
Anticipated expiration: 2038-10-16
Also published as: CN111061918B

Abstract

The invention discloses a method and a device for processing graph data and a storage medium, and belongs to the technical field of data processing. The method comprises the following steps: scanning graph information of graph data to obtain a neighbor node information set of each node in the graph data, wherein the neighbor node information set of each node comprises information of all neighbor nodes of each node; determining at least one of a first-order relationship path and a second-order relationship path of each node according to the adjacent node information set of each node; and determining the multi-order relation path of each node through an iterative splicing algorithm based on at least one of the first-order relation path and the second-order relation path of each node. In the embodiment of the invention, the multi-order relation path of each node can be determined only by scanning the graph information in the graph database, so that the pressure of the graph database is reduced.

Description

Graph data processing method and device and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing graph data, and a storage medium.

Background

Currently, some data with certain association relationship can be structured as a graph to be stored by graph data. The graph information of the graph data includes node information and side information, that is, the graph data generally includes a plurality of nodes and a plurality of edges, and an edge between each two adjacent nodes can be used to characterize a path relationship between the two adjacent nodes. For example, in a network, an IP (Internet Protocol) address is used as a node, and a communication relationship between the IP address and the IP address is used as an edge, so that graph data corresponding to the network can be obtained.

In some application scenarios, there is often a need to determine multi-level relationship paths for various nodes in graph data. Currently, it is generally necessary to repeatedly read node information and side information of graph data to determine a multi-level relationship path of a node. For example, when determining the second-order relationship path of the node S, the node B adjacent to the node S may be queried according to the edge information of the node S, and then the node C adjacent to the node B may be queried according to the edge information of the node B, so that the second-order relationship path of the node S may be determined to be S- > B- > C. For another example, when determining the second-order relationship path of the node Q, the node B adjacent to the node Q may be queried according to the edge information of the node Q, and then the node C adjacent to the node B may be queried according to the edge information of the node B, so that it may be determined that the second-order relationship path of the node Q is Q- > B- > C.

However, in the above implementation, since the node S and the node Q both have an association relationship with the node B and the node C, the node B and the node C and the related side information need to be read in the process of determining the multi-level relationship path of the node S and the node Q. Thus, when the data volume is large and the relation paths to be determined are large, the nodes similar to the node B and the node C need to be read repeatedly, and the pressure of the graph database is increased.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing graph data and a storage medium, which can solve the problem that the pressure of a graph database is increased because data needs to be read repeatedly in the related art. The technical scheme is as follows:

in a first aspect, a method for processing graph data is provided, where the method includes:

scanning graph information of graph data to obtain a neighbor node information set of each node in the graph data, wherein the neighbor node information set of each node comprises information of all neighbor nodes of each node;

determining at least one of a first-order relationship path and a second-order relationship path of each node according to the adjacent node information set of each node;

and determining the multi-order relation path of each node through an iterative splicing algorithm based on at least one of the first-order relation path and the second-order relation path of each node.

Optionally, the determining the multi-order relationship path of each node through an iterative concatenation algorithm based on at least one of the first-order relationship path and the second-order relationship path of each node includes:

according to the target order to be determined, by formula

Determining a first value, wherein M is the first value and N is the target order;

and determining the multi-order relation path of each node through an iterative splicing algorithm according to the target order, the first numerical value and at least one of the first-order relation path and the second-order relation path of each node.

Optionally, the determining, according to the target order, the first numerical value, and at least one of a first order relationship path and a second order relationship path of each node, a multi-order relationship path of each node through an iterative concatenation algorithm includes:

determining a second value according to the first value by a formula R ^ 2^ M, wherein R is the second value;

when the target order is equal to the second numerical value, determining a multi-order relation path of each node through an iterative splicing algorithm based on the second-order relation path of each node;

and when the target order is not equal to the second numerical value, determining the multi-order relation path of each node through an iterative splicing algorithm based on at least one of the first-order relation path and the second-order relation path of each node.

Optionally, the determining the multi-order relationship path of each node through an iterative concatenation algorithm based on the second-order relationship path of each node includes:

obtaining an i-order relational path of a target node, wherein the target node is an end point of the i-order relational path of each node, and splicing the i-order relational path of each node and the i-order relational path of the target node to obtain a 2 i-order relational path of each node;

determining whether the 2i is equal to the target order;

when the 2i is not equal to the target order, making i equal to 2i, and returning to the step of obtaining an i-order relationship path of a target node, where the target node is an end point of the i-order relationship path of each node, and splicing the i-order relationship path of each node with the determined i-order relationship path to obtain a 2 i-order relationship path of each node;

and when the 2i is equal to the target order, determining the obtained 2 i-order relation path as the multi-order relation path of each node.

determining a third value according to the first value by using a formula K ^ 2^ (M-1), wherein K is the third value;

determining a difference between the target order and the second value;

when the difference value is larger than the third numerical value, determining a 2R order relation path of each node through an iterative splicing algorithm based on the second order relation path of each node; deleting the last (2R-N) order relation path in the 2R order relation path of each node to obtain a multi-order relation path of each node;

when the difference value is smaller than the third numerical value, determining an R-order relational path of each node through an iterative concatenation algorithm based on the second-order relational path of each node, determining an (N-R) -order relational path of a terminal point in the R-order relational path of each node through the iterative concatenation algorithm based on the first-order relational path and the second-order relational path of each node, and splicing the R-order relational path of each node and the determined (N-R) -order relational path of the terminal point to obtain a multi-order relational path of each node.

Optionally, when the graph data is undirected graph data, before the splicing the i-order relationship path of each node with the i-order relationship path of the target node, the method further includes:

detecting whether the adjacent node in the i-order relationship path of the target node is the same as the ith node in the i-order relationship path of each node;

and when the adjacent node in the i-order relationship path of the target node is different from the ith node in the i-order relationship path of each node, executing the step of splicing the i-order relationship path of each node and the i-order relationship path of the target node.

Optionally, after determining the multi-order relationship path of each node by using an iterative concatenation algorithm based on at least one of the first-order relationship path and the second-order relationship path of each node, the method further includes:

and taking the node information of each node as a row key, and storing the paths with the same order number in at least one multi-order relational path of each node to the same row of a database.

In a second aspect, an apparatus for processing graph data is provided, the apparatus comprising:

the scanning module is used for scanning graph information of graph data to obtain a neighbor node information set of each node in the graph data, wherein the neighbor node information set of each node comprises information of all neighbor nodes of each node;

a first determining module, configured to determine at least one of a first order relationship path and a second order relationship path of each node according to the neighboring node information set of each node;

and the second determining module is used for determining the multi-order relation path of each node through an iterative stitching algorithm based on at least one of the first-order relation path and the second-order relation path of each node.

Optionally, the second determining module is configured to:

according to the target order to be determined, by formula

Optionally, the second determining module is configured to:

determining whether the 2i is equal to the target order;

Optionally, the second determining module is configured to:

determining a difference between the target order and the second value;

Optionally, the apparatus further comprises:

a detection module, configured to detect, when the graph data is undirected graph data, whether an adjacent node in an i-order relationship path of the target node is the same as an ith node in the i-order relationship path of each node;

the second determining module is configured to splice the i-order relationship path of each node with the i-order relationship path of the target node when an adjacent node in the i-order relationship path of the target node is different from the ith node in the i-order relationship path of each node.

Optionally, the apparatus further comprises:

and the storage module is used for taking the node information of each node as a row key and storing the same path order in at least one multi-order relational path of each node to the same row of a database.

In a third aspect, a computer-readable storage medium is provided, and the computer-readable storage medium stores instructions that, when executed by a processor, implement the method for processing graph data according to the first aspect.

In a fourth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for processing graph data according to the first aspect described above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

and scanning graph information of the graph data to obtain a neighbor node information set of each node in the graph data, and determining at least one of a first-order relationship path and a second-order relationship path of each node according to the neighbor node information set of each node. Then, a multi-order relationship path of each node may be determined by an iterative stitching algorithm based on at least one of the determined first order relationship path and second order relationship path. Because the multi-order relation path of each node can be determined only by scanning the graph information in the graph database, the pressure of the graph database is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow diagram illustrating a method for processing graph data in accordance with an exemplary embodiment;

FIG. 2 is a diagram illustrating a directed graph data, according to an example embodiment;

FIG. 3 is a schematic diagram illustrating a second order relationship path for a node in accordance with an illustrative embodiment;

FIG. 4 is a diagram illustrating a relationship path of a node in accordance with an illustrative embodiment;

FIG. 5 is a diagram illustrating a relationship path of a node in accordance with an illustrative embodiment;

FIG. 6 is a flow chart illustrating a method of processing graph data in accordance with another exemplary embodiment;

FIG. 7 is a diagram illustrating undirected graph data in accordance with an exemplary embodiment;

FIG. 8 is a diagram illustrating a relationship path for a node in accordance with an illustrative embodiment;

FIG. 9 is a diagram illustrating a relationship path for a node in accordance with an illustrative embodiment;

FIG. 10 is a diagram illustrating a relationship path for a node in accordance with an illustrative embodiment;

FIG. 11 is a diagram illustrating a relationship path for a node in accordance with an illustrative embodiment;

FIG. 12 is a block diagram illustrating an apparatus for processing graph data in accordance with an exemplary embodiment;

FIG. 13 is a block diagram illustrating an apparatus for processing graph data in accordance with an exemplary embodiment;

FIG. 14 is a block diagram illustrating an apparatus for processing graph data in accordance with an exemplary embodiment;

FIG. 15 is a block diagram illustrating a computer device according to an example embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Before describing the processing method of the graph data provided by the embodiment of the present invention in detail, terms, application scenarios and implementation environments related to the embodiment of the present invention are briefly described.

First, terms related to the embodiments of the present invention will be briefly described.

Graph data: is composed of multiple nodes and multiple edges. In a graph database, storage is typically performed as graph information, which typically includes node information and side information.

Spark: a fast, general-purpose computing engine designed specifically for large-scale data processing can be used to build large-scale, low-latency data analysis applications. The data organization structure in Spark can be abstracted as RDD (flexible distributed Dataset), which is a data model specific to Spark and represents a read-only and partitioned data set.

Janusgraph: a distributed graph database. In general, the graph data is generally stored by adopting a janussgraph, a storage end of the janussgraph stores by adopting HBase, and the indexing is realized based on ElasticSearch. In implementation, both the stored coding format and the elastic search index can be optimized according to actual requirements.

Step (2): a relationship path between any two nodes in the graph data.

Degree: the degree of a node refers to the number of edges associated with the node. Further, for a directed graph, the degree of a node can be subdivided into an In-degree (In-degree) and an Out-degree (Out-degree). The degree of entry of a node refers to the number of edges which take the node as an end point in all the edges associated with the node; the out degree is the number of edges starting from the node among the edges associated with the out degree.

Secondly, the application scenarios related to the embodiment of the invention are briefly introduced.

In the overall analysis processing process of the graph data, the determination of the multi-order relation path of each node in the graph data has significance. At present, when graph data is processed, data in a graph database needs to be repeatedly acquired, and the pressure of the graph database is increased. Therefore, the embodiment of the invention provides a graph data processing method, which can read data from a graph database only once, and determine a multi-order relation path of each node based on the read data, so that the pressure of the graph database is reduced. For a specific implementation, refer to the following embodiments.

Finally, a brief description of an implementation environment to which embodiments of the present invention relate will be given.

The graph data processing method provided by the embodiment of the invention can be executed by computer equipment, and further, the computer equipment can comprise a janussgraph, an HBase and the like and is used for storing data. In some embodiments, the computer device may include a tablet computer, a desktop computer, a portable computer, a notebook computer, and the like, which is not limited by the embodiments of the present invention.

After the terms, application scenarios and implementation environments related to the embodiments of the present invention are described, a detailed description will be given below of a processing method for graph data provided by the embodiments of the present invention with reference to the accompanying drawings. Since the graph data includes a directed graph and an undirected graph, the processing method of the graph data will be described herein with respect to the directed graph and the undirected graph, respectively, by the embodiments shown in fig. 1 and fig. 6, respectively, as follows.

Fig. 1 is a flowchart illustrating a method for processing graph data according to an exemplary embodiment, where the method for processing graph data is described as being applied to the foregoing implementation environment as an example, the method for processing graph data may include the following implementation steps:

step 101: and scanning graph information of the graph data to obtain a neighbor node information set of each node in the graph data, wherein the neighbor node information set of each node comprises information of all neighbor nodes of each node.

In a graph database, when nodes are stored, information for all edges associated with the nodes is also stored. Thus, after the graph information of the graph data in the graph database is scanned, the information of all the adjacent nodes of each node can be determined, and therefore the adjacent node information set of each node is obtained. Further, in a possible implementation, Spark may be used for the scanning process.

It should be noted that, in the embodiment of the present invention, each node includes at least one neighboring node, and therefore, at least one piece of neighboring node information may be scanned for each node, so as to obtain a set of neighboring node information of each node.

The node information of each node may be used to uniquely identify a node, and for example, the node information may be an ID of the node.

In the embodiment of the present invention, when the graph data is a directed graph, the neighbor node information set of each node includes an edge entry point information set and an edge exit point information set. The edge point set of each node comprises node information of all nodes which are adjacent to each node and the path of which points to each node, and the edge point set of each node comprises node information of all nodes which are adjacent to each node and the path of which points to each node.

For example, referring to fig. 2, fig. 2 is a schematic diagram illustrating directed graph data according to an example embodiment. The neighbor node information set of the node A comprises an edge point information set and an edge point information set, wherein the edge point information set comprises { B, C }, and the edge point information set comprises { D, E }.

Further, if Spark is used for scanning, after the scanning process, the RDD has a format of (vertexID, inlist, outlist), where vertexID represents node information of a certain node, inlist represents an edge point information set, and may be represented in a list form, and outlist represents an edge point information set, or may be represented in a list form.

Step 102: and determining at least one of a first-order relationship path and a second-order relationship path of each node according to the adjacent node information set of each node.

It is understood that after the neighbor node information set of each node is obtained, the first order relationship path and the second order relationship path of each node can be determined. For example, for node a, the corresponding RDD format is (a, { B }, { C }), so that it can be known that the first-order relationship path of node a is a- > C, and in general, the first-order relationship path of node a can be referred to as the first-order relationship path.

For the second-order relationship path of each node in the directed graph data, the second-order relationship path may be determined based on the first-order relationship path in-degree and the first-order relationship path out-degree of each node, as in the above example, it may be determined that the first-order relationship path in-degree of the node a is B- > a and the first-order relationship path out-degree of the node a is a- > C according to the neighboring node information set of the node a, so that the second-order relationship path of the node B is B- > a- > C.

Since the set of neighbor node information for each node is already known, based on this, at least one of the first order relationship path and the second order relationship path for each node can be determined. For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating a second order relationship path of a node according to an exemplary embodiment.

After at least one of the first order relationship path and the second order relationship path of each node is obtained, any multi-order relationship path of each node can be determined through an iterative splicing algorithm based on at least one of the first order relationship path and the second order relationship path. For example, the third order relationship path of the node may be determined by determining the first order relationship path according to the degree of entry of the node, and then performing reverse generation by using the first order relationship path, for example, referring to fig. 4, the degree of entry edge of the node B has F and G, and the third order relationship path of the node F and the node G may be generated according to the second order relationship path of the node B. That is, the calculation of the N-th order relationship path for each node can be performed by P-th order and (N-P) -th order combinations, and it is required that the last node of P-th order is the same as the start node of (N-P) -th order, for example, as shown in FIG. 5.

In the implementation, it is specifically required to determine a first-order relationship path of a node, determine a second-order relationship path of a node, or determine a first-order relationship path and a second-order relationship path of a node, and may be determined according to an order of a multi-order relationship path that needs to be determined actually, and please refer to steps 103 to 104 for a specific implementation policy.

Step 103: and determining a first numerical value according to the target order to be determined.

Further, according to the target order to be determined, a first numerical value is determined by formula (1), wherein formula (1) is:

wherein M is a first value and N is a target order. That is, according to the target order to be determined, it is determined that the satisfaction

First value of the formula, M, wherein

The notation means rounding down. In addition, the target order can be set by a user according to actual requirements.

It should be noted that, the operation is described by taking a base number of 2 as an example, in other embodiments, other numbers may be used as base numbers, and the embodiment of the present invention is not limited to this.

Step 104: and determining the multi-order relation path of each node through an iterative splicing algorithm according to the target order, the first numerical value and at least one of the first-order relation path and the second-order relation path of each node.

In a possible implementation manner, the determining the specific implementation of the multi-order relationship path of each node according to the target order, the first value, and at least one of the first-order relationship path and the second-order relationship path of each node through the iterative concatenation algorithm may include the following 1041-1043 implementation steps:

1041: from the first value, a second value is determined by equation (2).

Wherein, the formula (2) is: r is 2^{^}M, and R is a second value.

Further, after determining the second value, the second value may be compared to a target order to determine whether the second value is equal to the target order. When the comparison results are different, the corresponding implementation processes are also different, for example, according to the comparison results, the following steps 1042 and 1043 may be adopted.

1042: and when the target order is equal to the second numerical value, determining the multi-order relation path of each node through an iterative splicing algorithm based on the second-order relation path of each node.

For example, when the target order is 8 and M is 3, it may be determined that the second value R is equal to the target order through the above operations, and at this time, the multi-level relationship path of each node may be determined through an iterative concatenation manner based on the second-level relationship path of each node.

Further, determining a specific implementation of the multi-order relationship path of each node through an iterative concatenation algorithm based on the second-order relationship path of each node may include: and obtaining an i-order relational path of a target node, wherein the target node is the end point of the i-order relational path of each node, and splicing the i-order relational path of each node and the i-order relational path of the target node to obtain a 2 i-order relational path of each node. Determining whether the 2i is equal to the target order, when the 2i is not equal to the target order, making i equal to 2i, and returning to the i-order relationship path of the obtained target node, wherein the target node is the end point of the i-order relationship path of each node, and splicing the i-order relationship path of each node with the determined i-order relationship path to obtain the 2 i-order relationship path of each node; and when the 2i is equal to the target order, determining the obtained 2 i-order relation path as the multi-order relation path of each node.

It should be noted that, in the implementation process, when determining the i-order relationship path of the target node, if i is greater than 2, the i-order relationship path of the target node needs to be determined by an iterative concatenation algorithm based on the second-order relationship path of the target node. That is to say, the second order relationship path of the target node is spliced with the second order relationship path of the end point in the second order relationship path of the target node, so as to obtain the fourth order relationship path of the target node. And then, determining a fourth-order relation path of the end point in the fourth-order relation path of the target node according to a splicing mode of the second-order relation path and the second-order relation path, and determining the i-order relation path of the target node by analogy.

For example, when the target order is 8, it is assumed that an 8-order relationship path of the node a is to be determined, the second-order relationship path of the node a is a- > B- > C, and the target node is the node C at this time, the second-order relationship path of the node C is obtained, and the second-order relationship path of the node C is assumed to be C- > D- > E, and at this time, the second-order relationship path of the node a and the second-order relationship path of the node C are spliced, so that a fourth-order relationship path of the node a is a- > B- > C- > D- > E can be obtained. Since the determined order is not equal to the target order, iterative concatenation needs to be continued, that is, a fourth-order relationship path of the node E needs to be acquired. When the fourth order relationship path of the node E is obtained, iterative splicing is performed on the basis of the second order relationship path of the node E to obtain the fourth order relationship path of the node E. For example, if the second-order relationship path of the node E is E- > F- > G, the second-order relationship path of the node G is obtained, and if the second-order relationship path of the node G is G- > H- > I, the second-order relationship path of the node E and the second-order relationship path of the node G are spliced to obtain the fourth-order relationship path of the node E, which is E- > F- > G- > H- > I. And then splicing the fourth order relation path of the node A and the fourth order relation path of the node C to obtain an eight-order relation path of the node A, wherein the eight-order relation path of the node A is A- > B- > C- > D- > E- > F- > G- > H- > I. At this time, since the obtained order is equal to the target order, the iterative concatenation operation is ended, so as to obtain an 8-order relationship path of the node a.

1043: and when the target order is not equal to the second numerical value, determining the multi-order relation path of each node through an iterative splicing algorithm based on at least one of the first-order relation path and the second-order relation path of each node.

Specifically, this step 1043 may include: according to the first value, the formula K is 2^{^}(M-1) determining a third value, K being the third value, determining a difference between the target order and the second value. When the difference is larger than the third value, based on the second-order relationship path of each node, determining the 2R-order relationship path of each node through an iterative splicing algorithm, and deleting the last (2R-N) order relationship path in the 2R-order relationship path of each node to obtain the multi-order relationship path of each node. When the difference is smaller than the third value, determining an R-order relational path of each node through an iterative concatenation algorithm based on the second-order relational path of each node, determining an (N-R) -order relational path of a terminal point in the R-order relational path of each node through the iterative concatenation algorithm based on the first-order relational path and the second-order relational path of each node, and splicing the R-order relational path of each node and the determined (N-R) -order relational path of the terminal point to obtain a multi-order relational path of each node.

For example, assuming that the target order is 7, if the multi-step relationship path of the node a is determined, M is equal to 2 and R is equal to 4 through the above calculation, in this case, the relationship path of the fourth order can be obtained by splicing the second-order relationship path and the second-order relationship path. In addition, because the difference 3 between N and R is greater than the third value, the seven-order relationship path of the node a may be spliced by two four-order relationship paths to obtain an eight-order relationship path, and it is assumed that the eight-order relationship path is a- > B- > C- > D- > E- > F- > G- > H- > I, and then the last (2R-N) order relationship path in the eight-order relationship path is deleted, that is, the last order relationship path in the eight-order relationship path is deleted, so that the seven-order relationship path of the node a may be a- > B- > C- > D- > E- > F- > G- > H.

For another example, if the target order is 9, if the multi-level relational path of the node a is determined, M is equal to 3 and R is equal to 8 through the above calculation. Because the difference between N and R is smaller than the third value, an 8-order relationship path of the node a is obtained by an iterative concatenation algorithm based on the second-order relationship path of the node a, and it is assumed that the 8-order relationship path of the node a is a- > B- > C- > D- > E- > F- > G- > H- > I. In addition, it is also necessary to determine the (9-8) order relationship path of the end point in the 8 order relationship path of the node a, here, the first order relationship path of the node I is determined, and it is assumed that the first order relationship path of the node I is I- > J. And then splicing the 8-order relation path of the node A and the first-order relation path of the node I to obtain a 9-order relation path of the node A, wherein the 9-order relation path of the node A is A- > B- > C- > D- > E- > F- > G- > H- > I- > J.

It should be noted that, the above step 103 and step 104 are used to implement an operation of determining the multi-order relationship path of each node through an iterative concatenation algorithm based on at least one of the first-order relationship path and the second-order relationship path of each node.

Further, when calculating the high order, because there are many paths of the high order, it is time consuming and memory consuming to aggregate the paths in Spark, for example, the third order relational path in an environment of 800 ten thousand is output to about 170G of HDFS, so HBase may be used for the final aggregation processing of data storage, and at least one relational path of each determined node may be stored in a differentiated manner.

Specifically, the node information of each node is taken as a row key, and the path orders in at least one multi-order relational path of each node are stored to the same row of the database, wherein the path orders are the same. That is to say, for each node, the same path order is written in the same row of the HBase, so that fast aggregation is achieved, and aggregation efficiency is improved.

In the embodiment of the invention, the graph information of the graph data is scanned to obtain the adjacent node information set of each node in the graph data, and at least one of the first-order relationship path and the second-order relationship path of each node can be determined according to the adjacent node information set of each node. Then, a multi-order relationship path of each node may be determined by an iterative stitching algorithm based on at least one of the determined first order relationship path and second order relationship path. Because the multi-order relation path of each node can be determined only by scanning the graph information in the graph database, the pressure of the graph database is reduced.

Fig. 6 is a flowchart illustrating a processing method of graph data according to another exemplary embodiment, which is described herein by taking as an example that the processing method of graph data is applied to the above implementation environment, and the processing method of graph data may include the following implementation steps:

step 601: and scanning graph information of the graph data to obtain a neighbor node information set of each node in the graph data, wherein the neighbor node information set of each node comprises information of all neighbor nodes of each node.

In a graph database, when nodes are stored, information for all edges associated with the nodes is also stored. Therefore, after the graph information of the graph data in the graph database is scanned, the information of all the adjacent nodes of each node can be determined, so that the adjacent node information set of each node is obtained, and the first-order information of all the nodes is obtained. Further, in a possible implementation, Spark may be used for the scanning process.

Further, if Spark is used for scanning, after the scanning process, the format of RDD is (vertexlist), where vertexlist represents node information of a certain node, and vertexlist represents a set of neighbor node information, which may be represented in a form of a list. For example, referring to fig. 7, fig. 7 is a diagram illustrating undirected graph data in accordance with an exemplary embodiment. The neighbor information set for node a includes B, C, D, E.

Step 602: and determining at least one of a first-order relationship path and a second-order relationship path of each node according to the adjacent node information set of each node.

It is understood that after the neighbor node information set of each node is obtained, the first order relationship path and the second order relationship path of each node can be determined. For example, for node A, its corresponding RDD format is (A, { B, C, D, E }), it can be known that the first-order relationship path of node A includes A- > B, A- > C, A- > D, A- > E.

For the second-order relationship path of each node in the undirected graph data, the second-order relationship path may be determined based on the first-order relationship path of each node, as in the above example, the second-order relationship path may be determined according to the neighboring node information set of the node a, as shown in fig. 8.

After at least one of the first order relationship path and the second order relationship path of each node is obtained, any multi-order relationship path of each node can be determined through an iterative splicing algorithm based on at least one of the first order relationship path and the second order relationship path. For example, determining the third order relationship path of the node may be determined by performing a reverse generation with the first order relationship path of the node, for example, referring to fig. 9 and 10, the in-degree edge of the node B has F, G, H, and generating the third order relationship path according to the second order relationship path of the node B may include: f- > B- > A- > E, G- > B- > A- > E, H- > B- > A- > E, F- > B- > A- > D, G- > B- > A- > D, H- > B- > A- > D, F- > B- > A- > C, G- > B- > A- > C, H- > B- > A- > C.

That is, for the calculation of the N-th order relationship path of each node, the calculation can be performed by the combination of the P-th order and the (N-P) -th order, and it is only necessary that the last node of the P-th order is the same as the start node of the (N-P) -th order. In the implementation, it is specifically required to determine a first-order relationship path of a node, determine a second-order relationship path of a node, or determine a first-order relationship path and a second-order relationship path of a node, and may be determined according to an order of a multi-order relationship path that needs to be determined actually, and please refer to steps 603 to 604 for specifically implementing a policy.

Step 603: and determining a first numerical value according to the target order to be determined.

Further, according to the target order to be determined, a first numerical value is determined by formula (1), wherein formula (1) is

M is a first value, and N is a target order. That is, according to the target order to be determined, it is determined that the satisfaction

First value of the formula, M, wherein

Step 604: and determining the multi-order relation path of each node through an iterative splicing algorithm according to the target order, the first numerical value and at least one of the first-order relation path and the second-order relation path of each node.

6041: from the first value, a second value is determined by equation (2).

Wherein, the formula (2) is R ═ 2^ M, and R is the second value.

6042: and when the target order is equal to the second numerical value, determining the multi-order relation path of each node through an iterative splicing algorithm based on the second-order relation path of each node.

Here, since the undirected graph has no direction distinction, the process of the repeated path needs to be considered in the splicing process. In the embodiment, before splicing the i-order relationship path of each node with the i-order relationship path of the target node, whether an adjacent node in the i-order relationship path of the target node is the same as the ith node in the i-order relationship path of each node is detected, and when the adjacent node in the i-order relationship path of the target node is different from the ith node in the i-order relationship path of each node, the i-order relationship path of each node is spliced with the i-order relationship path of the target node. Otherwise, when the adjacent node in the i-order relationship path of the target node is the same as the ith node in the i-order relationship path of each node, the splicing operation is not performed.

For example, referring to fig. 11, since the neighboring node in the second-order relationship path of the node C is the node a, the end point in the fourth-order relationship path of the node M is the node C, and the fourth node is the node a, which is the same as the neighboring node in the second-order relationship path of the node C, it can be determined that the second-order relationship path of the node C is the duplicate path, and at this time, the fourth-order relationship path of the node M and the second-order relationship path of the node C are not spliced, so that the duplicate path is avoided.

In addition, in the splicing process referred to below, the repetitive path processing is also required according to the above implementation.

6043: and when the target order is not equal to the second numerical value, determining the multi-order relation path of each node through an iterative splicing algorithm based on at least one of the first-order relation path and the second-order relation path of each node.

It should be noted that, the foregoing step 603 and step 604 are used to implement an operation of determining the multi-order relationship path of each node through an iterative concatenation algorithm based on at least one of the first-order relationship path and the second-order relationship path of each node.

In the embodiment of the invention, the graph information of the graph data is scanned to obtain the adjacent node information set of each node in the graph data, and at least one of the first-order relationship path and the second-order relationship path of each node can be determined according to the adjacent node information set of each node. Then, a multi-order relationship path of each node may be determined by an iterative stitching algorithm based on at least one of the determined first order relationship path and second order relationship path. Because the multi-order relation path of each node can be determined only by scanning the graph information in the graph database, most points appear in a plurality of paths in the graph data, and the point information appearing in the plurality of paths is obtained only once without repeated obtaining. Thus, the pressure on the graph database is reduced.

Fig. 12 is a schematic structural diagram illustrating a processing apparatus of graph data, which may be implemented by software, hardware, or a combination of the two, according to an exemplary embodiment. The apparatus may include:

a scanning module 1201, configured to scan graph information of graph data to obtain a neighbor node information set of each node in the graph data, where the neighbor node information set of each node includes information of all neighbor nodes of each node;

a first determining module 1202, configured to determine at least one of a first order relationship path and a second order relationship path of each node according to the neighboring node information set of each node;

a second determining module 1203, configured to determine, based on at least one of the first order relationship path and the second order relationship path of each node, a multi-order relationship path of each node through an iterative concatenation algorithm.

Optionally, the second determining module 1203 is configured to:

according to the target order to be determined, by formula

Optionally, the second determining module 1203 is configured to:

determining whether the 2i is equal to the target order;

Optionally, the second determining module 1203 is configured to:

determining a difference between the target order and the second value;

Optionally, referring to fig. 13, the apparatus further includes:

a detecting module 1204, configured to detect, when the graph data is undirected graph data, whether an adjacent node in an i-th order relationship path of the target node is the same as an ith node in the i-th order relationship path of each node;

the second determining module 1203 is configured to splice the i-order relationship path of each node with the i-order relationship path of the target node when an adjacent node in the i-order relationship path of the target node is different from the i-th node in the i-order relationship path of each node.

Optionally, referring to fig. 14, the apparatus further includes:

the storage module 1205 is configured to use the node information of each node as a row key, and store the path orders that are the same in at least one multi-order relationship path of each node to the same row of the database.

It should be noted that: in the processing apparatus for graph data provided in the above embodiment, when implementing the processing method for graph data, only the division of each functional module is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the processing apparatus for graph data provided in the foregoing embodiment and the processing method embodiment for graph data belong to the same concept, and details of a specific implementation process thereof are referred to in the method embodiment and are not described herein again.

FIG. 15 shows a block diagram of a computer device 1500 provided in an exemplary embodiment of the invention. The computer device 1500 may be: smart phones, tablet computers, MP3 players (Moving Picture Experts group Audio Layer III, motion video Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts compression standard Audio Layer IV, motion video Experts compression standard Audio Layer 4), notebook computers, or desktop computers. Computer device 1500 may also be referred to by other names as user device, portable computer device, laptop computer device, desktop computer device, and so forth.

Generally, computer device 1500 includes: a processor 1501 and memory 1502.

Processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1501 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 1501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 1502 may include one or more computer-readable storage media, which may be non-transitory. The memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1502 is used to store at least one instruction for execution by processor 1501 to implement the graph data processing methods provided by method embodiments herein.

In some embodiments, computer device 1500 may also optionally include: a peripheral interface 1503 and at least one peripheral. The processor 1501, memory 1502, and peripheral interface 1503 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1503 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1504, touch screen display 1505, camera 1506, audio circuitry 1507, positioning assembly 1508, and power supply 1509.

The peripheral interface 1503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1501 and the memory 1502. In some embodiments, the processor 1501, memory 1502, and peripheral interface 1503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1501, the memory 1502, and the peripheral interface 1503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1504 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 1504 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 1504 can communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1504 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1505 is a touch display screen, the display screen 1505 also has the ability to capture touch signals on or over the surface of the display screen 1505. The touch signal may be input to the processor 1501 as a control signal for processing. At this point, the display 1505 may also be used to provide at least one of virtual buttons and a virtual keyboard, also referred to as at least one of soft buttons and a soft keyboard. In some embodiments, the display 1505 may be one, providing a front panel of the computer device 1500; in other embodiments, the display screens 1505 may be at least two, each disposed on a different surface of the computer device 1500 or in a folded design; in still other embodiments, the display 1505 may be a flexible display disposed on a curved surface or a folded surface of the computer device 1500. Even further, the display 1505 may be configured in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1505 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 1506 is used to capture images or video. Optionally, the camera assembly 1506 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of a computer apparatus, and a rear camera is disposed on a rear surface of the computer apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1507 may include a microphone and speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1501 for processing or inputting the electric signals to the radio frequency circuit 1504 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations on the computing device 1500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1507 may also include a headphone jack.

A Location component 1508 is used to locate the current geographic Location of the computer device 1500 for navigation or LBS (Location Based Service). The Positioning component 1508 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, or the russian galileo System.

The power supply 1509 is used to supply power to the various components in the computer device 1500. The power supply 1509 may be alternating current, direct current, disposable or rechargeable. When the power supply 1509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the computer device 1500 also includes one or more sensors 1510. The one or more sensors 1510 include, but are not limited to: acceleration sensor 1511, gyro sensor 1512, pressure sensor 1513, fingerprint sensor 1514, optical sensor 1515, and proximity sensor 1516.

The acceleration sensor 1511 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the computer apparatus 1500. For example, the acceleration sensor 1511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1501 may control the touch screen display 1505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1511. The acceleration sensor 1511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1512 may detect a body direction and a rotation angle of the computer device 1500, and the gyro sensor 1512 and the acceleration sensor 1511 cooperate to collect a 3D motion of the user on the computer device 1500. The processor 1501 may implement the following functions according to the data collected by the gyro sensor 1512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 1513 may be disposed on a side bezel of the computer device 1500 and/or underneath the touch screen display 1505. When the pressure sensor 1513 is disposed on the side frame of the computer device 1500, the user's holding signal to the computer device 1500 may be detected, and the processor 1501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1513. When the pressure sensor 1513 is disposed at a lower layer of the touch display 1505, the processor 1501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 1505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1514 is configured to capture a fingerprint of the user, and the processor 1501 identifies the user based on the fingerprint captured by the fingerprint sensor 1514, or the fingerprint sensor 1514 identifies the user based on the captured fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 1514 may be disposed on the front, back, or side of the computer device 1500. When a physical key or vendor Logo is provided on the computer device 1500, the fingerprint sensor 1514 may be integrated with the physical key or vendor Logo.

The optical sensor 1515 is used to collect ambient light intensity. In one embodiment, processor 1501 may control the brightness of the display on touch screen 1505 based on the intensity of ambient light collected by optical sensor 1515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1505 is turned down. In another embodiment, the processor 1501 may also dynamically adjust the shooting parameters of the camera assembly 1506 based on the ambient light intensity collected by the optical sensor 1515.

A proximity sensor 1516, also known as a distance sensor, is typically provided on the front panel of the computer device 1500. The proximity sensor 1516 is used to capture the distance between the user and the front of the computer device 1500. In one embodiment, the touch display 1505 is controlled by the processor 1501 to switch from a bright screen state to a dark screen state when the proximity sensor 1516 detects that the distance between the user and the front face of the computer device 1500 is gradually decreasing; when the proximity sensor 1516 detects that the distance between the user and the front of the computer device 1500 is gradually increasing, the processor 1501 controls the touch display 1505 to switch from a breath screen state to a bright screen state.

Those skilled in the art will appreciate that the architecture shown in FIG. 15 is not intended to be limiting of the computer device 1500, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Embodiments of the present application further provide a non-transitory computer-readable storage medium, and when instructions in the storage medium are executed by a processor of a mobile computer device, the mobile computer device is enabled to execute the graph data processing method provided in the above-described illustrated embodiments.

Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, enable the computer to execute the method for processing graph data provided in the above-described illustrative embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for processing graph data, the method comprising:

2. The method of claim 1, wherein determining the multi-order relationship path for each node by an iterative concatenation algorithm based on at least one of the first order relationship path and the second order relationship path for each node comprises:

according to the target order to be determined, by formula

3. The method of claim 2, wherein determining the multi-order relationship path for each node by an iterative concatenation algorithm based on the target order, the first value, and at least one of the first order relationship path and the second order relationship path for each node comprises:

4. The method as claimed in claim 3, wherein said determining the multi-order relationship path of each node by an iterative concatenation algorithm based on the second-order relationship path of each node comprises:

determining whether the 2i is equal to the target order;

5. The method of claim 4, wherein determining the multi-order relationship path for each node by an iterative concatenation algorithm based on at least one of the first order relationship path and the second order relationship path for each node comprises:

according to the first value, by the formula K2^{^}(M-1) determining a third value, said K being said third value;

determining a difference between the target order and the second value;

6. The method according to claim 4 or 5, wherein when the graph data is undirected graph data, before the splicing the i-th order relationship path of each node with the i-th order relationship path of the target node, further comprising:

7. The method according to any one of claims 1-5, wherein after determining the multi-order relationship path for each node by an iterative concatenation algorithm based on at least one of the first order relationship path and the second order relationship path for each node, further comprising:

8. An apparatus for processing graph data, the apparatus comprising:

9. The apparatus of claim 8, wherein the second determination module is to:

according to the target order to be determined, by formula

10. The apparatus of claim 9, wherein the second determination module is to:

11. The apparatus of claim 10, wherein the second determination module is to:

determining whether the 2i is equal to the target order;

12. The apparatus of claim 11, wherein the second determination module is to:

determining a difference between the target order and the second value;

13. The apparatus of claim 11 or 12, wherein the apparatus further comprises:

14. The apparatus of any one of claims 8-12, further comprising:

15. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-7.