CN112559631B

CN112559631B - Data processing method and device of distributed graph database and electronic equipment

Info

Publication number: CN112559631B
Application number: CN202011480363.1A
Authority: CN
Inventors: 王益飞; 汪洋; 王宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2023-09-26
Anticipated expiration: 2040-12-15
Also published as: CN112559631A

Abstract

The disclosure discloses a data processing method and device of a distributed graph database and electronic equipment, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence such as knowledge maps and distributed storage. The specific implementation scheme is as follows: obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relation coefficient data; extracting a node identifier of the first node from the edge identifiers; acquiring the identification of the target fragment to which the first node belongs according to the node identification; and distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the data query efficiency is improved.

Description

Data processing method and device of distributed graph database and electronic equipment

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as knowledge graph, distributed storage and the like, and particularly relates to a data processing method and device of a distributed graph database and electronic equipment.

Background

In the related technology, when data is stored in a distributed graph database, aiming at a static distributed graph database, a community division type algorithm is mainly adopted to operate all nodes and edges in the distributed graph database, the compactness between the nodes is determined, the distributed graph database is divided into a plurality of sub-graphs according to the compactness, and the data of each sub-graph is stored in the same segment.

However, the algorithm is an operation performed on the whole distributed graph database, and the newly added graph entity in the distributed graph database needs to be combined with the newly added graph entity and the whole distributed graph database to determine the subgraph where the graph entity is located, so that the operation amount is large, the time is long, and the cost is high.

Disclosure of Invention

The present disclosure provides a data processing method, apparatus, electronic device, storage medium, and computer program product for a distributed graph database.

According to an aspect of the present disclosure, there is provided a data processing method of a distributed graph database, including: obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relation coefficient data; extracting a node identifier of the first node from the edge identifiers; acquiring the identification of the target fragment to which the first node belongs according to the node identification; and distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments.

According to another aspect of the present disclosure, there is provided a data processing apparatus of a distributed graph database, including: the first obtaining module is configured to obtain edge data to be added by a first node in the distributed graph database, where the edge data includes: edge identification and corresponding edge relation coefficient data; the extraction module is used for extracting the node identification of the first node from the edge identification; the second acquisition module is used for acquiring the identification of the target fragment to which the first node belongs according to the node identification; and the first distribution module is used for distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method of a distributed graph database as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a data processing method of a distributed graph database as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a data processing method according to the distributed graph database as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a method of data processing of a distributed graph database according to a first embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method of data processing of a distributed graph database according to a second embodiment of the present disclosure;

FIG. 3 is a flow diagram of a method of data processing of a distributed graph database according to a third embodiment of the present disclosure;

FIG. 4 is an example diagram of a data processing method of a distributed graph database according to a third embodiment of the present disclosure;

FIG. 5 is a block diagram of a data processing apparatus of a distributed graph database according to a fourth embodiment of the present disclosure;

FIG. 6 is a block diagram of a data processing apparatus of a distributed graph database according to a fifth embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing a data processing method of a distributed graph database in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It can be understood that in the related art, when data is stored in the distributed graph database, for the static distributed graph database, a community division type algorithm is mainly adopted to operate each node and each edge in the distributed graph database, the closeness between the nodes is determined, the distributed graph database is divided into a plurality of sub-graphs according to the closeness, and the data of each sub-graph is stored in the same segment.

The method comprises the steps of firstly extracting node identification of a first node from edge identifications included in edge data after the edge data to be added of the first node in a distributed graph database are obtained, then obtaining identification of a target fragment to which the first node belongs according to the node identification, and further distributing the edge identifications included in the edge data and corresponding edge relation coefficient data to the target fragment of the distributed graph database for storage according to the identification of the target fragment. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the data query efficiency is improved.

Data processing methods, apparatuses, electronic devices, non-transitory computer readable storage media, and computer program products of a distributed graph database of embodiments of the present disclosure are described below with reference to the accompanying drawings.

First, a data processing method of the distributed graph database provided in the present disclosure will be described in detail with reference to fig. 1.

Fig. 1 is a flow diagram of a data processing method of a distributed graph database according to a first embodiment of the present disclosure. It should be noted that, the data processing method of the distributed graph database provided in this embodiment may be applied to storage of a knowledge graph in the distributed graph database, and the execution main body of the method is a data processing device of the distributed graph database, hereinafter referred to as a data processing device, where the data processing device may be an electronic device, or may be configured in the electronic device, so as to implement data processing on the distributed graph database, reduce the operand of data distribution in the distributed graph database, save the time and cost of data distribution, and improve the query efficiency of data.

The electronic device may be any stationary or mobile computing device capable of performing data processing, for example, a mobile computing device such as a notebook computer, a smart phone, a wearable device, or a stationary computing device such as a desktop computer, or a server, or other types of computing devices, which is not limited in this disclosure.

As shown in fig. 1, the data processing method of the distributed graph database may include the following steps:

step 101, obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relationship data.

It will be appreciated that the Graph database is generally based on Graph theory (Graph) in the data structure, and the core elements that make up the Graph are two: nodes (also called points) and attributes on nodes, edges or relationships, and attributes on edges.

In the embodiment of the disclosure, the first node is any node in the distributed graph database, and the edge data to be added by the first node is edge data of an edge to be added to the distributed graph database and having a connection relationship with the first node.

It should be noted that, since the edge having the connection relationship with the first node may include one or more edges, in the embodiment of the disclosure, the edge data may include edge data of one edge, or edge data of a plurality of edges, which is not limited in the disclosure.

The edge identifier is used for uniquely representing attribute information of one edge having a connection relationship with the first node, and specifically may include an edge tag, a node identifier of the first node, a direction parameter and other attribute information, where the edge tag is a tag of the edge and is used for uniquely identifying the edge, the direction parameter is used for representing a direction of the edge, and the node identifier of the first node is used for uniquely identifying the first node. The edge identifier corresponds to the edge relationship data, and includes attribute information and the like of the edge corresponding to the edge identifier.

For example, assume that the graph entity includes nodes "Zhang Sano" and "Liqu", and that an edge from "Zhang Sano" to "Liqu" and having the attribute "teacher" and an edge from "Liqu" to "Zhang Sano" and having the attribute "student", that is, "Zhang Sano" is a teacher of "Liqu", are included between nodes "Zhang Sano" and "Liqu". When the edge data corresponding to the node Zhang Sanning is added in the distributed graph database, the edge mark of the edge pointed to the node Zhun Sanning from the node Zhang Sanning includes the edge label of the edge pointed to the node Zhun Sanning from the node Zhun Sanning and the edge outputting from the direction parameter of the edge, and the corresponding edge coefficient data includes a teacher; for the edge pointed to the node 'Zhang Sanning' by the node 'Li four', the edge mark comprises an edge label pointed to the edge of the node 'Zhang Sanning' by the node 'Li four', the node mark 'Zhang Sanning' and the edge entering of the direction parameter of the edge, and the corresponding edge coefficient data comprises 'student'.

Step 102, extracting the node identification of the first node from the edge identification.

It can be understood that, when the edge identifier includes attribute information such as an edge tag, an identifier of the first node, and a direction parameter, after obtaining edge data to be added by the first node in the distributed graph database, the node identifier of the first node may be extracted from the edge identifier included in the edge data.

Step 103, obtaining the identification of the target fragment to which the first node belongs according to the node identification.

And 104, distributing the edge identification and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identification of the target fragments.

The target fragment identification is used for uniquely identifying the target fragment in the distributed graph database, and can be set according to requirements.

In the embodiment of the disclosure, when data is stored in the distributed graph database, the data related to the same node is stored in the same slice, that is, each node has the slice to which the node belongs. Note that the slices to which different nodes belong may be the same or different, which is not limited in this disclosure.

Then, after the node identifier of the first node is obtained, the identifier of the target fragment to which the first node belongs can be obtained according to the node identifier, and then the edge identifier and the corresponding edge relationship coefficient data are distributed to the target fragment of the distributed graph database for storage according to the identifier of the target fragment.

Through the process, after the edge data to be added by the first node in the distributed graph database is obtained, the edge data to be added by the first node is distributed to the target fragment to which the first node belongs for storage. Because the edge data of the first node is added in the distributed graph database, the edge data of the first node can be stored only by calculating the edge identification in the edge data without combining with a newly added graph entity and the whole distributed graph database, thereby reducing the calculation amount in the process of data distribution in the distributed graph database and saving the time and cost of data distribution.

It may be appreciated that, in the target slice to which the first node belongs, node data of the first node may also be stored. The node data may include a node identifier of the first node and corresponding node attribute data. For example, the node data of the first node may include attribute data such as the node identifier "1" and the age, sex, etc. of the user corresponding to the node "1".

In the embodiment of the disclosure, since the node data of the first node, the edge data of the first node and other data related to the first node are stored on the target fragment to which the first node belongs, when the data related to the first node is queried in the distributed graph database, the node data of the first node and the edge data of all edges related to the first node can be obtained only by initiating a query request to one fragment, namely the target fragment, so that the query efficiency of the data in the distributed graph database is improved.

According to the data processing method of the distributed graph database, edge data to be added of a first node in the distributed graph database are firstly obtained, wherein the edge data comprise edge identifiers and corresponding edge relationship data, then node identifiers of the first node are extracted from the edge identifiers comprised by the edge data, identifiers of target fragments to which the first node belongs are obtained according to the node identifiers, and then the edge identifiers comprised by the edge data and the corresponding edge relationship data are distributed to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the query efficiency of the data in the distributed graph database is improved.

As can be seen from the above analysis, in the embodiment of the present disclosure, after obtaining the edge data to be added by the first node in the distributed graph database, the node identifier of the first node may be extracted from the edge identifiers included in the edge data, and then the identifier of the target segment to which the first node belongs is obtained according to the node identifier, and then the edge identifier and the corresponding edge coefficient data are distributed to the target segment of the distributed graph database for storage according to the identifier of the target segment. In the following, with reference to fig. 2, a process of extracting a node identifier of a first node from edge identifiers included in edge data and obtaining an identifier of a target fragment to which the first node belongs according to the node identifier in the data processing method of the distributed graph database provided by the present disclosure is further described.

Fig. 2 is a flow diagram of a data processing method of a distributed graph database according to a second embodiment of the present disclosure. As shown in fig. 2, the data processing method of the distributed graph database may include the following steps:

step 201, obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relationship data.

The specific implementation process and principle of the above step 201 may refer to the description of the above embodiment, which is not repeated herein.

Step 202, a first spacer ordered last in the edge identifier is obtained.

In step 203, the content of the edge identifier located after the first spacer is obtained.

And 204, determining the content as the node identification of the first node.

In an exemplary embodiment, the edge identification may be stitched from the following parameters: direction parameters, edge labels, and node identification of the first node; and a spacer is arranged between two adjacent parameters.

The edge label is a label of an edge corresponding to the edge mark and is used for uniquely marking the edge; the direction parameter is used for representing the direction of the edge corresponding to the edge identifier, such as the outgoing edge of the first node pointing to other nodes or the incoming edge of the first node pointing to other nodes; the node identifier of the first node is used for uniquely identifying the first node. The spacer may be any character capable of separating two adjacent parameters, and may be set according to needs, for example, may be "_".

For example, assuming that the edge tag included in the edge identifier is "label1", the node identifier of the first node is "id1", and the edge corresponding to the edge identifier is the outgoing edge of the first node pointing to other nodes, the edge identifier may be "outgoing edge_label1_id1"; assuming that the edge label included by the edge identifier is "label2", the node identifier of the first node is "id2", and the edge corresponding to the edge identifier is the incoming edge of the other nodes pointing to the first node, the edge identifier may be "incoming edge_label2_id2".

Accordingly, in an exemplary embodiment, the first spacer ordered in the last in the edge identifier may be acquired, and the content located after the first spacer in the edge identifier may be acquired, so as to determine the content located after the first spacer as the node identifier of the first node.

The content which is arranged behind the first spacer in the edge identifier is obtained by obtaining the last first spacer in the edge identifier, and then the content is determined to be the node identifier of the first node, so that the node identifier of the first node is extracted from the edge identifier, the extraction result is accurate, the operation amount of the extraction process is less, the operation amount of data distribution in a distributed graph database is further reduced, and the time and the cost of data distribution are saved.

Step 205, obtaining binary data corresponding to the node identifier.

Step 206, hash operation is performed on the binary data to obtain the identification of the target fragment to which the first node belongs.

In an exemplary embodiment, after the node identifier of the first node is obtained, binary data corresponding to the node identifier of the first node may be obtained, and hash operation is performed on the binary data, where a result of the hash operation is an identifier of a target fragment to which the first node belongs.

The hash operation process of the binary data corresponding to the node identifier of the first node may refer to a hash operation technology in the related art, which is not described herein.

Through the process, the identification of the target fragment to which the first node belongs is obtained according to the node identification of the first node, the operation amount of the process for obtaining the identification of the target fragment to which the first node belongs is less, the operation amount of data distribution in a distributed graph database is further reduced, and the time and cost of data distribution are saved.

Step 207, distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments.

Specifically, after the identifier of the target fragment to which the first node belongs is obtained, the edge identifier and the corresponding edge relationship coefficient data can be distributed to the target fragment of the distributed graph database for storage according to the identifier of the target fragment.

According to the data processing method of the distributed graph database, after edge data including edge identifiers and corresponding edge relationship coefficient data to be added by a first node in the distributed graph database are obtained, a first spacer which is ordered in the last in the edge identifiers is obtained, then content which is located behind the first spacer in the edge identifiers is obtained, the content is determined to be the node identifier of the first node, binary data corresponding to the node identifier is obtained, hash operation is carried out on the binary data, the identifier of a target fragment to which the first node belongs is obtained, and the edge identifiers and the corresponding edge relationship coefficient data are distributed to the target fragment of the distributed graph database for storage according to the identifier of the target fragment. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the data query efficiency is improved.

As can be seen from the above analysis, in the embodiment of the present disclosure, when the edge data of the first node is added to the distributed graph database, the edge identifier included in the edge data and the corresponding edge relationship coefficient data may be distributed to the target segment to which the first node belongs for storage. In one possible implementation manner, the node data of the first node may also be stored on the target partition to which the first node belongs, and in the data processing method of the distributed graph database provided in the present disclosure, a process of storing the node data of the first node on the target partition to which the first node belongs in the distributed graph database is further described below with reference to fig. 3.

Fig. 3 is a flow chart of a data processing method of a distributed graph database according to a third embodiment of the present disclosure. As shown in fig. 3, the data processing method of the distributed graph database may include the following steps:

step 301, obtaining node data of a first node to be added to a distributed graph database, wherein the node data includes: node identification and corresponding node attribute data.

The node identifier of the first node is used for uniquely identifying the first node, and can be set according to requirements. The node attribute data corresponding to the first node may include attribute data such as age, sex, etc. of the user corresponding to the first node.

Step 302, obtaining the identification of the target fragment to which the first node belongs according to the node identification.

And step 303, distributing the node identification and the corresponding node attribute data to the target fragments of the distributed graph database for storage according to the identifications of the target fragments.

After the node data of the first node to be added to the distributed graph database is obtained, the identifier of the target fragment to which the first node belongs can be obtained according to the node identifier included in the node data, and then the node identifier and the corresponding node attribute data are distributed to the target fragment of the distributed graph database for storage according to the identifier of the target fragment.

Through the process, the node data of the first node is distributed to the target fragments to which the first node belongs for storage according to the node identification of the first node in the node data of the first node to be added to the distributed graph database, and the node data of the first node can be stored only by operating the node identification in the node data when the node data of the first node is added to the distributed graph database, so that the operation amount when the data distribution is carried out in the distributed graph database is reduced, and the time and cost of the data distribution are saved.

Step 304, obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relationship data.

Step 305, extracting the node identifier of the first node from the edge identifiers.

Step 306, obtaining the identification of the target fragment to which the first node belongs according to the node identification.

Step 307, distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments.

The specific implementation process and principle of the steps 304-307 may refer to the description of the foregoing embodiments, which is not repeated herein.

It should be noted that, in an exemplary embodiment, steps 304-307 may be performed after step 303, or steps 304-307 may be performed simultaneously with steps 301-303, and the execution timing of the above steps is not limited in the embodiments of the present disclosure.

Through the process, the edge data to be added by the first node and the node data of the first node in the distributed graph database can be distributed to the target fragments to which the first node belongs for storage.

It will be appreciated that, to be added to the graph entity in the distributed graph database, each node typically has an original identifier (also referred to as an offline identifier), while in the distributed graph database, for convenience of data management, a corresponding internal identifier (also referred to as an online identifier) may be set for each node, and a mapping relationship between the offline identifier and the online identifier of each node is stored in the distributed graph database. For example, assuming that the graph entity includes 5 nodes in total, and the offline identifiers of each node are "node a", "Zhang san", "li si", "node B" and "node C", for convenience of data management, corresponding online identifiers "1", "2", "3", "4" and "5" may be set for each node, and the mapping relationship between the offline identifiers and the online identifiers of the 5 nodes may be stored in the distributed graph database.

In the embodiment of the disclosure, the node identifier of the first node may be an online identifier of the first node in the distributed graph database.

Referring to fig. 4, in the embodiment of the present disclosure, when node data of any node is added to the distributed graph database, an identifier of a target fragment to which the node belongs may be obtained according to an online identifier of the node included in the node data, and then, according to the identifier of the target fragment, the node identifier included in the node data and the corresponding node attribute data are distributed to the target fragment of the distributed graph database to be stored. When the edge data including the direction parameter 'out edge' or 'in edge', edge label, on-line identification of the node and the like of any node are added in the distributed graph database, the on-line identification of the node can be extracted from the edge data, the identification of the target fragment to which the node belongs is obtained according to the on-line identification of the node, and then the edge data is distributed to the target fragment of the distributed graph database for storage according to the identification of the target fragment. When node data and edge data of any node are stored in the distributed graph database, the identification of the target fragment is determined according to the online identification of the node, so that the node data and the edge data of the node can be stored on the same fragment of the distributed graph database.

When the mapping relation between the offline identification and the online identification of the node is stored in the distributed graph database, the identification of the target fragment can be obtained according to the offline identification of the node, and then the mapping relation is stored on the target fragment determined according to the offline identification of the node. That is, in the embodiment of the present disclosure, the mapping relationship between the offline identifier and the online identifier of the node may be stored on different slices with the node data and the edge data of the node.

It may be appreciated that in the embodiment of the present disclosure, the node data and the edge data of the same node may be stored on the same slice determined according to the node identifier of the node, and then, when the relevant data of a certain node is queried in the distributed graph database, for example, the node data of a certain node, or the edge data of a certain node, or the node data and the edge data of a certain node are queried, the identifier of the slice to be queried to which the node to be queried belongs may be obtained according to the node identifier of the node to be queried, and then, according to the identifier of the slice to be queried, the node data, or the edge data, or the node data and the edge data of the node to be queried may be obtained from the slice to be queried.

That is, the data processing method of the distributed graph database provided by the present disclosure may further include the following steps:

acquiring a node identifier of a node to be queried;

acquiring the identification of the to-be-queried fragment to which the node to be queried belongs according to the node identification of the node to be queried;

and acquiring node data and/or edge data of the node to be queried from the fragment to be queried according to the identification of the fragment to be queried.

The node identifier of the node to be queried may be an online identifier of the node to be queried in the distributed graph database.

It should be noted that, in the distributed graph database, a mapping relationship between offline identifiers and online identifiers of each node of the graph entity is stored, when the obtained node identifier of the node to be queried is not the node online identifier, the online identifier of the node to be queried can be obtained according to the mapping relationship between the offline identifier and the online identifier of each node of the graph entity stored in the distributed graph database, and then the identifier of the fragment to be queried to which the node to be queried belongs is obtained according to the online identifier of the node to be queried.

It should be noted that, in the process of obtaining the identifier of the to-be-queried fragment to which the to-be-queried node belongs according to the node identifier of the to-be-queried node, reference may be made to the method for obtaining the identifier of the target fragment to which the first node belongs according to the node identifier of the first node in the above embodiment, which is not described herein.

According to the node identification of the node to be queried, the identification of the fragment to be queried to which the node to be queried belongs is obtained, and then according to the identification of the fragment to be queried, the node data and/or the edge data of the node to be queried are obtained from the fragment to be queried, so that the data related to the node in the distributed graph database is queried, the node data and the edge data of the same node are stored on the same fragment to which the node belongs according to the node identification of the node, and therefore, when the related data of the node to be queried is queried, the node data or the edge data of the node to be queried or the node data and the edge data of the node to be queried can be obtained only by initiating a query request to the fragment to be queried, and the query efficiency of the data in the distributed graph database is improved.

It should be noted that, when node data and edge data of a node are stored on a slice of the distributed graph database, the node data and the edge data may be stored in a key (key field) -value format, where the key-value storage is a storage manner of storing data by using key values, and each key field corresponds to a unique field value.

For example, in the distributed graph database, the mapping relationship between the offline identifier and the online identifier of the node may be stored in the following manner:

key: offline identification of nodes, value: online identification of nodes

That is, for any node, the key field may store an offline identification of the node, and the field value corresponding to the key field may store an online identification of the node.

In the distributed graph database, node data of nodes may be stored in the following manner:

key: on-line identification of nodes, value: node attribute data

That is, for any node, the key field may store an online identifier of the node, and the field value corresponding to the key field may store node attribute data of the node.

In the distributed graph database, the edge data of the nodes may be stored in the following manner:

key: direction parameter_edge tag_node internal identifier, value: edge relationship data

That is, for any node, the key field may store the edge identifier of the node, such as information of a direction parameter, an edge tag, an internal identifier of the node, and the field value corresponding to the key field may store the edge coefficient data of the node.

The data processing method of the distributed graph database provided by the embodiment of the disclosure obtains node data of a first node to be added to the distributed graph database, wherein the node data comprises: the node identifier and the corresponding node attribute data may obtain an identifier of a target fragment to which the first node belongs according to the node identifier, and further distribute the node identifier and the corresponding node attribute data to the target fragment of the distributed graph database for storage according to the identifier of the target fragment, and obtain edge data to be added by the first node in the distributed graph database, where the edge data includes: the edge identifier and the corresponding edge coefficient data can be extracted from the edge identifier, the identifier of the target fragment to which the first node belongs is obtained according to the node identifier, and then the edge identifier and the corresponding edge coefficient data are distributed to the target fragment of the distributed graph database for storage according to the identifier of the target fragment. Therefore, node data to be added to any node of the distributed graph database and edge data to be added of the node in the distributed graph database are stored on the same partition to which the node belongs, the operation amount during data distribution is small, the data distribution time is short, the data distribution cost is low, and the data query efficiency is high.

A data processing apparatus of a distributed graph database provided by the present disclosure is described below with reference to fig. 5.

Fig. 5 is a schematic structural view of a data processing apparatus of a distributed graph database according to a fourth embodiment of the present disclosure.

As shown in fig. 5, a data processing apparatus 500 of a distributed graph database provided in the present disclosure includes: a first acquisition module 501, an extraction module 502, a second acquisition module 503, a first distribution module 504.

The first obtaining module 501 is configured to obtain edge data to be added by a first node in the distributed graph database, where the edge data includes: edge identification and corresponding edge relation coefficient data;

an extracting module 502, configured to extract a node identifier of the first node from the edge identifiers;

a second obtaining module 503, configured to obtain, according to the node identifier, an identifier of a target fragment to which the first node belongs; and

the first distribution module 504 is configured to distribute, according to the identifier of the target tile, the edge identifier and the corresponding edge relationship coefficient data to the target tile of the distributed graph database for storage.

It should be noted that, the data processing device of the distributed graph database provided in this embodiment may execute the data processing method of the distributed graph database described in the foregoing embodiment. The data processing device of the distributed graph database can be electronic equipment or can be configured in the electronic equipment so as to realize data processing of the distributed graph database, reduce the operation amount of data distribution in the distributed graph database, save the time and the cost of data distribution and improve the query efficiency of the data.

It should be noted that the foregoing description of the embodiments of the data processing method of the distributed graph database is also applicable to the data processing apparatus of the distributed graph database provided in the present disclosure, and will not be repeated herein.

According to the data processing device of the distributed graph database, edge data to be added of a first node in the distributed graph database are firstly obtained, wherein the edge data comprise edge identifiers and corresponding edge relationship data, then node identifiers of the first node are extracted from the edge identifiers comprised by the edge data, identifiers of target fragments to which the first node belongs are obtained according to the node identifiers, and then the edge identifiers comprised by the edge data and the corresponding edge relationship data are distributed to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the data query efficiency is improved.

A data processing apparatus of a distributed graph database provided by the present disclosure is described below with reference to fig. 6.

Fig. 6 is a schematic structural view of a data processing apparatus of a distributed graph database according to a fifth embodiment of the present disclosure.

As shown in fig. 6, the data processing apparatus 600 of the distributed graph database may specifically include: a first acquisition module 601, an extraction module 602, a second acquisition module 603, and a first distribution module 604, wherein 601 to 604 in fig. 6 have the same functions and structures as 501 to 504 in fig. 5.

In an exemplary embodiment, the edge identification is stitched from the following parameters: direction parameters, edge labels, and node identification of the first node; and a spacer is disposed between two adjacent parameters, correspondingly, as shown in fig. 6, the extracting module 602 includes:

a first acquiring unit 6021 for acquiring a first spacer ordered at the last in the edge identifier;

a second acquisition unit 6022 for acquiring the content located after the first spacer in the edge identifier; and

a determining unit 6023 for determining the content as a node identification of the first node.

In an exemplary embodiment, as shown in fig. 6, the second obtaining module 603 includes:

a third acquiring unit 6031 for acquiring binary data corresponding to the node identifier;

The fourth obtaining unit 6032 is configured to perform hash operation on the binary data, and obtain an identifier of the target fragment to which the first node belongs.

In an exemplary embodiment, as shown in fig. 6, the apparatus may further include:

a third obtaining module 605 is configured to obtain node data of a first node to be added to the distributed graph database, where the node data includes: node identification and corresponding node attribute data;

a fourth obtaining module 606, configured to obtain, according to the node identifier, an identifier of a target fragment to which the first node belongs;

the second distributing module 607 is configured to distribute, according to the identifier of the target segment, the node identifier and the corresponding node attribute data to the target segment of the distributed graph database for storage;

a fifth obtaining module 608, configured to obtain a node identifier of a node to be queried;

a sixth obtaining module 609, configured to obtain, according to the node identifier of the node to be queried, an identifier of a fragment to be queried to which the node to be queried belongs;

and a seventh obtaining module 610, configured to obtain node data and/or edge data of the node to be queried from the fragment to be queried according to the identifier of the fragment to be queried.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as a data processing method of a distributed graph database. For example, in some embodiments, the data processing method of the distributed graph database may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the data processing method of the distributed graph database described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the data processing method of the distributed graph database in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS (Virtual Private Server ) service. The server may also be a server of a distributed system or a server that incorporates a blockchain.

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as knowledge maps and distributed storage.

It should be noted that artificial intelligence is a subject of research that makes a computer simulate some mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of a person, and has a technology at both hardware and software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises computer vision, voice recognition technology, natural language processing technology, machine learning/deep learning, big data processing technology, knowledge graph technology and other big directions.

According to the technical scheme of the embodiment of the disclosure, first edge data to be added of a first node in a distributed graph database is obtained, wherein the edge data comprises edge identifiers and corresponding edge relationship coefficient data, then node identifiers of the first node are extracted from the edge identifiers included in the edge data, identifiers of target fragments to which the first node belongs are obtained according to the node identifiers, and then the edge identifiers included in the edge data and the corresponding edge relationship coefficient data are distributed to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments. Therefore, the calculation amount of data distribution in the distributed graph database is reduced, the time and the cost of data distribution are saved, and the data query efficiency is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data processing method of a distributed graph database, comprising:

obtaining edge data to be added by a first node in a distributed graph database, wherein the edge data comprises: edge identification and corresponding edge relation coefficient data; the edge mark is obtained by splicing the following parameters: direction parameters, edge labels, and node identification of the first node; and a spacer is arranged between two adjacent parameters;

Extracting node identifiers of the first nodes from the edge identifiers, wherein a first spacer which is sequenced at the last in the edge identifiers is obtained, content which is positioned behind the first spacer in the edge identifiers is obtained, and the content is determined to be the node identifier of the first node;

acquiring the identification of the target fragment to which the first node belongs according to the node identification; and

and distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments.

2. The method of claim 1, wherein the obtaining, according to the node identifier, the identifier of the target slice to which the first node belongs includes:

acquiring binary data corresponding to the node identification;

and carrying out hash operation on the binary data to obtain the identification of the target fragment to which the first node belongs.

3. The method of claim 1, wherein prior to obtaining edge data to be added by the first node in the distributed graph database, further comprising:

obtaining node data of the first node to be added to the distributed graph database, wherein the node data comprises: node identification and corresponding node attribute data;

and distributing the node identification and the corresponding node attribute data to the target fragments of the distributed graph database for storage according to the identifications of the target fragments.

4. A method according to any one of claims 1-3, further comprising:

acquiring a node identifier of a node to be queried;

5. A data processing apparatus of a distributed graph database, comprising:

the first obtaining module is configured to obtain edge data to be added by a first node in the distributed graph database, where the edge data includes: edge identification and corresponding edge relation coefficient data; the edge mark is obtained by splicing the following parameters: direction parameters, edge labels, and node identification of the first node; and a spacer is arranged between two adjacent parameters;

the extraction module is used for extracting the node identification of the first node from the edge identification;

The second acquisition module is used for acquiring the identification of the target fragment to which the first node belongs according to the node identification; and

the first distribution module is used for distributing the edge identifier and the corresponding edge relationship coefficient data to the target fragments of the distributed graph database for storage according to the identifiers of the target fragments;

wherein, the extraction module includes:

a first obtaining unit, configured to obtain a first spacer ordered at the last in the edge identifier;

a second obtaining unit, configured to obtain content located after the first spacer in the edge identifier; and

and the determining unit is used for determining the content as the node identification of the first node.

6. The apparatus of claim 5, wherein the second acquisition module comprises:

a third obtaining unit, configured to obtain binary data corresponding to the node identifier;

and a fourth obtaining unit, configured to perform hash operation on the binary data, and obtain an identifier of a target fragment to which the first node belongs.

7. The apparatus of claim 5, further comprising:

a third obtaining module, configured to obtain node data of the first node to be added to the distributed graph database, where the node data includes: node identification and corresponding node attribute data;

A fourth obtaining module, configured to obtain, according to the node identifier, an identifier of a target fragment to which the first node belongs; and

and the second distributing module is used for distributing the node identification and the corresponding node attribute data to the target fragments of the distributed graph database for storage according to the identifications of the target fragments.

8. The apparatus of any of claims 5-7, further comprising:

a fifth obtaining module, configured to obtain a node identifier of a node to be queried;

a sixth obtaining module, configured to obtain, according to a node identifier of the node to be queried, an identifier of a fragment to be queried to which the node to be queried belongs;

and a seventh acquisition module, configured to acquire node data and/or edge data of the node to be queried from the fragment to be queried according to the identifier of the fragment to be queried.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.