CN116150263B

CN116150263B - Distributed graph calculation engine

Info

Publication number: CN116150263B
Application number: CN202211240196.2A
Authority: CN
Inventors: 孟英谦; 彭龙; 杜宏博; 李胜昌; 梁冬; 鲁东民; 葛晋鹏; 郭亚辉; 米丽媛; 饶雷; 张帅; 邵鹏志; 王乃正; 薛行; 徐天敕; 王嘉岩; 随秋林
Original assignee: China North Computer Application Technology Research Institute
Current assignee: China North Computer Application Technology Research Institute
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2023-07-25
Anticipated expiration: 2042-10-11
Also published as: CN116150263A

Abstract

The invention relates to a distributed graph calculation engine, belongs to the technical field of graph calculation, and solves the defects of the existing distributed graph calculation engine in the aspects of compiling and storing. The distributed graph computation engine includes: the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation; the Cyper compiler is used for realizing grammar and semantic interpretation of a standard OpenCypher language, compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan; the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users; the graph analysis algorithm module is used for constructing a graph mining model; the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language.

Description

Distributed graph calculation engine

Technical Field

The invention relates to the technical field of graph computation, in particular to a distributed graph computation engine.

Background

Graph computation is one enabling technique of artificial intelligence. The basic capabilities of artificial intelligence are roughly divided into three parts, the first part being the Understanding capabilities, the second part being the Reasoning capabilities, and the third part being the Learning capabilities, abbreviated URL (Learning). The map calculation is closely related to the URL, for example, to have an objective, complete and comprehensive knowledge of the whole real world, an understanding capability is required. The graph computing technique can fully characterize all relationships between everything, and fully describe them. Graph computation is regarded by the industry as an important cornerstone of next generation artificial intelligence, which is the key to the transition of artificial intelligence from data-driven perceptual intelligence to cognitive intelligence to understand semantic associations.

At present, the distributed graph calculation engine has some defects in terms of compiling and storing, and the application range of the distributed graph calculation engine is severely limited.

Disclosure of Invention

In view of the above analysis, embodiments of the present invention are directed to a distributed graph computing engine, which is used to solve the drawbacks of the existing distributed graph computing engine in terms of compilation and storage.

The embodiment of the invention provides a distributed graph calculation engine, which comprises the following components:

the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation;

the Cyper compiler is used for realizing grammar and semantic interpretation of the standard OpenCypher language; the method is also used for compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;

the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users;

the graph analysis algorithm module integrates various distributed graph calculation algorithms and deep learning graph algorithms and is used for constructing a graph mining model;

the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language.

Based on the scheme, the invention also makes the following improvements:

further, the distributed graph computing engine further comprises a RestAPI interface module;

the RestAPI interface module provides a standard RESTful interface for obtaining the calculation state of the graph, executing the addition, deletion and modification check of the graph data and constructing a graph algorithm conforming to the service model.

Further, the distributed graph computation engine further comprises a primordial graph storage format module;

the native graph storage format module is used for storing the graph data in the distributed graph database system in an efficient compression format by means of a graph partitioning algorithm.

Further, in the native graph storage format module, the graph partitioning algorithm is LSM Tree.

Further, in the distributed graph storage engine module,

a plurality of masters form a Master Group and are used for being responsible for meta-information management, task scheduling and load balancing functions;

worker provides data processing operations including reading, updating, and deleting of graph data.

Further, the distributed graph storage engine module divides the graph data into a plurality of partitions in the process of data processing operation; the minimum logic storage unit of each workbench is a partition;

and selecting a plurality of works as hosts for each partition, and managing consistency of graph data among a plurality of copies of the partition through a Raft protocol.

Further, in the Cyper compiler, the generating a physical execution plan for execution in a distributed environment according to the distributed logical execution plan includes:

optimizing the distributed logic execution plan according to preset filtering conditions;

and performing physical mapping on the optimized distributed logic execution plan to generate a physical execution plan executed in a distributed environment.

Further, the distributed graph execution engine module adopts a GraphMaster-Slave architecture.

Further, in the GraphMaster-Slave architecture,

the GraphWorker node is managed by a GraphSlave node;

the information flow in the interaction process of the OpenCypher interface of the user and the distributed graph calculation engine is divided into control flow information and data flow information; the control flow information is interacted between the GraphMaster and the GraphSlave nodes; and forwarding the data flow information uploaded by the user without passing through the GraphMaster node, and directly sending the data flow information to the GraphSlave node for processing.

Further, the graphworkbench node is a process pulled up by the GraphSlave node by using a fork function and an exec function;

the responsibilities of the graphworkbench node are as follows:

establishing an upstream-downstream relationship with a corresponding GraphWorker node according to task topology information of the GraphSlave;

executing task information issued by the GraphSlave node, receiving position information of a dynamic link library sent by the GraphSlave node, and calling a dlopen series function in a dynamic link library module to pull up a so file;

receiving data sent by an upstream graphworkbench node, calling a user-defined code to process the data, and sending the processed data to a downstream node or placing the processed data in a local storage;

and reporting the resource use condition and the task execution condition of the node to the GraphSlave node.

Compared with the prior art, the invention has at least one of the following beneficial effects:

according to the distributed graph calculation engine provided by the invention, the compiling mode of the Cyper compiler is improved, the filtering flow is increased, the problem of the traditional distributed graph calculation engine in the aspect of compiling is effectively solved, and the compiling effect is effectively improved. Meanwhile, by improving the architecture of the distributed graph storage engine module, the original graph storage format is changed by means of the graph partitioning algorithm, so that the storage rate of graph data is effectively improved, and the problem of the storage aspect of the existing distributed graph calculation engine is solved.

In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

FIG. 1 is a schematic diagram of a distributed graph computing engine according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of a distributed graph storage engine module according to embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of the internal collaboration relationship of the storage layer of the distributed graph storage engine module according to embodiment 1 of the present invention;

fig. 4 is a schematic diagram of a storage structure of an LSM Tree provided in embodiment 1 of the present invention;

fig. 5 is a schematic structural diagram of a distributed graph execution engine module according to embodiment 1 of the present invention;

FIG. 6 is a schematic diagram of another distributed graph computing engine according to embodiment 1 of the present invention;

fig. 7 is a flowchart of a distributed graph computing engine triggering method according to embodiment 2 of the present invention.

Detailed Description

Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.

Example 1

An embodiment 1 of the present invention discloses a distributed graph computing engine, and a schematic structural diagram is shown in fig. 1, including:

In addition, the distributed graph calculation engine provided by the embodiment further comprises a RestAPI interface module and a protogram storage format module; wherein, the liquid crystal display device comprises a liquid crystal display device,

And the original graph storage format module is used for storing the graph data in the distributed graph database system in an efficient compression format by means of a graph partitioning algorithm.

Next, the following description is made of each module in the distributed graph computation engine provided in this embodiment:

(1) Distributed graph storage engine module

The method comprises the steps of adopting a plurality of Master-Multi-workbench, wherein a plurality of Master form a Master Group which is used for taking charge of functions such as meta-information management, task scheduling, load balancing and the like; worker acts as the actual storage role for graph data, providing data processing operations including reading, updating (including "writing") and deleting of graph data. The storage engine ensures data consistency and high availability through the Raft protocol. The composition diagram of the distributed graph storage engine module is shown in fig. 2.

To ensure fault tolerance and high availability of the distributed system, additional designs for the Master and the Worker, respectively, are required. For the Master, since the data is reported by the workbench, the HA Group consisting of multiple Master processes can meet the high availability. For a workbench providing a data reading and writing service, failure of a process, a disk or a server can cause that the graph data cannot be read and written, so that the graph data is not available.

To solve this problem, the distributed graph storage engine module divides the graph data into a plurality of partitions, one partition for each Worker's minimum logical storage unit, during the data processing operation. For each partition, several (3 or more) works are chosen as hosts. The consistency of data is managed between multiple copies of the partition through the Raft protocol.

The Raft protocol is a protocol that provides final consistency, and can bring about lower data delay than the strong consistency of HDFS; the Raft protocol is easier to understand and maintain than the Paxos protocol.

The working mechanism of the Raft protocol introduces:

the storage layer receives the read-write request of the message processing module, and sends a reply after the asynchronous processing is completed. The storage layer simultaneously provides a storage interface beyond the normal request mechanism to achieve the underlying optimization that increases the computational speed.

The storage layer internally comprises a multi-copy hot backup function based on a Raft consistency protocol, so that one write request can be applied to a plurality of hosts. When one host fails, the upper message processing module can detect the failure, so that the Master host is switched, and the read-write request processing of the client is not affected.

The storage layer directly controls the reading and writing of the disk through the file system. The storage module has a function of balancing the utilization rate of a plurality of disks, so that the loads of the disks are uniform, and the request processing bottleneck caused by uneven loads is avoided.

A schematic diagram of the internal collaboration relationship of the storage layers of the distributed graph storage engine module is shown in fig. 3. Wherein the arrow indicates the flow of the single host write data. The writing process is taken as an example to describe various interaction processes inside the storage module of the graph processing calculation engine.

First, the upper message processing module transmits data encapsulated as a write event to the GraphDB. In the distributed graph storage engine module, each graph of the database corresponds to one GraphDB instance. In order to implement the disk read-write balancing function mentioned above, each graph is divided into a plurality of graphboards, and the data records are allocated to the corresponding graphboards according to their hash values. Different graphboards of the same drawing may have data storage paths on different disks. Therefore, full utilization of the magnetic disk is realized through the data sub-bucket.

The unit of multi-copy backup in the distributed graph storage engine module is graphboard. In the GraphSard class, a Raft message synchronization mechanism is included. Lift helps the GraphSard synchronize write events received from the GraphDB among the same GraphSard of multiple hosts. The graphboard does not parse the event into the write data immediately after receiving the write event from the GraphDB. The write actions are actually triggered by write events synchronized between clusters by the Raft master.

To ensure that the final state of the multi-copy data is consistent, the graph processing compute engine uses the Raft consistency protocol as a coordination mechanism for the multiple copies and uses the log (log) of Raft to backup the written data for a short period of time.

Raft is a distributed storage consistency algorithm. In a distributed system, in order to prevent server data from being completely unavailable or losing serious consequences of data due to failure of one storage node when the server data is serviced because of only one copy, the data can be stored with a plurality of backup copies which are respectively stored on different storage servers, so that the server data can be provided with services. In this way, if a proper algorithm is provided to ensure that the contents of each server for the same data storage are consistent, and when the storage server of one service fails, the cluster can be switched to other normal servers to provide services with proper logic, so that the quality of the distributed storage service can be ensured in place. Raft is the service for such systems.

Storage consistency can be classified into strong consistency, final consistency, and weak consistency according to how strict the difference is allowed between the same piece of data for each server. According to the "CAP theorem," a distributed system cannot guarantee Consistency, availability, and partition fault tolerance (Partition tolerence) at the same time. However, by way of trade-off, the system can achieve a "BASE" effect: basic available (Basically available), soft state (Soft state), final consistency (Eventually consistent). The final consistency achieved by Raft means that the data on each node eventually reaches a consistent state after a sufficient time has elapsed.

RPC messages between the various machines of the Raft cluster can be divided into two types: an entry RPC (appendenderpc, AE) and a request vote RPC (requestvolte RPC, RV) are added. AE is used by the leader to add entry to the follower. RV is a "candidate" (a third state other than leader, follower, only when present at election) to vote on other pollers.

If the time passes, the player does not detect a periodic heartbeat message from the leader (the leader uses the AE without the actual entry as a heartbeat), and the state becomes a candidate state. At this point, the Raft cluster will begin electing the leader. In Raft, there is a term concept in time, which indicates a leader's dominant period; each term has a unique incremented sequence number called term ID. The leader attaches its term ID to its own AE. When a new candidate is generated, it adds 1 to the term ID of the previous leader as its term ID and attaches the new term ID to the RVs broadcast to all other machines. Any node other than candidate receives an RV with a termID greater than any ID it has seen, replies to the RV and updates the "maximum ID it has seen". If at the same time the candidate described in this RV contains a sufficiently new log (see section below for details), the follow votes for this candidate. Thus, no node will cast two tickets for the same term ID.

If a candidate receives a sufficient number of tickets (the number of tickets plus one ticket can account for the majority of the cluster), it will start sending heartbeats, announce its leader status in this term, and start the service. When one leader fails, there may be a plurality of follower overtime times which are the same, and the time when the RV broadcast is sent out is also approximately the same, so that a sufficient number of tickets cannot be obtained in the new term, so that there is a mechanism of waiting for ticket overtime and random waiting, consistent conflict is avoided, and the leader cannot be selected. Candida will of course see if other Candida announces a winner when waiting for enough tickets. If none occurs, after a timeout period, candidates will start a new term ID again, but will wait randomly for a period of time before broadcasting RV request vote (equivalent to a poll before broadcasting RV, which may vote). Because the random waiting time is long or short, the waiting time is eventually won by the Candida that ends first.

The Raft cluster processes all client requests by the leader in unison, converts the write request to log entry, and then sends the log to the follower with AE. When a Leader creates a log entry, it is attached with two attributes: term ID and log index. the term ID is the ID of the own dominant period, and the ID is used for the log entry of the current term; while log index is also an incremental sequence number, it is continuous throughout the run of the cluster. Across term, log index would increment by 1 on the previous basis instead of zeroing the weight. In this way, considering that the election mechanism ensures that a term ID must correspond to a leader node determined, we can determine a unique log by the term id+log index combination.

After the Leader generates the log of the write operation, the log entries are sent to the various follow places by AE, which causes them to add the log to their own log store in the early-late order. If a plurality of nodes (including the leader itself) of the cluster successfully store a log (the follower replies to its own storage condition), the leader considers that the log and the earlier log are safe to store, replies to the client that the write operation is successful, and informs the cluster that the write operation in all the logs to the leader can be actually performed, and modifies its own storage data.

When the leader goes to the platform, the log storage of the leader is used as the reference, so that other nodes are aligned with the leader. If faster than oneself, cut off much; slower than itself, it is slowly complemented with its own log storage. However, in the process of copying logs, the speeds of the respective follow ers may be greatly different for various reasons. Then if a leader suddenly fails and a replicated, particularly slow, follower elects to be a leader, a large number of write operations may fail. The write operation is considered successful only if it is guaranteed that the log has been copied to a plurality of machines, in order to avoid this. Mention was made earlier in explaining the election mechanism: the RV is to include a log store version message of candidate, i.e., term id+log index of the last log. If the voter's poller finds that the version message of candidate is older than itself, it will refuse to vote for it. Since only the write operations copied to the logs of the majority of nodes are reported as successful, and the majority of tickets must be obtained for election, the finally elected leader must have all the logs of the operations reported as successful by the last leader and will not be lost.

(2) Primordial diagram storage format module

Through the original graph storage format module, graph data can be stored in clusters in a strategy dispersed mode, and the method has good expandability and theoretical capability of storing graphs of any scale.

The working mechanism is as follows:

and the LSM Tree (Log Structured Merge Tree) is used as a storage model of the graph data, so that higher writing speed is realized.

Btre sees a disk as a fixed-size page, which is the smallest unit of reading and writing. One page points to some other pages and forms a tree structure with high fan-out (high degree of outages). Because the data is stored in blocks, when the BTrees add data or delete data, one page is probably not put down, or a plurality of pages are sparse, and page splitting and page merging operations occur at the moment, the method is not suitable for mass graph data storage operation, and LSM Tree is not page-oriented operation, so that the advantage of sequential writing can be better utilized, and writing is generally faster. The storage structure of the LSM Tree is shown in fig. 4, wherein the LSM Tree has the following three important components.

1)MemTable

Memtab is a data structure in memory that is used to hold recently updated data, which will be organized in order according to keys, and LSM tree has no explicit data structure definition on how to organize the data in order specifically, e.g., hbase makes a skip table to guarantee the ordering of keys in memory.

Because data is temporarily stored in the memory, the memory is not reliably stored, and if the power is off, the data is lost, so that the reliability of the data is generally ensured by a WAL (Write-ahead logging) mode.

2)Immutable MemTable

When memtab reaches a certain size, it is converted into Immutable MemTable. Immutable MemTable is an intermediate state that changes the transfer memtab to SSTable. The write operation is handled by a new memtab, and the data update operation is not blocked during the transfer.

3)SSTable(Sorted String Table)

The ordered set of key-value pairs is the data structure of the LSM tree group in disk. To expedite reading of sstables, key lookup can be expedited by building an index of keys and bloom filters.

The LSM Tree (Log-Structured-Merge-Tree) stores all operation records (note operation records) of data insertion, modification, deletion, etc. in the memory just like its name, and when such operation reaches a certain data amount, it is sequentially written to the disk in batch. This is different from the b+ tree, where the update of the b+ tree data would directly modify the corresponding value where the original data is located, but the data update of the LSM number is journaled, when a data update is completed by directly applying an update record. The purpose of this design is to write Immutable MemTable flush to persistent storage continuously for sequential writing without modifying the key in the previous SSTable, ensuring sequential writing.

In this embodiment, the distributed graph computation engine writes data from ordered memtsable in memory to disk to form sst files, different sst files forming logically hierarchical data. Data with lower layers (such as Level 0) is updated compared with data with higher layers (such as Level 2), and as the single sst files are wholly ordered, there is no key range overlap among the sst files, the average search efficiency of the data is O (k×log (n/k)).

As the number of sst files grows, a large number of small files can bring storage pressure and affect the read speed. The LSM Tree model introduces a multi-layer merging concept, and small files are merged downwards layer by layer according to the layers, so that the number of files is reduced on one hand, and a model with lower layers and newer data is still guaranteed on the other hand.

(3) Cyper compiler

In this embodiment, the Cyper compiler provides some extended language compilation capabilities in addition to standard functionality.

In this embodiment, the working process of the Cyper compiler includes the following steps: lexical grammar analysis, semantic analysis, logical execution plan generation, and physical execution plan generation. The concrete explanation is as follows:

lexical analysis is the first stage of compilation, the process of converting opencytoer character sequences into word (Token) sequences, and lexical analyzers typically exist in the form of functions for the parser to call. The parser examines opencytoer grammar by meta information and multi-storage abstract information and constructs an abstract grammar tree composed of the input words.

Semantic analysis is a logical stage of the compilation process that is tasked with contextually relevant property inspection and type inspection of structurally correct source programs. Semantic analysis is to examine the source program for semantic errors and collect type information for the code generation stage. One effort such as semantic analysis is to do a type review, examining whether each operator has an operand allowed by the language specification, and when not conforming to the language specification, the compiler should report an error.

After the semantic interpretation is completed, the OpenCypher operation command can be compiled into a distributed logic execution plan. In addition, in the distributed logic execution plan, code that is not optimized often has poor execution efficiency, and therefore, optimization is required after the distributed logic execution plan is obtained. Specifically, the distributed logic execution plan may be optimized according to preset filtering conditions.

In the present embodiment, the following filtering conditions may be set: deleting common sub-queries (CSEs); filtering unused columns; the unused partitions are filtered. In an implementation, after filtering the unused columns, the unused columns are not read out when the table is scanned (columnprun); the data under the filtered partition need not be read at all (partitionnprune); in addition, other filtering conditions can be preset, particularly, the filtering conditions in the implicit join window are pushed down to the process of table sweeping (PPD) as much as possible, so that the engine can improve the query efficiency through the filtering of the index or the table sweeping; constant propagation is calculated from values that can be precisely determined at compile time, avoiding repeated calculations at run-time.

(4) Distributed graph execution engine module

In this embodiment, the distributed graph execution engine module is configured to provide a user with real-time graph query and offline graph analysis capabilities, and its computation capability linearly expands with the increase of the number of nodes, so as to support graph analysis of a large number of edge points, and accelerate graph query and analysis tasks by using data locality characteristics.

The distributed graph execution engine module adopts a GraphMaster-Slave architecture, and a structural schematic diagram is shown in FIG. 5 and is described as follows:

the graphworkbench node is managed by the GraphSlave node. The information flow in the interaction process of the user OpenCypher interface and the distributed graph calculation engine is divided into control flow information and data flow information, wherein the control flow information is interacted between the GraphMaster and GraphSlave nodes; and the data flow information uploaded by the user is directly sent to the GraphSlave node for processing without being forwarded by the GraphMaster node.

a) GraphMaster node architecture design

The GraphMaster is a control node of the whole distributed computing system, is responsible for management work of all control information of the whole platform, and is managed by adopting a master-slave node consistency protocol. The GraphMaster node, the Second GraphMaster node and the Third GraphMaster node run on three servers and synchronize control information through the TCP protocol. And detecting abnormality or fault among the three GraphMaster nodes through a heartbeat protocol. When the master GraphMaster node is abnormal, the Second GraphMaster node can immediately detect the abnormality of the master GraphMaster node and timely take over the control work of the master GraphMaster on the whole system, so that the high reliability of the system is ensured, and the problem of single-point fault is avoided.

The GraphMaster node cluster controls all GraphSlave nodes and GraphWorker nodes, a task topological graph required to be executed by a user is generated by reading an execution plan, and the GraphSlave nodes required to be executed and the GraphSlave nodes issued by the appointed task are dynamically generated by a control information data management model and a resource allocation scheduling algorithm model. And the GraphSlave node schedules a GraphWorker node scheduling dynamic link library to acquire the data to be processed and process the data.

The GraphMaster node is used as a control management node of the system, a control information data management model, a resource scheduling and distributing algorithm model and other system core management models are operated, 5 main sub-modules of the GraphMaster node are provided, and 5 sub-modules maintain management operation work of the GraphMaster node, and specifically:

a task management sub-module: and dynamically generating a task execution flow and formulating a task operation strategy through information provided by the XML file management module and the dynamic link library scheduling module.

Resource aware scheduling algorithm submodule: and receiving a task flow generated by the task management module and dynamically generating a task operation strategy by combining the use condition of system resources. The resource sensing scheduling algorithm comprises three core algorithms, a system initializes the resource sensing scheduling algorithm, a resource reconfiguration scheduling algorithm in the system operation and a system disaster recovery scheduling algorithm.

Heartbeat keep-alive submodule: the nodes of the GraphMaster cluster need to be detected regularly to determine whether the nodes are operating normally, and the heartbeat keep-alive protocol detects whether the nodes of the opposite side are abnormal by continuously sending heartbeat messages between the nodes.

The master-slave node error-tolerant algorithm submodule: and running a master-slave node consistency protocol, ensuring state management among the GraphMaster node, the second GraphMaster node and the third GraphMaster node, and ensuring that the system can rapidly schedule the second GraphMaster node to take over system control management when the GraphMaster node is abnormal.

A consistent hash disk storage sub-module: and a disk persistence management module for controlling data such as information data and the like, so that the data can be ensured to rapidly acquire historical operating conditions from a disk when the system is down or the system is restarted, and the system initialization is rapidly completed. Query scheduling of control information data during operation is provided.

b) GraphSlave node architecture design

The main responsibilities of the GraphSlave node include:

and after receiving task information issued by the GraphMaster, pulling up the corresponding GraphWorker node, and sending node online information to the GraphMaster node.

And operating the GraphWorker node to execute a service environment preparation flow, wherein the preparation flow comprises that the GraphSlave dynamically pulls up N GraphWorker nodes, connects upstream and downstream GraphWorkers to form task topology flows, and the GraphWorker pulls up a corresponding dynamic link library, and releases process resources after the task execution is finished.

And a heartbeat keep-alive protocol exists between the GraphMaster node and the GraphSlave node, so that the situation that the opposite side operates abnormally is ensured to be timely known between the GraphMaster node and the GraphSlave node. The GraphSlave also receives task running information of the graphworkbench and feeds the task running information back to the GraphMaster in time.

The GraphSlave counts the resource use condition of the node system, the number of GraphWorker nodes and the resource use condition of each GraphWorker node. And periodically transmits to the GraphMaster node.

Graphworkbench management submodule: responsible for creating or ending graphworkbench computing nodes.

A resource collection sub-module: and collecting the resource usage of all GraphWorks on the GraphSlave node. Timely sending to GraphMaster.

Heartbeat keep-alive module: and sending heartbeat information to the GraphMaster node at regular time, wherein the heartbeat protocol can contain resource information or task information. And receiving and replying the heartbeat sent by the GraphSlave.

A task scheduling sub-module: and receiving task information sent by the GraphMaster, scheduling the GraphWorker pulled up by the node, establishing an upstream-downstream relationship of the GraphWorker node, and executing task flow.

c) GraphWorker architecture design

The graphworkbench node is a process pulled up by the GraphSlave node using fork and exec functions.

The graphworkbench node has the main responsibilities:

and establishing an upstream-downstream relationship with the corresponding GraphWorker node according to task topology information of the GraphSlave.

Executing task information issued by the GraphSlave node, receiving position information of a dynamic link library sent by the GraphSlave node, and calling a dlopen series function in a dynamic link library module to pull up a so file.

And receiving data sent by the upstream GraphWorker node, calling a user-defined code to process the data, and sending the processed data to the downstream node or placing the processed data in a local storage.

(5) Graph analysis algorithm module

The graph analysis algorithm module integrates internal and external algorithm modules for providing interfaces for data and computation in RDD. The built-in algorithm library comprises PageRank, connected Components, fast-Unfolding and other basic algorithms, and NLP, nlU and a deep learning algorithm are externally arranged to adapt to the graph computing service scene.

The map analysis algorithm integrates a plurality of distributed map calculation algorithms and a deep learning map algorithm, and a map mining model is constructed. The support graph algorithm includes: starNet, page Rank, strong Connected Component, label Propagation, K-core, bow Tie, graph Central, fraud Rank, heavy Edge Detector, motifFinding.

The graph analysis provides a cypher query interface, 2D presents the query results, and provides various page operations for the query results.

(6) OpenCypher interface module

The OpenCypher interface module is used for enabling a user to access the graph processing calculation engine through an extended open-Cypher language, providing a plurality of extension languages besides standard functions, and providing graph function calculation functions, wherein the graph functions comprise basic functions, aggregation functions, mathematical functions, character string operation functions, collection operation functions and crux-cytor internal functions so as to meet the requirements of graph calculation and complex query flows.

Quick access graphics processing capability can be provided to the data provider by which a common query language can be shared with the interfaces of the distributed graph computation engine.

The OpenCypher query statement follows the format schematic as follows:

[MATCH WHERE]

[OPTIONAL MATCH WHERE]

[WITH[ORDER BY][SKIP][LIMIT]]

RETURN[ORDER BY][SKIP][LIMIT]；

the existence of the function greatly improves the efficiency of graph calculation and complex query, encapsulates codes which possibly need to be repeatedly executed into the function, and calls the function in a place where the function is needed, so that the code taking can be realized, and more importantly, the consistency of the codes is ensured. The usual basis functions are shown in table 1. The usual graph calculation aggregation function table is shown in table 2.

Table 1 diagram computing basic function table

Table 2 graph computation aggregation function table

Function name	Parameters (parameters)	Function of
			count	structure	Number of return parameters in result set
min	values	Minimum value of return parameter in set
			max	values	Maximum value of return parameter in set
sum	values	Returning the sum of the data in the collection

(7) RestAPI interface module

The method is used for providing a standard RESTful interface to obtain the calculation state of the graph, and can also be used for performing addition, deletion and modification of the graph through JAVA API and constructing a graph algorithm conforming to the service model.

The working process comprises the following steps: the distributed graph computing engine RestAPI interface design is described using the query API as an example.

Request path: api/stiller/cytoer

Request type: POST (Power on test)

Query parameter examples:

{

"cypher_graph":"snap_test",

"cypher_input":"match(a)-[f]-(b)return a,flimit 5；",

"execution_mode":0,

"result_form":0,

"vertex_attr_filter":{

"flag":0,

"filters":[{

"label":"__all__",

"attrs":["uid"]

}]

},

"edge_attr_filter":{

"flag":0,

"filters":[{

"label":"__all__",

"attrs":["uid"]

}]

}

format example of attribute filter:

scene description of introducing attribute filter in parameters: in the query result, the point or edge may have a number of attribute values, such that the json format data returned is large, but many times some of the attributes are not of interest to the querying user. Therefore, considering that the attribute which is not concerned by the inquiring user can be filtered out when the data is returned, only the attribute information which is concerned by the user is returned.

Because the data of different labels have different attributes, different filtering conditions are provided for different labels. All labels may also be referred to by the string "__ all __" and attribute filtering may be performed on the data of all labels.

Two forms of attribute filtering are provided, only returning certain attributes and not returning certain attributes.

Another structural schematic diagram of the distributed graph computing engine is shown in fig. 6, and mainly comprises a storage layer, a computing layer and an interface layer. At the storage layer, graphDB (native graph storage) provides the underlying storage with efficient compression capability for graph data with vertex tables, edge tables, and ordered index tables. By means of the graph partitioning algorithm, graph data can be stored in a cluster in a dispersed mode according to a strategy, consistency and expansibility of graph DB partitioning copies are achieved through a distributed storage engine, and graph data storage capacity of any scale is achieved theoretically. At the calculation layer, by means of a distributed graph calculation engine, through a built-in graph algorithm, real-time graph query and offline graph analysis capability can be provided for a user at the same time, and the calculation capability linearly expands along with the increase of the number of nodes, so that graph analysis of massive edge points is supported, and graph query and analysis tasks are accelerated by utilizing data locality characteristics. At the interface layer, openCypher is realized, and besides providing standard functions, some extension languages are provided to meet the requirements of graph calculation and complex query.

In summary, the distributed graph computing engine provided in this embodiment increases the filtering flow by improving the compiling mode of the Cyper compiler, effectively solves the problem of the existing distributed graph computing engine in terms of the compiling performance, and effectively improves the compiling effect. Meanwhile, by improving the architecture of the distributed graph storage engine module, the original graph storage format is changed by means of the graph partitioning algorithm, so that the storage rate of graph data is effectively improved, and the problem of the storage aspect of the existing distributed graph calculation engine is solved.

Example 2

An embodiment 2 of the present invention discloses a triggering method of a distributed graph computing engine, and a flowchart is shown in fig. 7, including the following steps:

step S1: receiving an OpenCypher operation instruction;

step S2: starting a Cypher compiler, performing grammar and semantic interpretation on an OpenCypher operation instruction, compiling the interpreted OpenCypher operation instruction into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;

step S3: registering with a GraphMaster and applying for resources (CPU Core and Memory); the GraphMaster obtains the task to be executed according to the physical execution plan, decomposes the task to be executed into a plurality of primary tasks, and distributes each primary task to different GraphSlave;

step S4: the GraphSlave decomposes the received primary task into a plurality of secondary tasks, distributes each secondary task to different workbench, and executes the corresponding secondary task by the workbench; in this process, the Worker also sends a task monitoring report to the GraphMaster.

Step S5: after all tasks to be executed are completed, resource logout is applied to the GraphMaster, and the next OpenCypher operation instruction is waited to be received.

When the next OpenCypher operation instruction is received, the process jumps to step S1, and steps S1 to S5 are repeatedly executed.

In step S3, graphMaster is also responsible for executing the overall control state of the system.

In the execution process of step S4, the following is also executed simultaneously:

the workbench sends a task monitoring report to the GraphMaster;

storing the graph data in the distributed graph database system in an efficient compressed format by a native graph storage format module in a graph computation engine;

obtaining a graph calculation state, executing addition and deletion and correction of graph data and constructing a graph algorithm conforming to a service model through a standard RESTful interface provided by a RestAPI interface module in a graph calculation engine;

real-time graph query and offline graph analysis services are provided to users through a distributed graph execution engine module in the graph computation engine.

The specific implementation process of the method embodiment of the present invention may be referred to the above system embodiment, and this embodiment is not described herein.

Since the principle of the embodiment is the same as that of the system embodiment, the method also has the corresponding technical effects of the method embodiment.

Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims

1. A distributed graph computation engine, comprising:

the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language;

in the Cyper compiler, the generating a physical execution plan for execution in a distributed environment according to the distributed logical execution plan includes:

optimizing the distributed logic execution plan according to preset filtering conditions; the filtering conditions include: deleting the public sub-queries; filtering unused columns; filtering unused partitions;

performing physical mapping on the optimized distributed logic execution plan to generate a physical execution plan executed in a distributed environment;

the distributed graph execution engine module adopts a GraphMaster-Slave architecture; in the GraphMaster-Slave architecture,

the GraphWorker node is managed by a GraphSlave node;

the information flow in the interaction process of the OpenCypher interface of the user and the distributed graph calculation engine is divided into control flow information and data flow information; the control flow information is interacted between the GraphMaster and the GraphSlave nodes; the data flow information uploaded by the user is directly sent to the GraphSlave node for processing without being forwarded by the GraphMaster node;

the GraphWorker node is a process pulled up by a GraphSlave node by using a fork function and an exec function;

the responsibilities of the graphworkbench node are as follows:

2. The distributed graph computation engine of claim 1, further comprising a RestAPI interface module;

3. The distributed graph computation engine of claim 2, further comprising a native graph storage format module;

4. A distributed graph computation engine according to claim 3, wherein in the native graph storage format module, the graph partitioning algorithm is LSM Tree.

5. The distributed graph computation engine of any of claims 1-4, wherein in the distributed graph storage engine module,

6. The distributed graph computation engine of claim 5, wherein the distributed graph storage engine module divides the graph data into a plurality of partitions during data processing operations; the minimum logic storage unit of each workbench is a partition;