CN116150263B - Distributed graph calculation engine - Google Patents

Distributed graph calculation engine Download PDF

Info

Publication number
CN116150263B
CN116150263B CN202211240196.2A CN202211240196A CN116150263B CN 116150263 B CN116150263 B CN 116150263B CN 202211240196 A CN202211240196 A CN 202211240196A CN 116150263 B CN116150263 B CN 116150263B
Authority
CN
China
Prior art keywords
graph
distributed
node
data
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211240196.2A
Other languages
Chinese (zh)
Other versions
CN116150263A (en
Inventor
孟英谦
彭龙
杜宏博
李胜昌
梁冬
鲁东民
葛晋鹏
郭亚辉
米丽媛
饶雷
张帅
邵鹏志
王乃正
薛行
徐天敕
王嘉岩
随秋林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China North Computer Application Technology Research Institute
Original Assignee
China North Computer Application Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China North Computer Application Technology Research Institute filed Critical China North Computer Application Technology Research Institute
Priority to CN202211240196.2A priority Critical patent/CN116150263B/en
Publication of CN116150263A publication Critical patent/CN116150263A/en
Application granted granted Critical
Publication of CN116150263B publication Critical patent/CN116150263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a distributed graph calculation engine, belongs to the technical field of graph calculation, and solves the defects of the existing distributed graph calculation engine in the aspects of compiling and storing. The distributed graph computation engine includes: the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation; the Cyper compiler is used for realizing grammar and semantic interpretation of a standard OpenCypher language, compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan; the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users; the graph analysis algorithm module is used for constructing a graph mining model; the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language.

Description

Distributed graph calculation engine
Technical Field
The invention relates to the technical field of graph computation, in particular to a distributed graph computation engine.
Background
Graph computation is one enabling technique of artificial intelligence. The basic capabilities of artificial intelligence are roughly divided into three parts, the first part being the Understanding capabilities, the second part being the Reasoning capabilities, and the third part being the Learning capabilities, abbreviated URL (Learning). The map calculation is closely related to the URL, for example, to have an objective, complete and comprehensive knowledge of the whole real world, an understanding capability is required. The graph computing technique can fully characterize all relationships between everything, and fully describe them. Graph computation is regarded by the industry as an important cornerstone of next generation artificial intelligence, which is the key to the transition of artificial intelligence from data-driven perceptual intelligence to cognitive intelligence to understand semantic associations.
At present, the distributed graph calculation engine has some defects in terms of compiling and storing, and the application range of the distributed graph calculation engine is severely limited.
Disclosure of Invention
In view of the above analysis, embodiments of the present invention are directed to a distributed graph computing engine, which is used to solve the drawbacks of the existing distributed graph computing engine in terms of compilation and storage.
The embodiment of the invention provides a distributed graph calculation engine, which comprises the following components:
the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation;
the Cyper compiler is used for realizing grammar and semantic interpretation of the standard OpenCypher language; the method is also used for compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;
the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users;
the graph analysis algorithm module integrates various distributed graph calculation algorithms and deep learning graph algorithms and is used for constructing a graph mining model;
the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language.
Based on the scheme, the invention also makes the following improvements:
further, the distributed graph computing engine further comprises a RestAPI interface module;
the RestAPI interface module provides a standard RESTful interface for obtaining the calculation state of the graph, executing the addition, deletion and modification check of the graph data and constructing a graph algorithm conforming to the service model.
Further, the distributed graph computation engine further comprises a primordial graph storage format module;
the native graph storage format module is used for storing the graph data in the distributed graph database system in an efficient compression format by means of a graph partitioning algorithm.
Further, in the native graph storage format module, the graph partitioning algorithm is LSM Tree.
Further, in the distributed graph storage engine module,
a plurality of masters form a Master Group and are used for being responsible for meta-information management, task scheduling and load balancing functions;
worker provides data processing operations including reading, updating, and deleting of graph data.
Further, the distributed graph storage engine module divides the graph data into a plurality of partitions in the process of data processing operation; the minimum logic storage unit of each workbench is a partition;
and selecting a plurality of works as hosts for each partition, and managing consistency of graph data among a plurality of copies of the partition through a Raft protocol.
Further, in the Cyper compiler, the generating a physical execution plan for execution in a distributed environment according to the distributed logical execution plan includes:
optimizing the distributed logic execution plan according to preset filtering conditions;
and performing physical mapping on the optimized distributed logic execution plan to generate a physical execution plan executed in a distributed environment.
Further, the distributed graph execution engine module adopts a GraphMaster-Slave architecture.
Further, in the GraphMaster-Slave architecture,
the GraphWorker node is managed by a GraphSlave node;
the information flow in the interaction process of the OpenCypher interface of the user and the distributed graph calculation engine is divided into control flow information and data flow information; the control flow information is interacted between the GraphMaster and the GraphSlave nodes; and forwarding the data flow information uploaded by the user without passing through the GraphMaster node, and directly sending the data flow information to the GraphSlave node for processing.
Further, the graphworkbench node is a process pulled up by the GraphSlave node by using a fork function and an exec function;
the responsibilities of the graphworkbench node are as follows:
establishing an upstream-downstream relationship with a corresponding GraphWorker node according to task topology information of the GraphSlave;
executing task information issued by the GraphSlave node, receiving position information of a dynamic link library sent by the GraphSlave node, and calling a dlopen series function in a dynamic link library module to pull up a so file;
receiving data sent by an upstream graphworkbench node, calling a user-defined code to process the data, and sending the processed data to a downstream node or placing the processed data in a local storage;
and reporting the resource use condition and the task execution condition of the node to the GraphSlave node.
Compared with the prior art, the invention has at least one of the following beneficial effects:
according to the distributed graph calculation engine provided by the invention, the compiling mode of the Cyper compiler is improved, the filtering flow is increased, the problem of the traditional distributed graph calculation engine in the aspect of compiling is effectively solved, and the compiling effect is effectively improved. Meanwhile, by improving the architecture of the distributed graph storage engine module, the original graph storage format is changed by means of the graph partitioning algorithm, so that the storage rate of graph data is effectively improved, and the problem of the storage aspect of the existing distributed graph calculation engine is solved.
In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a schematic diagram of a distributed graph computing engine according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a distributed graph storage engine module according to embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of the internal collaboration relationship of the storage layer of the distributed graph storage engine module according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram of a storage structure of an LSM Tree provided in embodiment 1 of the present invention;
fig. 5 is a schematic structural diagram of a distributed graph execution engine module according to embodiment 1 of the present invention;
FIG. 6 is a schematic diagram of another distributed graph computing engine according to embodiment 1 of the present invention;
fig. 7 is a flowchart of a distributed graph computing engine triggering method according to embodiment 2 of the present invention.
Detailed Description
Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.
Example 1
An embodiment 1 of the present invention discloses a distributed graph computing engine, and a schematic structural diagram is shown in fig. 1, including:
the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation;
the Cyper compiler is used for realizing grammar and semantic interpretation of the standard OpenCypher language; the method is also used for compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;
the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users;
the graph analysis algorithm module integrates various distributed graph calculation algorithms and deep learning graph algorithms and is used for constructing a graph mining model;
the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language.
In addition, the distributed graph calculation engine provided by the embodiment further comprises a RestAPI interface module and a protogram storage format module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the RestAPI interface module provides a standard RESTful interface for obtaining the calculation state of the graph, executing the addition, deletion and modification check of the graph data and constructing a graph algorithm conforming to the service model.
And the original graph storage format module is used for storing the graph data in the distributed graph database system in an efficient compression format by means of a graph partitioning algorithm.
Next, the following description is made of each module in the distributed graph computation engine provided in this embodiment:
(1) Distributed graph storage engine module
The method comprises the steps of adopting a plurality of Master-Multi-workbench, wherein a plurality of Master form a Master Group which is used for taking charge of functions such as meta-information management, task scheduling, load balancing and the like; worker acts as the actual storage role for graph data, providing data processing operations including reading, updating (including "writing") and deleting of graph data. The storage engine ensures data consistency and high availability through the Raft protocol. The composition diagram of the distributed graph storage engine module is shown in fig. 2.
To ensure fault tolerance and high availability of the distributed system, additional designs for the Master and the Worker, respectively, are required. For the Master, since the data is reported by the workbench, the HA Group consisting of multiple Master processes can meet the high availability. For a workbench providing a data reading and writing service, failure of a process, a disk or a server can cause that the graph data cannot be read and written, so that the graph data is not available.
To solve this problem, the distributed graph storage engine module divides the graph data into a plurality of partitions, one partition for each Worker's minimum logical storage unit, during the data processing operation. For each partition, several (3 or more) works are chosen as hosts. The consistency of data is managed between multiple copies of the partition through the Raft protocol.
The Raft protocol is a protocol that provides final consistency, and can bring about lower data delay than the strong consistency of HDFS; the Raft protocol is easier to understand and maintain than the Paxos protocol.
The working mechanism of the Raft protocol introduces:
the storage layer receives the read-write request of the message processing module, and sends a reply after the asynchronous processing is completed. The storage layer simultaneously provides a storage interface beyond the normal request mechanism to achieve the underlying optimization that increases the computational speed.
The storage layer internally comprises a multi-copy hot backup function based on a Raft consistency protocol, so that one write request can be applied to a plurality of hosts. When one host fails, the upper message processing module can detect the failure, so that the Master host is switched, and the read-write request processing of the client is not affected.
The storage layer directly controls the reading and writing of the disk through the file system. The storage module has a function of balancing the utilization rate of a plurality of disks, so that the loads of the disks are uniform, and the request processing bottleneck caused by uneven loads is avoided.
A schematic diagram of the internal collaboration relationship of the storage layers of the distributed graph storage engine module is shown in fig. 3. Wherein the arrow indicates the flow of the single host write data. The writing process is taken as an example to describe various interaction processes inside the storage module of the graph processing calculation engine.
First, the upper message processing module transmits data encapsulated as a write event to the GraphDB. In the distributed graph storage engine module, each graph of the database corresponds to one GraphDB instance. In order to implement the disk read-write balancing function mentioned above, each graph is divided into a plurality of graphboards, and the data records are allocated to the corresponding graphboards according to their hash values. Different graphboards of the same drawing may have data storage paths on different disks. Therefore, full utilization of the magnetic disk is realized through the data sub-bucket.
The unit of multi-copy backup in the distributed graph storage engine module is graphboard. In the GraphSard class, a Raft message synchronization mechanism is included. Lift helps the GraphSard synchronize write events received from the GraphDB among the same GraphSard of multiple hosts. The graphboard does not parse the event into the write data immediately after receiving the write event from the GraphDB. The write actions are actually triggered by write events synchronized between clusters by the Raft master.
To ensure that the final state of the multi-copy data is consistent, the graph processing compute engine uses the Raft consistency protocol as a coordination mechanism for the multiple copies and uses the log (log) of Raft to backup the written data for a short period of time.
Raft is a distributed storage consistency algorithm. In a distributed system, in order to prevent server data from being completely unavailable or losing serious consequences of data due to failure of one storage node when the server data is serviced because of only one copy, the data can be stored with a plurality of backup copies which are respectively stored on different storage servers, so that the server data can be provided with services. In this way, if a proper algorithm is provided to ensure that the contents of each server for the same data storage are consistent, and when the storage server of one service fails, the cluster can be switched to other normal servers to provide services with proper logic, so that the quality of the distributed storage service can be ensured in place. Raft is the service for such systems.
Storage consistency can be classified into strong consistency, final consistency, and weak consistency according to how strict the difference is allowed between the same piece of data for each server. According to the "CAP theorem," a distributed system cannot guarantee Consistency, availability, and partition fault tolerance (Partition tolerence) at the same time. However, by way of trade-off, the system can achieve a "BASE" effect: basic available (Basically available), soft state (Soft state), final consistency (Eventually consistent). The final consistency achieved by Raft means that the data on each node eventually reaches a consistent state after a sufficient time has elapsed.
RPC messages between the various machines of the Raft cluster can be divided into two types: an entry RPC (appendenderpc, AE) and a request vote RPC (requestvolte RPC, RV) are added. AE is used by the leader to add entry to the follower. RV is a "candidate" (a third state other than leader, follower, only when present at election) to vote on other pollers.
If the time passes, the player does not detect a periodic heartbeat message from the leader (the leader uses the AE without the actual entry as a heartbeat), and the state becomes a candidate state. At this point, the Raft cluster will begin electing the leader. In Raft, there is a term concept in time, which indicates a leader's dominant period; each term has a unique incremented sequence number called term ID. The leader attaches its term ID to its own AE. When a new candidate is generated, it adds 1 to the term ID of the previous leader as its term ID and attaches the new term ID to the RVs broadcast to all other machines. Any node other than candidate receives an RV with a termID greater than any ID it has seen, replies to the RV and updates the "maximum ID it has seen". If at the same time the candidate described in this RV contains a sufficiently new log (see section below for details), the follow votes for this candidate. Thus, no node will cast two tickets for the same term ID.
If a candidate receives a sufficient number of tickets (the number of tickets plus one ticket can account for the majority of the cluster), it will start sending heartbeats, announce its leader status in this term, and start the service. When one leader fails, there may be a plurality of follower overtime times which are the same, and the time when the RV broadcast is sent out is also approximately the same, so that a sufficient number of tickets cannot be obtained in the new term, so that there is a mechanism of waiting for ticket overtime and random waiting, consistent conflict is avoided, and the leader cannot be selected. Candida will of course see if other Candida announces a winner when waiting for enough tickets. If none occurs, after a timeout period, candidates will start a new term ID again, but will wait randomly for a period of time before broadcasting RV request vote (equivalent to a poll before broadcasting RV, which may vote). Because the random waiting time is long or short, the waiting time is eventually won by the Candida that ends first.
The Raft cluster processes all client requests by the leader in unison, converts the write request to log entry, and then sends the log to the follower with AE. When a Leader creates a log entry, it is attached with two attributes: term ID and log index. the term ID is the ID of the own dominant period, and the ID is used for the log entry of the current term; while log index is also an incremental sequence number, it is continuous throughout the run of the cluster. Across term, log index would increment by 1 on the previous basis instead of zeroing the weight. In this way, considering that the election mechanism ensures that a term ID must correspond to a leader node determined, we can determine a unique log by the term id+log index combination.
After the Leader generates the log of the write operation, the log entries are sent to the various follow places by AE, which causes them to add the log to their own log store in the early-late order. If a plurality of nodes (including the leader itself) of the cluster successfully store a log (the follower replies to its own storage condition), the leader considers that the log and the earlier log are safe to store, replies to the client that the write operation is successful, and informs the cluster that the write operation in all the logs to the leader can be actually performed, and modifies its own storage data.
When the leader goes to the platform, the log storage of the leader is used as the reference, so that other nodes are aligned with the leader. If faster than oneself, cut off much; slower than itself, it is slowly complemented with its own log storage. However, in the process of copying logs, the speeds of the respective follow ers may be greatly different for various reasons. Then if a leader suddenly fails and a replicated, particularly slow, follower elects to be a leader, a large number of write operations may fail. The write operation is considered successful only if it is guaranteed that the log has been copied to a plurality of machines, in order to avoid this. Mention was made earlier in explaining the election mechanism: the RV is to include a log store version message of candidate, i.e., term id+log index of the last log. If the voter's poller finds that the version message of candidate is older than itself, it will refuse to vote for it. Since only the write operations copied to the logs of the majority of nodes are reported as successful, and the majority of tickets must be obtained for election, the finally elected leader must have all the logs of the operations reported as successful by the last leader and will not be lost.
(2) Primordial diagram storage format module
Through the original graph storage format module, graph data can be stored in clusters in a strategy dispersed mode, and the method has good expandability and theoretical capability of storing graphs of any scale.
The working mechanism is as follows:
and the LSM Tree (Log Structured Merge Tree) is used as a storage model of the graph data, so that higher writing speed is realized.
Btre sees a disk as a fixed-size page, which is the smallest unit of reading and writing. One page points to some other pages and forms a tree structure with high fan-out (high degree of outages). Because the data is stored in blocks, when the BTrees add data or delete data, one page is probably not put down, or a plurality of pages are sparse, and page splitting and page merging operations occur at the moment, the method is not suitable for mass graph data storage operation, and LSM Tree is not page-oriented operation, so that the advantage of sequential writing can be better utilized, and writing is generally faster. The storage structure of the LSM Tree is shown in fig. 4, wherein the LSM Tree has the following three important components.
1)MemTable
Memtab is a data structure in memory that is used to hold recently updated data, which will be organized in order according to keys, and LSM tree has no explicit data structure definition on how to organize the data in order specifically, e.g., hbase makes a skip table to guarantee the ordering of keys in memory.
Because data is temporarily stored in the memory, the memory is not reliably stored, and if the power is off, the data is lost, so that the reliability of the data is generally ensured by a WAL (Write-ahead logging) mode.
2)Immutable MemTable
When memtab reaches a certain size, it is converted into Immutable MemTable. Immutable MemTable is an intermediate state that changes the transfer memtab to SSTable. The write operation is handled by a new memtab, and the data update operation is not blocked during the transfer.
3)SSTable(Sorted String Table)
The ordered set of key-value pairs is the data structure of the LSM tree group in disk. To expedite reading of sstables, key lookup can be expedited by building an index of keys and bloom filters.
The LSM Tree (Log-Structured-Merge-Tree) stores all operation records (note operation records) of data insertion, modification, deletion, etc. in the memory just like its name, and when such operation reaches a certain data amount, it is sequentially written to the disk in batch. This is different from the b+ tree, where the update of the b+ tree data would directly modify the corresponding value where the original data is located, but the data update of the LSM number is journaled, when a data update is completed by directly applying an update record. The purpose of this design is to write Immutable MemTable flush to persistent storage continuously for sequential writing without modifying the key in the previous SSTable, ensuring sequential writing.
In this embodiment, the distributed graph computation engine writes data from ordered memtsable in memory to disk to form sst files, different sst files forming logically hierarchical data. Data with lower layers (such as Level 0) is updated compared with data with higher layers (such as Level 2), and as the single sst files are wholly ordered, there is no key range overlap among the sst files, the average search efficiency of the data is O (k×log (n/k)).
As the number of sst files grows, a large number of small files can bring storage pressure and affect the read speed. The LSM Tree model introduces a multi-layer merging concept, and small files are merged downwards layer by layer according to the layers, so that the number of files is reduced on one hand, and a model with lower layers and newer data is still guaranteed on the other hand.
(3) Cyper compiler
In this embodiment, the Cyper compiler provides some extended language compilation capabilities in addition to standard functionality.
In this embodiment, the working process of the Cyper compiler includes the following steps: lexical grammar analysis, semantic analysis, logical execution plan generation, and physical execution plan generation. The concrete explanation is as follows:
lexical analysis is the first stage of compilation, the process of converting opencytoer character sequences into word (Token) sequences, and lexical analyzers typically exist in the form of functions for the parser to call. The parser examines opencytoer grammar by meta information and multi-storage abstract information and constructs an abstract grammar tree composed of the input words.
Semantic analysis is a logical stage of the compilation process that is tasked with contextually relevant property inspection and type inspection of structurally correct source programs. Semantic analysis is to examine the source program for semantic errors and collect type information for the code generation stage. One effort such as semantic analysis is to do a type review, examining whether each operator has an operand allowed by the language specification, and when not conforming to the language specification, the compiler should report an error.
After the semantic interpretation is completed, the OpenCypher operation command can be compiled into a distributed logic execution plan. In addition, in the distributed logic execution plan, code that is not optimized often has poor execution efficiency, and therefore, optimization is required after the distributed logic execution plan is obtained. Specifically, the distributed logic execution plan may be optimized according to preset filtering conditions.
In the present embodiment, the following filtering conditions may be set: deleting common sub-queries (CSEs); filtering unused columns; the unused partitions are filtered. In an implementation, after filtering the unused columns, the unused columns are not read out when the table is scanned (columnprun); the data under the filtered partition need not be read at all (partitionnprune); in addition, other filtering conditions can be preset, particularly, the filtering conditions in the implicit join window are pushed down to the process of table sweeping (PPD) as much as possible, so that the engine can improve the query efficiency through the filtering of the index or the table sweeping; constant propagation is calculated from values that can be precisely determined at compile time, avoiding repeated calculations at run-time.
And performing physical mapping on the optimized distributed logic execution plan to generate a physical execution plan executed in a distributed environment.
(4) Distributed graph execution engine module
In this embodiment, the distributed graph execution engine module is configured to provide a user with real-time graph query and offline graph analysis capabilities, and its computation capability linearly expands with the increase of the number of nodes, so as to support graph analysis of a large number of edge points, and accelerate graph query and analysis tasks by using data locality characteristics.
The distributed graph execution engine module adopts a GraphMaster-Slave architecture, and a structural schematic diagram is shown in FIG. 5 and is described as follows:
the graphworkbench node is managed by the GraphSlave node. The information flow in the interaction process of the user OpenCypher interface and the distributed graph calculation engine is divided into control flow information and data flow information, wherein the control flow information is interacted between the GraphMaster and GraphSlave nodes; and the data flow information uploaded by the user is directly sent to the GraphSlave node for processing without being forwarded by the GraphMaster node.
a) GraphMaster node architecture design
The GraphMaster is a control node of the whole distributed computing system, is responsible for management work of all control information of the whole platform, and is managed by adopting a master-slave node consistency protocol. The GraphMaster node, the Second GraphMaster node and the Third GraphMaster node run on three servers and synchronize control information through the TCP protocol. And detecting abnormality or fault among the three GraphMaster nodes through a heartbeat protocol. When the master GraphMaster node is abnormal, the Second GraphMaster node can immediately detect the abnormality of the master GraphMaster node and timely take over the control work of the master GraphMaster on the whole system, so that the high reliability of the system is ensured, and the problem of single-point fault is avoided.
The GraphMaster node cluster controls all GraphSlave nodes and GraphWorker nodes, a task topological graph required to be executed by a user is generated by reading an execution plan, and the GraphSlave nodes required to be executed and the GraphSlave nodes issued by the appointed task are dynamically generated by a control information data management model and a resource allocation scheduling algorithm model. And the GraphSlave node schedules a GraphWorker node scheduling dynamic link library to acquire the data to be processed and process the data.
The GraphMaster node is used as a control management node of the system, a control information data management model, a resource scheduling and distributing algorithm model and other system core management models are operated, 5 main sub-modules of the GraphMaster node are provided, and 5 sub-modules maintain management operation work of the GraphMaster node, and specifically:
a task management sub-module: and dynamically generating a task execution flow and formulating a task operation strategy through information provided by the XML file management module and the dynamic link library scheduling module.
Resource aware scheduling algorithm submodule: and receiving a task flow generated by the task management module and dynamically generating a task operation strategy by combining the use condition of system resources. The resource sensing scheduling algorithm comprises three core algorithms, a system initializes the resource sensing scheduling algorithm, a resource reconfiguration scheduling algorithm in the system operation and a system disaster recovery scheduling algorithm.
Heartbeat keep-alive submodule: the nodes of the GraphMaster cluster need to be detected regularly to determine whether the nodes are operating normally, and the heartbeat keep-alive protocol detects whether the nodes of the opposite side are abnormal by continuously sending heartbeat messages between the nodes.
The master-slave node error-tolerant algorithm submodule: and running a master-slave node consistency protocol, ensuring state management among the GraphMaster node, the second GraphMaster node and the third GraphMaster node, and ensuring that the system can rapidly schedule the second GraphMaster node to take over system control management when the GraphMaster node is abnormal.
A consistent hash disk storage sub-module: and a disk persistence management module for controlling data such as information data and the like, so that the data can be ensured to rapidly acquire historical operating conditions from a disk when the system is down or the system is restarted, and the system initialization is rapidly completed. Query scheduling of control information data during operation is provided.
b) GraphSlave node architecture design
The main responsibilities of the GraphSlave node include:
and after receiving task information issued by the GraphMaster, pulling up the corresponding GraphWorker node, and sending node online information to the GraphMaster node.
And operating the GraphWorker node to execute a service environment preparation flow, wherein the preparation flow comprises that the GraphSlave dynamically pulls up N GraphWorker nodes, connects upstream and downstream GraphWorkers to form task topology flows, and the GraphWorker pulls up a corresponding dynamic link library, and releases process resources after the task execution is finished.
And a heartbeat keep-alive protocol exists between the GraphMaster node and the GraphSlave node, so that the situation that the opposite side operates abnormally is ensured to be timely known between the GraphMaster node and the GraphSlave node. The GraphSlave also receives task running information of the graphworkbench and feeds the task running information back to the GraphMaster in time.
The GraphSlave counts the resource use condition of the node system, the number of GraphWorker nodes and the resource use condition of each GraphWorker node. And periodically transmits to the GraphMaster node.
Graphworkbench management submodule: responsible for creating or ending graphworkbench computing nodes.
A resource collection sub-module: and collecting the resource usage of all GraphWorks on the GraphSlave node. Timely sending to GraphMaster.
Heartbeat keep-alive module: and sending heartbeat information to the GraphMaster node at regular time, wherein the heartbeat protocol can contain resource information or task information. And receiving and replying the heartbeat sent by the GraphSlave.
A task scheduling sub-module: and receiving task information sent by the GraphMaster, scheduling the GraphWorker pulled up by the node, establishing an upstream-downstream relationship of the GraphWorker node, and executing task flow.
c) GraphWorker architecture design
The graphworkbench node is a process pulled up by the GraphSlave node using fork and exec functions.
The graphworkbench node has the main responsibilities:
and establishing an upstream-downstream relationship with the corresponding GraphWorker node according to task topology information of the GraphSlave.
Executing task information issued by the GraphSlave node, receiving position information of a dynamic link library sent by the GraphSlave node, and calling a dlopen series function in a dynamic link library module to pull up a so file.
And receiving data sent by the upstream GraphWorker node, calling a user-defined code to process the data, and sending the processed data to the downstream node or placing the processed data in a local storage.
And reporting the resource use condition and the task execution condition of the node to the GraphSlave node.
(5) Graph analysis algorithm module
The graph analysis algorithm module integrates internal and external algorithm modules for providing interfaces for data and computation in RDD. The built-in algorithm library comprises PageRank, connected Components, fast-Unfolding and other basic algorithms, and NLP, nlU and a deep learning algorithm are externally arranged to adapt to the graph computing service scene.
The map analysis algorithm integrates a plurality of distributed map calculation algorithms and a deep learning map algorithm, and a map mining model is constructed. The support graph algorithm includes: starNet, page Rank, strong Connected Component, label Propagation, K-core, bow Tie, graph Central, fraud Rank, heavy Edge Detector, motifFinding.
The graph analysis provides a cypher query interface, 2D presents the query results, and provides various page operations for the query results.
(6) OpenCypher interface module
The OpenCypher interface module is used for enabling a user to access the graph processing calculation engine through an extended open-Cypher language, providing a plurality of extension languages besides standard functions, and providing graph function calculation functions, wherein the graph functions comprise basic functions, aggregation functions, mathematical functions, character string operation functions, collection operation functions and crux-cytor internal functions so as to meet the requirements of graph calculation and complex query flows.
Quick access graphics processing capability can be provided to the data provider by which a common query language can be shared with the interfaces of the distributed graph computation engine.
The OpenCypher query statement follows the format schematic as follows:
[MATCH WHERE]
[OPTIONAL MATCH WHERE]
[WITH[ORDER BY][SKIP][LIMIT]]
RETURN[ORDER BY][SKIP][LIMIT];
the existence of the function greatly improves the efficiency of graph calculation and complex query, encapsulates codes which possibly need to be repeatedly executed into the function, and calls the function in a place where the function is needed, so that the code taking can be realized, and more importantly, the consistency of the codes is ensured. The usual basis functions are shown in table 1. The usual graph calculation aggregation function table is shown in table 2.
Table 1 diagram computing basic function table
Table 2 graph computation aggregation function table
Function name Parameters (parameters) Function of
count structure Number of return parameters in result set
min values Minimum value of return parameter in set
max values Maximum value of return parameter in set
sum values Returning the sum of the data in the collection
(7) RestAPI interface module
The method is used for providing a standard RESTful interface to obtain the calculation state of the graph, and can also be used for performing addition, deletion and modification of the graph through JAVA API and constructing a graph algorithm conforming to the service model.
The working process comprises the following steps: the distributed graph computing engine RestAPI interface design is described using the query API as an example.
Request path: api/stiller/cytoer
Request type: POST (Power on test)
Query parameter examples:
{
"cypher_graph":"snap_test",
"cypher_input":"match(a)-[f]-(b)return a,flimit 5;",
"execution_mode":0,
"result_form":0,
"vertex_attr_filter":{
"flag":0,
"filters":[{
"label":"__all__",
"attrs":["uid"]
}]
},
"edge_attr_filter":{
"flag":0,
"filters":[{
"label":"__all__",
"attrs":["uid"]
}]
}
}
format example of attribute filter:
scene description of introducing attribute filter in parameters: in the query result, the point or edge may have a number of attribute values, such that the json format data returned is large, but many times some of the attributes are not of interest to the querying user. Therefore, considering that the attribute which is not concerned by the inquiring user can be filtered out when the data is returned, only the attribute information which is concerned by the user is returned.
Because the data of different labels have different attributes, different filtering conditions are provided for different labels. All labels may also be referred to by the string "__ all __" and attribute filtering may be performed on the data of all labels.
Two forms of attribute filtering are provided, only returning certain attributes and not returning certain attributes.
Another structural schematic diagram of the distributed graph computing engine is shown in fig. 6, and mainly comprises a storage layer, a computing layer and an interface layer. At the storage layer, graphDB (native graph storage) provides the underlying storage with efficient compression capability for graph data with vertex tables, edge tables, and ordered index tables. By means of the graph partitioning algorithm, graph data can be stored in a cluster in a dispersed mode according to a strategy, consistency and expansibility of graph DB partitioning copies are achieved through a distributed storage engine, and graph data storage capacity of any scale is achieved theoretically. At the calculation layer, by means of a distributed graph calculation engine, through a built-in graph algorithm, real-time graph query and offline graph analysis capability can be provided for a user at the same time, and the calculation capability linearly expands along with the increase of the number of nodes, so that graph analysis of massive edge points is supported, and graph query and analysis tasks are accelerated by utilizing data locality characteristics. At the interface layer, openCypher is realized, and besides providing standard functions, some extension languages are provided to meet the requirements of graph calculation and complex query.
In summary, the distributed graph computing engine provided in this embodiment increases the filtering flow by improving the compiling mode of the Cyper compiler, effectively solves the problem of the existing distributed graph computing engine in terms of the compiling performance, and effectively improves the compiling effect. Meanwhile, by improving the architecture of the distributed graph storage engine module, the original graph storage format is changed by means of the graph partitioning algorithm, so that the storage rate of graph data is effectively improved, and the problem of the storage aspect of the existing distributed graph calculation engine is solved.
Example 2
An embodiment 2 of the present invention discloses a triggering method of a distributed graph computing engine, and a flowchart is shown in fig. 7, including the following steps:
step S1: receiving an OpenCypher operation instruction;
step S2: starting a Cypher compiler, performing grammar and semantic interpretation on an OpenCypher operation instruction, compiling the interpreted OpenCypher operation instruction into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;
step S3: registering with a GraphMaster and applying for resources (CPU Core and Memory); the GraphMaster obtains the task to be executed according to the physical execution plan, decomposes the task to be executed into a plurality of primary tasks, and distributes each primary task to different GraphSlave;
step S4: the GraphSlave decomposes the received primary task into a plurality of secondary tasks, distributes each secondary task to different workbench, and executes the corresponding secondary task by the workbench; in this process, the Worker also sends a task monitoring report to the GraphMaster.
Step S5: after all tasks to be executed are completed, resource logout is applied to the GraphMaster, and the next OpenCypher operation instruction is waited to be received.
When the next OpenCypher operation instruction is received, the process jumps to step S1, and steps S1 to S5 are repeatedly executed.
In step S3, graphMaster is also responsible for executing the overall control state of the system.
In the execution process of step S4, the following is also executed simultaneously:
the workbench sends a task monitoring report to the GraphMaster;
storing the graph data in the distributed graph database system in an efficient compressed format by a native graph storage format module in a graph computation engine;
obtaining a graph calculation state, executing addition and deletion and correction of graph data and constructing a graph algorithm conforming to a service model through a standard RESTful interface provided by a RestAPI interface module in a graph calculation engine;
real-time graph query and offline graph analysis services are provided to users through a distributed graph execution engine module in the graph computation engine.
The specific implementation process of the method embodiment of the present invention may be referred to the above system embodiment, and this embodiment is not described herein.
Since the principle of the embodiment is the same as that of the system embodiment, the method also has the corresponding technical effects of the method embodiment.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (6)

1. A distributed graph computation engine, comprising:
the distributed graph storage engine module is used for constructing a distributed graph database system in a 'multi-Master-multi-Worker' mode and is used for managing and controlling graph data and performing data processing operation;
the Cyper compiler is used for realizing grammar and semantic interpretation of the standard OpenCypher language; the method is also used for compiling the interpreted OpenCypher operation command into a distributed logic execution plan, and generating a physical execution plan executed in a distributed environment according to the distributed logic execution plan;
the distributed graph execution engine module is used for providing real-time graph query and offline graph analysis services for users;
the graph analysis algorithm module integrates various distributed graph calculation algorithms and deep learning graph algorithms and is used for constructing a graph mining model;
the OpenCypher interface module is used for enabling a user to access the distributed graph calculation engine through the expanded openCypher language;
in the Cyper compiler, the generating a physical execution plan for execution in a distributed environment according to the distributed logical execution plan includes:
optimizing the distributed logic execution plan according to preset filtering conditions; the filtering conditions include: deleting the public sub-queries; filtering unused columns; filtering unused partitions;
performing physical mapping on the optimized distributed logic execution plan to generate a physical execution plan executed in a distributed environment;
the distributed graph execution engine module adopts a GraphMaster-Slave architecture; in the GraphMaster-Slave architecture,
the GraphWorker node is managed by a GraphSlave node;
the information flow in the interaction process of the OpenCypher interface of the user and the distributed graph calculation engine is divided into control flow information and data flow information; the control flow information is interacted between the GraphMaster and the GraphSlave nodes; the data flow information uploaded by the user is directly sent to the GraphSlave node for processing without being forwarded by the GraphMaster node;
the GraphWorker node is a process pulled up by a GraphSlave node by using a fork function and an exec function;
the responsibilities of the graphworkbench node are as follows:
establishing an upstream-downstream relationship with a corresponding GraphWorker node according to task topology information of the GraphSlave;
executing task information issued by the GraphSlave node, receiving position information of a dynamic link library sent by the GraphSlave node, and calling a dlopen series function in a dynamic link library module to pull up a so file;
receiving data sent by an upstream graphworkbench node, calling a user-defined code to process the data, and sending the processed data to a downstream node or placing the processed data in a local storage;
and reporting the resource use condition and the task execution condition of the node to the GraphSlave node.
2. The distributed graph computation engine of claim 1, further comprising a RestAPI interface module;
the RestAPI interface module provides a standard RESTful interface for obtaining the calculation state of the graph, executing the addition, deletion and modification check of the graph data and constructing a graph algorithm conforming to the service model.
3. The distributed graph computation engine of claim 2, further comprising a native graph storage format module;
the native graph storage format module is used for storing the graph data in the distributed graph database system in an efficient compression format by means of a graph partitioning algorithm.
4. A distributed graph computation engine according to claim 3, wherein in the native graph storage format module, the graph partitioning algorithm is LSM Tree.
5. The distributed graph computation engine of any of claims 1-4, wherein in the distributed graph storage engine module,
a plurality of masters form a Master Group and are used for being responsible for meta-information management, task scheduling and load balancing functions;
worker provides data processing operations including reading, updating, and deleting of graph data.
6. The distributed graph computation engine of claim 5, wherein the distributed graph storage engine module divides the graph data into a plurality of partitions during data processing operations; the minimum logic storage unit of each workbench is a partition;
and selecting a plurality of works as hosts for each partition, and managing consistency of graph data among a plurality of copies of the partition through a Raft protocol.
CN202211240196.2A 2022-10-11 2022-10-11 Distributed graph calculation engine Active CN116150263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211240196.2A CN116150263B (en) 2022-10-11 2022-10-11 Distributed graph calculation engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211240196.2A CN116150263B (en) 2022-10-11 2022-10-11 Distributed graph calculation engine

Publications (2)

Publication Number Publication Date
CN116150263A CN116150263A (en) 2023-05-23
CN116150263B true CN116150263B (en) 2023-07-25

Family

ID=86339567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211240196.2A Active CN116150263B (en) 2022-10-11 2022-10-11 Distributed graph calculation engine

Country Status (1)

Country Link
CN (1) CN116150263B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117972154A (en) * 2024-03-27 2024-05-03 支付宝(杭州)信息技术有限公司 Graph data processing method and graph calculation engine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063486A (en) * 2014-07-03 2014-09-24 四川中亚联邦科技有限公司 Big data distributed storage method and system
CN106681820A (en) * 2016-12-30 2017-05-17 西北工业大学 Message combination based extensible big data computing method
CN112667644A (en) * 2021-01-20 2021-04-16 浪潮云信息技术股份公司 Hybrid index memory database storage engine management method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072194B (en) * 2015-08-27 2018-05-29 南京大学 A kind of storage data in distributed file system repair structure and restorative procedure
WO2018170276A2 (en) * 2017-03-15 2018-09-20 Fauna, Inc. Methods and systems for a database
CN113420517B (en) * 2021-05-28 2023-01-06 清华大学 FPGA virtualization hardware system stack design oriented to cloud deep learning reasoning
CN113900788A (en) * 2021-10-20 2022-01-07 咪咕文化科技有限公司 Distributed work scheduling method and distributed workflow engine system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063486A (en) * 2014-07-03 2014-09-24 四川中亚联邦科技有限公司 Big data distributed storage method and system
CN106681820A (en) * 2016-12-30 2017-05-17 西北工业大学 Message combination based extensible big data computing method
CN112667644A (en) * 2021-01-20 2021-04-16 浪潮云信息技术股份公司 Hybrid index memory database storage engine management method

Also Published As

Publication number Publication date
CN116150263A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
KR102307371B1 (en) Data replication and data failover within the database system
CN115562676B (en) Triggering method of graph calculation engine
CN110196871B (en) Data warehousing method and system
Baker et al. Megastore: Providing scalable, highly available storage for interactive services.
Bailis et al. Bolt-on causal consistency
Dean et al. MapReduce: Simplified data processing on large clusters
US7779298B2 (en) Distributed job manager recovery
US5495606A (en) System for parallel processing of complex read-only database queries using master and slave central processor complexes
CN111949454B (en) Database system based on micro-service component and related method
US20180004777A1 (en) Data distribution across nodes of a distributed database base system
GB2472620A (en) Distributed transaction processing and committal by a transaction manager
CN108369601A (en) Promotion attribute in relational structure data
CN108369599A (en) Duplication control between redundant data center
CN108431807A (en) The duplication of structured data in partition data memory space
US20230110826A1 (en) Log execution method and apparatus, computer device and storage medium
CN116150263B (en) Distributed graph calculation engine
CN116009428A (en) Industrial data monitoring system and method based on stream computing engine and medium
Margara et al. A model and survey of distributed data-intensive systems
Leibert et al. Automatic management of partitioned, replicated search services
Jacobs et al. Bad to the bone: Big active data at its core
CN116010452A (en) Industrial data processing system and method based on stream type calculation engine and medium
Vilaça et al. On the expressiveness and trade-offs of large scale tuple stores
Höger Fault tolerance in parallel data processing systems
Petrescu Replication in Raft vs Apache Zookeeper
Orensa A design framework for efficient distributed analytics on structured big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant