CN115544172A - Method and system for synchronizing data among clusters of one master and multiple slaves in real time - Google Patents

Method and system for synchronizing data among clusters of one master and multiple slaves in real time Download PDF

Info

Publication number
CN115544172A
CN115544172A CN202211498391.5A CN202211498391A CN115544172A CN 115544172 A CN115544172 A CN 115544172A CN 202211498391 A CN202211498391 A CN 202211498391A CN 115544172 A CN115544172 A CN 115544172A
Authority
CN
China
Prior art keywords
data
cluster
node
master
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211498391.5A
Other languages
Chinese (zh)
Inventor
岳通
王玉珏
吴敏
叶小萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ouruozhi Technology Co ltd
Original Assignee
Hangzhou Ouruozhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ouruozhi Technology Co ltd filed Critical Hangzhou Ouruozhi Technology Co ltd
Priority to CN202211498391.5A priority Critical patent/CN115544172A/en
Publication of CN115544172A publication Critical patent/CN115544172A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Abstract

The application relates to a method and a system for real-time synchronization of data among clusters of a master and a plurality of slaves, wherein the method comprises the steps of inter-cluster metadata synchronization of the master and the plurality of slaves and inter-cluster attribute data synchronization of the master and the plurality of slaves; sending the data to be synchronized on the master cluster to the pipeline nodes of each slave cluster through the meta-listener nodes and/or the storage listener nodes; and then the data are processed by the pipeline nodes and then are sent to corresponding slave cluster nodes, so that the data synchronization among one master cluster and multiple slave clusters is completed. By the method and the device, the problem of how to carry out real-time data synchronization of one master and multiple slaves in the graph database is solved, real-time synchronization of relationship network data changing in real time in the graph database among clusters of one master and multiple slaves is realized, and data consistency is effectively guaranteed.

Description

Method and system for synchronizing data among clusters of one master and multiple slaves in real time
Technical Field
The present application relates to the field of graph databases, and in particular, to a method and system for real-time synchronization of data among clusters of a master and multiple slaves.
Background
With the rapid development of big data and artificial intelligence, a very large scale relational network is gradually widely used in the fields of social recommendation, wind control, internet of things, block chains, security and the like, and a distributed graph database is used as one of the technical cornerstones of all the applications, and the data amount needing to be processed increases in a geometric shape. For graph database clusters containing billions of vertices and trillion edges, a backup cluster must be constructed to ensure high availability of data in real time, and real-time synchronization of data between the primary and backup clusters is ensured.
Because the data of the relational network is changed in real time, how to synchronize the real-time data of a master and multiple slaves (a master cluster and multiple slave clusters) among the clusters with huge data volume is a problem which each graph database has to face, both the synchronization of the full data volume and the real-time synchronization of the incremental data.
At present, no effective solution is provided for the problem of how to perform real-time data synchronization of a master and multiple slaves in a graph database in the related art.
Disclosure of Invention
The embodiment of the application provides a method and a system for data real-time synchronization among clusters of one master and multiple slaves, so as to at least solve the problem of how to perform real-time data synchronization of one master and multiple slaves in a graph database in the related art.
In a first aspect, an embodiment of the present application provides a method for real-time synchronization of inter-cluster data of a master multiple slaves, where the method includes inter-cluster metadata synchronization of a master multiple slaves and inter-cluster attribute data synchronization of a master multiple slaves;
the inter-cluster metadata synchronization of the master multi-slave includes:
the method comprises the following steps that a main cluster node is a metadata management node, and a meta-listener node is started on the main cluster node to receive synchronous information sent by a leader copy in a corresponding data fragment of the metadata management node;
if the synchronous information is a pre-written log, executing first synchronous preprocessing, and if the synchronous information is snapshot data, executing second synchronous preprocessing;
periodically traversing meta-listener node information in a graph space directory, and sending the data after synchronous preprocessing to the pipeline nodes of each slave cluster through the meta-listener nodes;
and processing the data through the pipeline node and then sending the processed data to a metadata management node corresponding to the slave cluster.
In some embodiments, the inter-cluster attribute data synchronization of a master multi-slave includes:
the method comprises the following steps that a main cluster node is a storage node, a storage listener node is started on a main cluster, a learner copy is created, and a data fragment of the storage node is added into a raft group;
transmitting pre-written logs stored in a leader copy of the data segment to the learner copy based on a consistency principle of raft;
regularly traversing the information of the storage listener nodes under the learner copy, and sending the logs to the pipeline nodes of each slave cluster through the storage listener nodes;
and processing the data through the pipeline node and then sending the processed data to the storage nodes of the corresponding slave cluster.
In some embodiments, if the synchronization information is a pre-written log, performing a first synchronization pre-processing includes:
if the synchronous information is a pre-written log, analyzing to obtain a graph space ID corresponding to the pre-written log, and writing the pre-written log into a pre-written log file corresponding to a graph space according to the graph space ID.
In some embodiments, if the synchronization information is snapshot data, the performing of the second synchronization preprocessing includes:
if the synchronous information is snapshot data, the pre-written log files of all the graph spaces are emptied, the snapshot data are analyzed to obtain corresponding graph space IDs, and then the snapshot data are written into the pre-written log files of the corresponding graph spaces according to the graph space IDs.
In some embodiments, sending the log by the storage listener node onto the pipe nodes of each slave cluster comprises:
judging whether the current synchronized log is in a pre-written log file of the learner copy;
if yes, directly reading the pre-written log through the storage listener node, and sending the pre-written log to a pipeline node of a slave cluster;
if not, the snapshot data of the leader copy is pulled through the storage listener nodes and is sent to the pipeline nodes of the slave cluster.
In some embodiments, before the data is processed by the pipe node and sent to the corresponding slave cluster node, the method further comprises:
and receiving the data through the pipeline node, and judging whether the data is legal or not according to the graph space ID and the log ID to be synchronized on the slave cluster.
In some embodiments, processing the data by the pipe node and sending the processed data to the corresponding slave cluster node comprises:
traversing through the pipeline nodes to obtain the graph space directories to be synchronized on the slave clusters, wherein each graph space in the graph space directories generates a task to be put into a task management queue;
during the execution of a task, generating a plurality of subtasks according to the information of the data fragment corresponding to the task, wherein one data fragment corresponds to one subtask;
and according to the parallelism, executing the subtasks in parallel until all the tasks in the task management queue are executed, namely, sending the data to the corresponding nodes of the slave cluster.
In some embodiments, the graph space is a basic unit of data synchronization between the master cluster and the slave cluster, and the graph space required to perform data synchronization is readable and writable in the master cluster and is read only in the slave cluster.
In some embodiments, the data slices of the metadata management node store metadata of the graph space; the data fragment of the storage node stores attribute data of the graph space.
In a second aspect, an embodiment of the present application provides a system for real-time synchronization of data among clusters of a master and multiple slaves, where the system includes a master cluster module and a slave cluster module;
the main cluster module is used for starting a meta-listener node on a main cluster to receive synchronous information sent by a leader copy in a data fragment corresponding to the meta-data management node when the main cluster node is the meta-data management node; if the synchronous information is a pre-written log, executing first synchronous preprocessing, and if the synchronous information is snapshot data, executing second synchronous preprocessing;
the main cluster module is used for periodically traversing meta-listener node information under a graph space directory and sending the data after synchronous preprocessing to the pipeline nodes of each slave cluster through the meta-listener nodes;
and the slave cluster module is used for processing the data through the pipeline node and then sending the processed data to the corresponding slave cluster node.
Compared with the related art, the method and the system for real-time synchronization of the inter-cluster data of the master and the multiple slaves are provided by the embodiment of the application, wherein the method comprises inter-cluster metadata synchronization of the master and the multiple slaves and inter-cluster attribute data synchronization of the master and the multiple slaves; sending the data to be synchronized on the master cluster to the pipeline nodes of each slave cluster through the meta-listener nodes and/or the storage listener nodes; and then the data are processed by the pipeline nodes and then sent to the corresponding slave cluster nodes to complete the data synchronization among the clusters of one master and multiple slaves, thereby solving the problem of how to carry out the real-time data synchronization of one master and multiple slaves in the graph database, realizing the real-time synchronization of the relational network data changing in real time in the graph database among the clusters of one master and multiple slaves, and effectively ensuring the data consistency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an architecture of a Nebula Graph cluster according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of a method for synchronizing data between clusters of a master and multiple slaves according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating inter-cluster metadata synchronization for a master multiple slaves according to an embodiment of the present application;
FIG. 4 is a flow diagram illustrating synchronization of inter-cluster attribute data of a master multiple slaves according to an embodiment of the present application;
FIG. 5 is a block diagram of a master-to-slave inter-cluster data synchronization system according to an embodiment of the present application;
fig. 6 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Description of the drawings: 51. a primary cluster module; 52. and a slave cluster module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (including a single reference) are to be construed in a non-limiting sense as indicating either the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, "a and/or B" may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The embodiments of the present invention relate to the following terms:
in the Nebula Graph distributed Graph database, the core elements forming the attribute Graph comprise two elements: points (vertex) and edges (edge).
Point (vertex): a point consists of a tag (tag) representing the type of point and attributes corresponding to the tag representing one or more attributes owned by the tag. There may be at least one type of a point, i.e., a tag, and there may be multiple types. tag has a set of corresponding attributes called schema.
Edge (edge): the relationship of the labels of the two points. An edge here refers to a directed edge. An edge consists of a type and an attribute. Type to identify a specific edge, positive numbers to indicate an edge, and negative numbers to indicate an edge. One set of attributes of edge is called schema.
Index (index): the index is a structure for sorting according to the attribute values of one or more columns of the tag or the edge, and the index can be used for quickly accessing specific data in the corresponding tag or edge.
Drawing space (space): graph spaces are independent entities, and tag, edge, tag index and edge index must depend on a certain space to exist. There may be any number of tags, edges, and their indices under a graph space.
Data fragmentation (partition): in a physical storage partition of the Nebula Graph, one space can have a plurality of data fragments, each data fragment has a plurality of copies, such as a leader (leader), a follower (follower) and a learner (leaner), and is distributed on different nodes, and the consistency of the copies of the data fragment is ensured through a raft distributed protocol.
And (3) raft: the method is a decentralized and highly available consensus algorithm and is widely used in engineering. The leader in the raft can be read and written; the follower is currently read only; learner is the same as follower but cannot participate in leader's voting.
Metadata management node (meta): a node that manages metadata.
Storage node (storage): a node storing data.
Compute node (graph): a stateless node for computing.
Listener node (listener): nodes where there is a leaner in the raft group of each partition. The listener node of meta metadata is called meta listener (meta listener node), and the listener node of storage attribute data is called storage listener (storage listener node).
Pipe node (drain): on the slave cluster, a node for inter-cluster data synchronization.
Nebula Graph cluster: the unit which is composed of the computing node, the metadata management node and the storage node and provides service to the outside in a unified mode.
Fig. 1 is a schematic diagram of an architecture of a Nebula Graph cluster according to an embodiment of the present application, and as shown in fig. 1, a simpler Nebula Graph cluster is composed of 1 computing node (Graph), 3 metadata management nodes (meta), and 3 storage nodes (storage).
Wherein, the metadata is stored on the data slice 0 (partition 0) of the meta node, and 3 copies in the raft group ensure high availability of the metadata data, so there are 3 meta nodes.
The attribute data is stored on a storage node, and 3 data fragments (partitions) are provided, each partition has 3 copies, and each storage node has a leader of the partition (the leader of the partition 1 is on the storage node 1, the leader of the partition 2 is on the storage node 2, and the leader of the partition 3 is on the storage node 3). Copies of the same partition on different machines constitute a raft group.
The basic unit of data synchronization between the master cluster and each slave cluster is a graph space (space). If the master cluster contains the space A and the space B, the data of the space A can be synchronized to the slave clusters, and when the space is taken as a unit, the master cluster does not need to know the distribution condition of each partition of the space in the slave clusters. For a certain space to be synchronized, the number of partitions, the number of machines and the number of copies in the two clusters may be different.
And controlling the read-write mode of the space to be synchronized in the master-slave cluster by setting a variable of the space level. The space to be synchronized in the master cluster can be read and written at the console end, and the space in the slave cluster can be read only at the console end. At this time, the space in the slave cluster can only write data in a data synchronization mode.
Therefore, if the synchronization relationship from the master cluster to a certain space of each slave cluster is established, any write of the space in the master cluster, including the write of metadata and data, can be synchronized to each slave cluster in real time. Therefore, real-time synchronization of data among clusters is a real-time dynamic process for capturing changed data.
Furthermore, patent application No. 202210267788.7 discloses a method for supporting real-time synchronization of data between clusters of a master and a slave within the same space, different spaces of the master cluster being able to synchronize to different single slave clusters. On the basis of the patent, the invention further realizes the technical scheme of supporting the real-time synchronization of data among clusters of a master and multiple slaves in the same space. Fig. 2 is a schematic flowchart of a method for synchronizing data between clusters of one master and multiple slaves according to an embodiment of the present application.
The method in the embodiment of the application comprises the steps of inter-cluster metadata synchronization of a master and a plurality of slaves and inter-cluster attribute data synchronization of a master and a plurality of slaves; fig. 3 is a schematic flowchart of inter-cluster metadata synchronization of a master and multiple slaves according to an embodiment of the present application, and as shown in fig. 3, the inter-cluster metadata synchronization of a master and multiple slaves includes the following steps:
step S302, the main cluster node is a metadata management node (meta), and a meta listener node (meta listener) is started on the main cluster node to receive synchronization information sent by a leader copy (meta leader) in a data fragment corresponding to the metadata management node;
it should be noted that, after the meta folder is started, the meta folder determines whether to send the wal log or send snapshot data (snapshot data) according to its own pre-written log (wal log) and the wal log that the meta folder last received.
Step S304, if the synchronous information is a pre-written log, executing a first synchronous preprocessing, and if the synchronous information is snapshot data, executing a second synchronous preprocessing;
specifically, if the synchronization information is a pre-written log, a graph space ID corresponding to the pre-written log is obtained through analysis, and then the pre-written log is written into a pre-written log file corresponding to a graph space according to the graph space ID.
Preferably, for meta folder, if we receive the wal log of meta leader, we first write the wal log into the wal file (pre-written log file). The wal file is then read, the space Id of each wal log is analyzed starting from the next piece of the wal log (noted as last application log Id) applied last time, then the wal log is written into the wal file of the corresponding space, and then the last application log Id is updated. Until all wal logs have been read and processed.
Specifically, if the synchronization information is snapshot data, the pre-written log files of each graph space are emptied first, the snapshot data is analyzed to obtain a corresponding graph space ID, and then the snapshot data is written into the pre-written log files of the corresponding graph space according to the graph space ID.
Preferably, for meta folder, if received is snapshot data of meta leader, meta folder first empties all wal files including the wal file of each space. And then analyzing each piece of snapshot data, analyzing the space id, and writing the data into a wal file corresponding to the space in a wal log mode according to the space id.
Step S306, periodically traversing meta-listener node information in the graph space directory, and sending the data after synchronous preprocessing to the pipeline nodes of each slave cluster through the meta-listener nodes;
preferably, the background thread of the meta folder will periodically traverse each space directory.
A space directory contains the wave file of the space and a plurality of listener information. Each listener information contains a listenerID, and a last wal log Id that the listener successfully sent to the slave cluster drainer.
For each space, all listeners under the space directory are traversed. For each listener, reading the wal file of the space where the listener is located, and starting from the wallog of the next piece of last wal log Id, sending the wal log to the corresponding slave cluster drainer. And updating the last wal log ID until all logs of the wal file of the space are sent. Then the next listener is started.
Taking the space1 of meta listener in fig. 3 as an example, the space1 contains a plurality of listener information (meta listener node information) such as listener1 and listener 2. The background thread processes listener1 first. Read last wal log Id of listener1, then read the wal file of space1 starting with the next wal log of last wal log Id. Until the wal file of space1 is read. The lower last wal log Id is updated. Then processes listenn 2, followed by all listener and space in sequence.
And step S308, processing the data through the pipeline nodes and then sending the processed data to the metadata management nodes corresponding to the slave clusters.
The method in the embodiment of the application comprises the steps of inter-cluster metadata synchronization of a master and a plurality of slaves and inter-cluster attribute data synchronization of the master and the plurality of slaves; fig. 4 is a schematic flowchart of synchronization of attribute data between clusters of one master and multiple slaves according to an embodiment of the present application, and as shown in fig. 4, synchronization of attribute data between clusters of one master and multiple slaves includes the following steps:
step S402, the main cluster node is a storage node, a storage listener node is started on the main cluster, a learner copy is created, and the learner copy is added into a raft group where the data fragments of the storage node are located;
it should be noted that, after the storage list is started, a part leader (part is an abbreviation of partition, that is, an abbreviation of data fragment) of the storage node sends the wal log according to its own wal log information and the wal log information of the leader copy of the part on the storage list.
Step S404, based on the consistency principle of raft, transmitting the pre-written log stored in the leader copy of the data fragment to the learner copy;
specifically, part leader (the leader copy of the data fragment) does not send snapshot data to part leader (the leader copy of the data fragment), and the part leader is only consistent with the wal file of the part leader copy based on the raft consistency principle. For example, at the start of storage list, the wall log Id of part leader is from 100 to 1000. Then the wal log for leanner for part is also from 100-1000.
Step S406, periodically traversing the information of the storage listener nodes under the learner copy, and sending the logs to the pipeline nodes of each slave cluster through the storage listener nodes;
specifically, the node information of the storage monitor under the learner copy is traversed periodically, and whether the current synchronous log is in a pre-written log file of the learner copy is judged; if yes, directly reading the pre-written log through the storage listener node, and sending the pre-written log to the pipeline node of the slave cluster; if not, the snapshot data of the leader copy is pulled through the storage listener node and is sent to the pipeline nodes of the slave cluster.
Preferably, there is one background thread for each leaner copy of part on the storage list.
The background thread will periodically go through the listener information (storage listener node information) under the part leaner, and handle the task of synchronizing data for each listener information, each listener information containing listenerId and last successfully applied logId (last wal log Id).
For each listener information, different operations are performed according to the fact that the next wal log of the last successfully applied logId is not in the wal file of the part leaner, respectively:
if the next wal log of the logId successfully applied in the last step of the listener information is in the wal file, reading the wal log, sending the wal log to the corresponding slave cluster drainer node, and updating the last wal log Id. And then reading the next wal log continuously until all the wal logs in the wal file are read.
And if the next wal log of the logId which is successfully applied by the last listener information is not in the wal file and does not reach the last wal log of the wal file, pulling the snapshot data by the listener deparater and sending the snapshot data to the corresponding slave cluster tractor node. Updating own last wal log Id according to commit log Id (transaction status record) of part leader when snapshot is pulled. Then, the next piece of data of last wal log Id is continuously read from the wal file until all the wal log in the wal file is read.
And step S408, processing the data through the pipeline nodes and then sending the processed data to the storage nodes corresponding to the slave clusters.
Through steps S302 to S306 and steps S402 to S408 in the embodiment of the present application, the problem of how to perform real-time data synchronization of a master and multiple slaves in a graph database is solved, real-time synchronization of relationship network data changing in real time in the graph database between clusters of a master and multiple slaves is realized, and data consistency is effectively ensured.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the present application provides a system for real-time synchronization of data among clusters of a master and multiple slaves, fig. 5 is a block diagram of a structure of a system for real-time synchronization of data among clusters of a master and multiple slaves according to the embodiment of the present application, and as shown in fig. 5, the system includes a master cluster module 51 and a slave cluster module 52;
the main cluster module 51 is configured to, when a main cluster node is a metadata management node, start a meta-listener node on the main cluster to receive synchronization information sent by a leader copy in a data fragment corresponding to the metadata management node; if the synchronous information is a pre-written log, executing first synchronous preprocessing, and if the synchronous information is snapshot data, executing second synchronous preprocessing;
the main cluster module 51 is configured to periodically traverse meta-listener node information in the graph space directory, and send the data after synchronous preprocessing to the pipeline nodes of each slave cluster through the meta-listener nodes;
and the slave cluster module 52 is used for processing the data through the pipeline nodes and sending the processed data to the corresponding slave cluster nodes.
Through the master cluster module 51 and the slave cluster module 52 in the embodiment of the application, the problem of how to perform real-time data synchronization of one master and multiple slaves in a graph database is solved, real-time synchronization of relational network data changing in real time in the graph database among clusters of one master and multiple slaves is realized, and the consistency of the data is effectively ensured.
It should be noted that the above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device, comprising a memory having a computer program stored therein and a processor configured to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the method for real-time synchronization of data among clusters in a graph database in the foregoing embodiments, embodiments of the present application may provide a storage medium to implement. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements a method for real-time synchronization of inter-cluster data in a graph database according to any of the above embodiments.
In one embodiment, a computer device is provided, which may be a terminal. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for real-time synchronization of data between clusters in a graph database. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In an embodiment, fig. 6 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 6, there is provided an electronic device, which may be a server, and an internal structure diagram of which may be as shown in fig. 6. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a method for synchronizing data among clusters in a database in real time, and the database is used for storing the data.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (10)

1. A method for real-time synchronization of data among clusters of a master and a plurality of slaves, the method comprises the steps of inter-cluster metadata synchronization of the master and the plurality of slaves and inter-cluster attribute data synchronization of the master and the plurality of slaves;
the inter-cluster metadata synchronization of the master multi-slave includes:
the method comprises the following steps that a main cluster node is a metadata management node, and a meta-listener node is started on the main cluster to receive synchronous information sent by a leader copy in a data fragment corresponding to the metadata management node;
if the synchronous information is a pre-written log, executing first synchronous preprocessing, and if the synchronous information is snapshot data, executing second synchronous preprocessing;
periodically traversing meta-listener node information in a graph space directory, and sending the synchronously preprocessed data to the pipeline nodes of each slave cluster through the meta-listener nodes;
and processing the data through the pipeline node and then sending the processed data to the metadata management node corresponding to the slave cluster.
2. The method of claim 1, wherein the inter-cluster attribute data synchronization of the master multi-slave includes:
the method comprises the following steps that a main cluster node is a storage node, a storage listener node is started on a main cluster, a learner copy is created, and a data fragment of the storage node is added into a raft group;
transmitting pre-written logs stored in the leader copy of the data segment to the learner copy based on a consistency principle of raft;
regularly traversing the information of the storage listener nodes under the learner copy, and sending the logs to the pipeline nodes of each slave cluster through the storage listener nodes;
and processing the data through the pipeline node and then sending the processed data to the storage nodes corresponding to the slave clusters.
3. The method of claim 1, wherein if the synchronization information is a pre-written log, performing a first synchronization pre-processing comprises:
if the synchronous information is a pre-written log, analyzing to obtain a graph space ID corresponding to the pre-written log, and writing the pre-written log into a pre-written log file corresponding to a graph space according to the graph space ID.
4. The method of claim 1, wherein if the synchronization information is snapshot data, performing a second synchronization pre-processing comprises:
if the synchronous information is snapshot data, the pre-written log files of all the graph spaces are emptied, the snapshot data are analyzed to obtain corresponding graph space IDs, and then the snapshot data are written into the pre-written log files of the corresponding graph spaces according to the graph space IDs.
5. The method of claim 2, wherein sending the log by the storage listener node onto a pipe node of each slave cluster comprises:
judging whether the current synchronized log is in a pre-written log file of the learner copy;
if yes, directly reading the pre-written log through the storage listener node, and sending the pre-written log to a pipeline node of a slave cluster;
and if not, pulling the snapshot data of the leader copy through the storage listener node, and sending the snapshot data to the pipeline nodes of the slave cluster.
6. The method according to claim 1 or 2, wherein before sending the processed data to a corresponding slave cluster node through the pipe node, the method further comprises:
and receiving the data through the pipeline nodes, and judging whether the data is legal or not according to the graph space ID and the log ID to be synchronized on the slave cluster.
7. The method of claim 1 or 2, wherein processing the data by the pipe node and sending the processed data to a corresponding slave cluster node comprises:
traversing through the pipeline nodes to obtain the graph space directories to be synchronized on the slave clusters, wherein each graph space in the graph space directories generates a task to be put into a task management queue;
during the execution of a task, generating a plurality of subtasks according to the information of the data fragment corresponding to the task, wherein one data fragment corresponds to one subtask;
and according to the parallelism, executing the subtasks in parallel until all the tasks in the task management queue are executed, namely, sending the data to the corresponding nodes of the slave cluster.
8. The method of claim 1, wherein the graph space is a basic unit of data synchronization between the master cluster and the slave cluster, and the graph space required for data synchronization is readable and writable in the master cluster and is read only in the slave cluster.
9. The method of claim 1, wherein the data slices of the metadata management node store metadata of the graph space; the data fragments of the storage nodes store attribute data of the graph space.
10. A system for real-time synchronization of data among clusters of a master and a plurality of slaves, wherein the system is applied to the method of any one of claims 1 to 9, and comprises a master cluster module and a slave cluster module;
the main cluster module is used for starting a meta-listener node on a main cluster to receive synchronous information sent by a leader copy in a data fragment corresponding to the meta-data management node when the main cluster node is the meta-data management node; if the synchronous information is a pre-written log, executing first synchronous preprocessing, and if the synchronous information is snapshot data, executing second synchronous preprocessing;
the main cluster module is used for periodically traversing meta-listener node information in a graph space directory and sending the data after synchronous preprocessing to the pipeline nodes of each slave cluster through the meta-listener nodes;
and the slave cluster module is used for processing the data through the pipeline node and then sending the processed data to the corresponding slave cluster node.
CN202211498391.5A 2022-11-28 2022-11-28 Method and system for synchronizing data among clusters of one master and multiple slaves in real time Pending CN115544172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211498391.5A CN115544172A (en) 2022-11-28 2022-11-28 Method and system for synchronizing data among clusters of one master and multiple slaves in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211498391.5A CN115544172A (en) 2022-11-28 2022-11-28 Method and system for synchronizing data among clusters of one master and multiple slaves in real time

Publications (1)

Publication Number Publication Date
CN115544172A true CN115544172A (en) 2022-12-30

Family

ID=84721733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211498391.5A Pending CN115544172A (en) 2022-11-28 2022-11-28 Method and system for synchronizing data among clusters of one master and multiple slaves in real time

Country Status (1)

Country Link
CN (1) CN115544172A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349384A (en) * 2023-12-04 2024-01-05 四川才子软件信息网络有限公司 Database synchronization method, system and equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019091324A1 (en) * 2017-11-07 2019-05-16 阿里巴巴集团控股有限公司 Data synchronization method and device, and electronic device
WO2019154394A1 (en) * 2018-02-12 2019-08-15 中兴通讯股份有限公司 Distributed database cluster system, data synchronization method and storage medium
US20200004746A1 (en) * 2018-07-02 2020-01-02 Baxter International Inc. Graph database for outbreak tracking and management
CN110795503A (en) * 2019-10-18 2020-02-14 北京达佳互联信息技术有限公司 Multi-cluster data synchronization method and related device of distributed storage system
CN112417033A (en) * 2020-10-19 2021-02-26 中国科学院计算机网络信息中心 Method and system for realizing multi-node data consistency of distributed graph database
CN112825525A (en) * 2019-11-20 2021-05-21 北京百度网讯科技有限公司 Method and apparatus for processing transactions
CN114579671A (en) * 2022-05-09 2022-06-03 高伟达软件股份有限公司 Inter-cluster data synchronization method and device
CN114661818A (en) * 2022-03-17 2022-06-24 杭州欧若数网科技有限公司 Method, system, and medium for real-time synchronization of data between clusters in a graph database

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019091324A1 (en) * 2017-11-07 2019-05-16 阿里巴巴集团控股有限公司 Data synchronization method and device, and electronic device
WO2019154394A1 (en) * 2018-02-12 2019-08-15 中兴通讯股份有限公司 Distributed database cluster system, data synchronization method and storage medium
US20200004746A1 (en) * 2018-07-02 2020-01-02 Baxter International Inc. Graph database for outbreak tracking and management
CN110795503A (en) * 2019-10-18 2020-02-14 北京达佳互联信息技术有限公司 Multi-cluster data synchronization method and related device of distributed storage system
CN112825525A (en) * 2019-11-20 2021-05-21 北京百度网讯科技有限公司 Method and apparatus for processing transactions
CN112417033A (en) * 2020-10-19 2021-02-26 中国科学院计算机网络信息中心 Method and system for realizing multi-node data consistency of distributed graph database
CN114661818A (en) * 2022-03-17 2022-06-24 杭州欧若数网科技有限公司 Method, system, and medium for real-time synchronization of data between clusters in a graph database
CN114579671A (en) * 2022-05-09 2022-06-03 高伟达软件股份有限公司 Inter-cluster data synchronization method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张晨东等: "基于Raft一致性协议的高可用性实现", 《华东师范大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349384A (en) * 2023-12-04 2024-01-05 四川才子软件信息网络有限公司 Database synchronization method, system and equipment
CN117349384B (en) * 2023-12-04 2024-03-15 四川才子软件信息网络有限公司 Database synchronization method, system and equipment

Similar Documents

Publication Publication Date Title
US11888599B2 (en) Scalable leadership election in a multi-processing computing environment
EP4254183A1 (en) Transaction processing method and apparatus, computer device, and storage medium
US9489443B1 (en) Scheduling of splits and moves of database partitions
US9852204B2 (en) Read-only operations processing in a paxos replication system
US10542073B2 (en) File transfer to a distributed file system
US11296940B2 (en) Centralized configuration data in a distributed file system
WO2017087267A1 (en) Low latency rdma-based distributed storage
CN112654978B (en) Method, equipment and system for checking data consistency in real time in distributed heterogeneous storage system
Yut et al. LDA* a robust and large-scale topic modeling system
US11728976B1 (en) Systems and methods for efficiently serving blockchain requests using an optimized cache
EP3494493B1 (en) Repartitioning data in a distributed computing system
CN111324606B (en) Data slicing method and device
CN112162846B (en) Transaction processing method, device and computer readable storage medium
CN115544172A (en) Method and system for synchronizing data among clusters of one master and multiple slaves in real time
CN108363787B (en) IFC file parallel storage method, device and system
CN114461593B (en) Log writing method and device, electronic device and storage medium
US11157456B2 (en) Replication of data in a distributed file system using an arbiter
CN114661818B (en) Method, system, and medium for real-time synchronization of data between clusters in a graph database
CN109726211B (en) Distributed time sequence database
CA3008830C (en) High throughput, high reliability data processing system
CN114925078A (en) Data updating method, system, electronic device and storage medium
Ruan et al. Hymr: a hybrid mapreduce workflow system
CN112818021A (en) Data request processing method and device, computer equipment and storage medium
CN113672556A (en) Batch file migration method and device
CN115098228B (en) Transaction processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221230