CN112416884A - Data synchronization method and system - Google Patents

Data synchronization method and system Download PDF

Info

Publication number
CN112416884A
CN112416884A CN202011319138.XA CN202011319138A CN112416884A CN 112416884 A CN112416884 A CN 112416884A CN 202011319138 A CN202011319138 A CN 202011319138A CN 112416884 A CN112416884 A CN 112416884A
Authority
CN
China
Prior art keywords
data
subsystem
node
file
synchronized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011319138.XA
Other languages
Chinese (zh)
Inventor
李睿
孙谦晨
王娟
庆祖良
耿东山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Jiangsu Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011319138.XA priority Critical patent/CN112416884A/en
Publication of CN112416884A publication Critical patent/CN112416884A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method and a data synchronization system. The data synchronization method comprises the following steps: acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster; generating a disk snapshot of the first data to be synchronized; and sending the disk snapshot to a distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in a backup cluster. By adopting the data synchronization method and system provided by the application, the network resource consumption can be effectively reduced, the time consumption of data synchronization is reduced, and the data synchronization efficiency is improved.

Description

Data synchronization method and system
Technical Field
The present application relates to the field of wireless communication technologies, and in particular, to a data synchronization method, an apparatus, and an electronic device.
Background
With the continuous development of internet technology, internet enterprises generally adopt a master-slave cluster form to provide network services for users, and data synchronization is performed between the master-slave clusters to ensure the quality of service.
At present, when data synchronization is performed between the main cluster and the backup cluster, each node in the main cluster needs to establish network connection with each node in the backup cluster, and for each node in the main cluster, the cache data can be sent to each node in the backup cluster through the network connection between the node in the main cluster and each node in the backup cluster, so that the data synchronization between the main cluster and the backup cluster is realized. However, with the development of internet technology, the number of nodes in the active/standby cluster and the amount of cache data of each node in the active/standby cluster are also increasing, and data synchronization according to the existing method needs to occupy more network resources.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data synchronization method and system, which can reduce network resource consumption in data synchronization.
The technical scheme of the application is as follows:
in a first aspect, a data synchronization method is provided, which is applied to a data synchronization system, and the method may include:
acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;
generating a disk snapshot of first data to be synchronized;
and sending the disk snapshot to the distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in the backup cluster.
In some embodiments, the data synchronization system includes a preset file subsystem;
generating a disk snapshot of first data to be synchronized, comprising:
generating a disk snapshot corresponding to first data to be synchronized based on a preset file subsystem;
sending the disk snapshot to a distributed file system, comprising:
and asynchronously sending the disk snapshot to the distributed file subsystem through the preset file subsystem.
In some embodiments, the default file subsystem is the ignite file subsystem.
In some embodiments, the data synchronization system includes a publish-subscribe subsystem and a message queue subsystem;
the method further comprises the following steps:
acquiring second data to be synchronized of the first node in real time;
converting the second data to be synchronized into a pre-written log file;
and sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem.
In some embodiments, before sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem, the method further comprises:
acquiring the last generation time of the disk snapshot;
and acquiring the target pre-written log file according to the last generation time.
In some embodiments, sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem comprises:
sending the pre-written log file to a message queue subsystem through a publish-subscribe subsystem;
and sending the pre-written log file to the second node through a preset cache channel by using the message queue subsystem.
In some embodiments, sending the pre-written log file to the second node through the preset cache channel by using the message queue subsystem includes:
the message queue subsystem acquires the distributed lock through a synchronous client;
and sending the pre-written log file to the second node by using the distributed lock through the synchronous client.
In some embodiments, the distributed lock is implemented by zookeeper.
In some embodiments, the pre-written log file includes at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file;
after sending the pre-written log file to the message queue subsystem through the publish-subscribe subsystem, the method further comprises:
and recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.
In a second aspect, a data synchronization system is provided, including:
the data acquisition device is used for acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;
the snapshot generating device is used for generating a disk snapshot of the first data to be synchronized;
and the data sending device is used for sending the disk snapshot to the distributed file subsystem so that the second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, and the second node is a node in the backup cluster.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
according to the embodiment of the application, the disk snapshot of the first to-be-synchronized data of the first node in the main cluster is generated, and the disk snapshot is sent to the distributed file subsystem. In this way, on the one hand, the second node in the backup cluster can achieve data synchronization by obtaining a disk snapshot from the distributed file subsystem. Therefore, each node in the main cluster is not required to transmit the data to be synchronized through the network connection with each node in the backup cluster, so that the data synchronization is realized, and the network resource consumption can be effectively reduced. On the other hand, the data synchronization between the main cluster and the backup cluster is carried out in a disk snapshot mode, and the time consumption of the data synchronization can be reduced, so that the data synchronization efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.
Fig. 1 is a schematic flow chart of a data synchronization method according to an embodiment of the present application;
fig. 2 is a schematic view of a data synchronization method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a WAL file structure provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data synchronization system according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Based on the background art, as the number of nodes in the main and standby clusters and the amount of cache data of each node of the main cluster increase, the data synchronization method according to the prior art needs to occupy more network resources.
Specifically, data synchronization between the main cluster and the standby cluster is mainly realized by data transmission from the main cluster to the backup cluster. Taking 100 node servers (hereinafter referred to as nodes) in the backup cluster as an example, each node in the main cluster needs to establish a network connection with 100 node servers in several backup circles respectively for transmitting data to be synchronized in real time. In an actual scene, the number of nodes of the main cluster and the cache data volume of the data to be synchronized are large, the load is large, and the data synchronization between the main cluster and the standby cluster is performed according to the existing data synchronization method, so that more network resources are occupied.
Based on the technical problem, the application provides a data synchronization method, a data synchronization device and an electronic device, and a disk snapshot of first to-be-synchronized data of a first node in a main cluster can be generated and sent to a distributed file subsystem. In this way, on the one hand, the second node in the backup cluster can achieve data synchronization by obtaining a disk snapshot from the distributed file subsystem. Therefore, each node in the main cluster is not required to transmit the data to be synchronized through the network connection with each node in the backup cluster, so that the data synchronization is realized, and the network resource consumption can be effectively reduced. On the other hand, the data synchronization between the main cluster and the backup cluster is carried out in a disk snapshot mode, and the time consumption of the data synchronization can be reduced, so that the data synchronization efficiency is improved.
The following describes the data synchronization method provided in the embodiments of the present application in detail.
Fig. 1 shows a flowchart of a data synchronization method provided in an embodiment of the present application. An execution main body of the data synchronization method provided in the embodiment of the present application may be a data synchronization system, and a structure of the data synchronization system will be described with reference to fig. 4, which is not repeated herein. As shown in fig. 1, the data synchronization method provided in this embodiment may include the following steps:
s110, first data to be synchronized of the first node in a preset historical period is obtained.
Wherein the first node may be any node in the primary cluster, see fig. 2.
As an example, the preset history period may be a preset history period to which the data to be synchronized belongs, such as 1 hour before the current time.
The first data to be synchronized may be data to be synchronized generated by a service client (client) in a preset history period, the data to be synchronized may be cached in the first node, for example, the preset history period is 1 hour, and the first data to be synchronized may be data to be synchronized, which is generated by the service client in 1 hour before the current time and is cached in the first node.
The data to be synchronized may be data generated by the first node that needs to be synchronized to the nodes of the backup cluster.
When data synchronization between the main cluster and the backup cluster is performed, first data to be synchronized generated by the first node in a preset history period can be acquired. Taking the preset history period of 2 hours and the current time of 10:00 as an example, the data to be synchronized generated by the first node at 8:00-10:00 can be acquired as the first data to be synchronized.
And S120, generating a disk snapshot of the first data to be synchronized.
After the first to-be-synchronized data of the first node in the preset history period is acquired, a disk snapshot of the first to-be-synchronized data can be generated. Because the disk snapshot can perform fast disk data backup on the whole disk volume, the backup action can be completed fast on the persistent data of the distributed cache. Therefore, for the first to-be-synchronized data, the disk snapshot is directly generated, the time consumption is less, and the occupied network resources are less.
S130, the disk snapshot is sent to the distributed file subsystem, so that the second node obtains the disk snapshot from the distributed file subsystem to achieve data synchronization.
The second node may be any node in the backup cluster, such as any node in the backup cluster in fig. 2.
After the disk snapshot of the first data to be synchronized is generated, the disk snapshot may be sent to the distributed file subsystem, for example, the disk snapshot may be mapped to the distributed file subsystem. As shown in fig. 2, the Distributed File subsystem may be a Hadoop Distributed File System (HDFS) that is accessible to nodes of the primary cluster and nodes of the backup cluster. In this way, the second node can obtain the disk snapshot from the distributed file subsystem, and data synchronization between the first node and the second node is realized.
It can be understood that the data synchronization method provided in the embodiment of the present application may be executed periodically, may be executed at any set time, or may be executed when a synchronization instruction is received. The disk snapshots in the distributed file subsystem may be deleted after being acquired by the second node, or may be deleted periodically.
According to the embodiment of the application, the disk snapshot of the first to-be-synchronized data of the first node in the main cluster is generated, and the disk snapshot is sent to the distributed file subsystem. In this way, on the one hand, the second node in the backup cluster can achieve data synchronization by obtaining a disk snapshot from the distributed file subsystem. Therefore, each node in the main cluster is not required to transmit the data to be synchronized through the network connection with each node in the backup cluster, so that the data synchronization is realized, and the network resource consumption can be effectively reduced. On the other hand, the data synchronization between the main cluster and the backup cluster is carried out in a disk snapshot mode, and the time consumption of the data synchronization can be reduced, so that the data synchronization efficiency is improved.
In addition, due to the fact that service processing has the characteristic of high real-time performance, peak-valley time and peak-valley value of the generated service access request are not fixed, when the service access request reaches or approaches the peak value, network load of each first node is high, and network response time is prolonged. At this time, if the data synchronization method provided by the prior art is adopted, the time delay of network response is aggravated, so that the delay of service processing is too high, and data processing is affected. By adopting the data synchronization method provided by the embodiment of the application, the network resource consumption can be effectively reduced, so that the service processing delay is not aggravated, and the data processing is not influenced.
In some embodiments, the data synchronization system may include a preset file subsystem, and based on this, the disk snapshot may be synchronized to the distributed file subsystem through the preset file subsystem, and accordingly, a specific implementation manner of the step S120 may be as follows:
generating a disk snapshot corresponding to first data to be synchronized based on a preset file subsystem;
in this case, the specific implementation method of step S130 may be:
and asynchronously sending the disk snapshot to the distributed file subsystem through the preset file subsystem.
As an example, the preset file subsystem may be a preset file system, which may be used to cache a disk snapshot of the first data to be synchronized.
As a specific example, when generating a disk snapshot of first data to be synchronized, the disk snapshot of the first data to be synchronized may be generated based on a preset file subsystem. Then, the disk snapshots in the preset file subsystem may be asynchronously mapped to the distributed file subsystem. In this way, the second node can read the disk snapshot in the preset file subsystem from the distributed file subsystem to realize data synchronization. Therefore, the time consumed by the disk file to the distributed file subsystem can be reduced by mapping the disk snapshot file of the preset file subsystem to the distributed file subsystem, so that the network resource consumption can be further reduced, the time consumed by data synchronization can be reduced, and the data synchronization efficiency can be further improved.
In some embodiments, referring to FIG. 2, the default file subsystem may be the omni file subsystem, i.e., the omni-fs in the figure.
As a specific example, the ignite file subsystem may be a native file system of the main and standby clusters, the disk snapshot file may be mapped to the HDFS through the ignite file subsystem, and the backup cluster also obtains the disk snapshot file through the HDFS to quickly complete synchronization of the basic data.
Thus, using the native ignite file subsystem, data synchronization costs can be reduced. The ignite file subsystem can delegate files to other file systems as a cache layer based on the memory, so that the ignite file subsystem can be transparently added into a hadoop operating environment to serve as a file transparent cache layer stored in the HDFS. Thus, when the ignite file subsystem is used as an HDFS cache layer, Input/Output (I/O) consumption can be effectively reduced, and delay and throughput can be improved.
In some embodiments, referring to fig. 2, the data synchronization system may further include a publish-subscribe subsystem (i.e., the publish-subscribe system in fig. 2) and a message queue subsystem (i.e., the MQ system in fig. 2), so as to implement real-time data synchronization, and accordingly, the method further includes the following processes:
acquiring second data to be synchronized of the first node in real time;
converting the second data to be synchronized into a pre-written log file;
and sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem.
As a specific example, based on an event notification mechanism of the distributed subfile system, the present embodiment uses the publish-subscribe subsystem to modify the publish-subscribe subsystem as follows:
1. expanding and releasing subscription events and increasing synchronous data transmission events;
2. through buffer memory group, divide the channel transmission to balanced transmission pressure avoids the heavier condition of single channel load to appear.
3. And sending the data to be synchronized to the subscription client through the publishing and subscribing subsystem.
Based on the data synchronization system, real-time data synchronization can be realized. The first node can generate data to be synchronized, namely second data to be synchronized, in real time in the working process. The data synchronization system may obtain second data to be synchronized of the first node, and convert the second data to be synchronized into a Write-Ahead log (WAL) file, for example, Write the second data to be synchronized into a WAL file, that is, the WAL log in fig. 2, and for example, may use a shared cache mechanism to ensure that all data to be synchronized are flushed into the WAL file.
Referring to fig. 3, the data content of the WAL file may include a timestamp Idx, data to be synchronized data, and an end off; the data structure of the data to be synchronized data of the WAL file may include a name of the data to be synchronized, a synchronization operation time, an operation type, and a Key-value pair (K-V).
Then, referring to fig. 2, the WAL file may be sent to the publish-subscribe subsystem, which may send the WAL log to a Message Queue (MQ) subsystem through a subscribing client (client). And the MQ subsystem sends the WAL file to a second node in the backup cluster through a synchronization client, and the second node can analyze the WAL file through an analysis thread pool to obtain data to be synchronized, so that data synchronization is realized.
Therefore, incremental data synchronization is realized by converting the data to be synchronized into the WAL file, so that the WAL file still exists even if the process is crashed, and the accuracy of the data to be synchronized can be controlled at the source. In addition, the publishing and subscribing subsystem and the message queue subsystem can control the accuracy of the data to be synchronized between the main cluster and the standby cluster, so that the data synchronization accuracy can be improved. In addition, compared with forward synchronization in the prior art, the data synchronization method provided by this embodiment can avoid excessive network resource consumption of a single node, and the WAL file is analyzed by using the analysis thread pool, so that the service processing pressure of the cluster at the service peak can be reduced, and meanwhile, by relying on the publish-subscribe subsystem, the cluster pressure can be relatively gentle, and the cluster is more stable.
In some embodiments, the target pre-written log file may also be obtained according to the generation time of the disk snapshot, and a specific implementation manner of determining the second data to be synchronized may be as follows:
acquiring the last generation time of the disk snapshot;
and acquiring a target pre-written log file according to the last generation time.
The target pre-written log file is a pre-written log file generated in a time period after the time of generating the disk snapshot last time, and the target pre-written log file can be regarded as a pre-written log file corresponding to data which is not synchronized from the time of generating the disk snapshot last time, that is, incremental synchronization data.
As a specific example, consider that the real-time data synchronization method may not always be performed due to a failure, restart, or other reason. Therefore, before the real-time data synchronization is executed for the first time, the last generation time of the disk snapshot can be obtained, and the target pre-written log file which needs to be subjected to the data synchronization is determined according to the last generation time. Then, a target pre-written log file after the last generation time can be obtained, the target pre-written log file is sent to the second node through the publish-subscribe subsystem and the message queue subsystem, and the second node can analyze the WAL file to obtain data to be synchronized, so that data synchronization is realized.
Therefore, the target pre-written log file is determined based on the last generation time of the disk snapshot, missing synchronization of data can be avoided, reliability of data to be synchronized is ensured from the source, repeated synchronization of the data can be reduced to a certain extent, integrity and accuracy of data synchronization can be further improved, and resource consumption is further reduced. In addition, the influence on the service processing process can be reduced.
Moreover, because the WAL file exists all the time, even if a network fault occurs, the target pre-written log file can be determined according to the last generation time of the disk snapshot, and data synchronization is realized based on the target pre-written log file, so that the accuracy of a transmission result can be further synchronized.
In some embodiments, data synchronization may be performed according to the cache channel, and accordingly, a specific implementation manner of sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem may be as follows:
sending the pre-written log file to a message queue subsystem through a publish-subscribe subsystem;
and sending the pre-written log file to the second node through a preset cache channel by using the message queue subsystem.
As an example, the preset cache channel may be a channel preset for transmitting WAL files of different types, different kinds, or different themes, for example, may be a theme preset for the cache data or the cache group, and a channel is allocated to each theme to transmit the data to be synchronized corresponding to the theme, that is, the WAL file corresponding to the theme.
As a specific example, referring to fig. 2, after the second data to be synchronized is converted into a WAL file, the WAL file may be sent to the publish-subscribe subsystem, and the publish-subscribe subsystem may send the WAL log to the message queue MQ subsystem through the subscription client. The MQ subsystem can determine a theme to which the WAL file belongs, determine a preset cache channel corresponding to the theme, and then send the WAL file to the second node through the preset cache channel by using a synchronization client so as to realize data synchronization. Therefore, the data synchronization is realized by utilizing the preset cache channel, the network resource consumption can be further reduced, and the network load pressure is reduced.
In some embodiments, the data synchronization may be implemented by using a distributed lock, and the specific implementation method for sending the pre-written log file to the second node by using the message queue subsystem through the preset cache channel may be as follows:
the message queue subsystem acquires the distributed lock through a synchronous client;
and sending the pre-written log file to the second node by using the distributed lock through the synchronous client.
As a specific example, when the MQ subsystem sends the WAL file to the second node, referring to fig. 2, the distributed lock, also called a cluster lock, may be obtained through the synchronization client. And then the WAL file is sent to a second node through a synchronization client based on the distributed lock, so that data synchronization between the main cluster and the standby cluster is realized. Therefore, the synchronization client is distinguished from the service client by using the distributed lock, so that the synchronization client and the service client can be prevented from being concurrent, the synchronization client needs to acquire the distributed lock preferentially, and after the preferential consumption is finished, the service client is accessed to finish data processing.
In some embodiments, the distributed lock described above may be implemented by zookeeper.
As a specific example, the synchronization client may write to the zookeeper node, confirm to switch the active/standby clusters, and automatically exit after the consumption is completed. The service client can monitor the father directory, and when no child path exists in the father directory, the service client starts to access to process service data.
In some embodiments, the pre-written log file may include at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file.
Correspondingly, at this time, after the pre-written log file is sent to the message queue subsystem through the publish-subscribe subsystem, the following steps may also be performed:
and recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.
As a specific example, after receiving the WAL file, the MQ subsystem may obtain and record one or more of a data operation time, a data operation type, a data operation cache type, and data content corresponding to the WAL file.
Therefore, the result of data synchronization can be compared to determine the synchronization result without depending on the comparison of data export between the main and standby clusters, i.e. without exporting the cache data of the main and standby clusters. But the existing records of the MQ subsystem can be checked to confirm the data synchronization result, namely, the data tracing and the confirmation of the data synchronization result can be completed through the MQ subsystem, so that the tracing and the data verification of the data synchronization result can be completed more intuitively in a dynamic mode, and the time sequence of the data is ensured. And based on the method, when the main cluster and the standby cluster need to be switched, synchronous data can be pulled from the MQ message system, synchronous operation data is constructed according to the cache name and the operation type, and reverse data synchronization is carried out.
The above is a data synchronization method provided in the embodiments of the present application, and based on the data synchronization method, the embodiments of the present application also provide a data synchronization system, and the following describes the data synchronization system provided in the embodiments of the present application.
Fig. 4 shows a schematic structural diagram of a data synchronization system provided in an embodiment of the present application, and as shown in fig. 4, the data synchronization system 400 may include:
a data obtaining device 410, configured to obtain first data to be synchronized of a first node in a preset history period, where the first node is a node in a master cluster;
a snapshot generating device 420, configured to generate a disk snapshot of the first data to be synchronized;
and a data sending device 430, configured to send the disk snapshot to the distributed file subsystem, so that a second node obtains the disk snapshot from the distributed file subsystem to implement data synchronization, where the second node is a node in the backup cluster.
In some embodiments, the data synchronization system 400 may further include a preset file subsystem;
the snapshot generating apparatus 420 may be specifically configured to:
generating a disk snapshot corresponding to first data to be synchronized based on a preset file subsystem;
the data transmission device 430 may include:
the first data sending module may be configured to asynchronously send the disk snapshot to the distributed file subsystem through the preset file subsystem.
In some embodiments, the default file subsystem may be the ignite file subsystem.
In some embodiments, the data synchronization system 400 may also include a publish-subscribe subsystem and a message queue subsystem;
the data acquisition device 410 may include:
the first data acquisition module may be configured to acquire second data to be synchronized of the first node in real time;
the conversion module can be used for converting the second data to be synchronized into a pre-written log file;
the data transmission device 430 may include:
and the second data sending module can be used for sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem.
In some embodiments, the data transmission system 400 may further include:
the time obtaining module can be used for obtaining the last generation time of the disk snapshot;
the data acquiring device 410 may further include:
and the second data acquisition module can acquire the target pre-written log file according to the last generation time.
In some embodiments, the second data sending module may include:
the first sending unit may be configured to send the pre-written log file to the message queue subsystem through the publish-subscribe subsystem;
the second sending unit may be configured to send the pre-written log file to the second node through the preset cache channel by using the message queue subsystem.
In some embodiments, the second sending unit may include:
the first acquisition subunit is used for the message queue subsystem to acquire the distributed lock through the synchronous client;
and the second sending subunit may be configured to send the prewritten log file to the second node through the synchronization client using the distributed lock.
In some embodiments, the distributed lock may be implemented by zookeeper.
In some embodiments, the pre-written log file may include at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file;
the data synchronization system may further include:
and the recording device can be used for recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.
The data synchronization system provided in the embodiments of the present application may execute the data synchronization method provided in each of the embodiments, and the specific implementation principle and technical effect are similar, and for brevity, no further description is given here.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods and systems according to the present application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A data synchronization method is applied to a data synchronization system, and the method comprises the following steps:
acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;
generating a disk snapshot of the first data to be synchronized;
and sending the disk snapshot to a distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in a backup cluster.
2. The method of claim 1, wherein the data synchronization system comprises a preset file subsystem;
the generating the disk snapshot of the first data to be synchronized includes:
generating a disk snapshot corresponding to the first to-be-synchronized data based on a preset file subsystem;
the sending the disk snapshot to a distributed file system includes:
and asynchronously sending the disk snapshot to a distributed file subsystem through the preset file subsystem.
3. The method of claim 2, wherein the default file subsystem is an ignite file subsystem.
4. The method of claim 1, wherein the data synchronization system comprises a publish-subscribe subsystem and a message queue subsystem;
the method further comprises the following steps:
acquiring second data to be synchronized of the first node in real time;
converting the second data to be synchronized into a pre-written log file;
and sending the pre-written log to the second node through a publish-subscribe subsystem and a message queue subsystem.
5. The method of claim 4, wherein before sending the pre-written log to the second node via a publish-subscribe subsystem and a message queue subsystem, further comprising:
acquiring the last generation time of the disk snapshot;
and acquiring a target pre-written log file according to the last generation time.
6. The method of claim 4, wherein sending the pre-written log to the second node via a publish-subscribe subsystem and a message queue subsystem comprises:
sending the pre-written log file to the message queue subsystem through the publish-subscribe subsystem;
and sending the pre-written log file to the second node through a preset cache channel by using the message queue subsystem.
7. The method of claim 6, wherein sending the pre-written log file to the second node via a predetermined cache channel using the message queue subsystem comprises:
the message queue subsystem acquires a distributed lock through a synchronous client;
and sending the pre-written log file to the second node by the synchronization client through the distributed lock.
8. The method of claim 6, wherein the distributed lock is implemented by zookeeper.
9. The method according to any one of claims 4 to 7, wherein the pre-written log file comprises at least one of a data operation time, a data operation type, a data operation cache type and a data content corresponding to the pre-written log file;
after the sending of the pre-written log file to the message queue subsystem by the publish-subscribe subsystem, the method further comprises:
and recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.
10. A data synchronization system, comprising:
the data acquisition device is used for acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;
the snapshot generating device is used for generating a disk snapshot of the first data to be synchronized;
and the data sending device is used for sending the disk snapshot to the distributed file subsystem so as to enable a second node to acquire the disk snapshot from the distributed file subsystem to realize data synchronization, and the second node is a node in the backup cluster.
CN202011319138.XA 2020-11-23 2020-11-23 Data synchronization method and system Pending CN112416884A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011319138.XA CN112416884A (en) 2020-11-23 2020-11-23 Data synchronization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011319138.XA CN112416884A (en) 2020-11-23 2020-11-23 Data synchronization method and system

Publications (1)

Publication Number Publication Date
CN112416884A true CN112416884A (en) 2021-02-26

Family

ID=74777284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011319138.XA Pending CN112416884A (en) 2020-11-23 2020-11-23 Data synchronization method and system

Country Status (1)

Country Link
CN (1) CN112416884A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860630A (en) * 2021-04-08 2021-05-28 广州趣丸网络科技有限公司 Real-time transformation data storage method and device, electronic equipment and storage medium
CN113282245A (en) * 2021-06-15 2021-08-20 中国建设银行股份有限公司 Method for auditing supply and host platform
CN114610817A (en) * 2022-05-12 2022-06-10 恒生电子股份有限公司 Data synchronization method and device, multi-active system, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188566A (en) * 2007-12-13 2008-05-28 沈阳东软软件股份有限公司 A method and system data buffering and synchronization under cluster environment
CN103678045A (en) * 2013-12-31 2014-03-26 曙光云计算技术有限公司 Data backup method for virtual machines
CN107087038A (en) * 2017-06-29 2017-08-22 珠海市魅族科技有限公司 A kind of method of data syn-chronization, synchronizer, device and storage medium
CN108123976A (en) * 2016-11-30 2018-06-05 阿里巴巴集团控股有限公司 Data back up method, apparatus and system between cluster
CN108664356A (en) * 2018-05-03 2018-10-16 吉林亿联银行股份有限公司 A kind of database backup method and device, Database Systems
WO2019091324A1 (en) * 2017-11-07 2019-05-16 阿里巴巴集团控股有限公司 Data synchronization method and device, and electronic device
CN110737719A (en) * 2019-09-06 2020-01-31 深圳平安通信科技有限公司 Data synchronization method, device, equipment and computer readable storage medium
CN110807013A (en) * 2018-08-03 2020-02-18 阿里巴巴集团控股有限公司 Data migration method and device for distributed data storage cluster

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101188566A (en) * 2007-12-13 2008-05-28 沈阳东软软件股份有限公司 A method and system data buffering and synchronization under cluster environment
CN103678045A (en) * 2013-12-31 2014-03-26 曙光云计算技术有限公司 Data backup method for virtual machines
CN108123976A (en) * 2016-11-30 2018-06-05 阿里巴巴集团控股有限公司 Data back up method, apparatus and system between cluster
CN107087038A (en) * 2017-06-29 2017-08-22 珠海市魅族科技有限公司 A kind of method of data syn-chronization, synchronizer, device and storage medium
WO2019091324A1 (en) * 2017-11-07 2019-05-16 阿里巴巴集团控股有限公司 Data synchronization method and device, and electronic device
CN108664356A (en) * 2018-05-03 2018-10-16 吉林亿联银行股份有限公司 A kind of database backup method and device, Database Systems
CN110807013A (en) * 2018-08-03 2020-02-18 阿里巴巴集团控股有限公司 Data migration method and device for distributed data storage cluster
CN110737719A (en) * 2019-09-06 2020-01-31 深圳平安通信科技有限公司 Data synchronization method, device, equipment and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟庆祥等: "《OpenGIS设计开发基础教程 基于QGIS+PostGIS设计开发》", vol. 1, 31 August 2018, 武汉:武汉大学出版社, pages: 38 - 40 *
艾利克斯洪木尔: "《云计算架构设计模式》", vol. 1, 31 October 2017, 武汉:华中科技大学出版社, pages: 155 - 159 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860630A (en) * 2021-04-08 2021-05-28 广州趣丸网络科技有限公司 Real-time transformation data storage method and device, electronic equipment and storage medium
CN113282245A (en) * 2021-06-15 2021-08-20 中国建设银行股份有限公司 Method for auditing supply and host platform
CN113282245B (en) * 2021-06-15 2024-04-12 中国建设银行股份有限公司 Method for auditing supply number and host platform
CN114610817A (en) * 2022-05-12 2022-06-10 恒生电子股份有限公司 Data synchronization method and device, multi-active system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112416884A (en) Data synchronization method and system
CN108920153B (en) Docker container dynamic scheduling method based on load prediction
US12013764B2 (en) Past-state backup generator and interface for database systems
CN110825420A (en) Configuration parameter updating method, device, equipment and storage medium for distributed cluster
US20160241441A1 (en) Method and apparatus for changing configurations
CN106021315B (en) Log management method and system for application program
CN114064211B (en) Video stream analysis system and method based on end-side-cloud computing architecture
CN110569269A (en) data synchronization method and system
CN111865632B (en) Switching method of distributed data storage cluster and switching instruction sending method and device
CN111064626B (en) Configuration updating method, device, server and readable storage medium
CN107391276A (en) Distributed monitor method, interception control device and system
CN106452836B (en) main node setting method and device
CN111913933B (en) Power grid historical data management method and system based on unified support platform
CN116304390B (en) Time sequence data processing method and device, storage medium and electronic equipment
CN115587118A (en) Task data dimension table association processing method and device and electronic equipment
CN111601299A (en) Information association backfill system under 5G framework
US11042454B1 (en) Restoration of a data source
CN108733808B (en) Big data software system switching method, system, terminal equipment and storage medium
CN107566341B (en) Data persistence storage method and system based on federal distributed file storage system
CN114625566A (en) Data disaster tolerance method and device, electronic equipment and storage medium
CN112019362B (en) Data transmission method, device, server, terminal, system and storage medium
CN114500289B (en) Control plane recovery method, device, control node and storage medium
CN115473858A (en) Data transmission method and streaming data transmission system
CN112685486B (en) Data management method and device for database cluster, electronic equipment and storage medium
CN113641385A (en) Distributed application parameter distribution system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination