CN112416884A

CN112416884A - Data synchronization method and system

Info

Publication number: CN112416884A
Application number: CN202011319138.XA
Authority: CN
Inventors: 李睿; 孙谦晨; 王娟; 庆祖良; 耿东山
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-02-26

Abstract

The application discloses a data synchronization method and a data synchronization system. The data synchronization method comprises the following steps: acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster; generating a disk snapshot of the first data to be synchronized; and sending the disk snapshot to a distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in a backup cluster. By adopting the data synchronization method and system provided by the application, the network resource consumption can be effectively reduced, the time consumption of data synchronization is reduced, and the data synchronization efficiency is improved.

Description

Data synchronization method and system

Technical Field

The present application relates to the field of wireless communication technologies, and in particular, to a data synchronization method, an apparatus, and an electronic device.

Background

With the continuous development of internet technology, internet enterprises generally adopt a master-slave cluster form to provide network services for users, and data synchronization is performed between the master-slave clusters to ensure the quality of service.

At present, when data synchronization is performed between the main cluster and the backup cluster, each node in the main cluster needs to establish network connection with each node in the backup cluster, and for each node in the main cluster, the cache data can be sent to each node in the backup cluster through the network connection between the node in the main cluster and each node in the backup cluster, so that the data synchronization between the main cluster and the backup cluster is realized. However, with the development of internet technology, the number of nodes in the active/standby cluster and the amount of cache data of each node in the active/standby cluster are also increasing, and data synchronization according to the existing method needs to occupy more network resources.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data synchronization method and system, which can reduce network resource consumption in data synchronization.

The technical scheme of the application is as follows:

in a first aspect, a data synchronization method is provided, which is applied to a data synchronization system, and the method may include:

acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;

generating a disk snapshot of first data to be synchronized;

and sending the disk snapshot to the distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in the backup cluster.

In some embodiments, the data synchronization system includes a preset file subsystem;

generating a disk snapshot of first data to be synchronized, comprising:

generating a disk snapshot corresponding to first data to be synchronized based on a preset file subsystem;

sending the disk snapshot to a distributed file system, comprising:

and asynchronously sending the disk snapshot to the distributed file subsystem through the preset file subsystem.

In some embodiments, the default file subsystem is the ignite file subsystem.

In some embodiments, the data synchronization system includes a publish-subscribe subsystem and a message queue subsystem;

the method further comprises the following steps:

acquiring second data to be synchronized of the first node in real time;

converting the second data to be synchronized into a pre-written log file;

and sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem.

In some embodiments, before sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem, the method further comprises:

acquiring the last generation time of the disk snapshot;

and acquiring the target pre-written log file according to the last generation time.

In some embodiments, sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem comprises:

sending the pre-written log file to a message queue subsystem through a publish-subscribe subsystem;

and sending the pre-written log file to the second node through a preset cache channel by using the message queue subsystem.

In some embodiments, sending the pre-written log file to the second node through the preset cache channel by using the message queue subsystem includes:

the message queue subsystem acquires the distributed lock through a synchronous client;

and sending the pre-written log file to the second node by using the distributed lock through the synchronous client.

In some embodiments, the distributed lock is implemented by zookeeper.

In some embodiments, the pre-written log file includes at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file;

after sending the pre-written log file to the message queue subsystem through the publish-subscribe subsystem, the method further comprises:

and recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.

In a second aspect, a data synchronization system is provided, including:

the data acquisition device is used for acquiring first to-be-synchronized data of a first node in a preset historical time period, wherein the first node is a node in a main cluster;

the snapshot generating device is used for generating a disk snapshot of the first data to be synchronized;

and the data sending device is used for sending the disk snapshot to the distributed file subsystem so that the second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, and the second node is a node in the backup cluster.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

according to the embodiment of the application, the disk snapshot of the first to-be-synchronized data of the first node in the main cluster is generated, and the disk snapshot is sent to the distributed file subsystem. In this way, on the one hand, the second node in the backup cluster can achieve data synchronization by obtaining a disk snapshot from the distributed file subsystem. Therefore, each node in the main cluster is not required to transmit the data to be synchronized through the network connection with each node in the backup cluster, so that the data synchronization is realized, and the network resource consumption can be effectively reduced. On the other hand, the data synchronization between the main cluster and the backup cluster is carried out in a disk snapshot mode, and the time consumption of the data synchronization can be reduced, so that the data synchronization efficiency is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.

Fig. 1 is a schematic flow chart of a data synchronization method according to an embodiment of the present application;

fig. 2 is a schematic view of a data synchronization method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a WAL file structure provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a data synchronization system according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Based on the background art, as the number of nodes in the main and standby clusters and the amount of cache data of each node of the main cluster increase, the data synchronization method according to the prior art needs to occupy more network resources.

Specifically, data synchronization between the main cluster and the standby cluster is mainly realized by data transmission from the main cluster to the backup cluster. Taking 100 node servers (hereinafter referred to as nodes) in the backup cluster as an example, each node in the main cluster needs to establish a network connection with 100 node servers in several backup circles respectively for transmitting data to be synchronized in real time. In an actual scene, the number of nodes of the main cluster and the cache data volume of the data to be synchronized are large, the load is large, and the data synchronization between the main cluster and the standby cluster is performed according to the existing data synchronization method, so that more network resources are occupied.

Based on the technical problem, the application provides a data synchronization method, a data synchronization device and an electronic device, and a disk snapshot of first to-be-synchronized data of a first node in a main cluster can be generated and sent to a distributed file subsystem. In this way, on the one hand, the second node in the backup cluster can achieve data synchronization by obtaining a disk snapshot from the distributed file subsystem. Therefore, each node in the main cluster is not required to transmit the data to be synchronized through the network connection with each node in the backup cluster, so that the data synchronization is realized, and the network resource consumption can be effectively reduced. On the other hand, the data synchronization between the main cluster and the backup cluster is carried out in a disk snapshot mode, and the time consumption of the data synchronization can be reduced, so that the data synchronization efficiency is improved.

The following describes the data synchronization method provided in the embodiments of the present application in detail.

Fig. 1 shows a flowchart of a data synchronization method provided in an embodiment of the present application. An execution main body of the data synchronization method provided in the embodiment of the present application may be a data synchronization system, and a structure of the data synchronization system will be described with reference to fig. 4, which is not repeated herein. As shown in fig. 1, the data synchronization method provided in this embodiment may include the following steps:

s110, first data to be synchronized of the first node in a preset historical period is obtained.

Wherein the first node may be any node in the primary cluster, see fig. 2.

As an example, the preset history period may be a preset history period to which the data to be synchronized belongs, such as 1 hour before the current time.

The first data to be synchronized may be data to be synchronized generated by a service client (client) in a preset history period, the data to be synchronized may be cached in the first node, for example, the preset history period is 1 hour, and the first data to be synchronized may be data to be synchronized, which is generated by the service client in 1 hour before the current time and is cached in the first node.

The data to be synchronized may be data generated by the first node that needs to be synchronized to the nodes of the backup cluster.

When data synchronization between the main cluster and the backup cluster is performed, first data to be synchronized generated by the first node in a preset history period can be acquired. Taking the preset history period of 2 hours and the current time of 10:00 as an example, the data to be synchronized generated by the first node at 8:00-10:00 can be acquired as the first data to be synchronized.

And S120, generating a disk snapshot of the first data to be synchronized.

After the first to-be-synchronized data of the first node in the preset history period is acquired, a disk snapshot of the first to-be-synchronized data can be generated. Because the disk snapshot can perform fast disk data backup on the whole disk volume, the backup action can be completed fast on the persistent data of the distributed cache. Therefore, for the first to-be-synchronized data, the disk snapshot is directly generated, the time consumption is less, and the occupied network resources are less.

S130, the disk snapshot is sent to the distributed file subsystem, so that the second node obtains the disk snapshot from the distributed file subsystem to achieve data synchronization.

The second node may be any node in the backup cluster, such as any node in the backup cluster in fig. 2.

After the disk snapshot of the first data to be synchronized is generated, the disk snapshot may be sent to the distributed file subsystem, for example, the disk snapshot may be mapped to the distributed file subsystem. As shown in fig. 2, the Distributed File subsystem may be a Hadoop Distributed File System (HDFS) that is accessible to nodes of the primary cluster and nodes of the backup cluster. In this way, the second node can obtain the disk snapshot from the distributed file subsystem, and data synchronization between the first node and the second node is realized.

It can be understood that the data synchronization method provided in the embodiment of the present application may be executed periodically, may be executed at any set time, or may be executed when a synchronization instruction is received. The disk snapshots in the distributed file subsystem may be deleted after being acquired by the second node, or may be deleted periodically.

In addition, due to the fact that service processing has the characteristic of high real-time performance, peak-valley time and peak-valley value of the generated service access request are not fixed, when the service access request reaches or approaches the peak value, network load of each first node is high, and network response time is prolonged. At this time, if the data synchronization method provided by the prior art is adopted, the time delay of network response is aggravated, so that the delay of service processing is too high, and data processing is affected. By adopting the data synchronization method provided by the embodiment of the application, the network resource consumption can be effectively reduced, so that the service processing delay is not aggravated, and the data processing is not influenced.

In some embodiments, the data synchronization system may include a preset file subsystem, and based on this, the disk snapshot may be synchronized to the distributed file subsystem through the preset file subsystem, and accordingly, a specific implementation manner of the step S120 may be as follows:

in this case, the specific implementation method of step S130 may be:

As an example, the preset file subsystem may be a preset file system, which may be used to cache a disk snapshot of the first data to be synchronized.

As a specific example, when generating a disk snapshot of first data to be synchronized, the disk snapshot of the first data to be synchronized may be generated based on a preset file subsystem. Then, the disk snapshots in the preset file subsystem may be asynchronously mapped to the distributed file subsystem. In this way, the second node can read the disk snapshot in the preset file subsystem from the distributed file subsystem to realize data synchronization. Therefore, the time consumed by the disk file to the distributed file subsystem can be reduced by mapping the disk snapshot file of the preset file subsystem to the distributed file subsystem, so that the network resource consumption can be further reduced, the time consumed by data synchronization can be reduced, and the data synchronization efficiency can be further improved.

In some embodiments, referring to FIG. 2, the default file subsystem may be the omni file subsystem, i.e., the omni-fs in the figure.

As a specific example, the ignite file subsystem may be a native file system of the main and standby clusters, the disk snapshot file may be mapped to the HDFS through the ignite file subsystem, and the backup cluster also obtains the disk snapshot file through the HDFS to quickly complete synchronization of the basic data.

Thus, using the native ignite file subsystem, data synchronization costs can be reduced. The ignite file subsystem can delegate files to other file systems as a cache layer based on the memory, so that the ignite file subsystem can be transparently added into a hadoop operating environment to serve as a file transparent cache layer stored in the HDFS. Thus, when the ignite file subsystem is used as an HDFS cache layer, Input/Output (I/O) consumption can be effectively reduced, and delay and throughput can be improved.

In some embodiments, referring to fig. 2, the data synchronization system may further include a publish-subscribe subsystem (i.e., the publish-subscribe system in fig. 2) and a message queue subsystem (i.e., the MQ system in fig. 2), so as to implement real-time data synchronization, and accordingly, the method further includes the following processes:

acquiring second data to be synchronized of the first node in real time;

converting the second data to be synchronized into a pre-written log file;

As a specific example, based on an event notification mechanism of the distributed subfile system, the present embodiment uses the publish-subscribe subsystem to modify the publish-subscribe subsystem as follows:

1. expanding and releasing subscription events and increasing synchronous data transmission events;

2. through buffer memory group, divide the channel transmission to balanced transmission pressure avoids the heavier condition of single channel load to appear.

3. And sending the data to be synchronized to the subscription client through the publishing and subscribing subsystem.

Based on the data synchronization system, real-time data synchronization can be realized. The first node can generate data to be synchronized, namely second data to be synchronized, in real time in the working process. The data synchronization system may obtain second data to be synchronized of the first node, and convert the second data to be synchronized into a Write-Ahead log (WAL) file, for example, Write the second data to be synchronized into a WAL file, that is, the WAL log in fig. 2, and for example, may use a shared cache mechanism to ensure that all data to be synchronized are flushed into the WAL file.

Referring to fig. 3, the data content of the WAL file may include a timestamp Idx, data to be synchronized data, and an end off; the data structure of the data to be synchronized data of the WAL file may include a name of the data to be synchronized, a synchronization operation time, an operation type, and a Key-value pair (K-V).

Then, referring to fig. 2, the WAL file may be sent to the publish-subscribe subsystem, which may send the WAL log to a Message Queue (MQ) subsystem through a subscribing client (client). And the MQ subsystem sends the WAL file to a second node in the backup cluster through a synchronization client, and the second node can analyze the WAL file through an analysis thread pool to obtain data to be synchronized, so that data synchronization is realized.

Therefore, incremental data synchronization is realized by converting the data to be synchronized into the WAL file, so that the WAL file still exists even if the process is crashed, and the accuracy of the data to be synchronized can be controlled at the source. In addition, the publishing and subscribing subsystem and the message queue subsystem can control the accuracy of the data to be synchronized between the main cluster and the standby cluster, so that the data synchronization accuracy can be improved. In addition, compared with forward synchronization in the prior art, the data synchronization method provided by this embodiment can avoid excessive network resource consumption of a single node, and the WAL file is analyzed by using the analysis thread pool, so that the service processing pressure of the cluster at the service peak can be reduced, and meanwhile, by relying on the publish-subscribe subsystem, the cluster pressure can be relatively gentle, and the cluster is more stable.

In some embodiments, the target pre-written log file may also be obtained according to the generation time of the disk snapshot, and a specific implementation manner of determining the second data to be synchronized may be as follows:

acquiring the last generation time of the disk snapshot;

and acquiring a target pre-written log file according to the last generation time.

The target pre-written log file is a pre-written log file generated in a time period after the time of generating the disk snapshot last time, and the target pre-written log file can be regarded as a pre-written log file corresponding to data which is not synchronized from the time of generating the disk snapshot last time, that is, incremental synchronization data.

As a specific example, consider that the real-time data synchronization method may not always be performed due to a failure, restart, or other reason. Therefore, before the real-time data synchronization is executed for the first time, the last generation time of the disk snapshot can be obtained, and the target pre-written log file which needs to be subjected to the data synchronization is determined according to the last generation time. Then, a target pre-written log file after the last generation time can be obtained, the target pre-written log file is sent to the second node through the publish-subscribe subsystem and the message queue subsystem, and the second node can analyze the WAL file to obtain data to be synchronized, so that data synchronization is realized.

Therefore, the target pre-written log file is determined based on the last generation time of the disk snapshot, missing synchronization of data can be avoided, reliability of data to be synchronized is ensured from the source, repeated synchronization of the data can be reduced to a certain extent, integrity and accuracy of data synchronization can be further improved, and resource consumption is further reduced. In addition, the influence on the service processing process can be reduced.

Moreover, because the WAL file exists all the time, even if a network fault occurs, the target pre-written log file can be determined according to the last generation time of the disk snapshot, and data synchronization is realized based on the target pre-written log file, so that the accuracy of a transmission result can be further synchronized.

In some embodiments, data synchronization may be performed according to the cache channel, and accordingly, a specific implementation manner of sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem may be as follows:

As an example, the preset cache channel may be a channel preset for transmitting WAL files of different types, different kinds, or different themes, for example, may be a theme preset for the cache data or the cache group, and a channel is allocated to each theme to transmit the data to be synchronized corresponding to the theme, that is, the WAL file corresponding to the theme.

As a specific example, referring to fig. 2, after the second data to be synchronized is converted into a WAL file, the WAL file may be sent to the publish-subscribe subsystem, and the publish-subscribe subsystem may send the WAL log to the message queue MQ subsystem through the subscription client. The MQ subsystem can determine a theme to which the WAL file belongs, determine a preset cache channel corresponding to the theme, and then send the WAL file to the second node through the preset cache channel by using a synchronization client so as to realize data synchronization. Therefore, the data synchronization is realized by utilizing the preset cache channel, the network resource consumption can be further reduced, and the network load pressure is reduced.

In some embodiments, the data synchronization may be implemented by using a distributed lock, and the specific implementation method for sending the pre-written log file to the second node by using the message queue subsystem through the preset cache channel may be as follows:

As a specific example, when the MQ subsystem sends the WAL file to the second node, referring to fig. 2, the distributed lock, also called a cluster lock, may be obtained through the synchronization client. And then the WAL file is sent to a second node through a synchronization client based on the distributed lock, so that data synchronization between the main cluster and the standby cluster is realized. Therefore, the synchronization client is distinguished from the service client by using the distributed lock, so that the synchronization client and the service client can be prevented from being concurrent, the synchronization client needs to acquire the distributed lock preferentially, and after the preferential consumption is finished, the service client is accessed to finish data processing.

In some embodiments, the distributed lock described above may be implemented by zookeeper.

As a specific example, the synchronization client may write to the zookeeper node, confirm to switch the active/standby clusters, and automatically exit after the consumption is completed. The service client can monitor the father directory, and when no child path exists in the father directory, the service client starts to access to process service data.

In some embodiments, the pre-written log file may include at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file.

Correspondingly, at this time, after the pre-written log file is sent to the message queue subsystem through the publish-subscribe subsystem, the following steps may also be performed:

As a specific example, after receiving the WAL file, the MQ subsystem may obtain and record one or more of a data operation time, a data operation type, a data operation cache type, and data content corresponding to the WAL file.

Therefore, the result of data synchronization can be compared to determine the synchronization result without depending on the comparison of data export between the main and standby clusters, i.e. without exporting the cache data of the main and standby clusters. But the existing records of the MQ subsystem can be checked to confirm the data synchronization result, namely, the data tracing and the confirmation of the data synchronization result can be completed through the MQ subsystem, so that the tracing and the data verification of the data synchronization result can be completed more intuitively in a dynamic mode, and the time sequence of the data is ensured. And based on the method, when the main cluster and the standby cluster need to be switched, synchronous data can be pulled from the MQ message system, synchronous operation data is constructed according to the cache name and the operation type, and reverse data synchronization is carried out.

The above is a data synchronization method provided in the embodiments of the present application, and based on the data synchronization method, the embodiments of the present application also provide a data synchronization system, and the following describes the data synchronization system provided in the embodiments of the present application.

Fig. 4 shows a schematic structural diagram of a data synchronization system provided in an embodiment of the present application, and as shown in fig. 4, the data synchronization system 400 may include:

a data obtaining device 410, configured to obtain first data to be synchronized of a first node in a preset history period, where the first node is a node in a master cluster;

a snapshot generating device 420, configured to generate a disk snapshot of the first data to be synchronized;

and a data sending device 430, configured to send the disk snapshot to the distributed file subsystem, so that a second node obtains the disk snapshot from the distributed file subsystem to implement data synchronization, where the second node is a node in the backup cluster.

In some embodiments, the data synchronization system 400 may further include a preset file subsystem;

the snapshot generating apparatus 420 may be specifically configured to:

the data transmission device 430 may include:

the first data sending module may be configured to asynchronously send the disk snapshot to the distributed file subsystem through the preset file subsystem.

In some embodiments, the default file subsystem may be the ignite file subsystem.

In some embodiments, the data synchronization system 400 may also include a publish-subscribe subsystem and a message queue subsystem;

the data acquisition device 410 may include:

the first data acquisition module may be configured to acquire second data to be synchronized of the first node in real time;

the conversion module can be used for converting the second data to be synchronized into a pre-written log file;

the data transmission device 430 may include:

and the second data sending module can be used for sending the pre-written log to the second node through the publish-subscribe subsystem and the message queue subsystem.

In some embodiments, the data transmission system 400 may further include:

the time obtaining module can be used for obtaining the last generation time of the disk snapshot;

the data acquiring device 410 may further include:

and the second data acquisition module can acquire the target pre-written log file according to the last generation time.

In some embodiments, the second data sending module may include:

the first sending unit may be configured to send the pre-written log file to the message queue subsystem through the publish-subscribe subsystem;

the second sending unit may be configured to send the pre-written log file to the second node through the preset cache channel by using the message queue subsystem.

In some embodiments, the second sending unit may include:

the first acquisition subunit is used for the message queue subsystem to acquire the distributed lock through the synchronous client;

and the second sending subunit may be configured to send the prewritten log file to the second node through the synchronization client using the distributed lock.

In some embodiments, the distributed lock may be implemented by zookeeper.

In some embodiments, the pre-written log file may include at least one of a data operation time, a data operation type, a data operation cache type, and a data content corresponding to the pre-written log file;

the data synchronization system may further include:

and the recording device can be used for recording at least one of the data operation time, the data operation type and the data operation cache type of the pre-written log file through the message queue subsystem.

The data synchronization system provided in the embodiments of the present application may execute the data synchronization method provided in each of the embodiments, and the specific implementation principle and technical effect are similar, and for brevity, no further description is given here.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods and systems according to the present application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A data synchronization method is applied to a data synchronization system, and the method comprises the following steps:

generating a disk snapshot of the first data to be synchronized;

and sending the disk snapshot to a distributed file subsystem so that a second node acquires the disk snapshot from the distributed file subsystem to realize data synchronization, wherein the second node is a node in a backup cluster.

2. The method of claim 1, wherein the data synchronization system comprises a preset file subsystem;

the generating the disk snapshot of the first data to be synchronized includes:

generating a disk snapshot corresponding to the first to-be-synchronized data based on a preset file subsystem;

the sending the disk snapshot to a distributed file system includes:

and asynchronously sending the disk snapshot to a distributed file subsystem through the preset file subsystem.

3. The method of claim 2, wherein the default file subsystem is an ignite file subsystem.

4. The method of claim 1, wherein the data synchronization system comprises a publish-subscribe subsystem and a message queue subsystem;

the method further comprises the following steps:

acquiring second data to be synchronized of the first node in real time;

converting the second data to be synchronized into a pre-written log file;

and sending the pre-written log to the second node through a publish-subscribe subsystem and a message queue subsystem.

5. The method of claim 4, wherein before sending the pre-written log to the second node via a publish-subscribe subsystem and a message queue subsystem, further comprising:

acquiring the last generation time of the disk snapshot;

6. The method of claim 4, wherein sending the pre-written log to the second node via a publish-subscribe subsystem and a message queue subsystem comprises:

sending the pre-written log file to the message queue subsystem through the publish-subscribe subsystem;

7. The method of claim 6, wherein sending the pre-written log file to the second node via a predetermined cache channel using the message queue subsystem comprises:

the message queue subsystem acquires a distributed lock through a synchronous client;

and sending the pre-written log file to the second node by the synchronization client through the distributed lock.

8. The method of claim 6, wherein the distributed lock is implemented by zookeeper.

9. The method according to any one of claims 4 to 7, wherein the pre-written log file comprises at least one of a data operation time, a data operation type, a data operation cache type and a data content corresponding to the pre-written log file;

after the sending of the pre-written log file to the message queue subsystem by the publish-subscribe subsystem, the method further comprises:

10. A data synchronization system, comprising:

and the data sending device is used for sending the disk snapshot to the distributed file subsystem so as to enable a second node to acquire the disk snapshot from the distributed file subsystem to realize data synchronization, and the second node is a node in the backup cluster.