CN112711641A

CN112711641A - Data synchronization method and device for distributed system and computer readable storage medium

Info

Publication number: CN112711641A
Application number: CN202110056333.6A
Authority: CN
Inventors: 闫巨龙; 刘洋
Original assignee: Jiangsu Yuncongxihe Artificial Intelligence Co ltd
Current assignee: Jiangsu Yuncongxihe Artificial Intelligence Co ltd
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-04-27

Abstract

The invention relates to the technical field of distributed system control, in particular provides a data synchronization method and device of a distributed system and a computer readable storage medium, and aims to solve the technical problem of efficiently and reliably synchronizing data of the distributed system. For this purpose, according to the method provided by the embodiment of the invention, the service engine to be synchronized, which needs to perform data synchronization, can be quickly positioned according to the working state of the service engine and the abnormal timestamp, and the original data to be synchronized can be determined, and each service engine to be synchronized can be accurately synchronized according to the determined original data to be synchronized, so that the defects of low efficiency and easy data loss caused by data synchronization in a manual adjustment mode in the prior art are overcome. In addition, when data synchronization needs to be performed on a large number of service engines to be synchronized, data synchronization can be performed on each service engine to be synchronized at the same time, and the data synchronization efficiency of the distributed system is further improved.

Description

Data synchronization method and device for distributed system and computer readable storage medium

Technical Field

The invention relates to the technical field of distributed system control, in particular to a data synchronization method and device of a distributed system and a computer readable storage medium.

Background

The same original data of the distributed system usually has a plurality of backup data, different backup data can be stored on different nodes, and if a certain node fails to cause the stored backup data to be inconsistent with the original data, the operational reliability and stability of the distributed system can be greatly influenced. In order to keep the backup data on each node consistent with the original data, at present, data migration is performed on the nodes mainly by adopting a manual adjustment mode after the node failure is recovered. However, when data migration needs to be performed on a large amount of data, not only a long time is required to be consumed to cause that the distributed system cannot normally operate, but also data loss is easily caused, and thus the operation reliability and stability of the distributed system are reduced.

Disclosure of Invention

In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a data synchronization method, apparatus and computer-readable storage medium for a distributed system that solve or at least partially solve the technical problem of how to efficiently and reliably synchronize data of the distributed system.

In a first aspect, a data synchronization method for a distributed system is provided, where the method includes:

acquiring the working state of each service engine in each copy group in the distributed system;

acquiring service engines to be synchronized according to the working state of each service engine;

acquiring an abnormal time stamp of each service engine to be synchronized;

inquiring the original data to be synchronized of each service engine to be synchronized from a preset original database according to the abnormal timestamp;

respectively carrying out data synchronization on each service engine to be synchronized according to the original data to be synchronized;

the copy data stored in each copy group is different, the copy data stored in each service engine in the same copy group is the same, and the exception timestamp represents the time when the service engine is in exception.

In one technical solution of the foregoing data synchronization method, the step of "querying the original data to be synchronized of each service engine to be synchronized from a preset original database according to the abnormal timestamp" specifically includes:

judging whether the abnormal timestamp is zero time or not;

if so, acquiring original data stored in the preset original database in a query time interval formed from zero time to current time, and taking the original data as the original data to be synchronized;

if not, acquiring an inquiry starting time t according to the abnormal timestamp, acquiring original data stored in the preset original database in an inquiry time period formed from the inquiry starting time t to the current time, and taking the original data as the original data to be synchronized;

wherein the query start time t ═ t₁-Δt₁Said t is₁Represents the anomaly timestamp, the Δ t₁Represents a preset first time variable and Δ t₁≥0。

In one technical solution of the data synchronization method, each service engine includes one or more replica databases, and replica data stored in each replica database in the same service engine is different;

the step of "performing data synchronization on each service engine to be synchronized according to the original data to be synchronized" specifically includes:

acquiring the replica data of each replica database in each service engine to be synchronized;

performing synchronous data comparison according to the original data to be synchronized and the replica data of each replica database;

acquiring a duplicate database to be synchronized and corresponding data to be synchronized in each service engine to be synchronized according to a comparison result of the synchronization data, wherein the data to be synchronized comprises data to be added to the duplicate database and/or data to be deleted from the duplicate database;

and performing data addition and/or data deletion operation on each replica database to be synchronized according to the data to be synchronized so as to complete data synchronization of the service engine to be synchronized.

In an embodiment of the foregoing data synchronization method, after the step of "completing data synchronization of the service engine to be synchronized", the method further includes:

acquiring the data synchronization completion time t of each service engine to be synchronized₂；

Obtaining the time period t of query₂-Δt₂，t₂+Δt₂]Taking the original data stored in the preset original database as original data to be verified, wherein the delta t is₂Represents a preset second time variable and Δ t₂＞0；

And respectively carrying out data verification on each service engine completing data synchronization according to the original data to be verified.

In a technical solution of the above data synchronization method, "performing data verification on each service engine that completes data synchronization according to the original data to be verified" specifically includes:

comparing the verification data according to the original data to be verified and the duplicate data of each duplicate database completing data synchronization in the service engine;

acquiring a duplicate database to be adjusted and corresponding data to be adjusted according to a comparison result of the check data, wherein the data to be adjusted comprises data to be added to the duplicate database and/or data to be deleted from the duplicate database;

and performing data addition and/or data deletion operation on each copy database to be adjusted according to the data to be adjusted so as to complete data verification.

In an aspect of the foregoing data synchronization method, the method further includes:

storing the real-time working state of each service engine and the database state of each replica database by adopting a state data table, wherein the database states comprise a data deleting state and a data adding state;

and/or the method further comprises:

judging whether a calling fault or a restarting of the service engine is detected; if so, adjusting the working state of the service engine from the effective state to a to-be-synchronized state; if not, maintaining the working state of the service engine to be an effective state;

after the data synchronization is carried out on the service engine to be synchronized, the working state of the service engine to be synchronized is adjusted from a state to be synchronized to a temporary storage state;

after data verification is carried out on the service engine which completes data synchronization, the working state of the service engine is adjusted to be a valid state from the temporary storage state.

responding to received service engine calling fault information, acquiring fault time when a service engine in the service engine calling fault information has calling fault, and setting an abnormal timestamp for the corresponding service engine according to the fault time;

and/or the method further comprises:

regularly detecting whether each service engine is restarted or not;

when a certain service engine is detected to be restarted, acquiring the media type of a storage medium for storing copy data in the service engine;

if the media type is a temporary storage media, setting an abnormal time stamp of the service engine according to zero time; and if the media type is a non-temporary storage media, acquiring the restarting time of the service engine when the service engine is restarted, and setting an abnormal timestamp of the service engine according to the restarting time.

In a second aspect, a data synchronization apparatus for a distributed system is provided, the apparatus comprising:

a service engine state acquisition module configured to acquire an operating state of each service engine in each replica group within the distributed system;

the service engine to be synchronized acquisition module is configured to acquire service engines to be synchronized according to the working state of each service engine;

an exception timestamp acquiring module configured to acquire an exception timestamp of each of the service engines to be synchronized;

a to-be-synchronized raw data acquisition module configured to query the to-be-synchronized raw data of each to-be-synchronized service engine from a preset raw database according to the abnormal timestamp;

the data synchronization module is configured to perform data synchronization on each service engine to be synchronized according to the original data to be synchronized;

In an aspect of the above data synchronization apparatus, the to-be-synchronized original data obtaining module is further configured to perform the following operations:

judging whether the abnormal timestamp is zero time or not;

In one embodiment of the data synchronization apparatus, each service engine includes one or more replica databases, and replica data stored in each replica database in the same service engine is different;

the data synchronization module comprises a copy data acquisition submodule, a synchronization data acquisition submodule, a to-be-synchronized data acquisition submodule and a data synchronization submodule;

the duplicate data acquisition sub-module is configured to acquire duplicate data of each duplicate database in each service engine to be synchronized;

the synchronous data acquisition sub-module is configured to perform synchronous data comparison according to the original data to be synchronized and the duplicate data of each duplicate database;

the to-be-synchronized data acquisition submodule is configured to acquire a duplicate database to be synchronized and corresponding data to be synchronized in each to-be-synchronized service engine according to a comparison result of the synchronized data, wherein the data to be synchronized comprises data to be added to the duplicate database and/or data to be deleted from the duplicate database;

the data synchronization submodule is configured to perform data addition and/or data deletion operations on each replica database to be synchronized according to the data to be synchronized, so as to complete data synchronization of the service engine to be synchronized.

In one technical solution of the above data synchronization device, the device further includes a data verification module, where the data verification module includes a synchronization completion time acquisition sub-module, an original data acquisition sub-module to be verified, and a data verification sub-module;

the synchronization completion time acquisition submodule is configured to acquire the data synchronization completion time t of each service engine to be synchronized₂；

The original data to be verified acquisition sub-module is configured to acquire the original data to be verified in a query time period t₂-Δt₂，t₂+Δt₂]Taking the original data stored in the preset original database as original data to be verified, wherein the delta t is₂Representing a preset second time variationQuantity and Δ t₂＞0；

And the data verification submodule is configured to perform data verification on each service engine completing data synchronization according to the original data to be verified.

In one technical solution of the above data synchronization device, the data verification sub-module includes a verification data comparison unit, a to-be-verified data acquisition unit, and a data verification unit;

the verification data comparison unit is configured to perform verification data comparison according to the original data to be verified and the duplicate data of each duplicate database completing data synchronization in the service engine;

the to-be-verified data acquisition unit is configured to acquire a to-be-adjusted duplicate database and corresponding to-be-adjusted data according to a verification data comparison result, wherein the to-be-adjusted data comprises data to be added to the duplicate database and/or data to be deleted from the duplicate database;

the data checking unit is configured to perform data adding and/or data deleting operation on each copy database to be adjusted according to the data to be adjusted so as to complete data checking.

In one technical solution of the above data synchronization apparatus, the apparatus further includes a status data table and/or a service engine status adjustment module, where the service engine status adjustment module includes a first status adjustment submodule, a second status adjustment submodule, and a third status adjustment submodule;

the state data table is configured to store a real-time working state of each service engine and a database state of each replica database, wherein the database states comprise a data deletion state and a data addition state;

the first state adjustment submodule is configured to judge whether a calling fault or a restarting of a service engine is detected; if so, adjusting the working state of the service engine from the effective state to a to-be-synchronized state; if not, maintaining the working state of the service engine to be an effective state;

the second state adjusting submodule is configured to adjust the working state of the service engine to be synchronized from a state to be synchronized to a temporary storage state after the data synchronization of the service engine to be synchronized is carried out;

the third state adjusting submodule is configured to adjust the working state of the service engine from the temporary storage state to the valid state after the data verification is performed on the service engine which completes the data synchronization.

In one technical solution of the above data synchronization apparatus, the apparatus further includes an abnormal timestamp setting module, where the abnormal timestamp setting module includes a first timestamp setting submodule and/or a second timestamp setting submodule;

the first timestamp setting submodule is configured to respond to received service engine calling fault information, obtain a fault moment when a service engine in the service engine calling fault information has a calling fault, and set an abnormal timestamp for the corresponding service engine according to the fault moment;

the second timestamp setting sub-module is configured to:

regularly detecting whether each service engine is restarted or not;

In a third aspect, a control device is provided, which comprises a processor and a storage device, wherein the storage device is adapted to store a plurality of program codes, and the program codes are adapted to be loaded and run by the processor to execute the data synchronization method of the distributed system according to any one of the above-mentioned technical solutions of the data synchronization method of the distributed system.

In a fourth aspect, a computer-readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and executed by a processor to perform the data synchronization method of the distributed system according to any one of the above-mentioned technical aspects of the data synchronization method of the distributed system.

One or more technical schemes of the invention at least have one or more of the following beneficial effects:

in the technical scheme of the invention, an abnormal time stamp (representing the time when the service engine is abnormal) can be set for the service engine when the service engine is abnormal, the original data of the service engine in the period from the abnormal occurrence to the normal recovery can be locked according to the abnormal time stamp, and whether the data synchronization of the service engine is needed or not can be analyzed according to the comparison result of the original data and the copy data (backup data) stored in the service engine. If the copy data is consistent with the original data, data synchronization is not needed; if the duplicate data is inconsistent with the original data, data synchronization is required. Specifically, the working state of each service engine in each replica group in the distributed system may be obtained first, the service engines to be synchronized are obtained according to the working state of each service engine, the original data to be synchronized of each service engine to be synchronized is queried from a preset original database according to the abnormal timestamp of each service engine to be synchronized, and finally, data synchronization is performed on each service engine to be synchronized according to the original data to be synchronized. By the method, the service engines to be synchronized, which need to perform data synchronization, can be quickly positioned according to the working state of the service engines and the abnormal timestamp, the original data to be synchronized is determined, accurate data synchronization can be performed on each service engine to be synchronized according to the determined original data to be synchronized, and the defects that in the prior art, the efficiency is low and the data is easily lost due to the fact that data synchronization is performed in a manual adjustment mode are overcome. In addition, because each service engine in the distributed system is independent, even if a large number of service engines to be synchronized need to be subjected to data synchronization, each service engine to be synchronized can be subjected to data synchronization at the same time, and thus the data synchronization efficiency of the distributed system is further improved.

Drawings

Embodiments of the invention are described below with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram illustrating the main steps of a data synchronization method for a distributed system according to one embodiment of the present invention;

FIG. 2 is a flow diagram illustrating the main steps of data synchronization for each service engine to be synchronized according to one embodiment of the present invention;

FIG. 3 is a flow diagram illustrating the main steps of data verification for a service engine that performs data synchronization according to one embodiment of the present invention;

fig. 4 is a main configuration block diagram of a data synchronization apparatus of a distributed system according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of an application scenario of the present invention.

List of reference numerals:

11: a service engine state acquisition module; 12: a service engine to be synchronized acquisition module; 13: an exception timestamp acquisition module; 14: a module for acquiring original data to be synchronized; 15: and a data synchronization module.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.

Some terms to which the present invention relates are explained first.

An Engine (Engine) refers to a computer program or a supporting part of a system, and a Service Engine (Service Engine) is a Service-oriented Engine capable of providing a set of internal components for a Service system to support services. For example: the service engine may be a service engine of a face recognition system, which may provide 1: n face recognition functions (identifying a face image sample matching the current face image from among N face image samples).

A distributed system refers to a distributed service system made up of multiple service engines. In an embodiment of the present invention, a distributed system may include a pre-provisioned original database and a plurality of replica groups, each replica group may include a plurality of service engines, and each service engine may include one or more replica databases. The original database stores all original data to be used by the distributed system, and the duplicate database stores duplicate data (backup data) of the original data. Further, in the embodiment of the present invention, the copy data stored in each copy group is different, and the copy data stored in each service engine in the same copy group is the same, that is, the number of service engines in the copy group determines the copy number (backup number) of the original data, and if the number of service engines in the copy group is 2, each original data has two copy data, and the two copy data are respectively stored in different service engines in the same copy group. When the data volume is large, the duplicate data to be stored by each service engine can be divided into multiple copies (each copy is different), and each copy is stored in a different duplicate database of the service engine. An example is as follows: the copy group a comprises a service engine a1 and a service engine a2, the service engine a1 comprises a copy database a11 and a copy database a12, the service engine a2 comprises a copy database a21 and a copy database a22, the backup data to be stored in the copy group a are face images of all teachers and students in school α and school β in a certain area, two identical teacher and student face images of school α can be stored in the copy database a11 and the copy database a21 respectively, and two identical teacher and student face images of school β can be stored in the copy database a12 and the copy database a22 respectively.

It should be noted that the service engines in the same copy group may be disposed on the same server, or may be disposed on different servers. Similarly, the service engines in different copy sets may be located on different servers, or may be located on the same server.

The same original data of the distributed system usually has a plurality of backup data, and different backup data may be stored in different nodes, and if a certain node fails, the stored backup data is inconsistent with the original data, which will greatly affect the operational reliability and stability of the distributed system. At present, the conventional data synchronization method of a distributed system is mainly to perform data migration on nodes in a manual adjustment mode after the nodes are recovered from a fault, so that backup data on different nodes are kept consistent. However, when data migration needs to be performed on a large amount of data, not only a long time is required to be consumed to cause that the distributed system cannot normally operate, but also data loss is easily caused, and thus the operation reliability and stability of the distributed system are reduced.

In the embodiment of the invention, an abnormal time stamp (indicating the time when the service engine is abnormal) can be set for the service engine when the service engine is abnormal, the original data of the service engine in the period from the abnormal occurrence to the normal recovery can be locked according to the abnormal time stamp, and whether the data synchronization of the service engine is needed or not can be analyzed according to the comparison result of the original data and the copy data (backup data) stored in the service engine. If the copy data is consistent with the original data, data synchronization is not needed; if the duplicate data is inconsistent with the original data, data synchronization is required. Specifically, the working state of each service engine in each replica group in the distributed system may be obtained first, the service engines to be synchronized are obtained according to the working state of each service engine, the original data to be synchronized of each service engine to be synchronized is queried from a preset original database according to the abnormal timestamp of each service engine to be synchronized, and finally, data synchronization is performed on each service engine to be synchronized according to the original data to be synchronized. The preset original database can be a database arranged on a background server in the distributed system, and the background server can be in communication connection with each service engine. By the method, the service engines to be synchronized, which need to perform data synchronization, can be quickly positioned according to the working state of the service engines and the abnormal timestamp, the original data to be synchronized is determined, accurate data synchronization can be performed on each service engine to be synchronized according to the determined original data to be synchronized, and the defects that in the prior art, the efficiency is low and the data is easily lost due to the fact that data synchronization is performed in a manual adjustment mode are overcome. In addition, because each service engine in the distributed system is independent, even if a large number of service engines to be synchronized need to be subjected to data synchronization, each service engine to be synchronized can be subjected to data synchronization at the same time, and thus the data synchronization efficiency of the distributed system is further improved.

Referring to fig. 5, in an example of an application scenario of the present invention, a distributed system is a system for performing face recognition on teacher and student images of school a and school B, and the system may include a middleware server and two copy groups (copy group 1 and copy group 2 shown in fig. 5), where copy group 1 includes a face recognition service engine 11 and a face recognition service engine 12, and copy group 2 includes a face recognition service engine 21 and a face recognition service engine 22. The middleware server is provided with a raw database that stores images (raw data) of teachers and students of schools a and B. The original data in the original database is divided into two parts according to the classification of the school, the teacher and student images of the school A are stored in a copy group 1, and the teacher and student images of the school B are stored in a copy group 2. Meanwhile, two identical images of teachers and students of school a are respectively stored in the face recognition service engine 11 and the face recognition service engine 12 in the copy group 1, and two identical images of teachers and students of school B are respectively stored in the face recognition service engine 21 and the face recognition service engine 22 in the copy group 2. In practical applications, if the face recognition service engine 11 is restarted after power failure is detected, an abnormal timestamp is set for the face recognition service engine 11 according to the restart time, and the working state of the face recognition service engine 11 is adjusted to be in a state to be synchronized. Judging the face recognition service engine 11 as a service engine to be synchronized according to the working state of the face recognition service engine 11, further acquiring original data to be synchronized from an original database of a middleware server according to an abnormal timestamp of the face recognition service engine 11, performing data synchronization on the face recognition service engine 11 according to the acquired original data to be synchronized, and adjusting the working state of the face recognition service engine 11 to be an effective state after the data synchronization is completed and the data is verified to be qualified, so that the face recognition service can be normally provided.

Referring to fig. 1, fig. 1 is a flow chart illustrating main steps of a data synchronization method of a distributed system according to an embodiment of the present invention. As shown in fig. 1, the data synchronization method of the distributed system in the embodiment of the present invention mainly includes the following steps:

step S101: and acquiring the working state of each service engine in each copy group in the distributed system. The duplicate data stored in each duplicate group is different, and the duplicate data stored in each service engine in the same duplicate group is the same.

The working state of the service engine can comprise a valid state, a to-be-synchronized state, a temporary storage state and the like. The valid state indicates that the service engine can normally provide the relevant service; the state to be synchronized represents that data synchronization needs to be carried out on the service engine, and the service engine cannot normally provide related services; the temporary storage state indicates that the data synchronization of the service engine is completed, and the relevant service can be normally provided if the data is verified to be correct.

In order to accurately and quickly acquire the working state of each service engine, in an implementation manner of this embodiment, a state data table may be used to store the real-time working state of each service engine, and the service engine to be synchronized may be quickly acquired by querying the state data table. Further, in this embodiment, the operating status type of the service engine may be adjusted in real time according to the operating status of the service engine, and specifically, the operating status type of the service engine may be adjusted according to the following steps: judging whether a calling fault or a restarting of the service engine is detected; if so, adjusting the working state of the service engine from the effective state to a to-be-synchronized state; if not, the working state of the service engine is maintained to be an effective state. After the data synchronization is carried out on the service engine to be synchronized, the working state of the service engine to be synchronized is adjusted to be a temporary storage state from the state to be synchronized. And after the data of the service engine completing the data synchronization is checked, adjusting the working state of the service engine from the temporary storage state to a valid state. Further, in one embodiment, in addition to having the state data table store the real-time operating state of each service engine, the database state of each replica database may also be stored to facilitate accurate retrieval of the number operations being performed by each replica database. Database states include, but are not limited to: delete data state and add data state.

Step S102: and acquiring the abnormal time stamp of each service engine to be synchronized.

The exception timestamp indicates the time at which the service engine exception occurred.

In an implementation manner of the embodiment of the present invention, different exception timestamp setting methods may be adopted according to different exception types to set an exception timestamp for each service engine in which an exception occurs, where the exception types include, but are not limited to: service engine call failures and service engine restarts, etc. The service engine call failure refers to a service engine call failure caused by problems such as failure in normal communication with the service engine, the service engine restart refers to a service engine restart caused by problems such as power failure, or a control service engine restart when data synchronization needs to be performed on a corresponding service engine due to data addition in an original database. The following describes two exception timestamp setting methods of the exception type, namely, the service engine call failure and the service engine restart, respectively.

Service engine invocation fault

In this embodiment, in response to the received service engine invocation fault information, the fault time when the service engine invocation fault occurs in the service engine invocation fault information is obtained, and the corresponding service engine is set with the abnormal timestamp according to the fault time, that is, the abnormal timestamp can be set as the fault time. An example is as follows: if a call failure of service engine 1 at 13:00 is detected, service engine 1 is set with an exception timestamp of "13: 00" in time.

Second, restarting service engine

In the embodiment, whether each service engine is restarted or not can be detected at regular time; when a certain service engine is detected to be restarted, acquiring the media type of a storage medium for storing copy data in the service engine; if the media type is a temporary storage media, setting an abnormal time stamp of the service engine according to the zero time; and if the media type is a non-temporary storage media, acquiring the restarting time when the service engine is restarted, and setting the abnormal timestamp of the service engine according to the restarting time.

The temporary storage medium refers to a storage medium in which data stored after the service engine is restarted is completely cleared; a non-transitory storage medium refers to a storage medium that is not affected by a restart of the service engine, and the stored data is not erased regardless of whether the service engine is restarted.

An example is as follows: if service engine 1 is detected at 13:00 and the storage medium is a temporary storage medium, a time of "00: an exception timestamp of 00 ". If service engine 1 is detected at 13:00 and the storage medium is a non-transitory storage medium, a time is set to "13: an exception timestamp of 00 ".

Step S103: and acquiring the abnormal time stamp of each service engine to be synchronized.

Step S104: and inquiring the original data to be synchronized of each service engine to be synchronized from a preset original database according to the abnormal timestamp.

If the exception timestamp is zero, it indicates that the duplicate data in the service engine may have been completely removed, and therefore all of the original data in the original database needs to be retrieved for data synchronization. If the abnormal timestamp is not zero, it indicates that only a part of data in the service engine is inconsistent with the data in the original database, and at this time, only the original data in the period from the occurrence of the abnormality to the recovery of the abnormality can be acquired for data synchronization. Specifically, in one implementation of this embodiment, the original data to be synchronized may be obtained according to the following steps 11-13:

step 11: judging whether the abnormal timestamp is zero time or not; if yes, go to step 12; if not, go to step 13.

Step 12: the method comprises the steps of obtaining original data stored in an original database preset in a query time interval formed from zero time to current time, and using the original data as original data to be synchronized, namely using all original data in the original database as original data to be synchronized.

Step 13: and acquiring a query starting time t according to the abnormal timestamp, acquiring original data stored in an original database preset in a query time period formed from the query starting time t to the current time, and taking the original data as original data to be synchronized. Wherein, the query starting time t is t ═ t₁-Δt₁，t₁Indicating an anomalous timestamp, Δ t₁Represents a preset first time variable and Δ t₁Is more than or equal to 0. By setting Δ t₁And the original data in the period from the abnormal timestamp to the current time can be obtained, and the original data in the period from a certain time before the abnormal timestamp to the current time can also be obtained. Through the steps 11 to 13, data synchronization can be quickly and accurately acquired according to the abnormal timestampThe raw data to be used.

Step S105: and respectively carrying out data synchronization on each service engine to be synchronized according to the original data to be synchronized.

In this embodiment, data comparison may be performed according to original data to be synchronized and duplicate data stored in the service engine, and a synchronization operation (data addition and/or data deletion) that needs to be performed is determined according to a data comparison result, so that the duplicate data in the service engine and the corresponding original data are kept consistent, and data synchronization is completed.

As can be seen from the foregoing explanation of the terms of the distributed system, in the embodiment of the present invention, each service engine may include one or more replica databases, and the replica data stored in each replica database in the same service engine is different. Therefore, when the data synchronization is performed on the service engine, whether the data synchronization is performed on each copy database is judged, the copy databases to be synchronized are screened out, and the data synchronization is performed only on the copy databases to be synchronized, so that the synchronization efficiency of the data synchronization can be further improved. Specifically, referring to fig. 2, in an embodiment of the present embodiment, data synchronization may be performed on the service engine to be synchronized according to steps S201 to S204 shown in fig. 2:

step S201: and acquiring the duplicate data of each duplicate database in each service engine to be synchronized.

Step S202: and comparing the synchronous data according to the original data to be synchronized and the duplicate data of each duplicate database.

Step S203: and acquiring the duplicate database to be synchronized and the corresponding data to be synchronized in each service engine to be synchronized according to the comparison result of the synchronization data.

If the duplicate data in the duplicate database is the same as the original data to be synchronized, indicating that the data synchronization of the duplicate database is not needed; if the duplicate data in the duplicate database is different from the original data to be synchronized, it indicates that data synchronization needs to be performed on the duplicate database, and the duplicate database is the duplicate database to be synchronized. Meanwhile, which data are specifically included in the data to be synchronized can be determined according to the result of the comparison of the synchronization data, and in this embodiment, the data to be synchronized may include data to be added to the replica database and/or data to be deleted from the replica database.

An example is as follows: if the original data to be synchronized includes data [ d1, d2, d3, d5, d6], the replica data stored in a replica database is [ d1, d2, d3, d4], then it may be determined that the data to be synchronized of the replica database includes: data d5 and d6 are added, and data d4 is deleted.

Step S204: and performing data addition and/or data deletion operation on each replica database to be synchronized according to the data to be synchronized so as to complete data synchronization of the service engine to be synchronized.

The method described in the steps S201 to S204 is adopted for each service engine to be synchronized to perform data synchronization, and data synchronization can be performed on a large number of service engines to be synchronized simultaneously, so that the data synchronization efficiency of the distributed system is significantly improved.

In order to avoid that a part of data does not complete data synchronization due to abnormal operation of data deletion and data addition, after the data synchronization of the service engine to be synchronized is completed, data verification can be performed on the synchronized data to verify whether the synchronized data is consistent with the original data, namely, the accuracy of the data synchronization is verified. Specifically, referring to fig. 3, in an embodiment of the present embodiment, after completing the data synchronization of the service engine to be synchronized (step S105), the data check may be performed according to steps S301 to S303 shown in fig. 3.

Step S301: acquiring the data synchronization completion time t of each service engine to be synchronized₂. In this embodiment, if data synchronization needs to be performed on multiple replica databases in the service engine to be synchronized, the time for completing data synchronization of the last replica database may be used to set the data synchronization completion time t of the service engine to be synchronized₂。

Step S302: obtaining the time period t of query₂-Δt₂，t₂+Δt₂]Storing the original data in an original database preset in the databaseAs the original data to be verified, where Δ t₂Represents a preset second time variable and Δ t₂Is greater than 0. Note that, in the present embodiment, a person skilled in the art can flexibly set Δ t₂As long as the data synchronization completion time t can be obtained₂The two moments before and after the first and second points are formed for a period of time.

Step S303: and respectively carrying out data verification on each service engine completing data synchronization according to the original data to be verified.

In the present embodiment, data verification can be performed separately for each service engine that completes data synchronization according to the following steps 21 to 23.

Step 21: and performing verification data comparison according to the original data to be verified and the duplicate data of each duplicate database (the duplicate database subjected to data synchronization in the previous service engine to be synchronized) in which data synchronization is completed in the service engine (the previous service engine to be synchronized for completing data synchronization).

Step 22: and acquiring the duplicate database to be adjusted and corresponding data to be adjusted according to the comparison result of the check data, wherein the data to be adjusted comprises data to be added into the duplicate database and/or data to be deleted from the duplicate database.

An example is as follows: if the original data to be verified includes data [ d1, d2, d3, d5, d6], the duplicate data stored in a duplicate database in which data synchronization has been completed is [ d1, d2, d3, d4], then it may be determined that the data to be adjusted in the duplicate database includes: data d5 and d6 are added, and data d4 is deleted.

Step 23: and performing data addition and/or data deletion operation on each copy database to be adjusted according to the data to be adjusted so as to complete data verification.

The method of the steps 21 to 23 is adopted for data verification of each service engine which completes data synchronization, and data verification can be simultaneously performed on a large number of service engines, so that the data synchronization efficiency of the distributed system is improved, and the data synchronization accuracy of the distributed system is also improved.

According to the description of the data synchronization method embodiment, the service engine to be synchronized, which needs to perform data synchronization, can be quickly located according to the working state of the service engine and the abnormal timestamp, and the original data to be synchronized is determined, and each service engine to be synchronized can be accurately synchronized according to the determined original data to be synchronized, so that the defects that in the prior art, the efficiency is low and the data is easily lost due to the fact that the data synchronization is performed in a manual adjustment mode are overcome. In addition, because each service engine in the distributed system is independent, even if a large number of service engines to be synchronized need to be subjected to data synchronization, each service engine to be synchronized can be subjected to data synchronization at the same time, and thus the data synchronization efficiency of the distributed system is further improved. Further, the data synchronization method described in the data synchronization method embodiment may be sampled at regular time to perform data synchronization on the distributed system, for example, the data synchronization method described in the data synchronization method embodiment is executed once every 7 days on the distributed system, so as to ensure consistency between the duplicate data and the original data in the distributed system.

It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.

Furthermore, the invention also provides a data synchronization device of the distributed system.

Referring to fig. 4, fig. 4 is a main structural block diagram of a data synchronization apparatus of a distributed system according to an embodiment of the present invention. As shown in fig. 4, the data synchronization apparatus of the distributed system in the embodiment of the present invention mainly includes a service engine state obtaining module 11, a service engine to be synchronized obtaining module 12, an abnormal timestamp obtaining module 13, an original data to be synchronized obtaining module 14, and a data synchronization module 15. In some embodiments, one or more of the service engine state acquisition module 11, the service engine acquisition module to be synchronized 12, the exception timestamp acquisition module 13, the raw data acquisition module to be synchronized 14, and the data synchronization module 15 may be merged together into one module. In some embodiments of the present invention, the,

the service engine state obtaining module 11 may be configured to obtain the working state of each service engine in each replica group in the distributed system; the service engine to be synchronized acquisition module 12 may be configured to acquire service engines to be synchronized according to the working state of each service engine; the exception timestamp acquiring module 13 may be configured to acquire an exception timestamp of each service engine to be synchronized; the to-be-synchronized raw data acquiring module 14 may be configured to query the to-be-synchronized raw data of each to-be-synchronized service engine from a preset raw database according to the abnormal timestamp; the data synchronization module 15 may be configured to perform data synchronization on each service engine to be synchronized according to the raw data to be synchronized; the copy data stored in each copy group is different, the copy data stored in each service engine in the same copy group is the same, and the exception timestamp represents the time when the service engine is abnormal. In one embodiment, the description of the specific implementation function may refer to steps S101 to S105.

In one embodiment, the raw data acquisition module to be synchronized 14 may be further configured to perform the following operations: judging whether the abnormal timestamp is zero time or not; if so, acquiring original data stored in an original database preset in a query time interval formed from zero time to current time, and taking the original data as original data to be synchronized; if not, acquiring a query starting time t according to the abnormal timestamp, acquiring original data stored in an original database preset in a query time period formed from the query starting time t to the current time, and taking the original data as original data to be synchronized; wherein, the query starting time t is t ═ t₁-Δt₁，t₁Indicating an anomalous timestamp, Δ t₁Represents a preset first time variable and Δ t₁Is more than or equal to 0. In one embodiment, the description of the specific implementation function may refer to that in step S104.

In one embodiment, each service engine may include one or more replica databases, each replica database storing replica data that is different in the same service engine; the data synchronization module 15 may include a duplicate data acquisition sub-module, a synchronization data acquisition sub-module, a to-be-synchronized data acquisition sub-module, and a data synchronization sub-module. The duplicate data acquisition sub-module can be configured to acquire the duplicate data of each duplicate database in each service engine to be synchronized; the synchronous data acquisition submodule can be configured to perform synchronous data comparison according to original data to be synchronized and the replica data of each replica database; the to-be-synchronized data acquisition submodule can be configured to acquire the replica database to be synchronized and corresponding data to be synchronized in each service engine to be synchronized according to the result of the comparison of the synchronization data, wherein the data to be synchronized can include data to be added to the replica database and/or data to be deleted from the replica database; the data synchronization submodule can be configured to perform data addition and/or data deletion operations on each replica database to be synchronized according to the data to be synchronized, so as to complete data synchronization of the service engine to be synchronized. In one embodiment, the description of the specific implementation function may refer to steps S201 to S204.

In an embodiment, the data synchronization apparatus shown in fig. 4 may further include a data verification module, and the data verification module may include a synchronization completion time acquisition sub-module, an original data acquisition sub-module to be verified, and a data verification sub-module. The synchronization completion time acquisition submodule can be configured to acquire the data synchronization completion time t of each service engine to be synchronized₂(ii) a The raw data to be verified acquisition submodule may be configured to acquire the raw data during the query period t₂-Δt₂，t₂+Δt₂]The original data stored in an original database is used as original data to be verified, wherein delta t₂Represents a preset second time variable and Δ t₂Is greater than 0; the data verification submodule can be configured to perform data verification on each service engine completing data synchronization according to original data to be verified. In one embodiment, the specific implementation functions are described in detail in the followingStep S301 to step S303.

In one embodiment, the data verification sub-module may include a verification data comparison unit, a to-be-verified data acquisition unit, and a data verification unit. The verification data comparison unit can be configured to perform verification data comparison according to the original data to be verified and the duplicate data of each duplicate database completing data synchronization in the service engine; the to-be-verified data acquisition unit can be configured to acquire the duplicate database to be adjusted and the corresponding data to be adjusted according to the result of the comparison of the verification data, wherein the data to be adjusted can comprise data to be added into the duplicate database and/or data to be deleted from the duplicate database; the data checking unit can be configured to perform data adding and/or data deleting operation on each duplicate database to be adjusted according to the data to be adjusted so as to complete data checking. In one embodiment, the description of the specific implementation function may refer to that in step S303.

In one embodiment, the data synchronization apparatus shown in fig. 4 may further include a status data table and a service engine status adjustment module, and the service engine status adjustment module may include a first status adjustment submodule, a second status adjustment submodule, and a third status adjustment submodule. The state data table can be configured to store a real-time working state of each service engine and a database state of each replica database, wherein the database states comprise a data deleting state and a data adding state; the first state adjustment submodule can be configured to judge whether a calling fault or restarting of the service engine is detected; if so, adjusting the working state of the service engine from the effective state to a to-be-synchronized state; if not, maintaining the working state of the service engine to be an effective state; the second state adjusting submodule can be configured to adjust the working state of the service engine to be synchronized from the state to be synchronized to the temporary storage state after the data of the service engine to be synchronized is synchronized; the third state adjusting submodule may be configured to adjust the working state of the service engine from the staging state to the active state after the data verification is performed on the service engine that completes the data synchronization. In one embodiment, the description of the specific implementation function may be referred to in step S101.

In one embodiment, the data synchronization apparatus shown in fig. 4 may further include an abnormal timestamp setting module, and the abnormal timestamp setting module may include a first timestamp setting sub-module and/or a second timestamp setting sub-module. The first timestamp setting submodule can be configured to respond to the received service engine calling fault information, obtain a fault moment when the service engine calls a fault in the service engine calling fault information, and set an abnormal timestamp for the corresponding service engine according to the fault moment. The second timestamp setting submodule may be configured to: regularly detecting whether each service engine is restarted or not; when a certain service engine is detected to be restarted, acquiring the media type of a storage medium for storing copy data in the service engine; if the media type is a temporary storage media, setting an abnormal time stamp of the service engine according to the zero time; and if the media type is a non-temporary storage media, acquiring the restarting time when the service engine is restarted, and setting the abnormal timestamp of the service engine according to the restarting time. In one embodiment, the description of the specific implementation function may be referred to in step S102.

The technical principles, the solved technical problems, and the generated technical effects of the data synchronization device of the distributed system described above are similar for implementing the embodiment of the data synchronization method of the distributed system shown in fig. 1 to 3, and it can be clearly understood by those skilled in the art that for convenience and brevity of description, the specific working process and related descriptions of the data synchronization device of the distributed system may refer to the content described in the embodiment of the data synchronization method of the distributed system, and are not described herein again.

It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program that executes the data synchronization method of the distributed system of the above-described method embodiment, and the program may be loaded and executed by a processor to implement the data synchronization method of the above-described distributed system. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.

Furthermore, the invention also provides a control device. In an embodiment of the control device according to the present invention, the control device comprises a processor and a storage device, the storage device may be configured to store a program for executing the data synchronization method of the distributed system of the above-mentioned method embodiment, and the processor may be configured to execute the program in the storage device, the program including but not limited to the program for executing the data synchronization method of the distributed system of the above-mentioned method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed including various electronic apparatuses.

Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method for data synchronization in a distributed system, the method comprising:

acquiring an abnormal time stamp of each service engine to be synchronized;

2. The data synchronization method of the distributed system according to claim 1, wherein the step of querying the original data to be synchronized of each service engine to be synchronized from a preset original database according to the abnormal timestamp specifically comprises:

judging whether the abnormal timestamp is zero time or not;

3. The data synchronization method for the distributed system according to claim 1, wherein each service engine comprises one or more replica databases, and each replica database in the same service engine stores different replica data;

4. The data synchronization method of the distributed system according to claim 3, wherein after the step of completing the data synchronization of the service engine to be synchronized, the method further comprises:

5. The data synchronization method of the distributed system according to claim 4, wherein the step of performing data verification on each service engine that completes data synchronization according to the original data to be verified respectively specifically comprises:

6. The method for data synchronization in a distributed system according to claim 5, wherein the method further comprises:

and/or the like and/or,

the method further comprises the following steps:

7. The method for data synchronization of a distributed system according to any one of claims 1 to 6, wherein the method further comprises:

and/or the like and/or,

the method further comprises the following steps:

regularly detecting whether each service engine is restarted or not;

8. A data synchronization apparatus for a distributed system, the apparatus comprising:

9. The data synchronization apparatus of the distributed system according to claim 8, wherein the raw data to be synchronized acquisition module is further configured to:

judging whether the abnormal timestamp is zero time or not;

10. The data synchronization device of the distributed system according to claim 8, wherein each of the service engines comprises one or more replica databases, and each replica database in the same service engine stores different replica data;

11. The data synchronization device of the distributed system according to claim 10, further comprising a data verification module, wherein the data verification module comprises a synchronization completion time acquisition sub-module, an original data acquisition sub-module to be verified, and a data verification sub-module;

The original data to be verified acquisition sub-module is configured to acquire the original data to be verified in a query time period t₂-Δt₂，t₂+Δt₂]Taking the original data stored in the preset original database as original data to be verified, wherein the delta t is₂Represents a preset second time variable and Δ t₂＞0；

12. The data synchronization device of the distributed system according to claim 11, wherein the data verification sub-module includes a verification data comparison unit, a to-be-verified data acquisition unit, and a data verification unit;

13. The data synchronization apparatus of the distributed system according to claim 12, wherein the apparatus further comprises a status data table and/or a service engine status adjustment module, the service engine status adjustment module comprises a first status adjustment submodule, a second status adjustment submodule, and a third status adjustment submodule;

14. The data synchronization apparatus of a distributed system according to any one of claims 8 to 13, wherein the apparatus further comprises an exception timestamp setting module, the exception timestamp setting module comprising a first timestamp setting sub-module and/or a second timestamp setting sub-module;

the second timestamp setting sub-module is configured to:

regularly detecting whether each service engine is restarted or not;

15. A control device comprising a processor and a storage device adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the data synchronization method of the distributed system according to any of claims 1 to 7.

16. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by a processor to perform the data synchronization method of the distributed system of any of claims 1 to 7.