CN116226286A

CN116226286A - Data synchronization method, storage medium and equipment

Info

Publication number: CN116226286A
Application number: CN202310252561.XA
Authority: CN
Inventors: 王凯龙
Original assignee: Beijing Kingbase Information Technologies Co Ltd
Current assignee: Beijing Kingbase Information Technologies Co Ltd
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-06-06

Abstract

The invention provides a data synchronization method, a storage medium and equipment. The data synchronization method comprises the following steps: respectively acquiring incremental data from each source database; the obtained incremental data are sent to a coordination node, and the coordination node is positioned between a target database and each source end database and is used for associating a synchronous link between the target database and each source end database; determining the incremental data with highest priority in the coordination node; and delivering the incremental data with the highest priority to the target database. The invention has the advantages of ensuring the accuracy of data synchronization and accelerating the data synchronization under the scene of data synchronization of many to one.

Description

Data synchronization method, storage medium and equipment

Technical Field

The present invention relates to the field of databases, and in particular, to a data synchronization method, a storage medium, and a device.

Background

The real-time synchronization of data by using a data synchronization tool is generally divided into three stages, wherein the first stage is used for initializing and loading stock data to obtain a basic point of data synchronization; the second stage uses the synchronization basic point established by the initial data loading as a reference to perform incremental data synchronization; and in the third stage, the source data and the target data of the data synchronization are compared and checked regularly to confirm that no data is lost in the data synchronization process.

In a usage scenario of data synchronization, one of them is many-to-one data synchronization. For example, data in source databases a and B are synchronized into target database C. In the prior art, synchronous links are respectively deployed between a target database C and source databases A and B, and the two synchronous links are independent of each other and respectively synchronize the data of the source databases A and B into the target database C.

However, in practical applications, the sequence of data synchronization may be required in the process of synchronizing the data of the source databases a and B to the target database C. For example, it is required to insert a piece of data in the source database a into the target database C first, and then insert a piece of data in the source database B into the target database C, and if the sequence is wrong, synchronization may be failed, which needs to be improved.

Disclosure of Invention

It is an object of the present invention to provide a data synchronization method, a storage medium and a device that overcome or at least partially solve the above-mentioned problems.

It is a further object of the invention to reduce the processing logic complexity of the coordinator node.

In particular, the present invention provides a data synchronization method, comprising:

respectively acquiring incremental data from each source database;

the obtained incremental data are sent to a coordination node, and the coordination node is positioned between a target database and each source end database and is used for associating a synchronous link between the target database and each source end database;

determining the incremental data with highest priority in the coordination node;

and delivering the incremental data with the highest priority to the target database.

Optionally, the step of obtaining a piece of incremental data from each source database includes:

analyzing a log table of each source database;

and obtaining the increment data with the global mark from the parsed log table.

Optionally, the global is marked as a timestamp, and the step of determining the incremental data with the highest priority in the coordination node includes:

comparing the time stamp of each increment data in the coordination node;

and taking the increment data with the smallest time stamp as the increment data with the highest priority.

Optionally, the global flag is a change number or a serial number, and the step of determining the incremental data with the highest priority in the coordination node includes:

correspondingly converting the change number or serial number of each increment data in the coordination node into a time stamp;

comparing the time stamp of each increment data in the coordination node;

Optionally, after the step of delivering the incremental data with the highest priority to the target database, the method further includes:

judging whether incremental data still exists in a source end database of delivered incremental data;

if yes, acquiring the next incremental data from a source database of the delivered incremental data;

and sending the acquired incremental data to a coordination node.

Optionally, after the step of determining whether the incremental data still exists in the source database of the delivered incremental data, the method further includes:

and if the incremental data does not exist, returning a piece of virtual data to the coordination node, wherein the priority of the virtual data is lower than that of the incremental data existing in the coordination node.

Optionally, the timestamp of the virtual data is set to the timestamp when the coordinating node is returned.

Optionally, the step of delivering the incremental data with the highest priority to the target database includes:

transmitting the incremental data with the highest priority to an executing node, wherein the executing node is positioned between the coordination node and the target database;

and processing the incremental data with the highest priority, and delivering the incremental data to a target database after the processing is completed.

According to another aspect of the present invention, there is also provided a machine-readable storage medium having stored thereon a machine-executable program which, when executed by a processor, implements the data synchronization method of any of the above.

According to yet another aspect of the present invention, there is also provided a computer device comprising a memory, a processor and a machine executable program stored on the memory and running on the processor, and the processor implementing the data synchronization method of any one of the above when executing the machine executable program.

According to the data synchronization method, firstly, one piece of incremental data is respectively obtained from each source end database, then the obtained incremental data is sent to a coordination node, the incremental data with the highest priority in the coordination node is determined, and finally the incremental data with the highest priority is delivered to a target database. That is, the invention introduces the coordination node in the many-to-one synchronization link, connects each original independent synchronization link to the coordination node at the same time, and can ensure the accuracy of data synchronization by determining the increment data with the highest priority at the coordination node and delivering the increment data to the target database with priority, thereby avoiding the failure of data synchronization.

In the data synchronization method, the executing node is arranged between the coordination node and the target database, and in the process of delivering the incremental data with the highest priority to the target database, the incremental data with the highest priority can be sent to the executing node first, then the executing node processes the incremental data, and the processed incremental data is delivered to the target database. Therefore, the processing logic complexity of the coordination node can be obviously reduced, the throughput of data is improved, and the data synchronization speed is increased.

The above, as well as additional objectives, advantages, and features of the present invention will become apparent to those skilled in the art from the following detailed description of a specific embodiment of the present invention when read in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic architecture diagram of a many-to-one data synchronization scheme in the prior art;

FIG. 2 is a schematic architecture diagram of a data synchronization scheme in accordance with one embodiment of the invention;

FIG. 3 is a flow chart of a method of data synchronization according to one embodiment of the invention;

FIG. 4 is a schematic flow chart diagram after data delivery in accordance with one embodiment of the present invention;

FIG. 5 is a schematic architecture diagram of a data synchronization scheme in accordance with another embodiment of the invention;

FIG. 6 is a schematic diagram of a machine-readable storage medium according to one embodiment of the invention;

FIG. 7 is a schematic diagram of a computer device according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a schematic architecture diagram of a many-to-one data synchronization method in the prior art, and referring to fig. 1, in a many-to-one data synchronization scenario, for example, data in source databases a and B are synchronized into target database C. In the prior art, synchronous links are respectively deployed between a target database C and source databases A and B, and the two synchronous links are independent of each other and respectively synchronize the data of the source databases A and B to the target database C. As mentioned above, the sequence of data synchronization may be required in the process of synchronizing the data of the source databases a and B to the target database C. For example, it is required to insert a piece of data in the source database a into the target database C, and then insert a piece of data in the source database B into the target database C, which may cause synchronization failure if the sequence is wrong.

Fig. 2 is a schematic architecture diagram of a data synchronization manner according to an embodiment of the present invention, referring to fig. 2, in order to solve the above problem, the present invention introduces a coordination node in a many-to-one synchronization link, and performs data synchronization based on the coordination node, so as to ensure accuracy of data synchronization.

Fig. 3 is a flowchart of a data synchronization method according to an embodiment of the present invention, and referring to fig. 3, the data synchronization method of the present invention at least includes the following steps S302 to S308.

Step S302, respectively obtaining a piece of incremental data from each source database.

Step S304, the obtained incremental data is sent to a coordination node, and the coordination node is located between the target database and each source database and is used for associating synchronous links between the target database and each source database.

Step S306, determining the incremental data with highest priority in the coordination node.

Step S308, delivering the incremental data with the highest priority to the target database.

By adopting the data synchronization method, firstly, one piece of incremental data is respectively acquired from each source database, then the acquired incremental data is sent to the coordination node, the incremental data with the highest priority in the coordination node is determined, and finally the incremental data with the highest priority is delivered to the target database. That is, the invention introduces the coordination node in the many-to-one synchronization link, connects each original independent synchronization link to the coordination node at the same time, and can ensure the accuracy of data synchronization by determining the increment data with the highest priority at the coordination node and delivering the increment data to the target database with priority, thereby avoiding the failure of data synchronization.

It can be understood that the change of adding and deleting data is recorded in the log table of the source database, and in the step S102, the step of obtaining one piece of incremental data from each source database may be to analyze the log table of each source database first, and then obtain the incremental data with the global flag from the analyzed log table.

In general, the source database records incremental data in a log table, and the global mark may be a timestamp of the current system or a change number or serial number of the database, and a database function for converting the change number or serial number into time is provided in the log table. The names of global markers in different databases are different, such as Oracle database is SCN (System Change Number, change number), SQL Server database is LSN (Log Sequence Number, serial number), etc.

In an alternative embodiment of the present invention, the global flag may be a timestamp, and the step of determining the incremental data with the highest priority in the coordinating node may be comparing the timestamp of each incremental data in the coordinating node, and then using the incremental data with the smallest timestamp as the incremental data with the highest priority. That is, when the global flag is a time stamp, the incremental data having the highest priority can be directly determined by comparing the sizes of the time stamps.

In another alternative embodiment of the present invention, the global flag may be a change number or a serial number, and the step of determining the incremental data with the highest priority in the coordination node may be to convert the change number or the serial number of each incremental data in the coordination node into a timestamp, then compare the timestamp of each incremental data in the coordination node, and then use the incremental data with the smallest timestamp as the incremental data with the highest priority. That is, when the global flag is a change number or a serial number, the change number or serial number can be converted into a time stamp by a database function, and the time stamp can be compared with the time stamp to determine the change number or serial number more easily.

Fig. 4 is a schematic flow chart after data delivery according to one embodiment of the present invention, referring to fig. 4, after delivering the incremental data with the highest priority to the target database, the following steps S402 to S408 may be further included. Further comprises:

step S402, judging whether the incremental data still exists in the source database of the delivered incremental data, if so, executing step S404, and if not, executing step S406.

Step S404, obtaining the next incremental data from the source database of the delivered incremental data, and returning to step S304, namely: and sending the acquired incremental data to the coordination node.

Step S406, a piece of virtual data is returned to the coordination node, and the priority of the virtual data is lower than the priority of the incremental data existing in the coordination node, and then step S306 is returned, namely: and determining the incremental data with highest priority in the coordination node.

It can be understood that when the incremental data still exists in the source database of the delivered incremental data, then the next piece of incremental data is continuously acquired, the acquired incremental data is sent to the coordination node, and then the incremental data with the highest priority in the coordination node is redetermined. When the incremental data does not exist in the source end database of the delivered incremental data, a piece of virtual data can be returned to the coordination node, the priority of the virtual data is lower than that of other incremental data in the coordination node, and a reference can be provided for synchronization of the other incremental data, so that data synchronization of the other source end databases is not blocked.

In the embodiment of the invention, the time stamp of the virtual data is set to be the time stamp when the coordination node is returned, that is, when the virtual data is returned to the coordination node, the event of the current system can be used as the time stamp of the virtual data, and the priority of the virtual data can be ensured to be lower than the priority of any increment data existing in the coordination node more easily.

Fig. 5 is a schematic architecture diagram of a data synchronization manner according to another embodiment of the present invention, and referring to fig. 5, an execution node is further disposed between the coordination node and the target database. In step S308 above, the step of delivering the incremental data with the highest priority to the target database may be: firstly, the incremental data with the highest priority is sent to an execution node, then the incremental data with the highest priority is processed, and the incremental data is delivered to a target database after the processing is completed. Because the coordination node possibly becomes the bottleneck of the whole system operation after being introduced, the coordination node only takes charge of the comparison of the time stamps of the incremental data, and the processing of the incremental data is completed by the execution node, so that the processing logic complexity of the coordination node can be obviously reduced, the throughput of the data is improved, and the data synchronization speed is accelerated.

The present embodiment also provides a machine-readable storage medium 10 and a computer device 20. FIG. 6 is a schematic diagram of a machine-readable storage medium 10 according to one embodiment of the invention. Fig. 7 is a schematic diagram of a computer device 20 according to one embodiment of the invention.

The machine-readable storage medium 10 has stored thereon a machine-executable program 11, which when executed by a processor, implements the data synchronization method of any of the above embodiments.

The computer device 20 may include a memory 210, a processor 220, and a machine executable program 11 stored on the memory 210 and running on the processor 220, and the processor 220 implements the data synchronization method of any of the embodiments described above when executing the machine executable program 11.

It should be noted that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any machine-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

For the purposes of this description of embodiments, a machine-readable storage medium 10 can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the machine-readable storage medium 10 may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.

The computer device 20 may be, for example, a server, a desktop computer, a notebook computer, a tablet computer, or a smartphone. In some examples, computer device 20 may be a cloud computing node. The computer device 20 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer device 50 may be implemented in a distributed cloud computing environment where remote processing devices coupled via a communications network perform tasks. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Computer device 20 may include a processor 220 adapted to execute stored instructions, a memory 210 providing temporary storage for the operation of the instructions during operation. Processor 220 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations. Memory 210 may include Random Access Memory (RAM), read only memory, flash memory, or any other suitable storage system.

Processor 220 may be connected via a system interconnect (e.g., PCI-Express, etc.) to an I/O interface (input/output interface) adapted to connect computer device 20 to one or more I/O devices (input/output devices). The I/O devices may include, for example, a keyboard and a pointing device, which may include a touch pad or touch screen, among others. The I/O device may be a built-in component of the computer device 20 or may be a device externally connected to the computing device.

The processor 220 may also be linked through a system interconnect to a display interface suitable for connecting the computer device 20 to a display device. The display device may include a display screen as a built-in component of the computer device 20. The display device may also include a computer monitor, television, projector, or the like, that is externally connected to the computer device 20. Further, a network interface controller (network interface controller, NIC) may be adapted to connect the computer device 20 to a network through a system interconnect. In some embodiments, the NIC may use any suitable interface or protocol (such as an internet small computer system interface, etc.) to transfer data. The network may be a cellular network, a radio network, a Wide Area Network (WAN), a Local Area Network (LAN), or the internet, among others. The remote device may be connected to the computing device through a network.

The flowcharts provided by this embodiment are not intended to indicate that the operations of the method are to be performed in any particular order, or that all of the operations of the method are included in all of each case. Furthermore, the method may include additional operations. Additional variations may be made to the above-described methods within the scope of the technical ideas provided by the methods of the present embodiments.

By now it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been shown and described herein in detail, many other variations or modifications of the invention consistent with the principles of the invention may be directly ascertained or inferred from the present disclosure without departing from the spirit and scope of the invention. Accordingly, the scope of the present invention should be understood and deemed to cover all such other variations or modifications.

Claims

1. A method of data synchronization, comprising:

respectively acquiring incremental data from each source database;

2. The data synchronization method according to claim 1, wherein the step of acquiring a piece of incremental data from each source database includes:

analyzing a log table of each source database;

3. The data synchronization method of claim 2, wherein the global marker is a timestamp, and the step of determining the highest priority incremental data in the coordinator node comprises:

comparing the time stamp of each increment data in the coordination node;

4. The data synchronization method according to claim 2, wherein the global flag is a change number or a serial number, and the step of determining the incremental data with the highest priority in the coordinator node includes:

comparing the time stamp of each increment data in the coordination node;

5. The data synchronization method of claim 1, wherein after the step of delivering the highest priority delta data to the target database, further comprising:

and sending the acquired incremental data to a coordination node.

6. The data synchronization method according to claim 5, wherein after the step of determining whether the delta data still exists in the source database of delivered delta data, further comprising:

7. The data synchronization method according to claim 6, wherein,

the timestamp of the virtual data is set to the timestamp of when the coordinator node was returned.

8. The data synchronization method of claim 1, wherein delivering the highest priority delta data to the target database comprises:

and processing the incremental data with the highest priority, and delivering the incremental data to the target database after the processing is completed.

9. A machine-readable storage medium having stored thereon a machine-executable program which when executed by a processor implements the data synchronization method according to any one of claims 1 to 8.

10. A computer device comprising a memory, a processor and a machine executable program stored on the memory and running on the processor, and the processor implementing the data synchronization method according to any one of claims 1 to 8 when executing the machine executable program.