CN111797158A - Data synchronization system, method and computer-readable storage medium - Google Patents

Data synchronization system, method and computer-readable storage medium Download PDF

Info

Publication number
CN111797158A
CN111797158A CN201910275139.XA CN201910275139A CN111797158A CN 111797158 A CN111797158 A CN 111797158A CN 201910275139 A CN201910275139 A CN 201910275139A CN 111797158 A CN111797158 A CN 111797158A
Authority
CN
China
Prior art keywords
module
data
data synchronization
snapshot information
sink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910275139.XA
Other languages
Chinese (zh)
Other versions
CN111797158B (en
Inventor
潘月鹏
于汝国
乔超
刘彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN201910275139.XA priority Critical patent/CN111797158B/en
Publication of CN111797158A publication Critical patent/CN111797158A/en
Application granted granted Critical
Publication of CN111797158B publication Critical patent/CN111797158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronization system, a data synchronization method and a computer readable storage medium, and relates to the field of data processing. The data synchronization system includes: the source module is configured to acquire data from a source database one by one, preprocess the read data, and send part or all of the preprocessed data to the input buffer module; the analysis module is configured to acquire and analyze the data from the input buffer module one by one and send the analyzed data to the output buffer module; and the sink module is configured to acquire the data from the output buffer module one by one and write the acquired data into the target database. The data synchronization system of the embodiment of the invention improves the efficiency of data synchronization, and can be applied to application scenes with higher real-time performance and higher throughput.

Description

Data synchronization system, method and computer-readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data synchronization system, method, and computer-readable storage medium.
Background
Subscription and consumption of real-time binary logs (binlogs) is used to handle and coordinate data synchronization for various rooms, where large amounts of data computation and storage are involved. Table data synchronization between databases is a high frequency and general requirement in this process. In the related art, a database binlog log synchronization system constructed based on micro-batch processing is generally adopted, and a micro-batch processing computing mechanism is used, so that functions provided by the system can realize the calculation and synchronization of the database logs in a quasi-real-time minute level.
Disclosure of Invention
After analysis, the inventor finds that when the database binlog log synchronization system constructed based on micro batch processing is applied to an application environment with higher real-time requirement and higher throughput requirement, the efficiency of data synchronization is lower.
The embodiment of the invention aims to solve the technical problem that: how to improve the data synchronization efficiency of the data synchronization system.
According to a first aspect of some embodiments of the present invention there is provided a data synchronization system comprising: the source module is configured to acquire data from a source database one by one, preprocess the read data, and send part or all of the preprocessed data to the input buffer module; the analysis module is configured to acquire and analyze the data from the input buffer module one by one and send the analyzed data to the output buffer module; and the sink module is configured to acquire the data from the output buffer module one by one and write the acquired data into the target database.
In some embodiments, the data synchronization system further comprises: and the parallelism configuration module is configured to adjust the number of at least one module of the source module, the analysis module and the sink module according to the acquired parallelism.
In some embodiments, the data synchronization system further comprises: and the load analysis module is configured to determine the parallelism of at least one of the source module, the analysis module and the sink module according to the load information of the data synchronization system, so that the parallelism configuration module adjusts the number of the corresponding modules.
In some embodiments, the input buffer module includes a plurality of circular queues; the load analysis module is further configured to increase the parallelism of the resolution module in case the free position of the circular queue of the input buffer module is smaller than a preset value.
In some embodiments, the data synchronization system further comprises: the input buffer module is configured to press data sent by the source module and pop up data acquired by the analysis module; and the output buffer module is configured to press the data sent by the analysis module and eject the data acquired by the sink module.
In some embodiments, the input buffer module includes an input agent and a plurality of ring queues coupled to the input agent; the output buffer module includes an output agent and a plurality of ring queues coupled to the output agent.
In some embodiments, the data synchronization system further comprises: the analysis module is further configured to analyze the acquired data according to the metadata stored in the metadata cache, and when the acquired data is judged to be the database schema definition language (DDL) operation data, initiate a metadata update request to the source database so as to update the metadata in the metadata cache.
In some embodiments, the data synchronization system further comprises: and the site recording module is configured to receive snapshot information sent by the source module, the analysis module and the sink module, wherein the snapshot information comprises a sequence number of data currently read by the module sending the snapshot information, and a minimum sequence number in the latest snapshot information sent by each module is determined as an effective site.
In some embodiments, the source module is further configured to assign a serial number to each piece of read data, generate snapshot information of the source module according to the currently read data, and send the snapshot information of the source module to the parsing module and the site recording module; the analysis module is further configured to generate snapshot information of the analysis module according to the snapshot information of the source module and the currently read data, and send the snapshot information of the analysis module to the sink module and the site recording module; the sink module is further configured to generate snapshot information of the sink module according to the snapshot information of the parsing module and the currently read data, and send the snapshot information of the sink module to the site recording module.
According to a second aspect of some embodiments of the present invention, there is provided a data synchronization method, including: the source module acquires data from a source database one by one, preprocesses the read data, and sends part or all of the preprocessed data to the input buffer module; the analysis module acquires and analyzes the data from the input buffer module one by one and sends the analyzed data to the output buffer module; the sink module acquires data from the output buffer module one by one and writes the acquired data into a target database.
In some embodiments, the data synchronization method further comprises: and adjusting the number of at least one module of the source module, the analysis module and the sink module according to the acquired parallelism.
In some embodiments, the data synchronization method further comprises: and determining the parallelism of at least one of the source module, the analysis module and the sink module according to the load information of the data synchronization system.
In some embodiments, the input buffer module includes a plurality of circular queues; determining the parallelism of at least one of the source module, the parsing module and the sink module according to the load information of the data synchronization system comprises: and under the condition that the vacant position of the annular queue of the input buffer module is smaller than a preset value, the parallelism of the analysis module is increased.
In some embodiments, the parsing module parses the acquired data according to metadata stored in the metadata cache; the data synchronization method further comprises: and when the acquired data is judged to be the DDL operation data, initiating a metadata updating request to the source database so as to update the metadata in the metadata cache.
In some embodiments, the data synchronization method further comprises: receiving snapshot information sent by a source module, an analysis module and a sink module, wherein the snapshot information comprises a serial number of data currently read by the module sending the snapshot information; and determining the minimum sequence number in the latest snapshot information sent by each module as a valid position.
In some embodiments, the data synchronization method further comprises: the source module distributes a serial number for each piece of read data; the source module generates snapshot information of the source module according to the currently read data; the source module sends the snapshot information of the source module to the analysis module and the site recording module; the analysis module generates snapshot information of the analysis module according to the snapshot information of the source module and the currently read data; the analysis module sends the snapshot information of the analysis module to the sink module and the site recording module; the sink module generates snapshot information of the sink module according to the snapshot information of the analysis module and the currently read data; the sink module sends the snapshot information of the sink module to the site recording module.
According to a third aspect of some embodiments of the present invention, there is provided a data synchronization system comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing data synchronization methods based on instructions stored in the memory.
According to a fourth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements any one of the aforementioned data synchronization methods.
Some embodiments of the above invention have the following advantages or benefits: the embodiment of the invention can perform decoupling processing on the three stages of reading data, analyzing data and storing data, so that the source module, the analyzing module and the sink module can independently adjust the resource configuration and the processing progress. Therefore, the data synchronization system of the embodiment of the invention improves the efficiency of data synchronization and can be applied to application scenes with higher real-time performance and higher throughput.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a block diagram of a data synchronization system according to some embodiments of the invention.
FIG. 2 is a block diagram of a data synchronization system according to further embodiments of the present invention.
FIG. 3 is a block diagram of an input agent and an output agent according to some embodiments of the invention.
FIG. 4 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention.
FIG. 5 is a block diagram of a data synchronization system according to further embodiments of the present invention.
FIG. 6 is a block diagram of a data synchronization system according to some embodiments of the invention.
FIG. 7 is a flow diagram illustrating a method for data synchronization according to some embodiments of the invention.
Fig. 8 is a flow chart illustrating a parallelism adjustment method according to some embodiments of the invention.
FIG. 9 is a schematic flow diagram of a metadata update and data parsing method according to some embodiments of the invention.
Fig. 10 is a schematic flow diagram of a method of site registration according to some embodiments of the invention.
FIG. 11 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention.
FIG. 12 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
After further analysis, the inventor finds that the data synchronization system in the related art cannot be applied to an application environment with higher real-time requirement and higher throughput requirement, because a batch of data is used as a unit for synchronization processing. However, although the computational resources and computational efficiencies required for the different stages of processing are different during synchronization, the different stages of processing are tightly coupled. Therefore, the inventor thinks that the synchronization process can be divided into three stages of reading, parsing and storing, and the reading, parsing and storing are subjected to a decoupling process. An embodiment of the data synchronization system of the present invention is described below with reference to fig. 1.
FIG. 1 is a block diagram of a data synchronization system according to some embodiments of the invention. As shown in fig. 1, the data synchronization system 10 of this embodiment includes:
the source module 110 is configured to acquire data from a source database one by one, preprocess the read data, and send part or all of the preprocessed data to the input buffer module;
the analysis module 120 is configured to acquire and analyze the data from the input buffer module one by one, and send the analyzed data to the output buffer module;
and the sink module 130 is configured to acquire the data from the output buffer module one by one and write the acquired data into the target database.
In some embodiments, after the preprocessing, the source module 110 may determine whether the acquired data is data required for synchronization according to a preset rule, and send the required data to the input buffer module, and discard the unnecessary data.
Embodiments of the present invention may be used for synchronization of binlog logs.
The data synchronization system of the above embodiment adopts a data synchronization method based on data stream, that is, each module reads data one by one. Moreover, the data preprocessed by the source module is not directly sent to the analysis module, but is temporarily stored in the input buffer module; the data analyzed by the analysis module is not directly sent to the sink module, but is temporarily stored in the output buffer module. In the mode, decoupling processing can be performed on three stages of reading data, analyzing data and storing data, so that the source module, the analyzing module and the sink module can independently adjust resource configuration and processing progress. Therefore, the data synchronization system of the embodiment of the invention improves the efficiency of data synchronization and can be applied to application scenes with higher real-time performance and higher throughput.
The data synchronization system of the embodiment of the invention can also comprise an input buffer module and an output buffer module. An embodiment of the data synchronization system of the present invention is described below with reference to fig. 2.
FIG. 2 is a block diagram of a data synchronization system according to further embodiments of the present invention. As shown in fig. 2, the data synchronization system 20 of this embodiment includes a source module 210, a parsing module 220, and a sink module 230. In addition, an input buffer module 240 and an output buffer module 250 are included. The input buffer module 240 is connected to the source module 210 and the parsing module 220, and is configured to push data sent by the source module 210 and pop data obtained by the parsing module 220; the output buffer module 250 is connected to the parsing module 220 and the sink module 230, and is configured to push data sent by the parsing module 220 and pop data obtained by the sink module 230.
Through the data synchronization system of the embodiment, the data can be temporarily stored by using the input buffer module and the output buffer module, so that the subsequent modules read the data from the input buffer module and the output buffer module according to the processing progress of the subsequent modules, and the decoupling among the source module, the analysis module and the sink module is realized.
In some embodiments, the input buffer module and the output buffer module may include a plurality of ring queues (ringbuffers). The data in the ring queue can be loaded into the cache, so that the read-write speed is higher, and the memory can be allocated in advance, thereby improving the efficiency of data synchronization.
In some embodiments, the input buffer module and the output buffer module may further include an input agent and an output agent, respectively. An embodiment of an ingress agent and an egress agent of the present invention is described below with reference to fig. 3.
FIG. 3 is a block diagram of an input agent and an output agent according to some embodiments of the invention. As shown in fig. 3, the input buffer module 340 includes an input agent 3401 and a plurality of ring queues 3402 connected with the input agent 3401; the output buffer module 350 includes an output agent 3501 and a plurality of ring queues 3502 connected to the output agent 3501. Input agent 3401 and output agent 3501 may be implemented based on a channel agent (ChannelProxy). When the input agent 3401 and the output agent 3501 perform input and output (I/O) on data, the corresponding ring queues are operated through a mechanism of Channelproxy. Based on the technical characteristics of the ring queue, backpressure processing among all components can be completed, and system overhead caused by traditional multi-thread programming is avoided, so that the data processing speed is increased.
Because the source module, the analysis module and the sink module realize decoupling, the parallelism of each module can be independently adjusted. An embodiment of the data synchronization system of the present invention is described below with reference to fig. 4.
FIG. 4 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention. As shown in fig. 4, the data synchronization system 40 of this embodiment includes a source module 410, a parsing module 420, a sink module 430, and a parallelism configuration module 460. A parallelism configuration module 460 configured to adjust the number of at least one of the source module 410, the resolution module 420, and the sink module 430 according to the obtained parallelism. Therefore, the parallelism of each module can be flexibly adjusted according to the requirement.
In some embodiments, the data synchronization system 40 may further include a load analysis module 470 configured to determine a parallelism of at least one of the source module 410, the resolution module 420, and the sink module 430 according to load information of the data synchronization system, so that the parallelism configuration module 460 adjusts the number of the corresponding modules.
The load analysis module 470 may determine the parallelism based on the collected operating conditions of the respective modules. In some embodiments, the load analysis module 470 may be further configured to increase the parallelism of the resolution module 420 if the free position of the circular queue of the input buffer module is less than a preset value. For example, the load analysis module 470 may determine the free position of the ring queue according to the amount of data in the input channel (InputChannel) and the output channel (OutputChannel) of the free position of the ring queue.
In some embodiments, the parallelism of the source module may be set constantly to 1, while the parallelism of the resolution module and the sink module may be set adjustable. Therefore, the steps of reading data by the source module are relatively stable, and the parallelism of the analysis module and the sink module which bear more time-consuming operations or calculation-intensive operations can be adjusted in time according to the load condition.
When a Data Definition Language (DDL) operation is performed on a Data table in a database, for example, operations such as modifying the name of the table, deleting a column, and adding a field of the table are performed, a problem of abnormal Data analysis may occur during a Data synchronization process. To address this, in some embodiments, the parsing module may be further configured to parse the acquired data according to the metadata stored in the metadata cache, and when it is determined that the acquired data is DDL operation data, initiate a metadata update request to the source database so as to update the metadata in the metadata cache. An embodiment of the data synchronization system of the present invention is described below with reference to fig. 5.
FIG. 5 is a block diagram of a data synchronization system according to further embodiments of the present invention. As shown in fig. 5, the data synchronization system 50 of this embodiment includes a source module 510, a parsing module 520, a sink module 530, and a metadata cache 5801. The metadata cache 5801 is configured to store metadata for data synchronized by the data synchronization system. When determining that the obtained data is DDL operation data, the parsing module 520 may initiate a metadata update request to the source database through the metadata update module 5802, for example. The metadata update module 5802 may update the content in the metadata cache 5801 after obtaining the updated metadata. By the method of the embodiment, the stored metadata can be updated in time after the DDL operation occurs, so that the analysis module 520 can analyze the data normally in time, the probability of occurrence of abnormal conditions is reduced, and the accuracy of data synchronization is improved.
Some embodiments of the invention can also accurately record the data site, so that the processing progress before the restart can be accurately recovered after the system is restarted. An embodiment of the data synchronization system of the present invention is described below with reference to fig. 6.
FIG. 6 is a block diagram of a data synchronization system according to some embodiments of the invention. As shown in fig. 6, the data synchronization system 60 of this embodiment includes a source module 610, a parsing module 620, a sink module 630, and a location record module 690, and is configured to receive snapshot information sent by the source module, the parsing module, and the sink module, where the snapshot information includes a sequence number of data currently read by a module sending the snapshot information, and determine a minimum sequence number in latest snapshot information sent by each module as a valid location.
For example, ten pieces of data with serial numbers of 1 to 10 are processed by the data synchronization system according to the embodiment of the present invention. The serial number of the data read by the parsing module 620 is 9, and the serial number of the data read by the sink module is 5. At this time, the data corresponding to the sequence number 5 should be determined as the data corresponding to the valid bit point.
In some embodiments, the snapshot information may further include a file name of a file to which the data belongs and location information in the file name to facilitate indexing of the data.
In some embodiments, the source module 610 is further configured to assign a serial number to each piece of read data, generate snapshot information of the source module according to the currently read data, and send the snapshot information of the source module to the parsing module 620 and the site recording module 690; the parsing module 620 is further configured to generate snapshot information of the parsing module according to the snapshot information of the source module and the currently read data, and send the snapshot information of the parsing module to the sink module 630 and the site recording module 690; the sink module 630 is further configured to generate snapshot information of the sink module according to the snapshot information of the parsing module and the currently read data, and send the snapshot information of the sink module to the site recording module 690.
In some embodiments, the source module 610, the parsing module 620, and the sink module 630 may send the snapshot information to the site logging module 690 via a send heartbeat mechanism.
In some embodiments, the sink module 630 may determine whether a condition for reporting snapshot information is currently satisfied, and send the snapshot information of the sink module to the site recording module 690 when the condition is satisfied. For example, the condition for reporting the snapshot information may be that the time since the last reporting has reached a preset timeout time, the amount of data processed since the last reporting has reached a preset value, and the like.
With the above embodiments, the source module may maintain a sequence number for each piece of data, broadcasting the necessary snapshot information to downstream modules. The downstream module will use the received upstream information and periodically broadcast the necessary snapshot information downstream. The sink component reports the effective position of the data processed by the sink component, and the effective position data is summarized by the position recording module to determine the effective position data and asynchronously persist the position data. When the system is restarted or actively restarted due to an exception, the effective locus in the locus recording module can be used as reference data of the data consumption position. Therefore, the accuracy of the locus recording is improved, the data synchronization efficiency is improved, and the possibility of abnormal conditions is reduced.
An embodiment of the data synchronization method of the present invention is described below with reference to fig. 7.
FIG. 7 is a flow diagram illustrating a method for data synchronization according to some embodiments of the invention. As shown in fig. 7, the data synchronization method of this embodiment includes steps S702 to S706.
In step S702, the source module obtains data from the source database one by one, pre-processes the read data, and sends part or all of the pre-processed data to the input buffer module.
In step S704, the parsing module obtains and parses the data from the input buffer module one by one, and sends the parsed data to the output buffer module.
In step S706, the sink module acquires data from the output buffer module one by one, and writes the acquired data into the target database.
By the method of the embodiment, decoupling processing can be performed on three stages of reading data, analyzing data and storing data, so that the source module, the analyzing module and the sink module can independently adjust resource configuration and processing progress. Therefore, the data synchronization system of the embodiment of the invention improves the efficiency of data synchronization and can be applied to application scenes with higher real-time performance and higher throughput.
An embodiment of the parallelism adjusting method of the present invention is described below with reference to fig. 8.
Fig. 8 is a flow chart illustrating a parallelism adjustment method according to some embodiments of the invention. As shown in fig. 8, the parallelism adjusting method of this embodiment includes steps S802 to S804.
In step S802, the parallelism of at least one of the source module, the parsing module, and the sink module is determined according to the load information of the data synchronization system. Other ways of determining parallelism may also be used by those skilled in the art, as desired.
In some embodiments, the input buffer module includes a plurality of circular queues. And under the condition that the vacant position of the annular queue of the input buffer module is smaller than a preset value, the parallelism of the analysis module is increased.
In step S804, the number of at least one of the source module, the parsing module, and the sink module is adjusted according to the obtained parallelism.
Therefore, the parallelism of each module can be flexibly adjusted according to the requirement.
An embodiment of the method for parsing by the metadata update and parsing module of the present invention is described below with reference to FIG. 9.
FIG. 9 is a schematic flow diagram of a metadata update and data parsing method according to some embodiments of the invention. As shown in fig. 9, the analysis method of this embodiment includes steps S902 to S904.
In step S902, when it is determined that the acquired data is DDL operation data, a metadata update request is initiated to the source database, so as to update the metadata in the metadata cache.
In step S904, the parsing module parses the acquired data according to the metadata stored in the metadata cache.
By the method of the embodiment, the stored metadata can be updated in time after the DDL operation occurs, so that the analysis module can analyze the data normally in time, the probability of abnormal conditions is reduced, and the accuracy of data synchronization is improved.
An embodiment of the site recording method of the present invention is described below with reference to fig. 10.
Fig. 10 is a schematic flow diagram of a method of site registration according to some embodiments of the invention. As shown in fig. 10, the site recording method of this embodiment includes steps S1002 to S1018.
In step S1002, the source module assigns a sequence number to each piece of data read.
In step S1004, the source module generates snapshot information of the source module according to the currently read data.
In step S1006, the source module sends snapshot information of the source module to the parsing module and the location recording module.
In step S1008, the parsing module generates snapshot information of the parsing module according to the snapshot information of the source module and the currently read data.
In step S1010, the parsing module sends the snapshot information of the parsing module to the sink module and the site recording module.
In step S1012, the sink module generates snapshot information of the sink module according to the snapshot information of the parsing module and the currently read data.
In step S1014, the sink module sends the snapshot information of the sink module to the site recording module.
As required, those skilled in the art may also obtain the data currently read by each module in other manners, which is not described herein again.
In step S1016, snapshot information sent by the source module, the parsing module, and the sink module is received, where the snapshot information includes a serial number of data currently read by the module sending the snapshot information.
In step S1018, the minimum sequence number in the latest snapshot information sent by each module is determined as the valid location.
Therefore, the accuracy of the locus recording is improved, the data synchronization efficiency is improved, and the possibility of abnormal conditions is reduced.
FIG. 11 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention. As shown in fig. 11, the data synchronization system 1100 of this embodiment includes: a memory 1110 and a processor 1120 coupled to the memory 1110, wherein the processor 1120 is configured to execute a data synchronization method according to any of the embodiments based on instructions stored in the memory 1110.
Memory 1110 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
FIG. 12 is a block diagram of a data synchronization system in accordance with further embodiments of the present invention. As shown in fig. 12, the data synchronization system 1200 of this embodiment includes: the memory 1210 and the processor 1220 may further include an input/output interface 1230, a network interface 1240, a storage interface 1250, and the like. These interfaces 1230, 1240, 1250, as well as the memory 1210 and the processor 1220, may be connected via a bus 1260, for example. The input/output interface 1230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 1240 provides a connection interface for a variety of networking devices. The storage interface 1250 provides a connection interface for external storage devices such as an SD card and a usb disk.
An embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, wherein the computer program is configured to implement any one of the aforementioned data synchronization methods when executed by a processor.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (18)

1. A data synchronization system, comprising:
the source module is configured to acquire data from a source database one by one, preprocess the read data, and send part or all of the preprocessed data to the input buffer module;
the analysis module is configured to acquire and analyze the data from the input buffer module one by one and send the analyzed data to the output buffer module;
and the sink module is configured to acquire the data from the output buffer module one by one and write the acquired data into the target database.
2. The data synchronization system of claim 1, further comprising:
and the parallelism configuration module is configured to adjust the number of at least one module of the source module, the analysis module and the sink module according to the acquired parallelism.
3. The data synchronization system of claim 2, further comprising:
and the load analysis module is configured to determine the parallelism of at least one of the source module, the analysis module and the sink module according to the load information of the data synchronization system, so that the parallelism configuration module adjusts the number of the corresponding modules.
4. The data synchronization system of claim 3, wherein the input buffer module comprises a plurality of circular queues;
the load analysis module is further configured to increase the parallelism of the resolution module in case the free position of the circular queue of the input buffer module is smaller than a preset value.
5. The data synchronization system of claim 1, further comprising:
the input buffer module is configured to press data sent by the source module and pop up data acquired by the analysis module;
and the output buffer module is configured to press the data sent by the analysis module and eject the data acquired by the sink module.
6. The data synchronization system of claim 5,
the input buffer module comprises an input agent and a plurality of ring queues connected with the input agent;
the output buffer module includes an output agent and a plurality of ring queues coupled to the output agent.
7. The data synchronization system of claim 1, further comprising:
the analysis module is further configured to analyze the acquired data according to the metadata stored in the metadata cache, and when the acquired data is judged to be the database schema definition language (DDL) operation data, initiate a metadata update request to the source database so as to update the metadata in the metadata cache.
8. The data synchronization system of claim 1, further comprising:
the system comprises a site recording module and a sink module, wherein the site recording module is configured to receive snapshot information sent by the source module, the analysis module and the sink module, the snapshot information comprises a sequence number of data currently read by a module sending the snapshot information, and the minimum sequence number in the latest snapshot information sent by each module is determined as an effective site.
9. The data synchronization system of claim 8,
the source module is further configured to assign a serial number to each piece of read data, generate snapshot information of the source module according to the currently read data, and send the snapshot information of the source module to the parsing module and the site recording module;
the analysis module is further configured to generate snapshot information of the analysis module according to the snapshot information of the source module and the currently read data, and send the snapshot information of the analysis module to the sink module and the site recording module;
the sink module is further configured to generate snapshot information of the sink module according to the snapshot information of the parsing module and the currently read data, and send the snapshot information of the sink module to the site recording module.
10. A method of data synchronization, comprising:
the source module acquires data from a source database one by one, preprocesses the read data, and sends part or all of the preprocessed data to the input buffer module;
the analysis module acquires and analyzes the data from the input buffer module one by one and sends the analyzed data to the output buffer module;
the sink module acquires data from the output buffer module one by one and writes the acquired data into a target database.
11. The data synchronization method of claim 10, further comprising:
and adjusting the number of at least one module of the source module, the analysis module and the sink module according to the acquired parallelism.
12. The data synchronization method of claim 11, further comprising:
and determining the parallelism of at least one of the source module, the analysis module and the sink module according to the load information of the data synchronization system.
13. The data synchronization method of claim 12, wherein the input buffer module comprises a plurality of circular queues;
the determining the parallelism of at least one of the source module, the parsing module and the sink module according to the load information of the data synchronization system includes:
and under the condition that the vacant position of the annular queue of the input buffer module is smaller than a preset value, the parallelism of the analysis module is increased.
14. The data synchronization method according to claim 10, wherein the parsing module parses the acquired data according to metadata stored in the metadata cache;
the data synchronization method further comprises:
and when the acquired data is judged to be the DDL operation data, initiating a metadata updating request to the source database so as to update the metadata in the metadata cache.
15. The data synchronization method of claim 10, further comprising:
receiving snapshot information sent by a source module, an analysis module and a sink module, wherein the snapshot information comprises a serial number of data currently read by the module sending the snapshot information;
and determining the minimum sequence number in the latest snapshot information sent by each module as a valid position.
16. The data synchronization method of claim 15, further comprising:
the source module distributes a serial number for each piece of read data;
the source module generates snapshot information of the source module according to the currently read data;
the source module sends the snapshot information of the source module to an analysis module and a site recording module;
the analysis module generates snapshot information of the analysis module according to the snapshot information of the source module and the currently read data;
the analysis module sends the snapshot information of the analysis module to a sink module and a site recording module;
the sink module generates snapshot information of the sink module according to the snapshot information of the analysis module and the currently read data;
and the sink module sends the snapshot information of the sink module to the site recording module.
17. A data synchronization system, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the data synchronization method of any of claims 1-9 based on instructions stored in the memory.
18. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the data synchronization method of any one of claims 1 to 9.
CN201910275139.XA 2019-04-08 2019-04-08 Data synchronization system, method and computer readable storage medium Active CN111797158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910275139.XA CN111797158B (en) 2019-04-08 2019-04-08 Data synchronization system, method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910275139.XA CN111797158B (en) 2019-04-08 2019-04-08 Data synchronization system, method and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111797158A true CN111797158A (en) 2020-10-20
CN111797158B CN111797158B (en) 2024-04-05

Family

ID=72805680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910275139.XA Active CN111797158B (en) 2019-04-08 2019-04-08 Data synchronization system, method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111797158B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980475A (en) * 2023-07-31 2023-10-31 深圳市亲邻科技有限公司 Data pushing system based on binlog and double annular buffer areas

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004062473A (en) * 2002-07-29 2004-02-26 Hitachi Ltd Database management method and system
KR20040017008A (en) * 2002-08-20 2004-02-26 주식회사 케이랩 System and method for offering information using a search engine
CN102222071A (en) * 2010-04-16 2011-10-19 华为技术有限公司 Method, device and system for data synchronous processing
CN104239572A (en) * 2014-09-30 2014-12-24 普元信息技术股份有限公司 System and method for achieving metadata analysis based on distributed cache
CN106557492A (en) * 2015-09-25 2017-04-05 阿里巴巴集团控股有限公司 A kind of method of data synchronization and device
CN107317838A (en) * 2017-05-24 2017-11-03 重庆邮电大学 A kind of astronomical metadata archiving method and system based on stream data processing framework
CN107357526A (en) * 2017-07-03 2017-11-17 北京京东尚科信息技术有限公司 For the method and apparatus of network data, server and storage medium
CN107423303A (en) * 2016-05-24 2017-12-01 北京京东尚科信息技术有限公司 The method and system of data syn-chronization
CN107844506A (en) * 2016-09-21 2018-03-27 阿里巴巴集团控股有限公司 A kind of method and device for realizing database and the data syn-chronization of caching
CN107918621A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 Daily record data processing method, device and operation system
CN108694199A (en) * 2017-04-10 2018-10-23 北京京东尚科信息技术有限公司 Data synchronization unit, method, storage medium and electronic equipment
CN109145060A (en) * 2018-07-20 2019-01-04 腾讯科技(深圳)有限公司 Data processing method and device
CN109359139A (en) * 2018-10-24 2019-02-19 拉扎斯网络科技(上海)有限公司 Method of data synchronization, system, electronic equipment and computer readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004062473A (en) * 2002-07-29 2004-02-26 Hitachi Ltd Database management method and system
KR20040017008A (en) * 2002-08-20 2004-02-26 주식회사 케이랩 System and method for offering information using a search engine
CN102222071A (en) * 2010-04-16 2011-10-19 华为技术有限公司 Method, device and system for data synchronous processing
CN104239572A (en) * 2014-09-30 2014-12-24 普元信息技术股份有限公司 System and method for achieving metadata analysis based on distributed cache
CN106557492A (en) * 2015-09-25 2017-04-05 阿里巴巴集团控股有限公司 A kind of method of data synchronization and device
CN107423303A (en) * 2016-05-24 2017-12-01 北京京东尚科信息技术有限公司 The method and system of data syn-chronization
CN107844506A (en) * 2016-09-21 2018-03-27 阿里巴巴集团控股有限公司 A kind of method and device for realizing database and the data syn-chronization of caching
CN107918621A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 Daily record data processing method, device and operation system
CN108694199A (en) * 2017-04-10 2018-10-23 北京京东尚科信息技术有限公司 Data synchronization unit, method, storage medium and electronic equipment
CN107317838A (en) * 2017-05-24 2017-11-03 重庆邮电大学 A kind of astronomical metadata archiving method and system based on stream data processing framework
CN107357526A (en) * 2017-07-03 2017-11-17 北京京东尚科信息技术有限公司 For the method and apparatus of network data, server and storage medium
CN109145060A (en) * 2018-07-20 2019-01-04 腾讯科技(深圳)有限公司 Data processing method and device
CN109359139A (en) * 2018-10-24 2019-02-19 拉扎斯网络科技(上海)有限公司 Method of data synchronization, system, electronic equipment and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XU ZI-JIAN 等: "Data synchronization tool for distributed heterogeneous database", JOURNAL OF SOFTWARE *
张平, 赵荣彩, 李清宝: "基于相关性的同步优化算法", 计算机工程, no. 17 *
黄晓微;陈玲;魏玮;徐世莲;: "基于快照日志分析的数据同步方法", 后勤工程学院学报, no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980475A (en) * 2023-07-31 2023-10-31 深圳市亲邻科技有限公司 Data pushing system based on binlog and double annular buffer areas
CN116980475B (en) * 2023-07-31 2024-06-04 深圳市亲邻科技有限公司 Data pushing system based on binlog and double annular buffer areas

Also Published As

Publication number Publication date
CN111797158B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
US20180357111A1 (en) Data center operation
CN110704484A (en) Method and system for processing mass real-time data stream
CN107153527B (en) Parallel radar data processing method based on message queue
US10102098B2 (en) Method and system for recommending application parameter setting and system specification setting in distributed computation
CN110659278A (en) Graph data distributed processing system based on CPU-GPU heterogeneous architecture
CN109359060B (en) Data extraction method, device, computing equipment and computer storage medium
CN112434003B (en) SQL optimization method and device, computer equipment and storage medium
CN111679860B (en) Distributed information processing method and device
CN112115105A (en) Service processing method, device and equipment
CN111797158B (en) Data synchronization system, method and computer readable storage medium
CN108334532B (en) Spark-based Eclat parallelization method, system and device
CN107844490B (en) Database dividing method and device
US9830374B2 (en) Evaluating reference based operations in shared nothing parallelism systems
CN110909085A (en) Data processing method, device, equipment and storage medium
WO2022253131A1 (en) Data parsing method and apparatus, computer device, and storage medium
WO2023051035A1 (en) Data transmission method and apparatus for robot, electronic device, and storage medium
CN113590322A (en) Data processing method and device
CN111104527B (en) Rich media file analysis method
CN113608724A (en) Offline warehouse real-time interaction method and system based on model cache
CN111858616A (en) Streaming data storage method and device
WO2016053083A1 (en) System for processing multiple queries using gpu
CN112783980A (en) Data synchronization processing method and device, electronic equipment and computer readable medium
CN111597201A (en) Content rapid compression method based on Greenplus large-scale parallel processing database
CN111061712A (en) Data connection operation processing method and device
CN114817311B (en) Parallel computing method applied to GaussDB database storage process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant