CN113342898B - Data synchronization method and device - Google Patents

Data synchronization method and device Download PDF

Info

Publication number
CN113342898B
CN113342898B CN202110729604.XA CN202110729604A CN113342898B CN 113342898 B CN113342898 B CN 113342898B CN 202110729604 A CN202110729604 A CN 202110729604A CN 113342898 B CN113342898 B CN 113342898B
Authority
CN
China
Prior art keywords
data synchronization
task
target data
target
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110729604.XA
Other languages
Chinese (zh)
Other versions
CN113342898A (en
Inventor
林鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202110729604.XA priority Critical patent/CN113342898B/en
Publication of CN113342898A publication Critical patent/CN113342898A/en
Application granted granted Critical
Publication of CN113342898B publication Critical patent/CN113342898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Abstract

The application provides a data synchronization method and a device, and the method can comprise the following steps: acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that newly added data exists in a target message queue theme in the message queue; determining a target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task; and under the condition that the target data synchronization task belongs to a target data synchronization thread, executing the target data synchronization task through the target data synchronization thread to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task. According to the technical scheme, a plurality of data synchronization tasks can be executed through one data synchronization thread, the utilization rate of the data synchronization thread in the data synchronization tool is improved, and thread resources are saved.

Description

Data synchronization method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data synchronization method and apparatus.
Background
In order to implement real-time data synchronization between databases, data in a source database is generally required to be sent to a message queue for caching, and a data synchronization tool acquires corresponding data from the message queue and loads the corresponding data to a corresponding destination data table.
In the related art, corresponding to each message queue topic in the message queue, the data synchronization tool needs to set a corresponding data synchronization thread, and only one data synchronization thread can extract data in one message queue topic. Under the condition that the data to be synchronized is not continuous data flow, the utilization rate of the data synchronization thread is low, and the thread resource waste is serious.
Disclosure of Invention
In view of this, the present application provides a data synchronization method and apparatus, which are used to implement data synchronization between two data ends.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of the present application, a data synchronization method is provided, which is applied to a data synchronization tool running with one or more data synchronization threads, where the data synchronization threads are configured with data synchronization tasks, input configuration information of the data synchronization tasks includes information of corresponding message queue topics, and output configuration information includes information of corresponding destination data tables, and includes:
acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that newly added data exists in a target message queue theme in the message queue;
determining a target data synchronization task matched with the target message queue subject according to the input configuration information of each data synchronization task;
and under the condition that the target data synchronization task belongs to a target data synchronization thread, executing the target data synchronization task through the target data synchronization thread to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
According to a second aspect of the present application, a data synchronization apparatus is provided, which is applied to a data synchronization tool running with one or more data synchronization threads, where the data synchronization threads are configured with data synchronization tasks, input configuration information of the data synchronization tasks includes information of corresponding message queue topics, and output configuration information includes information of corresponding destination data tables, and includes:
the message acquiring unit is used for acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that newly added data exist in a target message queue theme in the message queue;
the task determining unit is used for determining a target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task;
and the task execution unit is used for executing the target data synchronization task through the target data synchronization thread under the condition that the target data synchronization task belongs to the target data synchronization thread so as to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
According to a third aspect of the present application, there is provided an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method as described in the embodiments of the first aspect above by executing the executable instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method as described in the embodiments of the first aspect above.
According to the technical scheme, the configuration information of the data synchronization tasks is preset in the data synchronization thread operated by the data synchronization tool, so that a plurality of data synchronization tasks can be executed in one data synchronization thread, newly added data of different message queue themes can be synchronized to the corresponding target data table, the utilization rate of the data synchronization thread in the data synchronization tool is improved, and thread resources are saved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of a network architecture of a data synchronization system to which embodiments of the present application are applied;
FIG. 2 is a flow chart illustrating a method of data synchronization according to an exemplary embodiment of the present application;
FIG. 3 is a multi-party interaction flow diagram illustrating a method of data synchronization in accordance with an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a data synchronization electronic device shown in accordance with an exemplary embodiment of the present application;
fig. 5 is a block diagram illustrating a data synchronization apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.
Next, examples of the present application will be described in detail.
The application provides a data synchronization method for synchronizing data from a source end to a destination end, wherein the destination end can be a database, and the source end can be a database and/or an application program. Referring to fig. 1, a schematic diagram of a network architecture for data synchronization between databases is shown, where the databases in the source database group 101 and the destination database group 104 may be the same type of database or different types of databases. The database type may be sqlserver, mysql, oracle, SQLite, access, and the like, which is not limited in the present application. As shown in fig. 1, a database in the source database group 101 sends data to a message queue 102 for caching, a data synchronization tool 103 extracts new data from the message queue 102 according to a message queue theme, and after performing operations such as data format conversion and data cleaning on the extracted new data, synchronizes the processed new data to a corresponding target data table in the destination database group 104.
In the related art, extracting new data from a message queue topic and synchronizing the new data to a corresponding destination data table can be regarded as a data synchronization task, and one data synchronization task corresponds to one data synchronization thread. A plurality of data synchronization threads running in the data synchronization tool not only occupy a large amount of CPU resources, but also cause a large amount of resource waste because the data synchronization is usually carried out between the source end and the destination end only sporadically, and the idle rate of the data synchronization threads is higher, the utilization rate is lower.
To solve the above problem, fig. 2 is a flowchart illustrating a data synchronization method according to an exemplary embodiment of the present application. As shown in fig. 2, the method is applied to a data synchronization tool running one or more data synchronization threads configured with data synchronization tasks, where input configuration information of the data synchronization tasks includes information of corresponding message queue topics, and output configuration information of the data synchronization tasks includes information of corresponding destination data tables, and may include the following steps:
step 202: and acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that the target message queue theme in the message queue has new data.
In the application, the data synchronization tool may be an Extract-Transform-Load (ETL-Load) tool, and is configured to Extract data from a source end, perform operations such as format conversion and data cleaning on the extracted data, and Load the extracted data to a destination end; the format conversion, data cleansing, and other operations may be specifically completed after the new data is extracted and before the new data is synchronized to the destination data table in the subsequent step 206 of the present application. The data synchronization tool may include one or more data synchronization threads, each data synchronization thread may include one or more data synchronization tasks, and the data synchronization tool performs the data synchronization task through the data synchronization thread to synchronize data acquired from the message queue into a corresponding destination data table, where the data synchronization thread may be a real-time data synchronization thread to implement real-time synchronization of data between the source end and the destination end.
In the technical solution of the present application, the message delivery model adopted by the message queue may be a publish-subscribe model, such as Kafka model, rabbitMQ model, and the like, which is not limited in this application. Under this messaging model, a message publisher may publish messages to a particular message topic, and one or more message subscribers may receive all messages in the message queue topic if they "subscribe" to the message queue topic. In the application, a database or an application program of a source end is a message publisher, a data synchronization tool is a message subscriber, a message queue comprises one or more message queue topics, the source end can publish data to the message queue, and the data synchronization tool can acquire new data from the message queue.
In one embodiment, a trigger is preset in the message queue, and the trigger can be triggered when a predefined condition is met and executes a statement set defined in the trigger. The predefined condition may be set to that any message queue topic in the message queue has new data, and the defined statement set may be set to generate a notification message according to the message queue topic of the new data, and send the notification message to the corresponding data synchronization tool. Through the trigger arranged in the message queue, the notification message can be automatically sent to the data synchronization tool when the newly added data of the source end is received, manual monitoring is not needed, and therefore the data synchronization tool can extract the newly added data from the message queue in time.
Step 204: and determining the target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task.
In the technical scheme of the application, the data synchronization task is established according to a message queue theme and a target data table, wherein the message queue theme is configured by a user, the message queue theme can be used as input configuration information of the data synchronization task, and the target data table can be used as output configuration information of the data synchronization task. After receiving the notification message, the data synchronization tool may match the target message queue topic included in the notification message with the input configuration information of each data synchronization task in the data synchronization tool, determine a corresponding target data synchronization task, and determine a target data synchronization thread to which the target data synchronization task belongs. In addition to the message queue topic, the input configuration information may also include information of a corresponding consumption group identifier; the step of determining the target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task comprises the following steps: and determining a target data synchronization task corresponding to the notification message according to message queue subject information and consumption group identification information in the input configuration information of each data synchronization task, and a target message queue subject and a target consumption group identification contained in the notification message.
Step 206: and under the condition that the target data synchronization task belongs to a target data synchronization thread, executing the target data synchronization task through the target data synchronization thread to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
In an embodiment, the notification message further includes a partition and an offset of the newly added data in the message queue, and the data synchronization thread may update the local partition and the local offset of the data synchronization task according to the partition and the offset of the currently synchronized newly added data after the data synchronization task is executed every time. When the data synchronization task does not have a local partition corresponding to the partition in the notification message, after the newly added data is synchronized to the destination data table defined by the output configuration information of the data synchronization task, the partition in the notification message may be added to the local partition of the data synchronization task and the local offset corresponding to the partition is updated to the offset in the notification message; for example, if the offset of the partition of the newly added data corresponding to the notification message is 10, and the local partition of the data synchronization task corresponding to the notification message is 1, it indicates that the data in the partition 2 is the newly added data, and after the data synchronization is completed, the partition 2 needs to be newly added to the local partition of the data synchronization task, and the local offset corresponding to the partition is recorded as 10. In the case that the offset in the notification message is greater than the local offset of the data synchronization task corresponding to the partition in the notification message, after synchronizing the newly added data to the destination data table defined by the output configuration information of the data synchronization task, the local offset may be updated to the offset in the notification message; for example, if the offset of the partition of the new data corresponding to the notification message is 1 and the offset is 10, and the local partition of the data synchronization task corresponding to the notification message is 1 and the offset is 5, it indicates that the data between the offset 5 and the offset 10 in the partition 1 is the new data, and the local offset corresponding to the local partition 1 of the data synchronization task needs to be updated to 10 after the data synchronization is completed. Therefore, the local partition recorded by the data synchronization thread in the present application is the partition where the history synchronization data is located, and the local offset is the maximum offset of the history synchronization data corresponding to each partition.
In one embodiment, performing the target data synchronization task to extract the new data comprises: under the condition that the notification message indicates the starting position and the ending position of the newly added data, extracting the newly added data according to the starting position and the ending position; and in the case that the notification message indicates the starting position of the newly added data and does not indicate the ending position, extracting all the newly added data from the starting position. For example, if the notification message indicates that the start position of the newly added data is: partition 1, start offset 68; the termination location is partition 1, offset 72, then the data synchronization tool may execute the target data synchronization task via the target data synchronization thread to extract the new data between offsets 68-72 in partition 1 from the target message queue topic of the message queue. If the notification message only indicates that the starting position of the newly added data is: partition 1, start offset 68, the data synchronization tool may execute the target data synchronization task via the target data synchronization thread to extract all new data from the target message queue topic of the message queue starting at partition 1 offset 6.
In some cases, for example, when a target message queue topic in a message queue adds new data for multiple times in a short time, the message queue sends multiple notification messages corresponding to each new data, and if the notification message only indicates a start position of the new data and does not indicate an end position of the new data, a situation that both a partition and an offset of the notification message received by a data synchronization thread are smaller than an offset of a local partition may occur. If the situation occurs, the new data triggering the notification message is synchronized, and the target data synchronization thread does not need to extract the new data according to the notification message. For example, data a is added in the target message queue theme, the message queue generates a notification message a corresponding to the data a, the target data synchronization thread extracts the added data from the target message queue theme after receiving the notification message a sent by the message queue, and if data B is added in the message queue between the added data a and the execution target data synchronization thread, the target data thread can extract the data a and the data B together, so that when receiving the notification message B corresponding to the data B, the notification message B can be directly discarded, and the data synchronization task corresponding to the notification message B is not executed. Therefore, after receiving the notification message, the data synchronization tool may compare the partition and offset of the newly added data recorded in the notification message with the local partition and offset of the data synchronization task corresponding to the notification message, which are recorded locally by the data synchronization tool. If the partition in the notification message is consistent with the local partition, and the offset is greater than the local offset, the data of the message queue whose partition offset is greater than the local offset may be considered as new data, and if the partition in the notification message is greater than the local partition, the data in the partition may be considered as new data. Therefore, the data synchronization tool can execute the data synchronization task through the target data synchronization thread only when the partition in the notification message is larger than the local partition of the target data synchronization task or the offset in the notification message is larger than the local offset of the target data synchronization task, thereby avoiding the data duplication in the target database.
In an embodiment, when the data synchronization tool receives a plurality of notification messages for invoking the same data synchronization thread, the data synchronization thread needs to sequence the data synchronization tasks to be executed, and execute each data synchronization task in sequence according to a sequencing result. For example, the data synchronization thread may form a task scheduling queue from data synchronization tasks to be executed, and after determining a target data synchronization task, insert the target data synchronization task into a corresponding position in the task scheduling queue by querying the task scheduling queue according to an execution sequence existing among the data synchronization tasks in the task scheduling queue. The execution sequence of the task scheduling queue can be determined according to the preset priority of each data synchronization task, the data synchronization task with higher priority is preferentially executed, and if the data synchronization tasks with the same priority exist, the execution sequence of the data synchronization tasks with the same priority can be determined according to the time point of acquiring the corresponding notification message, and the data synchronization task with the earlier time point is preferentially executed; or the task scheduling queues in the data synchronization tasks may be preferentially sorted according to the acquisition time points corresponding to the notification messages, and if there are data synchronization tasks with the same acquisition time points, the execution sequence of the data synchronization tasks with the same acquisition time points may be determined according to a preset priority. The execution sequence of each data synchronization task in the task scheduling queue may be set according to the needs of those skilled in the art, and the present application is not limited thereto.
In the technical scheme of the application, if the data volume of the newly added data in the message queue is large, the time taken by the data synchronization thread to pull the newly added data when executing the data synchronization task is long, and the execution of other subsequent data synchronization tasks in the task scheduling queue is easily affected. Therefore, a data volume threshold can be preset as the maximum data volume that can be extracted during the execution of the data synchronization task, if the data volume of the new data corresponding to the target data synchronization task is greater than the preset data volume threshold, only the new data with the data volume of the preset data volume threshold is extracted during the execution of the target data synchronization thread, after the extracted new data is synchronized to the corresponding target data table, the local offset of the data synchronization task can be updated to the offset after the synchronization of the extracted new data, the acquisition time point of the data synchronization task is updated to the time point after the synchronization of the extracted new data is completed, and the data synchronization task is reordered in the task scheduling queue. For example, if the local offset of the target data synchronization task is 98, and the termination position of the new data indicated in the notification message received by the target data synchronization thread is 1000, and if the preset data amount threshold is 100 offsets, the target data synchronization thread can only extract the new data with 100 offsets from the message queue when executing the target data synchronization task, and update the local offset of the target data synchronization task to 198 after completing the synchronization of the extracted new data, and update the acquisition time point of the target data synchronization task to the time point after completing the synchronization of the extracted data with 100 offsets, and reorder the target data synchronization task in the task scheduling queue. By controlling the data volume of newly added data pulled by each data synchronization task, the data synchronization thread is prevented from executing the same data synchronization task for a long time to cause data synchronization task congestion, and thus the real-time performance of data synchronization in the data synchronization thread is guaranteed.
In an embodiment, the data synchronization tool may obtain a load status of the data synchronization thread in real time, where the load status may be the number of unexecuted data synchronization tasks in the task scheduling queue, or the data amount of accumulated newly added data to be synchronized by each data synchronization task in the task scheduling queue. And under the condition that the load state of the target data synchronization thread exceeds a preset complex threshold, the data synchronization tool can share the load of the target data synchronization thread through other data synchronization threads. Under the condition that a plurality of data synchronization threads run in the data synchronization tool, if one message queue corresponds to the plurality of data synchronization threads, other idle data synchronization threads for processing the message queue can be called to share the load of the target data synchronization thread; if one message queue corresponds to one data synchronization thread, the data synchronization threads of other message queues can be called to share the load of the target data synchronization thread, and in this case, the input configuration information of the data synchronization task should also include the message queue name. And if the other data synchronization threads are in an idle state, the idle other data synchronization threads can be used for carrying out load sharing on the target data synchronization thread. Or, in addition to calling the existing data synchronization thread to perform load sharing, if the data synchronization tool has an idle CPU resource, a new data synchronization thread may also be created by using the idle CPU resource in the data synchronization tool, and load sharing is performed on the target data synchronization thread through the new data synchronization thread until the load state of the target data synchronization thread acquired by the data synchronization tool is not greater than the preset complex threshold. In this embodiment, the data synchronization task shared by the data synchronization thread for load sharing may be a real-time data synchronization task whose waiting time duration processed by the target data synchronization thread exceeds a preset time duration. When the data synchronization task is configured, a first waiting duration threshold value may be set for the real-time data synchronization task, a second waiting duration threshold value may be set for the non-real-time data synchronization task, and the first waiting duration threshold value is smaller than the second waiting duration threshold value, that is, the real-time task has a relatively higher requirement for timeliness of task processing than the non-real-time task. By carrying out load sharing on the target data synchronization thread, the data synchronization thread resource can be temporarily increased under the condition that data synchronization task congestion occurs in the target data synchronization thread, so that the real-time performance of data synchronization is improved.
In the technical scheme of the application, the data synchronization tool can respectively perform data interaction with the message queue and the target data table through TCP connection. Under the condition that the data adding frequency of the message queue is not greater than the preset threshold value, the data synchronization tool can establish TCP connection with the message queue and a corresponding target data table when data is added newly every time, and disconnect the connection after the data synchronization is completed; under the condition that the data adding frequency of the message queue is greater than the preset frequency threshold, the TCP connection established between the data synchronization tool and the message queue and the corresponding target data table can be maintained all the time until the data adding frequency is reduced to be lower than the preset frequency threshold. For establishing and disconnecting the TCP connection between the message queue and the data synchronization tool, reference may be made to the prior art in the art, and details are not described here. By keeping the TCP connection between the data synchronization tool and the message queue and the target data table respectively, the repeated establishment of the TCP connection in the data synchronization process is avoided, and the data synchronization efficiency is improved.
According to the technical scheme provided by the application, the configuration information of the data synchronization tasks corresponding to the data synchronization threads is preset in the data synchronization tool, and the data synchronization tasks to be executed are sequenced according to the preset rule when a plurality of data synchronization tasks to be executed exist, so that one data synchronization thread can execute the plurality of data synchronization tasks, the real-time synchronization of the message queue theme data is realized, the utilization rate of the data synchronization threads in the data synchronization tool is improved, and thread resources are saved. The network architecture diagram shown in fig. 1 is taken as an example, and is described in detail with reference to fig. 3. FIG. 3 is a flow chart illustrating a multi-party interaction of a data synchronization method according to an exemplary embodiment of the present application. As shown in fig. 3, the interaction process between the source database group 101, the message queue 102, the data synchronization tool 103, and the destination database group 104 includes the following steps:
in step 301, the source database group 101 sends the new data to the message queue 102.
In step 302, the message queue 102 generates a notification message in case of a new addition of data.
The message queue 102 receives and buffers the new data sent by the source database 101, and the message queue 102 generates a notification message according to parameters such as a target message queue theme of the new data, a partition and an offset of the new data in the message queue 102, and the like when receiving the new data.
In step 303, the message queue 102 sends a notification message to the corresponding data synchronization tool 103.
The message queue 102 and the data synchronization tool 103 have previously established a TCP connection, and the message queue 102 transmits the generated notification message to the data synchronization tool 103 through the TCP connection.
At step 304, the data synchronization tool 103 determines a target data synchronization thread corresponding to the received notification message.
Each data synchronization thread running in the data synchronization tool 103 is configured with a corresponding data synchronization task, and input information and output information of each data synchronization task in advance. After receiving the notification message, the data synchronization tool 103 may analyze the notification message, obtain a target message queue topic included in the notification message, and determine a corresponding target data synchronization task and a target data synchronization thread to which the target data synchronization task belongs by matching the obtained target message queue topic with input information of each data synchronization task configured by each data synchronization thread that is running by the data synchronization tool.
Step 305, inserting the target data synchronization task into a corresponding position in a task scheduling queue corresponding to the target data thread.
And the target data synchronization thread determines the execution sequence between the target data synchronization task and each data synchronization task existing in the task scheduling queue according to the preset priority of each data synchronization task existing in the task scheduling queue corresponding to the target data synchronization task and the target data synchronization thread and the acquisition time point of each corresponding notification message, so as to insert the target data synchronization task into the corresponding position in the task scheduling queue according to the determined execution sequence.
For example, table 1 is a task scheduling queue corresponding to the target data synchronization thread, and table 2 is each attribute of the target data synchronization task acquired by the data synchronization tool 103 analyzing the notification message.
Figure BDA0003139602060000121
TABLE 1
Figure BDA0003139602060000122
TABLE 2
The priority of the target data synchronization task Ab shown in table 2 is compared with the priority of each data synchronization task shown in table 1, and it is determined that the priority of the target data synchronization task Ab is higher than the priority of the data synchronization task bb and is the same as the priority of the data synchronization task Aa. Further comparing the acquisition time points of the target data synchronization task Ab and the data synchronization task Aa, determining that the acquisition time point of the target data synchronization task Ab is later than that of the data synchronization task Aa, determining that the execution sequence of the target data synchronization task Ab is lower than that of the data synchronization task Aa and higher than that of the data synchronization task bb, inserting the target data synchronization task Ab between the data synchronization tasks Aa and bb in the task scheduling queue list 1, and generating a new task scheduling queue, as shown in table 3.
Figure BDA0003139602060000123
TABLE 3
In step 306, the target task synchronization thread sequentially executes each data synchronization task in the task scheduling queue to synchronize the newly added data corresponding to each data synchronization task to the target data table defined by the output configuration information of each data synchronization task in the target database group 104.
The target task synchronization thread preferentially executes the data synchronization tasks Aa in the execution order of the task scheduling queue shown in table 3.
If it is preset in the target data synchronization thread that new data with maximum 100 offsets are extracted for each execution of the data synchronization task, when the target data synchronization thread executes the data synchronization task Aa, the target data synchronization thread extracts the new data with 100 offsets from the corresponding message queue topic of the message queue 102, updates the local offset of the data synchronization task to 198 after the extracted new data is synchronized, updates the acquisition time point to the time point after the extracted new data is synchronized, for example, 12.
Figure BDA0003139602060000131
TABLE 4
Corresponding to the method embodiments, the present specification also provides an embodiment of an apparatus.
Fig. 4 is a schematic structural diagram of a data synchronization electronic device according to an exemplary embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410, but may also include hardware required for other services. The processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Fig. 5 is a block diagram illustrating a data synchronization apparatus according to an exemplary embodiment of the present application. Referring to fig. 5, the apparatus includes a message acquisition unit 502, a task determination unit 504, and a task execution unit 506, wherein:
the message acquiring unit 502 is configured to acquire a notification message sent by a message queue, where the notification message is used to indicate that a target message queue topic in the message queue has new data;
the task determining unit 504 is configured to determine a target data synchronization task matching the target message queue topic according to input configuration information of each data synchronization task;
the task execution unit 506 is configured to execute the target data synchronization task through the target data synchronization thread to extract the new data from the message queue and synchronize the new data to a destination data table defined by output configuration information of the target data synchronization task, in a case where the target data synchronization task belongs to the target data synchronization thread.
Optionally, the notification message includes a partition and an offset of the newly added data in the message queue, and the executing the target data synchronization task by the target data synchronization thread includes: executing, by the target data synchronization thread, the target data synchronization task if the target data synchronization task does not have a local partition corresponding to a partition in the notification message, or if an offset in the notification message is greater than a local offset of the target data synchronization task corresponding to a partition in the notification message; the local partition is a partition where the historical synchronous data is located, and the local offset is a maximum offset corresponding to the historical synchronous data in the partition where the historical synchronous data is located.
Optionally, the apparatus further comprises:
a local updating unit 508, configured to, in a case that the partition in the notification message is larger than the local partition of the target data synchronization task, after synchronizing the newly added data to a destination data table defined by output configuration information of the target data synchronization task, update the local partition to the partition in the notification message and update the local offset to the offset in the notification message; and under the condition that the offset in the notification message is larger than the local offset of the target data synchronization task, after the newly added data is synchronized to a target data table defined by the output configuration information of the target data synchronization task, updating the local offset into the offset in the notification message.
Optionally, executing the target data synchronization task through a target data synchronization thread includes: determining an execution sequence between the target data synchronization task and each existing data synchronization task in a task scheduling queue corresponding to the target data synchronization thread; inserting the target data synchronization task into a corresponding position in the task scheduling queue according to the determined execution sequence; and sequentially executing each data synchronization task in the scheduling queue through the target data synchronization thread.
Optionally, determining an execution sequence between the target data synchronization task and each data synchronization task existing in a task scheduling queue corresponding to the target data synchronization thread includes: determining an execution sequence between the target data synchronization task and each existing data synchronization task according to the preset priority of the target data synchronization task and the preset priority of each existing data synchronization task in a task scheduling queue corresponding to the target data synchronization thread; if the preset priority of any existing data synchronization task is the same as the preset priority of the target data synchronization task, determining an execution sequence between the target data synchronization task and the any existing data synchronization task according to the acquisition time points of the notification messages corresponding to the target data synchronization task and the any existing data synchronization task.
Optionally, the apparatus further comprises:
a data extracting unit 510, configured to execute the target data synchronization task through the target data synchronization thread if a data amount of new data corresponding to the target data synchronization task is greater than a preset data amount threshold, so as to extract new data whose data amount is the preset data amount threshold from the message queue, and synchronize the extracted new data to a target data table defined by output configuration information of the target data synchronization task;
a task ordering unit 512, configured to update the obtaining time point corresponding to the data synchronization task to a time point after the extracted new data synchronization is completed, and reorder the target data synchronization task in the scheduling queue.
Optionally, the input configuration information further includes information of a corresponding consumption group identifier; the step of determining the target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task comprises the following steps: and determining a target data synchronization task corresponding to the notification message according to message queue subject information and consumption group identification information in the input configuration information of each data synchronization task, and a target message queue subject and a target consumption group identification contained in the notification message.
Optionally, the apparatus further comprises:
a load obtaining unit 514 configured to obtain a load status of the target data synchronization thread;
a load sharing unit 516, configured to create a new data synchronization thread to perform load sharing on the data synchronization task processed by the target data synchronization thread when the load state is greater than a preset load threshold and no other data synchronization thread exists or no other idle data synchronization thread exists;
the thread releasing unit 518, when the load status after the load sharing of the target data synchronization thread is not greater than a preset load threshold, releases the new data synchronization thread.
Optionally, the new data synchronization thread is an idle thread in the data synchronization tool.
Optionally, the data synchronization task shared by the new data synchronization thread is a real-time data synchronization task whose waiting time processed by the target data synchronization thread exceeds a preset time.
Optionally, the extracting the new data includes: under the condition that the notification message indicates the starting position and the ending position of the newly added data, extracting the newly added data according to the starting position and the ending position; and in the case that the notification message indicates the starting position of the newly added data and does not indicate the ending position, extracting all the newly added data from the starting position.
Optionally, the apparatus further comprises:
a connection holding unit 520, configured to hold TCP connections established between the data synchronization tool and the message queue and the destination data table defined by the output configuration information, respectively, when the data addition frequency of the message queue is greater than a preset frequency threshold, where the TCP connections are used for data interaction between the data synchronization tool and the destination data table defined by the message queue and the output configuration information, respectively.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium, e.g. a memory, comprising instructions executable by a processor of a data synchronization apparatus to implement a method as in any one of the above embodiments, such as the method may comprise:
acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that newly added data exists in a target message queue theme in the message queue; determining a target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task; and under the condition that the target data synchronization task belongs to a target data synchronization thread, executing the target data synchronization task through the target data synchronization thread to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
The non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc., which is not limited in this application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (15)

1. A data synchronization method is applied to a data synchronization tool, the data synchronization tool runs with one or more data synchronization threads, the data synchronization threads are configured with a plurality of data synchronization tasks, input configuration information of the data synchronization tasks comprises information of corresponding message queue topics, and output configuration information of the data synchronization tasks comprises information of corresponding destination data tables, and the method comprises the following steps:
acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that newly added data exists in a target message queue theme in the message queue;
determining a target data synchronization task matched with the target message queue subject according to the input configuration information of each data synchronization task;
and under the condition that the target data synchronization task belongs to a target data synchronization thread, executing the target data synchronization task through the target data synchronization thread to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
2. The method of claim 1, wherein the notification message includes a partition and an offset of the new data in the message queue, and wherein executing the target data synchronization task by a target data synchronization thread comprises:
executing, by the target data synchronization thread, the target data synchronization task if the target data synchronization task does not have a local partition corresponding to a partition in the notification message, or if an offset in the notification message is greater than a local offset of the target data synchronization task corresponding to a partition in the notification message; the local partition is a partition where the historical synchronous data is located, and the local offset is a maximum offset corresponding to the historical synchronous data in the partition where the historical synchronous data is located.
3. The method of claim 2, further comprising:
when the partition in the notification message is larger than the local partition of the target data synchronization task, after the newly added data is synchronized to a destination data table defined by output configuration information of the target data synchronization task, updating the local partition to the partition in the notification message and updating the local offset to the offset in the notification message;
and under the condition that the offset in the notification message is larger than the local offset of the target data synchronization task, after the newly added data is synchronized to a target data table defined by output configuration information of the target data synchronization task, updating the local offset into the offset in the notification message.
4. The method of claim 1, wherein executing the target data synchronization task via a target data synchronization thread comprises:
determining an execution sequence between the target data synchronization task and each existing data synchronization task in a task scheduling queue corresponding to the target data synchronization thread;
inserting the target data synchronization task into a corresponding position in the task scheduling queue according to the determined execution sequence;
and sequentially executing each data synchronization task in the scheduling queue through the target data synchronization thread.
5. The method of claim 4, wherein determining an execution order between the target data synchronization task and each data synchronization task already existing in a task scheduling queue corresponding to the target data synchronization thread comprises:
determining an execution sequence between the target data synchronization task and each existing data synchronization task according to the preset priority of the target data synchronization task and the preset priority of each existing data synchronization task in a task scheduling queue corresponding to the target data synchronization thread;
if the preset priority of any existing data synchronization task is the same as the preset priority of the target data synchronization task, determining the execution sequence between the target data synchronization task and any existing data synchronization task according to the acquisition time points of the notification messages corresponding to the target data synchronization task and any existing data synchronization task.
6. The method of claim 5, further comprising:
if the data volume of the newly added data corresponding to the target data synchronization task is larger than a preset data volume threshold, executing the target data synchronization task through the target data synchronization thread to extract the newly added data with the data volume being the preset data volume threshold from the message queue, and synchronizing the extracted newly added data to a target data table defined by output configuration information of the target data synchronization task;
and updating the acquisition time point corresponding to the data synchronization task to the time point after the extracted newly added data are synchronized, and reordering the target data synchronization task in the scheduling queue.
7. The method of claim 1, wherein the input configuration information further comprises information of a corresponding consumption group identifier; the step of determining the target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task comprises the following steps:
and determining a target data synchronization task corresponding to the notification message according to message queue subject information and consumption group identification information in the input configuration information of each data synchronization task, and a target message queue subject and a target consumption group identification contained in the notification message.
8. The method of claim 1, further comprising:
acquiring the load state of the target data synchronization thread;
when the load state is greater than a preset load threshold value and other data synchronization threads do not exist or other idle data synchronization threads do not exist, a new data synchronization thread is created to share the load of the data synchronization task processed by the target data synchronization thread;
and releasing the new data synchronization thread under the condition that the load state after the load sharing of the target data synchronization thread is not greater than a preset load threshold.
9. The method of claim 8, wherein the new data synchronization thread is an idle thread in the data synchronization tool.
10. The method as claimed in claim 8, wherein the data synchronization task shared by the new data synchronization thread is a real-time data synchronization task processed by the target data synchronization thread and having a waiting time longer than a preset time.
11. The method of claim 1, wherein the extracting the new data comprises:
under the condition that the notification message indicates the starting position and the ending position of the newly added data, extracting the newly added data according to the starting position and the ending position;
and in the case that the notification message indicates the starting position of the newly added data and does not indicate the ending position, extracting all the newly added data from the starting position.
12. The method of claim 1, further comprising:
and under the condition that the data adding frequency of the message queue is greater than a preset frequency threshold, maintaining TCP (transmission control protocol) connections established between the data synchronization tool and the message queue and the target data table defined by the output configuration information respectively, wherein the TCP connections are used for data interaction between the data synchronization tool and the message queue and the target data table defined by the output configuration information respectively.
13. A data synchronization apparatus applied to a data synchronization tool running one or more data synchronization threads, wherein the data synchronization threads are configured with a plurality of data synchronization tasks, input configuration information of the data synchronization tasks includes information of corresponding message queue topics, and output configuration information of the data synchronization tasks includes information of corresponding destination data tables, the apparatus comprising:
the message acquisition unit is used for acquiring a notification message sent by a message queue, wherein the notification message is used for indicating that new data exist in a target message queue theme in the message queue;
the task determining unit is used for determining a target data synchronization task matched with the target message queue theme according to the input configuration information of each data synchronization task;
and the task execution unit is used for executing the target data synchronization task through the target data synchronization thread under the condition that the target data synchronization task belongs to the target data synchronization thread so as to extract the newly added data from the message queue and synchronize the newly added data to a target data table defined by output configuration information of the target data synchronization task.
14. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-12 by executing the executable instructions.
15. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1-12.
CN202110729604.XA 2021-06-29 2021-06-29 Data synchronization method and device Active CN113342898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110729604.XA CN113342898B (en) 2021-06-29 2021-06-29 Data synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110729604.XA CN113342898B (en) 2021-06-29 2021-06-29 Data synchronization method and device

Publications (2)

Publication Number Publication Date
CN113342898A CN113342898A (en) 2021-09-03
CN113342898B true CN113342898B (en) 2022-10-04

Family

ID=77481444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110729604.XA Active CN113342898B (en) 2021-06-29 2021-06-29 Data synchronization method and device

Country Status (1)

Country Link
CN (1) CN113342898B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893126A (en) * 2016-03-29 2016-08-24 华为技术有限公司 Task scheduling method and device
CN110535787A (en) * 2019-07-25 2019-12-03 北京奇艺世纪科技有限公司 Information consumption method, apparatus and readable storage medium storing program for executing
CN110569123A (en) * 2019-07-31 2019-12-13 苏宁云计算有限公司 Thread allocation method and device, computer equipment and storage medium
WO2020238365A1 (en) * 2019-05-31 2020-12-03 深圳前海微众银行股份有限公司 Message consumption method, apparatus and device, and computer storage medium
CN112698789A (en) * 2020-12-29 2021-04-23 广州鼎甲计算机科技有限公司 Data caching method, device, equipment and storage medium
CN112818022A (en) * 2021-02-25 2021-05-18 北京新致君阳信息技术有限公司 Data stream synchronization system, device and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190102223A1 (en) * 2017-09-29 2019-04-04 Niall Power System, Apparatus And Method For Real-Time Activated Scheduling In A Queue Management Device
JP7197794B2 (en) * 2019-03-28 2022-12-28 富士通株式会社 Information processing device and execution control program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893126A (en) * 2016-03-29 2016-08-24 华为技术有限公司 Task scheduling method and device
WO2020238365A1 (en) * 2019-05-31 2020-12-03 深圳前海微众银行股份有限公司 Message consumption method, apparatus and device, and computer storage medium
CN110535787A (en) * 2019-07-25 2019-12-03 北京奇艺世纪科技有限公司 Information consumption method, apparatus and readable storage medium storing program for executing
CN110569123A (en) * 2019-07-31 2019-12-13 苏宁云计算有限公司 Thread allocation method and device, computer equipment and storage medium
CN112698789A (en) * 2020-12-29 2021-04-23 广州鼎甲计算机科技有限公司 Data caching method, device, equipment and storage medium
CN112818022A (en) * 2021-02-25 2021-05-18 北京新致君阳信息技术有限公司 Data stream synchronization system, device and method

Also Published As

Publication number Publication date
CN113342898A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN109343941B (en) Task processing method and device, electronic equipment and computer readable storage medium
CN107391243B (en) Thread task processing equipment, device and method
CN103297395B (en) The implementation method of a kind of Internet service, system and device
US20170047069A1 (en) Voice processing method and device
CN109766194B (en) Method and system for realizing low-coupling plan task component based on message
WO2016177191A1 (en) Packet processing method and device
CN102023899B (en) Multithreaded data synchronization method and device
CN110083651A (en) A kind of method and apparatus of data load
US11012542B2 (en) Data processing method and apparatus
CN112860401B (en) Task scheduling method, device, electronic equipment and storage medium
CN113342898B (en) Data synchronization method and device
CN113961341A (en) Concurrent data processing method, system, device and storage medium based on Actor model
CN109491767A (en) The processing method and distributed system of distributed transaction
US7707584B2 (en) Method and apparatus for synchronizing calls in a server and client system
CN113064705A (en) Thread pool capacity expansion method, device, server, medium and product
CN110839061B (en) Data distribution method, device and storage medium
CN109617821B (en) Transmission method, main control board and equipment of multicast message
CN111597056A (en) Distributed scheduling method, system, storage medium and device
CN106936911B (en) Lightweight distributed computing platform and computing resource management and control method thereof
CN113268365A (en) Method, device, equipment and storage medium for realizing delay message in distributed system
CN115185787A (en) Method and device for processing transaction log
US10313253B2 (en) Non-blocking request processing method and device
CN112422303B (en) Alarm data processing method, manager entity and network element
EP2963548B1 (en) Method for enhancing the reliability of a telecommunications network, system, telecommunications network and program
CN108574622B (en) Instant message processing method and device based on XMPP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant