CN115794837B - Data table synchronization method, system, electronic equipment and storage medium - Google Patents

Data table synchronization method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115794837B
CN115794837B CN202310050243.5A CN202310050243A CN115794837B CN 115794837 B CN115794837 B CN 115794837B CN 202310050243 A CN202310050243 A CN 202310050243A CN 115794837 B CN115794837 B CN 115794837B
Authority
CN
China
Prior art keywords
data
weight
data table
target
capturers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310050243.5A
Other languages
Chinese (zh)
Other versions
CN115794837A (en
Inventor
邓祺
李振达
温文鎏
姬永飞
吕图
陈羽飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202310050243.5A priority Critical patent/CN115794837B/en
Publication of CN115794837A publication Critical patent/CN115794837A/en
Application granted granted Critical
Publication of CN115794837B publication Critical patent/CN115794837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a data table synchronization method and a data table synchronization system, which are applied to a distributed incremental synchronization tool of a distributed fusion database cluster, wherein the method comprises the following steps: acquiring the weight of each data table; all the data tables are arranged in descending order according to the weight; traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of a distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables; and reporting the load data of each capturer to an arrangement driving component of the distributed fusion database cluster, and transmitting the synchronous task of all data tables to a plurality of capturers by the arrangement driving component according to the load data. The embodiment of the invention improves the rationality of load balancing processing, avoids the situation that the data table is evenly distributed to the capturers according to the number of the data table, reduces the synchronous pressure of part of capturers and fully utilizes the resources of the capturers.

Description

Data table synchronization method, system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data table synchronization method, a data table synchronization system, an electronic device, and a computer readable storage medium.
Background
TiDB is a distributed fusion database. TiDB change data capture (TiDB Change Data Capture, tiCDC for short) is a distributed incremental synchronization tool of a TiDB cluster, and can be used for carrying out real-time incremental synchronization on data by multi-node cooperation.
In the prior art, when incremental synchronization is performed on a plurality of data tables in a TiDB cluster, tiCDC performs load balancing processing according to the number of the data tables, that is, all the data tables in one synchronization task are equally distributed to each node in the TiDB. However, in practical applications, the size, the read-write frequency, etc. of each data table may be different, so that each node is evenly allocated to obtain the same number of data tables, which may result in a larger pressure ratio of incremental synchronization of the data tables of part of the nodes, and may also cause a situation that resources of part of the nodes are idle.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention are directed to providing a method for synchronizing a data table and a corresponding system for synchronizing a data table that overcome or at least partially solve the foregoing problems.
In order to solve the above problems, an embodiment of the present invention discloses a synchronization method of a data table, which is applied to a distributed incremental synchronization tool of a distributed fusion database cluster, and the method includes: acquiring the weight of each data table to be synchronized; descending order of all the data tables is carried out according to the weights; traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of the distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables, wherein the load data represents the weight of the corresponding data tables to be synchronized of the capturers; and reporting the load data of each capturer to an arrangement driving component of the distributed fusion database cluster, so that the arrangement driving component can send the synchronous task of all the data tables to a plurality of capturers according to the load data.
Optionally, before the obtaining the weight of each data table to be synchronized, the method further includes: all of the data tables are equally distributed to a plurality of the traps.
Optionally, the acquiring the weight of each data table to be synchronized includes: acquiring the number of events of successful synchronization of each data table in the current synchronization period from each capturer; and taking the number of the events of each data table as the weight of each data table.
Optionally, the distributing the traversed target data table to a target capturer of the plurality of capturers of the distributed incremental synchronization tool includes: and distributing the target data table with the largest weight in all the traversed data tables to the target capturer with the least load data in the plurality of capturers.
Optionally, the determining new load data of the target capturer according to the current load data of the target capturer and the weight of the target data table includes: and adding the current load data of the target capturer with the weight of the target data table to obtain new load data of the target capturer.
Optionally, before said equally distributing all of said data tables to a plurality of said capturers, said method further comprises: and setting the weight of all the data tables as a preset initial weight value.
Optionally, before the reporting the load data of each of the capturers to the arrangement driving component of the distributed fusion database cluster, the method further includes: judging whether all the data tables are traversed; if all the data tables are traversed, executing the operation of reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster; and if not, continuing to execute the operation of traversing all the data tables.
The embodiment of the invention also discloses a data table synchronization system, which is applied to a distributed incremental synchronization tool of a distributed fusion database cluster, and comprises the following steps: the weight acquisition module is used for acquiring the weight of each data table to be synchronized; the data table ordering module is used for ordering all the data tables in a descending order according to the weight; the load balancing module is used for traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of the distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables, wherein the load data represents the weight of the corresponding data tables to be synchronized of the capturers; and the load reporting module is used for reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster, so that the arrangement driving component can send the synchronous tasks of all the data tables to a plurality of capturers according to the load data.
Optionally, the system further comprises: and the data table average distribution module is used for evenly distributing all the data tables to a plurality of capturers before the weight acquisition module acquires the weight of each data table to be synchronized.
Optionally, the weight acquisition module includes: the event number acquisition module is used for acquiring the number of events of successful synchronization of each data table in the current synchronization period from each capturer; and the data table weight determining module is used for taking the event number of each data table as the weight of each data table.
Optionally, the load balancing module includes: and the target data table distribution module is used for distributing the target data table with the largest weight in all the traversed data tables to the target capturer with the least load data in the plurality of capturers.
Optionally, the load balancing module includes: and the load weight adding module is used for adding the current load data of the target capturer and the weight of the target data table to obtain new load data of the target capturer.
Optionally, the system further comprises: and the weight initialization module is used for setting the weight of all the data tables to be a preset initial weight value before the data table average distribution module distributes all the data tables to a plurality of the capturers in an average way.
Optionally, the system further comprises: the traversal completion judging module is used for judging whether all the data tables are traversed or not before the load reporting module reports the load data of each capturer to the arrangement driving assembly of the distributed fusion database cluster; the load reporting module is used for reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster when all the data tables are traversed; and the load balancing module is used for traversing all the data tables when all the data tables are not traversed.
The embodiment of the invention also discloses an electronic device, which comprises: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform the method of synchronizing a data table as described above.
The embodiment of the invention also discloses a computer readable storage medium, which stores a computer program for causing a processor to execute the data table synchronization method.
The embodiment of the invention has the following advantages:
the synchronization scheme of the data table provided by the embodiment of the invention is applied to the TiCDC of the TiDB cluster, the TiCDC can acquire the weight of each data table to be synchronized, all the data tables are arranged in descending order according to the weight, all the data tables are traversed, the traversed target data table is distributed to target Capture in a plurality of capturers (Capture) of the TiCDC, and new load data of the target Capture is determined according to the current load data of the target Capture and the weight of the target data table. The load data represents the weight of the corresponding Capture to-be-synchronized data table. Then, the TiCDC reports the load data of each Capture to a layout driver (PD) component of the TiDB cluster, so that the PD component can issue a synchronization task of all data tables to multiple captures according to the load data of each Capture.
According to the embodiment of the invention, all data tables are arranged in descending order according to the weight of each data table, then the traversed target data table is distributed to the target Capture, new load data of the target Capture are determined, the load data of each Capture are finally reported to the PD component, and the PD component issues a synchronous task to the Capture. According to the embodiment of the invention, the data tables are distributed according to the weight of the data tables, so that the rationality of load balancing processing is improved, the data tables are prevented from being distributed to Capture according to the number of the data tables, the synchronous pressure of part of Capture is reduced, and the resources of Capture are fully utilized.
Drawings
FIG. 1 is a flow chart of steps of a method for synchronizing a data table according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a TiCDC-based data table synchronization scheme according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a TiCDC-based data table synchronization task generation scheme according to an embodiment of the present invention;
fig. 4 is a block diagram of a data table synchronization system according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, a flowchart of steps of a method for synchronizing a data table according to an embodiment of the present invention is shown. The synchronization method of the data table can be applied to TiCDC. The synchronization method of the data table specifically comprises the following steps:
step 101, obtaining the weight of each data table to be synchronized.
In the embodiment of the invention, the respective weights are set for each data table, and the respective weights of each data table may be the same or different. The weight of each data table may be a quantized value for calculating the value of the capturer (Capture) load at a subsequent load balancing stage, which is of no practical significance. In practical application, the value of the number of successful synchronization events of each data table in the previous synchronization period may be used as the weight of the data table, so the weight of the data table may be an integer.
And 102, arranging all the data tables in descending order according to the weight.
In the embodiment of the present invention, all data tables may be arranged in descending order according to the size of the weights, that is, data tables with relatively large weights are arranged in front and data tables with relatively small weights are arranged in back.
And step 103, traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of the distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables.
In an embodiment of the present invention, all data tables may be traversed in the order in which they are arranged. In practical applications, the data table with the relatively largest weight, i.e. the target data table, can be traversed first. The TiCDC may include a plurality of traps (Capture). Capture is the running process of TiCDC. A TiCDC cluster is typically made up of multiple Capture. There are two Capture roles in a TiCDC cluster: an Owner (Owner) and a Processor (Processor). One TiCDC cluster has and only one own, which is responsible for the internal scheduling of the TiCDC cluster. If the Owner is abnormal, other Capture will spontaneously elect a new Owner. The target Capture can be understood as a Capture with relatively minimal load data among a plurality of captures. The load data may represent the weight of the corresponding Capture to be synchronized data table. For example, the data tables to be synchronized by Capture1 are data tables b01, b02, and b03. The weight of the data table b01 is q01, the weight of the data table b02 is q02, and the weight of the data table b03 is q03. Therefore, the load data of Capture1 is q01+q02+q03. In addition to assigning the target data table to the target Capture, the load data of the target Capture needs to be updated. Specifically, new load data of the target Capture can be determined according to the current load data of the target Capture and the weight of the target data table.
And 104, reporting the load data of each capturer to an arrangement driving component of the distributed fusion database cluster so that the arrangement driving component can send the synchronous task of all the data tables to a plurality of capturers according to the load data.
In an embodiment of the present invention, after all the data tables have been traversed, each data table is assigned to its corresponding target Capture. To this end, the respective load data of each Capture is updated. And reporting the updated load data of each Capture to the PD component of the TiDB cluster. The PD component is a global central master node in the TiDB, which is responsible for the scheduling of the whole TiDB cluster. After the PD component obtains the load data of all the Capture, the synchronization task of all the data tables may be issued to each Capture according to the load data of all the Capture. In practical applications, the synchronization tasks may include, but are not limited to: each data table and Capture to which each data table is assigned. For example, the synchronization task includes data tables b01, b02, b03, b04, and b05, and data tables b01 and b02 are assigned to Capture1, data table b03 is assigned to Capture2, and data tables b04 and b05 are assigned to Capture3.
The synchronization scheme of the data table provided by the embodiment of the invention is applied to the TiCDC of the TiDB cluster, the TiCDC can acquire the weight of each data table to be synchronized, all the data tables are arranged in descending order according to the weight, all the data tables are traversed, the traversed target data table is distributed to target Capture in a plurality of capturers (Capture) of the TiCDC, and new load data of the target Capture is determined according to the current load data of the target Capture and the weight of the target data table. The load data represents the weight of the corresponding Capture to-be-synchronized data table. Then, the TiCDC reports the load data of each Capture to a layout driver (PD) component of the TiDB cluster, so that the PD component can issue a synchronization task of all data tables to multiple captures according to the load data of each Capture.
According to the embodiment of the invention, all data tables are arranged in descending order according to the weight of each data table, then the traversed target data table is distributed to the target Capture, new load data of the target Capture are determined, the load data of each Capture are finally reported to the PD component, and the PD component issues a synchronous task to the Capture. According to the embodiment of the invention, the data tables are distributed according to the weight of the data tables, so that the rationality of load balancing processing is improved, the data tables are prevented from being distributed to Capture according to the number of the data tables, the synchronous pressure of part of Capture is reduced, and the resources of Capture are fully utilized.
In one exemplary embodiment of the present invention, all data tables may be equally distributed to multiple Captures before the weight of each data table to be synchronized is acquired. At this time, the weight of each data table can be regarded as the same. If all the data tables cannot be equally distributed to the plurality of Capture, all the data tables may be distributed to the plurality of Capture as evenly as possible according to the actual situation. For example, all data tables include four data tables, data tables b01, b02, b03, and b04, respectively. The data tables b01, b02, b03, and b04 can be equally assigned to Capture1 and Capture2. Specifically, data tables b01 and b02 may be assigned to Capture1, and data tables b03 and b04 may be assigned to Capture2.
In an exemplary embodiment of the present invention, one implementation of obtaining the weight of each data table to be synchronized is to obtain, from each capturer, the number of events that each data table succeeds in synchronization in the current synchronization period; the number of events of each data table is used as the weight of each data table. In practical applications, each Capture counts the number of events (events) allocated to the respective data table that are successful in synchronization in each synchronization period, and reports the number of events to the PD component. The Owner acquires the event number of successful synchronization of each data table from the PD component, and further takes the event number of successful synchronization of each data table as the weight of each data table.
In an exemplary embodiment of the present invention, one implementation of assigning the traversed target data table to a target capturer of the plurality of capturers of the distributed incremental synchronization tool is to assign the target data table with the greatest weight of all traversed data tables to the target capturer with the least load data of the plurality of capturers. For example, all the data tables are sorted in order of the weight from the higher to the lower, and the data tables b02, b01, b03, b05, b04 are sequentially sorted. That is, among the data tables b01, b02, b03, b04, and b05, the weight of the data table b02 is the largest, and the weight of the data table b04 is the smallest. Traversing all the data tables, and traversing to the data table b02 with the largest weight, wherein the data table b02 is the target data table. If the load data of Capture1 is the smallest at this time, capture1 is the target Capture at this time, and the data table b02 is assigned to Capture1.
In an exemplary embodiment of the present invention, before reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster, determining whether all the data tables are traversed, and if all the data tables are traversed, executing an operation of reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster; if not, continuing to execute the operation of traversing all the data tables. In the above example, after traversing to the data table b02, it is determined whether or not all the data tables have been traversed, and since other data tables exist in the subsequent step, it is necessary to continue the operation of traversing all the data tables without traversing all the data tables. When traversing to the data table b01, the data table b01 is the target data table at this time. If the load data of Capture1 is the smallest at this time, capture1 is the target Capture at this time, and the data table b01 is assigned to Capture1.
In an exemplary embodiment of the present invention, one implementation manner of determining the new load data of the target capturer according to the current load data of the target capturer and the weight of the target data table is to add the current load data of the target capturer and the weight of the target data table to obtain the new load data of the target capturer. For example, the current load data of the target Capture is q01+q02+q03, where q01 is the weight of the data table b01, q02 is the weight of the data table b02, and q03 is the weight of the data table b03. The weight of the target data table b04 is q04. The new load data of the target Capture is q01+q02+q03+q04.
In one exemplary embodiment of the present invention, the weights of all data tables may be set to a preset initial weight value before the all data tables are equally distributed to the plurality of capturers. In practical applications, the initial weight value may be any value, for example, the initial weight value is 1. I.e. the weights of all data tables are set to 1.
Based on the above description of an embodiment of a data table synchronization method, a TiCDC-based data table synchronization scheme is described below. The TiCDC-based data table synchronization scheme may involve TiCDC and TiDB.
Referring to fig. 2, a flow chart of a TiCDC-based data table synchronization scheme according to an embodiment of the present invention is shown.
In step 201, an initial synchronization task is created.
The initial synchronization tasks may include, but are not limited to: a plurality of data tables to be synchronized, a plurality of Capture for synchronizing the plurality of data tables, and each synchronization period.
Step 202, setting the weight of each data table to 1, and equally distributing all the data tables to a plurality of Capture.
In practical applications, the entire data table may be equally distributed to multiple Captures by the Owner.
And 203, acquiring the weight of the data table from the Capture, and carrying out load balancing processing according to the weight to obtain a synchronous task.
In each synchronization period, capture can count the number of event that each data table allocated to itself succeeds in synchronization in the present synchronization period, and report the number of event that succeeds in synchronization as the weight of the data table to the PD component. The Owner obtains the weight of the data table from the PD component, and performs load balancing processing according to the weight to obtain a synchronous task.
Step 204, upload the sync task to the PD component.
The Owner uploads the sync task to the PD component.
In step 205, each Capture obtains a synchronization task from the PD component.
Specifically, each Capture may obtain a data table from the PD component that each needs to be synchronized.
Step 206, starting synchronization and counting the weight of each data table.
In each synchronization period, capture needs to count the number of event that each data table allocated to Capture has successful synchronization in the synchronization period, and the counted number of event that has successful synchronization is used as the weight of the next synchronization period.
Step 207, determining whether the synchronous task scheduling event is triggered.
When the synchronous task scheduling event is triggered, step 208 is performed; if the sync task scheduling event is not triggered, step 206 is re-executed.
In step 208, each Capture reports the counted event number with successful synchronization to the PD component.
In step 209, the owner obtains the number of event that each data table is successfully synchronized from the PD component, and uses the number of event that the synchronization is successful as the weight of the data table.
Step 209 is followed by a further execution of step 203.
Referring to fig. 3, a flow diagram of a TiCDC-based data table synchronization task generation scheme according to an embodiment of the present invention is shown.
Step 301, obtaining the weight of each data table.
Step 302, arranging all data tables in descending order according to the weight.
Step 303, traversing the entire data table.
And 304, distributing the traversed data table to the Capture with the least current load data, and adding the load data with the least current load data and the weight of the traversed data table to obtain new load data.
Step 305, determining whether all the data tables are traversed.
If all the data tables are traversed, executing step 306; if all the data tables are not traversed, step 304 is performed.
And step 306, generating a synchronous task and reporting the synchronous task to the PD component.
The synchronization tasks may include, but are not limited to: each Capture requires a synchronized data table.
In an exemplary embodiment of the invention, for example, there are 3 Captures in a TiDB cluster, and a synchronization task requires incremental synchronization of 30 data tables. When an incremental sync task is created, 30 data tables are first distributed to 3 Capture on average for synchronization. In the synchronization process, capture counts the number of event completed by each data table; when the time reaches one synchronization period, each Captrue reports the counted number of events to the PD component. The number of events that the Owner obtains each data table from the PD component is: if the number of events in the data table b1 is 5000, the number of events in the data table b2 is 4800, and the number of events in each of the data tables b3 to b30 is 10, the following synchronization tasks are allocated after the load balancing process: capture1 is responsible for synchronizing data table b1, capture2 is responsible for synchronizing data table b2, and Capure3 is responsible for synchronizing data tables b3 through b30. The Owner reports the allocation scheme to the PD component, and then each Capture obtains from the PD component their respective data tables responsible for synchronization, and then the progress of each data table continues to be synchronized last time.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Referring to fig. 4, a block diagram of a data table synchronization system according to an embodiment of the present invention is shown, where the data table synchronization system is applied to a distributed incremental synchronization tool of a distributed fusion database cluster, and the data table synchronization system may specifically include the following modules.
A weight obtaining module 41, configured to obtain a weight of each data table to be synchronized;
a data table ordering module 42, configured to order all the data tables in descending order according to the weights;
the load balancing module 43 is configured to traverse all the data tables, distribute the traversed target data table to a target capturer of the multiple capturers of the distributed incremental synchronization tool, and determine new load data of the target capturer according to current load data of the target capturer and weights of the target data table, where the load data represents weights of the corresponding data tables to be synchronized by the capturer;
and the load reporting module 44 is configured to report the load data of each capturer to the arrangement driving component of the distributed fusion database cluster, so that the arrangement driving component issues the synchronization task of all the data tables to a plurality of capturers according to the load data.
In an exemplary embodiment of the invention, the system further comprises:
the data table average allocation module is configured to, before the weight obtaining module 41 obtains the weight of each data table to be synchronized, average allocate all the data tables to the plurality of capturers.
In an exemplary embodiment of the present invention, the weight obtaining module 41 includes:
the event number acquisition module is used for acquiring the number of events of successful synchronization of each data table in the current synchronization period from each capturer;
and the data table weight determining module is used for taking the event number of each data table as the weight of each data table.
In an exemplary embodiment of the present invention, the load balancing module 43 includes:
and the target data table distribution module is used for distributing the target data table with the largest weight in all the traversed data tables to the target capturer with the least load data in the plurality of capturers.
In an exemplary embodiment of the present invention, the load balancing module 43 includes:
and the load weight adding module is used for adding the current load data of the target capturer and the weight of the target data table to obtain new load data of the target capturer.
In an exemplary embodiment of the invention, the system further comprises:
and the weight initialization module is used for setting the weight of all the data tables to be a preset initial weight value before the data table average distribution module distributes all the data tables to a plurality of the capturers in an average way.
In an exemplary embodiment of the invention, the system further comprises:
the traversal completion judging module is configured to judge whether all the data tables are traversed before the load reporting module 44 reports the load data of each capturer to the arrangement driving component of the distributed fusion database cluster;
the load reporting module 44 is configured to report, when all the data tables are traversed, load data of each capturer to an arrangement driving component of the distributed fusion database cluster;
the load balancing module 43 is configured to traverse all the data tables when all the data tables are not traversed.
For system embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference is made to the description of method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing has described in detail a data table synchronization method and a data table synchronization system provided by the present invention, and specific examples have been applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only for aiding in understanding the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (15)

1. A method of synchronizing data tables, characterized by a distributed incremental synchronization tool applied to a distributed fusion database cluster, the method comprising:
acquiring the weight of each data table to be synchronized, wherein the value of the number of successful synchronization events of each data table is used as the weight of the data table, and the weight of the data table is an integer;
descending order of all the data tables is carried out according to the weights;
traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of the distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables, wherein the load data represents the weight of the corresponding data tables to be synchronized of the capturers;
and reporting the load data of each capturer to an arrangement driving component of the distributed fusion database cluster, so that the arrangement driving component can send the synchronous task of all the data tables to a plurality of capturers according to the load data.
2. The method of claim 1, wherein prior to the obtaining the weight of each data table to be synchronized, the method further comprises:
all of the data tables are equally distributed to a plurality of the traps.
3. The method of claim 2, wherein the obtaining the weight of each data table to be synchronized comprises:
acquiring the number of events of successful synchronization of each data table in the current synchronization period from each capturer;
and taking the number of the events of each data table as the weight of each data table.
4. The method of claim 1, wherein the assigning the traversed target data table to a target capturer of the plurality of capturers of the distributed incremental synchronization tool comprises:
and distributing the target data table with the largest weight in all the traversed data tables to the target capturer with the least load data in the plurality of capturers.
5. The method of claim 1, wherein the determining new load data for the target capturer based on the current load data for the target capturer and the weight of the target data table comprises:
and adding the current load data of the target capturer with the weight of the target data table to obtain new load data of the target capturer.
6. The method of claim 2, wherein prior to said equally distributing all of said data tables to a plurality of said capturers, said method further comprises:
and setting the weight of all the data tables as a preset initial weight value.
7. The method of claim 1, wherein prior to said reporting load data for each of said capturers to the placement driver component of the distributed fusion database cluster, the method further comprises:
judging whether all the data tables are traversed;
if all the data tables are traversed, executing the operation of reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster;
and if not, continuing to execute the operation of traversing all the data tables.
8. A synchronization system for a data table, characterized by a distributed incremental synchronization tool for a distributed fusion database cluster, the system comprising:
the weight acquisition module is used for acquiring the weight of each data table to be synchronized, wherein the value of the number of successful synchronization events of each data table is used as the weight of the data table, and the weight of the data table is an integer;
the data table ordering module is used for ordering all the data tables in a descending order according to the weight;
the load balancing module is used for traversing all the data tables, distributing the traversed target data tables to target capturers in a plurality of capturers of the distributed incremental synchronization tool, and determining new load data of the target capturers according to the current load data of the target capturers and the weight of the target data tables, wherein the load data represents the weight of the corresponding data tables to be synchronized of the capturers;
and the load reporting module is used for reporting the load data of each capturer to the arrangement driving component of the distributed fusion database cluster, so that the arrangement driving component can send the synchronous tasks of all the data tables to a plurality of capturers according to the load data.
9. The system of claim 8, wherein the system further comprises:
and the data table average distribution module is used for evenly distributing all the data tables to a plurality of capturers before the weight acquisition module acquires the weight of each data table to be synchronized.
10. The system of claim 9, wherein the weight acquisition module comprises:
the event number acquisition module is used for acquiring the number of events of successful synchronization of each data table in the current synchronization period from each capturer;
and the data table weight determining module is used for taking the event number of each data table as the weight of each data table.
11. The system of claim 8, wherein the load balancing module comprises:
and the target data table distribution module is used for distributing the target data table with the largest weight in all the traversed data tables to the target capturer with the least load data in the plurality of capturers.
12. The system of claim 8, wherein the load balancing module comprises:
and the load weight adding module is used for adding the current load data of the target capturer and the weight of the target data table to obtain new load data of the target capturer.
13. The system of claim 9, wherein the system further comprises:
and the weight initialization module is used for setting the weight of all the data tables to be a preset initial weight value before the data table average distribution module distributes all the data tables to a plurality of the capturers in an average way.
14. An electronic device, comprising:
one or more processors; and
one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the electronic device to perform the method of synchronizing data tables of any of claims 1 to 7.
15. A computer readable storage medium storing a computer program for causing a processor to perform the method of synchronizing data tables according to any of claims 1 to 7.
CN202310050243.5A 2023-02-01 2023-02-01 Data table synchronization method, system, electronic equipment and storage medium Active CN115794837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310050243.5A CN115794837B (en) 2023-02-01 2023-02-01 Data table synchronization method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310050243.5A CN115794837B (en) 2023-02-01 2023-02-01 Data table synchronization method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115794837A CN115794837A (en) 2023-03-14
CN115794837B true CN115794837B (en) 2023-06-23

Family

ID=85429439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310050243.5A Active CN115794837B (en) 2023-02-01 2023-02-01 Data table synchronization method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115794837B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778225B (en) * 2015-03-27 2017-12-12 浙江大学 A kind of method of synchrodata in more storage systems of unstructured data
CN108712508A (en) * 2018-06-06 2018-10-26 亚信科技(中国)有限公司 A kind of load-balancing method and device
CN109032801B (en) * 2018-07-26 2022-02-18 郑州云海信息技术有限公司 Request scheduling method, system, electronic equipment and storage medium
CN109726191B (en) * 2018-12-12 2021-02-02 中国联合网络通信集团有限公司 Cross-cluster data processing method and system and storage medium
CN112506669B (en) * 2021-01-29 2021-06-18 浙江大华技术股份有限公司 Task allocation method and device, storage medium and electronic equipment
CN113259470B (en) * 2021-06-03 2021-09-24 长视科技股份有限公司 Data synchronization method and data synchronization system
CN113626516A (en) * 2021-06-21 2021-11-09 中科恒运股份有限公司 Data increment synchronization method and system

Also Published As

Publication number Publication date
CN115794837A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN106817408B (en) Distributed server cluster scheduling method and device
CN104050042B (en) The resource allocation methods and device of ETL operations
CN106843745A (en) Capacity expansion method and device
CN102025753B (en) Load balancing method and equipment for data resources of servers
CN107515786A (en) Resource allocation methods, master device, from device and distributed computing system
CN112463375A (en) Data processing method and device
CN110941602B (en) Database configuration method and device, electronic equipment and storage medium
CN101719079A (en) Method and device for processing tasks
CN112579692B (en) Data synchronization method, device, system, equipment and storage medium
CN103810045A (en) Resource allocation method, resource manager, resource server and system
CN111324435A (en) Distributed task scheduling and registering method, device and distributed task scheduling system
CN112463395A (en) Resource allocation method, device, equipment and readable storage medium
CN110868435B (en) Bare metal server scheduling method and device and storage medium
CN104410511B (en) A kind of server management method and system
CN115951983A (en) Task scheduling method, device and system and electronic equipment
CN106156198A (en) Task executing method based on distributed data base and device
CN115794837B (en) Data table synchronization method, system, electronic equipment and storage medium
WO2017161820A1 (en) Server grouping management method, device, and electronic apparatus
CN102932389B (en) A kind of request processing method, device and server system
CN111064586B (en) Distributed parallel charging method
CN110868330B (en) Evaluation method, device and evaluation system for CPU resources which can be divided by cloud platform
CN109032779B (en) Task processing method and device, computer equipment and readable storage medium
CN103338246A (en) Method and system for selecting virtual machine in allocation process of infrastructure construction cloud resource
CN111008071A (en) Task scheduling system, method and server
CN108400999B (en) Load balancing method and device for mirror image nodes of database cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing

Patentee after: Tianyiyun Technology Co.,Ltd.

Address before: 100093 Floor 4, Block E, Xishan Yingfu Business Center, Haidian District, Beijing

Patentee before: Tianyiyun Technology Co.,Ltd.

CP02 Change in the address of a patent holder