CN109388677B - Method, device and equipment for synchronizing data among clusters and storage medium thereof - Google Patents

Method, device and equipment for synchronizing data among clusters and storage medium thereof Download PDF

Info

Publication number
CN109388677B
CN109388677B CN201810978213.XA CN201810978213A CN109388677B CN 109388677 B CN109388677 B CN 109388677B CN 201810978213 A CN201810978213 A CN 201810978213A CN 109388677 B CN109388677 B CN 109388677B
Authority
CN
China
Prior art keywords
partition
offset
target
data
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810978213.XA
Other languages
Chinese (zh)
Other versions
CN109388677A (en
Inventor
陈文彪
林国峰
曾宪成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201810978213.XA priority Critical patent/CN109388677B/en
Publication of CN109388677A publication Critical patent/CN109388677A/en
Application granted granted Critical
Publication of CN109388677B publication Critical patent/CN109388677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Abstract

The application discloses a method, a device and equipment for synchronizing data among clusters and a storage medium thereof. The method comprises the following steps: reading a target message offset of a first partition of a first topic of a target cluster; comparing the target message offset with the earliest message offset of a second partition of a second theme of the source cluster, wherein the second theme and the first theme have the same theme name, and the first partition and the second partition have the same partition serial number; if the target message offset is less than the earliest message offset, filling data into the first partition; and synchronizing the primary copy of the second partition to the first partition. According to the technical scheme of the embodiment of the application, the problem of uneven distribution of the theme zone data of the source cluster and the target cluster in the prior art is solved through data filling processing.

Description

Method, device and equipment for synchronizing data among clusters and storage medium thereof
Technical Field
The present application relates generally to the field of big data processing technologies, and in particular, to the field of kafka technologies, and in particular, to a method, an apparatus, a device, and a storage medium for synchronizing data among clusters.
Background
With the development of big data, for example, massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, and the like are continuously updated.
Kafka is a high throughput distributed publish-subscribe messaging system that supports differentiation of messages by Kafka server and consumer clusters. There are instances when the subject partition message of the target cluster and the subject partition message of the source cluster are not evenly distributed between consumer clusters using existing synchronization tools, such as the MirrorMaker tool.
Disclosure of Invention
In view of the above-mentioned drawbacks and deficiencies of the prior art, it is desirable to provide a solution for data synchronization between clusters.
In a first aspect, an embodiment of the present application provides a method for synchronizing data between clusters, where the method includes:
reading a target message offset of a first partition of a first topic of a target cluster;
comparing the target message offset with the earliest message offset of a second partition of a second theme of the source cluster, wherein the second theme and the first theme have the same theme name, and the first partition and the second partition have the same partition serial number;
if the target message offset is less than the earliest message offset, filling data into the first partition; and
the primary copy of the second partition is synchronized to the first partition.
In a second aspect, an embodiment of the present application provides an apparatus for synchronizing data between clusters, where the apparatus includes:
a target offset reading unit, configured to read a target message offset of a first partition of a first topic of a target cluster;
the offset comparison unit is used for comparing the target message offset with the earliest message offset of a second partition of a second theme of the source cluster, wherein the second theme and the first theme have the same theme name, and the first partition and the second partition have the same partition serial number;
a data padding unit for padding data into the first partition if the current message offset is less than the earliest message offset;
and the synchronization unit is used for synchronizing the primary copy of the second partition to the first partition.
In a third aspect, embodiments of the present application provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method as described in embodiments of the present application when executing the program.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the computer program being configured to:
which when executed by a processor performs the method as described in embodiments of the present application.
According to the technical scheme of data synchronization provided by the embodiment of the application, when the problem of data synchronization between kafka clusters is solved, the problem that the offsets of the same partitions with the same theme between the clusters cannot be in one-to-one correspondence in the prior art is solved through data filling processing.
Furthermore, the progress of data synchronization is monitored by acquiring the state that the master copy is written into the partition, so that a user can conveniently check the data, and the experience of the user is improved. And the efficiency of data processing is improved by judging whether the partition offset is stored or not. The data security and consistency are also ensured through the data of the primary copy.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a schematic flowchart illustrating a method for synchronizing data between clusters according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating a method for synchronizing data between clusters according to another embodiment of the present application;
fig. 3 shows a schematic structural block diagram of an inter-cluster data synchronization apparatus provided in the embodiment of the present application;
fig. 4 shows a schematic structural block diagram of an inter-cluster data synchronization apparatus according to yet another embodiment of the present application;
fig. 5 shows a schematic structural diagram of a computer system suitable for implementing the terminal device of the embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a data synchronization method between clusters according to an embodiment of the present application.
As shown in fig. 1, the method includes:
step 110, read the target message offset of the first partition of the first topic of the target cluster.
The data is divided into different subjects according to types, and the subjects store the data of the same type in different partitions. Each partition is composed of a series of ordered, immutable messages that are appended consecutively to the partition, each message in the partition containing a consecutive sequence number, namely: an offset. The offset is used to determine the unique location of the message in the partition. For example, messages for the kafka partition are stored in the partition in increasing order, with an offset for each message to record the amount of messages currently stored by the partition. In addition to recording the number of partition stored messages, the messages of the kafka partition may also set a save time, and when a message exceeds the save time, the message is cleared, but the offset corresponding to the message is preserved. And the occupied disk is reduced by regularly clearing or deleting the consumed messages. Useless files are deleted quickly, and the utilization rate of the disk is effectively improved.
In the embodiment of the application, in the process of executing data synchronization, the processing device reads the offset of the target message of the first partition of the first topic of the target cluster. The target cluster is the target object to be synchronized. The first topic represents, for example, any one topic in the target cluster. The first partition represents, for example, any one of partitions under the first theme. Note that the terms first, second, and the like appearing in the embodiments of the present application are merely interpreted as differences in the same technology, and are not to be construed as limitations on the order or timing of operations.
In this embodiment of the present application, the target message offset is, for example, an offset of a current message of the target cluster.
Step 120, comparing the target message offset with an earliest message offset of a second partition of a second topic of the source cluster, where the second topic and the first topic have the same topic name, and the first partition and the second partition have the same partition number.
In the embodiment of the application, after the earliest offset of the second partition of the second topic of the source cluster is determined, the sizes of the target offset and the earliest message offset are compared. Wherein the source cluster is a source object of the synchronization process. I.e. to synchronize data from a source object to a target object. The second topic has the same topic name as the first topic of the target cluster, i.e. it means that the first topic and the second topic represent the same topic. The second partition has the same partition number as the first partition, and the set indicates that the second partition has the same partition number as the first partition. For example, the message corresponding to subject A partition 0 of the source kafka cluster is synchronized to subject A partition 0 of the target kafka cluster.
If the target message offset is less than the earliest message offset, the data is filled into the first partition, step 130.
In the embodiment of the application, after the consumer sequentially consumes the messages of the partition 0 of the theme a from the source cluster, the messages of the partition 0 of the theme a acquired from the source cluster are gradually generated to the target cluster. However, the messages stored by the subject partition are typically time-bounded, and when a message is consumed from the source cluster, the offset of a certain partition that results in a certain subject of the source cluster is not from an initial value, but rather is offset by a time-bounded process. For example, when the message offset for partition 2 of topic A of the source cluster is an ordered sequence value interval [2,8], where 2 is the starting offset, also referred to as the earliest offset. When the starting offset is compared to the target offset of the target cluster, the target offset represents the current offset of the target cluster. And if the starting offset is larger than the target offset, filling the target partition of the target subject in the target cluster by adopting a filling processing mode. The data can be filled in, for example, manually set, or by calling a fill function. So that the offset of the source cluster corresponds to the offset of the same subject partition of the target cluster one to one.
In this embodiment, the data padding may be invoked, for example, in the following manner:
when the offset of the target message is smaller than the offset of the earliest message, the program synchronously calls a filling message module, the filling message module reads a predefined String type message in the configuration file, the type is converted into byte type data, the message data can be compressed to improve the filling rate, an API (application programming interface) of the product of Kafka is called to write the data into a partition of a theme corresponding to the target cluster, and after the data is written into the partition corresponding to the target theme of the target cluster, the program asynchronously returns the result of whether the data filling is successful. If the data is successfully written, the data filling is stopped and the program is exited.
Step 140 synchronizes the primary copy of the second partition to the first partition.
In the embodiment of the application, after the offset mapping relation corresponding to one another is established, the messages of the same partition with the same theme are synchronized to the target partition from the node stored in the master copy of the source cluster.
According to the embodiment of the application, the one-to-one correspondence relationship of the offset is established, so that the theme partitions of the source cluster and the theme partitions of the target cluster can correspond to each other, and the accuracy of data synchronization is improved.
Furthermore, the embodiment of the application also provides a technical scheme capable of checking the data synchronization progress in the data synchronization process. Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a method for synchronizing data between clusters according to an embodiment of the present application.
As shown in fig. 2, the method includes:
step 210, read a target message offset of a first partition of a first topic of a target cluster.
Step 220, comparing the target message offset with the earliest message offset of a second partition of a second topic of the source cluster, where the second topic has the same topic name as the first topic, and the first partition and the second partition have the same partition sequence number.
If the target message offset is less than the earliest message offset, the data is populated into the first partition, step 230.
At step 240, the node where the primary copy of the second partition is stored is determined.
Step 250, obtaining a partition offset of the primary copy of the second partition, wherein a starting position of the partition offset is an earliest message offset.
Step 260, the slave node obtains the master copy corresponding to the partition offset.
Step 270, write the primary copy into the first partition.
Step 280, obtain the status of the primary copy written into the first partition.
Wherein steps 210-230 are the same as steps 110-130, and the implementation is understood with reference to the description of steps 110-130.
In an embodiment of the application, the processing device determines, after completing data population of the first partition, a node where a primary copy of a second partition of a second topic of the source cluster is stored. And determining a partition offset for the second partition, wherein the partition offset is understood to be the total amount of messages for the second partition, i.e. from the starting position to the ending position. In general, the starting position is 0, and the specific sequence number value of the message offset corresponding to the ending position can directly represent the total amount of the message. However, the start position may not be 0 due to the time limit for storing the message. The partition offset starts with the earliest message offset. The cut-off position may be the position of the last offset.
After the node where the master copy is stored and the partition offset are determined, the master copy corresponding to the partition offset is obtained from the node according to a consumer mode (consumer), and then the master copy is pushed to a target partition according to a producer mode (producer), namely, the process of synchronizing data is completed.
In Kafka, if there are N copies per subject partition, kafka implements failover via a multi-copy mechanism, thereby ensuring the security of the data, and the copies are stored on different nodes of the cluster. For data operation of the partition of the subject, all nodes of the partition of the subject need to be operated, so that the consistency of the data is maintained.
After reading the message of the partition from the kafka server, the consumer stores the offset of the message in the partition, and when reading the message of the partition again next time, the consumer decides which offset of the partition to start reading according to whether to execute consumption action (commit) on the message or not. If consumption activity occurs, it is started from the next sequence number of the offset position. If no consumption activity has occurred, then starting from the location of the offset after the previous consumption activity.
After the consumer reads the message, the message and the offset are stored accordingly. When a producer issues a message to a certain partition, the producer needs to find out a node where a primary copy of the partition is located, and then the producer only issues the message to the node where the primary copy is located, and nodes where other copies are located keep data consistency through the node where the primary copy is located.
After determining the node where the primary copy is stored, the content of the primary copy is obtained according to the partition offset of the primary copy, and then the content of the primary copy is written into a target partition (i.e., a target partition of a target cluster to be synchronized).
On the basis of the above embodiment, the state in which the primary copy is written into the first partition may also be acquired. In order to improve the experience degree of a user, the condition that the data of the partition of the theme is written into the target partition of the target theme is returned to a producer by calling a data push callback function, so that the progress of data synchronization is conveniently checked.
On the basis of the above embodiments of the present application, the data processing efficiency is further improved. After the partition offset of the primary copy of the second partition is obtained, whether the partition offset is stored is judged. For example, the method may further comprise:
step 250a, judging whether the partition offset is stored in a partition offset register unit;
step 205b, if yes, reading the partition offset stored in the partition offset register unit;
step 250c, if not, invoking the partition offset obtaining unit to obtain the partition offset of the primary copy of the second partition through the partition offset obtaining unit, and storing the partition offset to the partition offset registering unit.
By judging whether the partition offset is stored in the corresponding storage device or not, the processing time of the data is further saved, and the data processing efficiency is improved. A storage device, for example, zookeeper, a partition offset register unit, or the like.
It should be noted that while the operations of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Further referring to fig. 3, a schematic block diagram of an inter-cluster data synchronization apparatus 300 according to an embodiment of the present application is shown.
As shown in fig. 3, the apparatus 300 includes:
a target offset reading unit 310, configured to read a target message offset of a first partition of a first subject of a target cluster.
The data is divided into different themes according to types, and the themes store the data of the same type in different partitions. Each partition is composed of a series of ordered, immutable messages that are appended consecutively to the partition, each message in the partition containing a consecutive sequence number, namely: an offset. The offset is used to determine the unique location of the message in the partition. For example, messages for the kafka partition are stored in the partition in increasing order, with an offset for each message to record the amount of messages currently stored by the partition. In addition to recording the number of partition stored messages, the messages of the kafka partition may also set a save time, and when a message exceeds the save time, the message is cleared, but the offset corresponding to the message is preserved. And the occupied disk is reduced by regularly clearing or deleting the consumed messages. Useless files are deleted quickly, and the utilization rate of the disk is effectively improved.
In the embodiment of the application, in the process of executing data synchronization, the processing device reads the offset of the target message of the first partition of the first topic of the target cluster. The target cluster is the target object to be synchronized. The first topic represents, for example, any one topic in the target cluster. The first partition represents, for example, any one of partitions under the first theme. Note that the terms first, second, and the like appearing in the embodiments of the present application are merely interpreted as differences in the same technology, and are not to be construed as limitations on the order or timing of operations.
In this embodiment of the present application, the target message offset is, for example, an offset of a current message of the target cluster.
An offset comparing unit 320, configured to compare the target message offset with an earliest message offset of a second partition of a second topic of the source cluster, where the second topic and the first topic have the same topic name, and the first partition and the second partition have the same partition sequence number.
In the embodiment of the application, after the earliest offset of the second partition of the second topic of the source cluster is determined, the sizes of the target offset and the earliest message offset are compared. Wherein the source cluster is a source object of the synchronization process. I.e. to synchronize data from a source object to a target object. The second topic has the same topic name as the first topic of the target cluster, i.e. it means that the first topic and the second topic represent the same topic. The second partition has the same partition number as the first partition, and the set indicates that the second partition has the same partition number as the first partition. For example, the message corresponding to subject A partition 0 of the source kafka cluster is synchronized to subject A partition 0 of the target kafka cluster.
A data padding unit 330 for padding data to the first partition if the target message offset is less than the earliest message offset.
In the embodiment of the application, after the consumer sequentially consumes the messages of the partition 0 of the theme a from the source cluster, the messages of the partition 0 of the theme a acquired from the source cluster are gradually generated to the target cluster. However, the messages stored by the subject partition are typically time-bounded, and when a message is consumed from the source cluster, the offset of a certain partition that results in a certain subject of the source cluster is not from an initial value, but rather is offset by a time-bounded process. For example, when the message offset for partition 2 of topic A of the source cluster is an ordered sequence value interval [2,8], where 2 is the starting offset, also referred to as the earliest offset. When the starting offset is compared to the target offset of the target cluster, the target offset represents the current offset of the target cluster. And if the starting offset is larger than the target offset, filling the target partition of the target subject in the target cluster by adopting a filling processing mode. The data can be filled in, for example, manually set, or by calling a fill function. So that the offset of the source cluster corresponds to the offset of the same subject partition of the target cluster one to one.
In this embodiment of the present application, the data padding may be invoked, for example, in the following manner:
when the offset of the target message is smaller than the offset of the earliest message, the program synchronously calls a filling message module, the filling message module reads a predefined String type message in the configuration file, the type is converted into byte type data, the message data can be compressed to improve the filling rate, an API (application programming interface) of the product of Kafka is called to write the data into a partition of a theme corresponding to the target cluster, and after the data is written into the partition corresponding to the target theme of the target cluster, the program asynchronously returns the result of whether the data filling is successful. If the data is successfully written, the data filling is stopped and the program is exited.
A synchronization unit 340 for synchronizing the primary copy of the second partition to the first partition.
In the embodiment of the application, after the offset mapping relation corresponding to one another is established, the messages of the same partition with the same theme are synchronized to the target partition from the nodes stored in the master copy of the source cluster.
According to the embodiment of the application, the one-to-one correspondence relationship of the offset is established, so that the theme partitions of the source cluster and the theme partitions of the target cluster can correspond to each other, and the accuracy of data synchronization is improved.
Furthermore, the embodiment of the application also provides a technical scheme capable of checking the data synchronization progress in the data synchronization process. Referring to fig. 4, fig. 4 shows a schematic structural block diagram of an inter-cluster data synchronization apparatus 400 according to an embodiment of the present application.
As shown in fig. 4, the apparatus 400 includes:
a target offset reading unit 410, configured to read a target message offset of a first partition of a first topic of a target cluster.
An offset comparing unit 420, configured to compare the target message offset with an earliest message offset of a second partition of a second topic of the source cluster, where the second topic has the same topic name as the first topic, and the first partition and the second partition have the same partition sequence number.
A data padding unit 430 for padding data to the first partition if the target message offset is less than the earliest message offset.
A determining subunit 440 determines the node where the primary copy of the second partition is stored.
A partition offset obtaining subunit 450, configured to obtain a partition offset of the primary copy of the second partition, where a starting position of the partition offset is an earliest message offset;
a master copy obtaining subunit 460, configured to obtain, from the node, a master copy corresponding to the partition offset;
a write subunit 470 is configured to write the primary copy to the first partition.
The data push callback unit 480 is configured to obtain a state that the primary copy is written into the first partition.
Wherein target offset reading unit 410-data padding unit 430 are the same as target offset reading unit 310-data padding unit 330, the implementation is understood with reference to the description of target offset reading unit 310-data padding unit 330.
In an embodiment of the application, the processing device determines, after completing data population of the first partition, a node where a primary copy of a second partition of a second subject of the source cluster is stored. And determining a partition offset for the second partition, wherein the partition offset is understood to be the total amount of messages for the second partition, i.e. from the starting position to the ending position. In general, the starting position is 0, and the specific sequence number value of the message offset corresponding to the ending position can directly represent the total amount of the message. However, the start position may not be 0 due to the time limit for storing the message. The partition offset starts with the earliest message offset. The cut-off position may be the position of the last offset.
After determining the node where the primary copy is stored and the partition offset, obtaining the primary copy corresponding to the partition offset from the node according to a consumer mode (consumer), and then pushing the primary copy to a target partition according to a producer mode (producer), namely completing the process of synchronizing data.
In Kafka, if there are N copies per subject partition, kafka implements failover via a multi-copy mechanism, thereby ensuring the security of the data, and the copies are stored on different nodes of the cluster. For data operation of the partition of the subject, all nodes of the partition of the subject need to be operated, so that the consistency of the data is maintained.
After reading the message of the partition from the kafka server, the consumer stores the offset of the message in the partition, and when reading the message of the partition again next time, the consumer decides which offset of the partition to start reading according to whether to execute consumption action (commit) on the message or not. If consumption occurs, it starts with the next sequence number of the offset's position. If no consumption activity has occurred, then starting from the location of the offset after the previous consumption activity.
After the consumer reads the message, the message and the offset are stored correspondingly. When a producer issues a message to a certain partition, the producer needs to find out a node where a primary copy of the partition is located, and then the producer only issues the message to the node where the primary copy is located, and nodes where other copies are located keep data consistency through the node where the primary copy is located.
After determining the node where the primary copy is stored, the content of the primary copy is obtained according to the partition offset of the primary copy, and then the content of the primary copy is written into a target partition (i.e., a target partition of a target cluster to be synchronized).
On the basis of the above embodiment, the state in which the primary copy is written into the first partition may also be acquired. In order to improve the experience degree of a user, the condition that the data of the partition of the theme is written into the target partition of the target theme is returned to a producer by calling a data push callback function, so that the progress of data synchronization is conveniently checked.
On the basis of the above embodiments of the present application, the data processing efficiency is further improved. After the partition offset of the primary copy of the second partition is acquired, whether the partition offset is stored is judged. For example, the apparatus may further include:
a judging subunit 450a, configured to judge whether the partition offset is stored in the partition offset registering unit;
a reading subunit 450b, configured to read, if the partition offset exists, the partition offset stored in the partition offset register unit;
a calling subunit 450c, configured to, if not, call the partition offset obtaining unit to obtain, by the partition offset obtaining unit, a partition offset of the primary copy of the second partition, and store the partition offset to the partition offset registering unit.
By judging whether the partition offset is stored in the corresponding storage device or not, the processing time of the data is further saved, and the data processing efficiency is improved. Storage means, such as Zookeeper, partition offset register unit, and the like.
In the embodiment of the present application, the determining subunit 440, the first acquiring subunit 450, the second acquiring subunit 460, and the writing subunit 470 may be implemented by a synchronization unit, for example. And, the judging subunit 450a, the reading subunit 450b, the calling subunit 450c, and other subunits are implemented by the synchronization unit as optional components to realize their functions as a whole.
It should be understood that the units or modules recited in the devices 300-400 correspond to various steps in the methods described with reference to fig. 1-2. Thus, the operations and features described above with respect to the methods are equally applicable to the apparatuses 300-400 and the units included therein and will not be described again here. The apparatuses 300-400 may be implemented in a browser or other security applications of the electronic device in advance, or may be loaded into the browser or other security applications of the electronic device by downloading or the like. Corresponding elements in the apparatus 300-400 may cooperate with elements in the electronic device to implement aspects of embodiments of the present application.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted on the storage section 508 as necessary.
In particular, the processes described above with reference to fig. 1/2 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method of fig. 1-2. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a target offset reading unit, an offset comparing unit, a data padding unit, and a synchronizing unit. Where the names of these units or modules do not in some cases constitute a limitation of the unit or module itself, for example, a synchronization unit may also be described as a "unit for synchronizing the primary copy of the second partition to the first partition".
As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the foregoing device in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the inter-cluster data synchronization methods described herein.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for synchronizing data between clusters, the method comprising:
reading a target message offset of a first partition of a first topic of a target cluster, the target message offset representing a current offset of the target cluster;
comparing the target message offset to an earliest message offset for a second partition of a second topic of a source cluster, the earliest message offset representing a starting offset for the source cluster, the second topic having a same topic name as the first topic, the first partition having a same partition number as the second partition;
if the target message offset is smaller than the earliest message offset, filling data into the first partition, wherein the source of the filling data is obtained by manually setting or calling a filling function, so that the offset of the source cluster corresponds to the offset of the target cluster in the same subject partition one by one; and
synchronizing a primary copy of the second partition to the first partition.
2. The method of claim 1, wherein synchronizing the primary copy of the second partition to the first partition comprises:
determining a node where a primary copy of the second partition is stored;
acquiring a partition offset of a main copy of the second partition, wherein the starting position of the partition offset is the earliest message offset;
acquiring a primary copy corresponding to the partition offset from the node;
writing the primary copy to the first partition.
3. The method of claim 2, wherein after obtaining the partition offset for the primary replica of the second partition, the method further comprises:
judging whether the partition offset is stored in a partition offset register unit or not;
if the partition offset exists, the partition offset stored in the partition offset register unit is read;
if the partition offset does not exist, calling a partition offset acquisition unit to acquire the partition offset of the primary copy of the second partition through the partition offset acquisition unit, and storing the partition offset to the partition offset register unit.
4. The method of claim 2, wherein after writing the primary copy to the first partition, the method comprises:
obtaining a state in which the primary replica is written to the first partition.
5. The method of any of claims 2-4, wherein writing the primary replica to the first partition comprises:
pushing the primary replica to the first partition.
6. An apparatus for synchronizing data between clusters, the apparatus comprising:
a target offset reading unit, configured to read a target message offset of a first partition of a first topic of a target cluster, where the target message offset represents a current offset of the target cluster;
an offset comparison unit, configured to compare the target message offset with an earliest message offset of a second partition of a second topic of a source cluster, where the earliest message offset represents a starting offset of the source cluster, the second topic has a same topic name as the first topic, and the first partition and the second partition have a same partition number;
a data padding unit, configured to pad data into the first partition if the target message offset is smaller than the earliest message offset, where a source of the pad data is obtained by manually setting or calling a padding function, so that offsets of the source cluster correspond to offsets of partitions with the same theme as the target cluster one to one;
a synchronization unit to synchronize a primary replica of the second partition to the first partition.
7. The apparatus of claim 6, wherein the synchronization unit comprises:
a determining subunit, configured to determine a node where the primary copy of the second partition is stored;
a partition offset obtaining subunit, configured to obtain a partition offset of the primary copy of the second partition, where a starting position of the partition offset is the earliest message offset;
a master copy obtaining subunit, configured to obtain, from the node, a master copy corresponding to the partition offset;
and the writing subunit is used for writing the primary copy into the first partition.
8. The apparatus of claim 7, wherein after the first acquiring subunit, the synchronizing unit further comprises:
a judgment subunit, configured to judge whether the partition offset is stored in the partition offset registering unit;
a reading subunit, configured to read, if the partition offset exists, the partition offset stored in the partition offset register unit;
and the calling subunit is used for calling the partition offset acquisition unit if the partition offset acquisition unit does not exist, so as to acquire the partition offset of the primary copy of the second partition through the partition offset acquisition unit.
9. The apparatus of claim 7, wherein after the writing subunit, the apparatus further comprises:
and the data push callback unit is used for acquiring the state that the primary copy is written into the first partition.
10. The apparatus of any of claims 7-9, wherein the write subunit comprises:
and the data pushing subunit is used for pushing the primary copy to the first partition.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the program.
12. A computer-readable storage medium having stored thereon a computer program for:
the computer program, when executed by a processor, implements the method of any one of claims 1-5.
CN201810978213.XA 2018-08-23 2018-08-23 Method, device and equipment for synchronizing data among clusters and storage medium thereof Active CN109388677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810978213.XA CN109388677B (en) 2018-08-23 2018-08-23 Method, device and equipment for synchronizing data among clusters and storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810978213.XA CN109388677B (en) 2018-08-23 2018-08-23 Method, device and equipment for synchronizing data among clusters and storage medium thereof

Publications (2)

Publication Number Publication Date
CN109388677A CN109388677A (en) 2019-02-26
CN109388677B true CN109388677B (en) 2022-10-11

Family

ID=65418392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810978213.XA Active CN109388677B (en) 2018-08-23 2018-08-23 Method, device and equipment for synchronizing data among clusters and storage medium thereof

Country Status (1)

Country Link
CN (1) CN109388677B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543472B (en) * 2019-08-09 2022-08-09 浙江大华技术股份有限公司 Data reconciliation method and related device
CN110688254B (en) * 2019-09-06 2022-06-03 北京达佳互联信息技术有限公司 Data synchronization method and device, electronic equipment and storage medium
CN111031135B (en) * 2019-12-17 2023-01-10 金瓜子科技发展(北京)有限公司 Message transmission method and device and electronic equipment
CN113055430A (en) * 2019-12-27 2021-06-29 华为技术有限公司 Data synchronization method and related equipment
CN111262915B (en) * 2020-01-10 2020-09-22 北京东方金信科技有限公司 Kafka cluster-crossing data conversion system and method
CN113297309B (en) * 2021-05-31 2023-11-10 平安证券股份有限公司 Stream data writing method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077008A (en) * 2013-01-30 2013-05-01 中国人民解放军国防科学技术大学 Address alignment SIMD (Single Instruction Multiple Data) acceleration method of array addition operation assembly library program

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395404B2 (en) * 2004-12-16 2008-07-01 Sandisk Corporation Cluster auto-alignment for storing addressable data packets in a non-volatile memory array
US9917913B2 (en) * 2016-05-23 2018-03-13 Microsoft Technology Licensing, Llc Large message support for a publish-subscribe messaging system
CN106095589B (en) * 2016-06-30 2019-04-09 浪潮卓数大数据产业发展有限公司 A kind of method, apparatus and system for distributing subregion
US20180091588A1 (en) * 2016-09-26 2018-03-29 Linkedin Corporation Balancing workload across nodes in a message brokering cluster
CN107465735B (en) * 2017-07-31 2020-08-14 杭州多麦电子商务股份有限公司 Distributed messaging system
CN108205588B (en) * 2017-12-29 2021-04-09 北京奇虎科技有限公司 Data synchronization method and device based on master-slave structure

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077008A (en) * 2013-01-30 2013-05-01 中国人民解放军国防科学技术大学 Address alignment SIMD (Single Instruction Multiple Data) acceleration method of array addition operation assembly library program

Also Published As

Publication number Publication date
CN109388677A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109388677B (en) Method, device and equipment for synchronizing data among clusters and storage medium thereof
CN107391628B (en) Data synchronization method and device
CN108920698B (en) Data synchronization method, device, system, medium and electronic equipment
CN109118358B (en) Component-based synchronization of digital assets
CN108121782B (en) Distribution method of query request, database middleware system and electronic equipment
CN108509462B (en) Method and device for synchronizing activity transaction table
CN110392121B (en) Parallel chain block generation method, device and storage medium
US8843581B2 (en) Live object pattern for use with a distributed cache
CN109144785B (en) Method and apparatus for backing up data
CN113094430B (en) Data processing method, device, equipment and storage medium
CN112579692B (en) Data synchronization method, device, system, equipment and storage medium
CN110830580B (en) Storage data synchronization method and device
CN112199192B (en) Method and system for deploying Kubernetes cluster refined management quota based on server
CN111666266A (en) Data migration method and related equipment
CN107992354B (en) Method and device for reducing memory load
CN113885780A (en) Data synchronization method, device, electronic equipment, system and storage medium
CN109165210A (en) A kind of method and device of cluster Hbase Data Migration
EP3696658A1 (en) Log management method, server and database system
CN112187889A (en) Data synchronization method, device and storage medium
US11809385B1 (en) Efficient data backup in a distributed storage system
CN108121514B (en) Meta information updating method and device, computing equipment and computer storage medium
CN113783916B (en) Information synchronization method and device
CN111147226A (en) Data storage method, device and storage medium
CN114328722A (en) Data synchronization method and device supporting multiple data sources and computer equipment
CN114925078A (en) Data updating method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant