WO2021129477A1

WO2021129477A1 - Data synchronization method and related device

Info

Publication number: WO2021129477A1
Application number: PCT/CN2020/136716
Authority: WO
Inventors: 汝佳; 赵东; 智伟
Original assignee: 华为技术有限公司
Priority date: 2019-12-27
Filing date: 2020-12-16
Publication date: 2021-07-01
Also published as: CN113055430A

Abstract

The present application provides a data synchronization method and a related device. The method comprises: a follower partition obtains an offset selection policy; the follower partition determines a target offset in a leader partition of a topic to be synchronized based on the offset selection policy; and the follower partition starts to synchronize data in the leader partition from the target offset determined in the leader partition to the follower partition. The method can improve the flexibility and efficiency of data synchronization and reduce the data synchronization time.

Description

A data synchronization method and related equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 27, 2019, the application number is 201911423080.0, and the application name is "a method of data synchronization and related equipment", the entire content of which is incorporated herein by reference Applying.

Technical field

The present invention relates to the field of information technology, in particular to a data synchronization method and related equipment.

Background technique

The distributed publish-subscribe messaging system (kafka) has good functional characteristics. It provides message persistence through the disk structure, can maintain long-term stable performance, has high throughput, can support millions of messages per second, and can Supports parallel data loading, so Kafka is generally used to process massive streams of data generated by application service systems or artificial intelligence (AI) platforms.

As the most commonly used distributed message queuing system, Kafka usually caches message data on the local disk. The object of publishing and subscribing in Kafka is a category (topic). Each topic represents a type of data, and a topic is divided into multiple partitions. (partition), each partition is an ordered queue. The different partitions of each topic are distributed on different disks or hosts. At the same time, in order to ensure the reliability of Kafka, each partition has multiple replicas (replica), and a primary replica is elected among multiple replicas. It can be called a leader partition, and the remaining replicas are slave replicas, which can also be called a follower partition. Follower partition will continuously synchronize the latest message data from leader partition. Consumers (clients that subscribe to messages from topics) and producers (clients that publish messages to topics) directly interact with leader partition when they consume and produce . When a node where a leader partition is located suddenly fails or loses power, the follower partition corresponding to the partition will elect a new leader partition to ensure service reliability and data security.

When a node in the Kafka cluster fails and cannot be recovered, or the storage capacity of the Kafka cluster needs to be expanded (that is, to increase the number of nodes), the copies existing in the Kafka cluster need to be migrated to ensure the integrity of the copy. The current copy of the Kafka cluster The migration is to complete the copy migration through the migration tool provided by Kafka itself, but this method is not flexible enough, the migration time is longer, the efficiency is low, and it will continue to occupy the central processing unit (CPU) resources and network bandwidth resources. Affect the normal operation of the business.

Summary of the invention

The embodiment of the present invention discloses a data synchronization method and related equipment. A follower partition can flexibly select data to be synchronized from a leader partition, thereby improving data synchronization efficiency, shortening data synchronization time, and reducing impact on services.

In the first aspect, this application provides a data synchronization method, including: follower partition obtaining an offset selection strategy for the follower partition; the follower partition determines the leader of the topic to be synchronized based on the offset selection strategy The target offset in the leader partition of the leader partition; the follower partition starts to synchronize the data in the leader partition to the follower partition starting from the determined target offset of the leader partition.

In the embodiment of this application, the follower partition can flexibly select the target offset that needs to be synchronized according to needs, and after determining the target offset, start to synchronize the data in the leader partition to the follower from the target offset determined in the leader partition partition, which can improve data synchronization efficiency, shorten data synchronization time, and expand applicable scenarios.

In a possible implementation manner, the follower partition sends a query request to the leader partition, and the query request is used to query the offset in the leader partition at the current moment; the follower partition calculates the current moment The difference between the offset in the leader partition and the offset in the follower partition, and when the difference is greater than a preset threshold, it is determined to synchronize the data in the leader partition to the follower partition.

In the embodiment of this application, before the follower partition performs data synchronization, it needs to query the offset in the leader partition at the current moment, and then calculate the difference between the offset in the leader partition and the offset in the follower partition, and Data synchronization is performed when the difference is greater than the preset threshold. This can avoid frequent data synchronization, and only perform data synchronization when the conditions are met, thereby improving system efficiency.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the follower partition obtains the average data writing speed of the leader partition; the follower partition is based on the leader partition The average write speed of the data is calculated, and the target offset is calculated.

In the embodiment of the present application, when the follower partition selects the offset strategy as the offset adaptive determination strategy, it needs to obtain the average data writing speed of the leader partition, so as to calculate the target offset.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the follower partition obtains the start offset of the leader partition and the average write data of the leader partition Speed; the follower partition calculates the target offset according to the start offset of the leader partition and the average data write speed of the leader partition.

In the embodiment of this application, when the offset selection strategy is an offset adaptive determination strategy, the follower partition needs to obtain the leader partition's starting offset and the average data writing speed, so as to ensure the calculated target offset more precise.

In a possible implementation manner, when the average data write speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment When the average data write speed of the leader partition is less than or equal to the data synchronization transmission speed, the target offset is the start offset of the leader partition.

In the embodiment of the present application, the target offset is determined by comparing the relationship between the average data write speed of the leader partition and the data synchronous transmission speed, so that it can be ensured that the determined target offset is more reasonable.

In a possible implementation manner, when the average data write speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment When the average data write speed of the leader partition is less than or equal to the data synchronization transmission speed, and the synchronization time required by the total amount of current data of the leader partition is less than the preset duration, the target offset Is the starting offset of the leader partition.

In the embodiment of this application, the relationship between the average data write speed of the leader partition and the data synchronization transmission speed, and the relationship between the synchronization time required by the current total amount of data of the leader partition and the preset duration are considered at the same time, so as to determine the target Offset, which can further improve the rationality and accuracy of the determined target offset.

In a possible implementation manner, the offset selection strategy is a selection strategy based on data value, the follower partition obtains the consumption offsets of multiple consumer groups, and the follower partition determines the smallest consumption The offset is the target offset; wherein the consumer group is used to consume the data in the topic to be synchronized, each consumer group includes at least one consumer, and the consumption offset is used to indicate The number of data in the leader partition consumed by the consumer group.

In the embodiment of this application, when the follower partition selection strategy is a data value-based selection strategy, it is necessary to obtain the consumption offsets of multiple consumer groups that consume the data in the topic to be synchronized, so as to determine the target offset The minimum consumption offset is determined as the target offset.

In a second aspect, this application provides a data synchronization device, including: an acquiring unit, configured to acquire an offset selection strategy; and a processing unit, configured to determine the leader partition of the topic to be synchronized based on the offset selection strategy The synchronization unit is used to synchronize the data in the leader partition from the determined target offset of the leader partition.

In a possible implementation manner, the data synchronization device further includes a sending unit, the sending unit is configured to send a query request to the leader partition, and the query request is used to query the leader partition at the current moment Offset; the processing unit is also used to calculate the difference between the offset in the leader partition at the current moment and the offset in the data synchronization device, and when the difference is greater than a preset When the threshold is used, it is determined to synchronize the data in the leader partition.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the processing unit is specifically configured to: obtain the average data writing speed of the leader partition; The average data writing speed is calculated, and the target offset is calculated.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the processing unit is specifically configured to: obtain the starting offset of the leader partition and the leader partition The average data write speed of the; according to the start offset of the leader partition and the average data write speed of the leader partition, the target offset is calculated.

In a possible implementation manner, the offset selection strategy is a selection strategy based on data value, and the processing unit is specifically configured to: obtain the consumption offsets of multiple consumer groups, and determine the smallest The consumption offset is the target offset; wherein the consumer group is used to consume the data in the topic to be synchronized, each consumer group includes at least one consumer, and the consumption offset is used for Indicates the amount of data in the leader partition consumed by the consumer group.

In a third aspect, the present application provides a computing device. The computing device includes a processor and a memory. The processor executes computer instructions stored in the memory, so that the computing device executes the first aspect described above and in combination with the first aspect described above. On the one hand, any one of the implementation methods.

In a fourth aspect, the present application provides a computer storage medium that stores a computer program that, when executed by a computing device, implements any one of the foregoing first aspect and a combination of the foregoing first aspect Way of realization.

In a fifth aspect, the present application provides a computer program product. The computer program product includes computer instructions. When the computer instructions are executed by a computing device, the computing device can execute the above-mentioned first aspect and in combination with the above-mentioned first aspect. Any one of the methods in the implementation.

Description of the drawings

FIG. 1A is a schematic diagram of an application scenario provided by an embodiment of the present application;

FIG. 1B is a schematic diagram of another application scenario provided by an embodiment of the present application;

FIG. 1C is a schematic diagram of another application scenario provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of data synchronization provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of a data synchronization method provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a data synchronization device provided by an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present application.

Detailed ways

The following describes the technical solutions in the embodiments of the present application clearly and completely with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

First of all, some terms and related technologies involved in this application will be explained in conjunction with the drawings to facilitate the understanding of those skilled in the art.

Kafka is a distributed messaging system that supports partitions, multiple copies, and coordination based on zookeeper. It can process large amounts of data in real time to meet various demand scenarios: such as batch processing systems based on hadoop and low-latency real-time systems , Storm/spark streaming engine, access log, message service, etc. Kafka's characteristics include: high throughput, low latency, strong scalability, good durability, high fault tolerance, support for high concurrency, and it is widely used for log collection and user activity tracking.

Zookeeper is a distributed, open source distributed application coordination service, and an important component of Hadoop and Hbase. It is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, domain name services, distributed synchronization, group services, etc. Zookeeper's goal is to encapsulate key services that are complex and error-prone, and provide users with simple and easy-to-use interfaces and systems with high performance and stable functions.

Figure 1A shows a possible application scenario of an embodiment of the present application. In this application scenario, node 110, node 120, and node 130 are node devices in Kafka cluster 100, such as servers, etc., each node stores different partitions of different topics, for example, node 110 stores the first partition The leader partition 111 of the first partition, a follower partition 112 of the first partition, and a follower partition 113 of the second partition; the node 120 stores a follower partition 121 of the first partition, a follower partition 122 of the second partition, and the leader partition 123 of the third partition; The node 130 stores the two follower partitions 131 and 132 of the third partition and the leader partition of the second partition. The first partition, the second partition, and the third partition may be different partitions under the same topic, or different partitions under different topics, and each partition uses three copies for storage. At this time, the Kafka cluster 100 needs to be expanded, that is, a new node 140 is added. At this time, there is no data stored on the newly added node 140. The data distribution of the entire Kafka cluster 100 is not balanced. You need to use a migration tool to store other nodes. Part of the partition migration is to the newly added node 140 to ensure a balanced data distribution throughout the cluster. For example, the copy migration tool kafka-reassign-partitions provided by kafka is used to migrate the follower partition 112 in the node 110 to the node 140, and the follower partition 131 in the node 130 to the node 140.

Figure 1B shows yet another possible application scenario. As shown in Figure 1B, the node 110 stores a leader partition 111 of the first partition, a follower partition 112 of the first partition, and a follower partition 113 of the second partition; the node 120 stores a follower partition 121 of the first partition and the second partition. One follower partition 122 of the partition and the leader partition 123 of the third partition; the node 130 stores two follower partitions 131 and 132 of the third partition and the leader partition of the second partition. At this time, the node 120 fails during operation, and the data stored in the node 120 needs to be migrated to the node 110 and the node 130 to achieve the data distribution balance of the entire cluster 100. The migration tool is used to connect the follower partition 121 and the leader in the node 120 The partition 123 is migrated to the node 110, and the follower partition 122 in the node 120 is migrated to the node 130.

Figure 1C shows yet another possible application scenario. As shown in Figure 1C, the node 110 stores the leader partition 111 of the first partition, a follower partition 112 of the first partition, and a follower partition 113 of the second partition; the node 120 stores a follower partition 121 and the third partition of the first partition. The leader partition 122 of the partition; the leader partition 131 of the second partition is stored in the node 130. At this time, the data distribution of the cluster 100 is not balanced, and a migration tool needs to be used to migrate the follower partition 113 stored in the node 110 to the node 130 to achieve a more balanced data distribution of the cluster 100.

For the application scenarios shown in Figure 1A, Figure 1B, and Figure 1C, it is necessary to use a migration tool to migrate the follower partition on a node to another node, and the migration here does not refer to the direct transfer of data from one node. To another node, it pulls data from the leader partition corresponding to the follower partition in a synchronous manner, and finally achieves the consistency with the data in the leader partition, completing the purpose of migrating. At present, the copy migration tool kafka-reassign-partitions provided by kafka is used to complete the migration of the follower partition (ie data synchronization). When a new follower partition is created in the node, the copy synchronization thread will be from the follower partition. The leader partition pulls data until the data of the follower partition is synchronized with the data of the leader partition. However, in the above scheme, the newly created follower partition will synchronize from the start offset of the leader partition when synchronizing data. When the amount of cached data of the leader partition is large, it takes a lot of time to complete the data synchronization, and In the process of data synchronization, the CPU resources and network bandwidth resources of the nodes will continue to be occupied, which will affect the normal operation of the business. In addition, in many scenarios, the value of the data cached in Kafka will gradually decrease over time. The business or application is only interested in the latest data. For example, in an application scenario that counts the number of occurrences of a target within one minute, only the current data within one minute is valuable, while the previous data is of low value. Therefore, a large amount of history is synchronized Cached data is meaningless, and the follower partition can only be synchronized from the starting offset of the leader partition, and the offset cannot be flexibly specified. That is, the synchronization cannot be started from any offset according to actual needs. The historical cache data is synchronized.

In order to solve the above problems, this application provides a data synchronization method and related equipment, which can flexibly select the data to be synchronized from the leader partition during data synchronization, improve data synchronization efficiency, shorten data synchronization time, and reduce business Impact.

Refer to FIG. 2, which is a schematic diagram of data synchronization provided by the present application. As shown in Figure 2, the node 210 deploys a leader partition of a certain partition. The leader partition caches the data written by the producer, including historical written data and the latest written data. The node 220 deploys the corresponding partition of the same partition. Follower partition and follower partition need to synchronize data from the leader partition in node 210. Follower partition can flexibly select the data that needs to be synchronized. For example, it can start data synchronization from the start offset of leader partition, that is, offset 1. , Synchronize all data; you can also start synchronization from offset 7 and only synchronize the latest data written by the leader partition; or select any offset (for example, offset 5, etc.) to start synchronization.

With reference to the application scenarios shown in FIG. 1A, FIG. 1B, and FIG. 1C, and the data synchronization schematic diagram shown in FIG. 2, refer to FIG. 3. FIG. 3 is a schematic flowchart of a data synchronization method provided by an embodiment of the present application. As shown in Figure 3, the method includes but is not limited to the following steps:

S301: The follower partition acquires an offset selection strategy.

For example, the user can access zookeeper through the World Wide Web (Web) browser or other interfaces (such as restful interface, etc.) provided by the Kafka cluster, and set the offset selection strategy, which is set in the unit of topic. That is, set the offset acquisition strategy for all partitions under the same topic. After the setting is complete, you can save the set offset acquisition strategy as a JavaScript object notation (JSON) format file, and record the file in the specified directory of zookeeper.

In addition, all nodes in the kafka cluster will elect a management node, which uses the watch mechanism to monitor zookeeper. Therefore, when the offset acquisition strategy is recorded in the directory specified by zookeeper, the management node will sense that the zookeeper directory has changed, and then send a synchronization request to the relevant node, that is, a new follower of the topic corresponding to the offset acquisition strategy For partition nodes, after receiving the synchronization request sent by the management node, the relevant node will start the replica synchronization thread. After the replica synchronization thread is started, the relevant node will read the offset acquisition strategy of the related topic on the zookeeper.

In a possible implementation manner, the follower partition sends a query request to the leader partition, and the query request is used to query the offset in the leader partition at the current moment; the follower partition calculates the maximum offset of the leader partition at the current moment The difference between the maximum offset from the follower partition, and when the difference is greater than the preset threshold, it is determined to synchronize the data in the leader partition to the follower partition.

Specifically, each log has an offset to uniquely mark a message. The value of the offset is an 8-byte number to indicate its position in the partition. Therefore, in the leader partition and the follower partition Each stored message (ie data) corresponds to a unique offset. After the follower partition starts the replica synchronization thread, it sends a query request to the leader partition to obtain the maximum offset of the leader partition at the current moment, and then calculates the difference with its own offset. If the difference exceeds the preset threshold, it means There is a large amount of data in the leader partition that has not been synchronized to the follower partition. At this time, the follower partition needs to synchronize the data in the leader partition to the local. It should be understood that the preset threshold can be set as required, which is not limited in this application.

S302: The follower partition determines the target offset in the leader partition of the topic to be synchronized based on the offset selection strategy.

Specifically, the follower partition needs to determine the offset selection strategy before data synchronization. After the offset selection strategy is determined, it needs to further determine the target offset in the leader partition of the topic to be synchronized, that is, from which offset The volume starts to synchronize data.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the follower partition obtains the average data writing speed of the leader partition; the follower partition obtains the average data writing speed of the leader partition , Calculate the target offset.

Specifically, the follower partition obtains the average data write speed of the leader partition through the interface provided by the Java management extensions (JMX) in the node where the leader partition is located, for example, obtains the leader partition in the last hour through the interface provided by JMX The average data write speed of the partition. After the follower partition obtains the average write speed of the leader partition's data, the target offset can be calculated.

Optionally, after the follower partition obtains the average data write speed of the leader partition, it can further send a query request to the leader partition to query the starting offset of the leader partition, and then according to the leader partition’s starting offset obtained by the query. The average data write speed of the shift and leader partition, and the target shift is calculated.

Exemplarily, after the follower partition obtains the average data write speed of the leader partition, it first determines whether the speed threshold is set. The speed threshold is used to limit the data synchronization transmission speed of the follower partition, so as to avoid excessive occupation during data synchronization. Excessive CPU resources and network bandwidth and other resources can avoid affecting the business and causing the business to fail to run normally. If the speed threshold is set, the speed threshold is used as the data synchronization transmission speed; if the threshold is not set, the consumption speed of the follower partition is used as the data synchronization transmission speed.

For example, when the follower partition performs data synchronization, its essence is to start a new consumer thread in the follower partition, and the follower partition acts as a consumer to consume data in the leader partition to achieve data synchronization. Therefore, if the speed threshold is not set, the consumption speed of the follower partition is the data synchronization transmission speed.

Then, the follower partition judges the relationship between the data synchronization transmission speed and the leader partition’s average data write speed. When the leader partition’s average data write speed is greater than the data synchronization transmission speed, it means that even if the follower partition has been continuously performing data synchronization, Nor can it completely catch up with the leader partition, that is, the follower partition has been synchronizing the historical data of the leader partition. Therefore, in this case, only the latest data in the leader partition needs to be synchronized, and the maximum value of the leader partition at the current moment As the target offset, the follower partition starts data synchronization from the largest offset of the leader partition at the current moment. When the average data write speed of the leader partition is less than or equal to the data synchronization transmission speed, it means that the follower partition must be able to catch up with the leader partition after sufficient time, that is, the follower partition can synchronize all data in the leader partition to the local. In this scenario, further judgment processing is required to determine the final target offset.

A possible implementation is that the follower partition calculates the time required to synchronize the current total data volume of the leader partition. Optionally, follower partition can obtain the total amount of current data while obtaining the average write speed of the leader partition through the interface provided by JMX, or obtain the total amount of current data of the leader partition through other methods. This application will not do this. limited. After the follower partition obtains the current total amount of data, it divides the total amount of data by the data synchronization transmission speed to obtain the synchronization time. When the synchronization time is less than or equal to the preset time, it means that the current total amount of data in the leader partition is not very large, and the synchronization process will not cause excessive resource consumption. The follower partition uses the start offset of the leader partition as the target offset The follower partition synchronizes all data in the leader partition to the local. The preset time can be set according to needs, for example, it can be set to one hour, which is not limited in this application.

When the synchronization time is greater than the preset time, the target offset can be calculated according to the following formula 1:

((y-y0)*n)/v1=((y1-y)*n)/(v2-v1) Formula 1

Among them, y0 represents the starting offset of the leader partition, y represents the target offset, y1 represents the maximum offset of the leader partition, n represents the average data volume of a message, and v1 represents the average data write speed of the leader partition , V2 represents the data synchronous transmission speed.

Simplifying the above formula 1 further, the following formula 2 can be obtained:

y=(y1*v1+(v2-v1)y0)/v2 Formula 2

It can be seen that after obtaining the starting offset and maximum offset of the leader partition, the average data writing speed of the leader partition, and the data synchronization transmission speed, the target offset can be calculated, starting from the target offset Synchronize the data in the leader partition, which can avoid the long-term occupation of CPU and network bandwidth and other resources during the data synchronization process, which will affect the business, and avoid the synchronization of some outdated and meaningless historical data to improve synchronization efficiency , Shorten the synchronization time.

In another possible implementation manner, the offset selection strategy is a selection strategy based on data value, the follower partition obtains the consumption offsets of multiple consumer groups, and the follower partition determines the smallest consumption bias The offset is the target offset, where the consumer group is used to consume the data in the topic to be synchronized, each consumer group includes at least one consumer, and the consumption offset is used to indicate the consumption The number of data in the leader partition consumed by the user group.

Specifically, each consumer belongs to a specific group, and each consumer can be assigned a group name, if not specified, it belongs to the default group. Consumer groups consume data in units of topics, and different consumer groups can consume the same topic. For consumers in a consumer group, consumers in the same group cannot consume the same partition of the same topic, that is, different consumers cannot consume the same partition, but the same consumer can consume multiple partitions .

Therefore, for the topic to be synchronized, when multiple consumer groups consume data in the topic at the same time, find the consumer who consumes the leader partition to be synchronized from each consumer group, and obtain the consumer's consumption Offset, that is, the consumer has currently consumed the amount of data in the leader partition, and then compares the obtained consumption offset, and selects the smallest consumption offset as the target offset.

It is easy to understand that using the smallest consumption offset as the target offset can ensure that when the leader partition node suddenly fails and goes down, all consumer groups can still consume the data in the topic completely, avoiding business Make an impact.

In another possible implementation manner, the offset selection strategy is an offset specification strategy, and the follower partition uses the specified offset as the target offset.

Specifically, after the follower partition determines that the data in the leader partition needs to be synchronized to the follower partition, it directly uses the specified offset as the target offset, and starts data synchronization from the target offset. The specific value of the designated offset can be set according to actual needs, which is not limited in this application.

S303: The follower partition starts to synchronize the data in the leader partition to the follower partition from the determined target offset of the leader partition.

Specifically, after the follower partition determines the target offset, it starts a consuming thread to consume the data in the leader partition from the target offset, that is, synchronizes the data in the leader partition to the local.

It should be understood that steps S301 to S303 involved in the foregoing method embodiments are only schematic descriptions and summaries, and should not constitute specific limitations. The involved steps can be added, reduced, or combined as needed.

The foregoing describes the methods of the embodiments of the present application in detail. In order to facilitate better implementation of the above solutions of the embodiments of the present application, correspondingly, the following also provides related equipment for cooperating with the implementation of the foregoing solutions.

Refer to FIG. 4, which is a schematic structural diagram of a data synchronization device provided by an embodiment of the present application. As shown in FIG. 4, the data synchronization device 400 includes an acquisition unit 410, a processing unit 420, and a synchronization unit 430. among them,

The obtaining unit 410 is configured to obtain an offset selection strategy.

Specifically, the acquiring unit 410 is configured to perform the foregoing step S301, and optionally perform optional methods in the foregoing steps.

The processing unit 420 is configured to determine the target offset in the leader partition of the topic to be synchronized based on the offset selection strategy.

Specifically, the processing unit 420 is configured to execute the aforementioned step S302, and optionally execute an optional method in the aforementioned step.

The synchronization unit 430 is configured to synchronize data in the leader partition from the determined target offset of the leader partition.

Specifically, the synchronization unit 430 is configured to perform the foregoing step S303, and optionally perform optional methods in the foregoing steps.

In a possible implementation manner, the data synchronization device 400 further includes a sending unit 440, the sending unit 440 is configured to send a query request to the leader partition, and the query request is used to query the leader at the current moment the offset in the partition; the processing unit 420 is further configured to calculate the difference between the offset in the leader partition and the offset in the data synchronization device at the current moment, and calculate the difference in the difference When the value is greater than the preset threshold, it is determined to synchronize the data in the leader partition.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the processing unit 420 is specifically configured to: obtain the average data writing speed of the leader partition; according to the leader partition The average write speed of the data is calculated, and the target offset is calculated.

In a possible implementation manner, the offset selection strategy is an offset adaptive determination strategy, and the processing unit 420 is specifically configured to: obtain the starting offset of the leader partition and the leader The average data write speed of the partition; the target offset is calculated according to the start offset of the leader partition and the average data write speed of the leader partition.

In a possible implementation, the offset selection strategy is a selection strategy based on data value, and the processing unit 420 is specifically configured to: obtain the consumption offsets of multiple consumer groups, and determine the smallest total value. The consumption offset is the target offset; wherein, the consumer group is used to consume the data in the topic to be synchronized, and each consumer group includes at least one consumer, and the consumption offset is used Indicates the number of data in the leader partition consumed by the consumer group.

It should be noted that the structure of the data synchronization device and the process of using the data synchronization device to achieve data synchronization are only an example, and should not constitute a specific limitation. The units in the data synchronization device can be added, reduced, or combined as needed. . In addition, the operation and/or function of each module in the data synchronization device is to realize the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present application. As shown in FIG. 5, the computing device 500 includes a processor 510, a communication interface 520, and a memory 530, and the processor 510, the communication interface 520, and the memory 530 are connected to each other through an internal bus 540. It should be understood that the computing device may be a library server.

The computing device 500 may be a server where a follower partition is deployed in FIGS. 1A-1C, 2 and 3. The functions executed by the servers in FIGS. 1A-1C, 2 and 3 are actually executed by the processor 510 of the server.

The processor 510 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The foregoing hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

The bus 540 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 540 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 5 to represent it, but it does not mean that there is only one bus or one type of bus.

The memory 530 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 530 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory, ROM, flash memory, hard disk drive (HDD), or solid-state drive (SSD); the memory 530 may also include a combination of the above types. The program code may be used to implement the functional unit shown in the data synchronization device 400, or used to implement the method steps in the method embodiment shown in FIG. 3 with the follower partition as the execution subject.

The embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above Figure 4 describes the function of any one of the functional units.

The embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component unit of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

It should also be understood that, in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims

A method for data synchronization, characterized in that the method includes:

Follower partition follower partition obtains offset selection strategy;

The follower partition determines the target offset in the leader partition of the topic to be synchronized based on the offset selection strategy;

The follower partition starts to synchronize the data in the leader partition to the follower partition from the determined target offset of the leader partition.
The method of claim 1, wherein the method further comprises:

The follower partition sends a query request to the leader partition, and the query request is used to query the offset in the leader partition at the current moment;

The follower partition calculates the difference between the offset in the leader partition and the offset in the follower partition at the current moment, and determines to synchronize the leader partition when the difference is greater than a preset threshold The data to the follower partition.
The method according to claim 1 or 2, wherein the offset selection strategy is an offset adaptive determination strategy, and the follower partition determines the location of the topic to be synchronized based on the offset selection strategy. The offset in the leader partition, including:

The follower partition obtains the average data writing speed of the leader partition;

The follower partition calculates the target offset according to the average data writing speed of the leader partition.
The method according to claim 1 or 2, wherein the offset selection strategy is an offset adaptive determination strategy, and the follower partition determines the location of the topic to be synchronized based on the offset selection strategy. The offset in the leader partition, including:

The follower partition obtains the start offset of the leader partition and the average data write speed of the leader partition;

The follower partition calculates the target offset according to the start offset of the leader partition and the average data writing speed of the leader partition.
The method of claim 3 or 4, wherein:

When the average data writing speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment;

When the average data write speed of the leader partition is less than or equal to the data synchronous transmission speed, the target offset is the start offset of the leader partition.
The method of claim 3 or 4, wherein:

When the average data writing speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment;

When the average data write speed of the leader partition is less than or equal to the data synchronization transmission speed, and the synchronization time required by the current total amount of data of the leader partition is less than the preset duration, the target offset is The starting offset of the leader partition.
The method of claim 1 or 2, wherein the offset selection strategy is a selection strategy based on data value, and the follower partition determines the leader of the topic to be synchronized based on the offset selection strategy. The offset in the partition, including:

The follower partition obtains consumption offsets of multiple consumer groups, and the follower partition determines that the smallest consumption offset is the target offset;

Wherein, the consumer group is used to consume data in the topic to be synchronized, each consumer group includes at least one consumer, and the consumption offset is used to indicate the leader consumed by the consumer group The number of data in the partition.
A data synchronization device is characterized in that it comprises:

The acquiring unit is used to acquire the offset selection strategy;

A processing unit, configured to determine the target offset in the leader partition of the topic to be synchronized based on the offset selection strategy;

The synchronization unit is configured to synchronize data in the leader partition from the determined target offset of the leader partition.
The data synchronization device according to claim 8, wherein the data synchronization device further comprises a sending unit,

The sending unit is configured to send a query request to the leader partition, and the query request is used to query the offset in the leader partition at the current moment;

The processing unit is further configured to calculate the difference between the offset in the leader partition at the current moment and the offset in the data synchronization device, and determine synchronization when the difference is greater than a preset threshold Data in the leader partition.
The data synchronization device according to claim 8 or 9, wherein the offset selection strategy is an offset adaptive determination strategy, and the processing unit is specifically configured to:

Get the average data write speed of leader partition;

According to the average data writing speed of the leader partition, the target offset is calculated.
The data synchronization device according to claim 8 or 9, wherein the offset selection strategy is an offset adaptive determination strategy, and the processing unit is specifically configured to:

Obtaining the start offset of the leader partition and the average data writing speed of the leader partition;

The target offset is calculated according to the start offset of the leader partition and the average data write speed of the leader partition.
The data synchronization device according to claim 10 or 11, wherein:

When the average data writing speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment;

When the average data write speed of the leader partition is less than or equal to the data synchronous transmission speed, the target offset is the start offset of the leader partition.
The data synchronization device according to claim 10 or 11, wherein:

When the average data writing speed of the leader partition is greater than the data synchronization transmission speed of the leader partition, the target offset is the maximum offset of the leader partition at the current moment;

When the average data write speed of the leader partition is less than or equal to the data synchronization transmission speed, and the synchronization time required by the current total amount of data of the leader partition is less than the preset duration, the target offset is The starting offset of the leader partition.
The data synchronization device according to claim 8 or 9, wherein the offset selection strategy is a selection strategy based on data value, and the processing unit is specifically configured to:

Acquiring consumption offsets of multiple consumer groups, and determining the smallest consumption offset as the target offset;

Wherein, the consumer group is used to consume data in the topic to be synchronized, each consumer group includes at least one consumer, and the consumption offset is used to indicate the leader consumed by the consumer group The number of data in the partition.
A computing device, wherein the computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the method according to any one of claims 1-7 .
A computer storage medium, wherein the computer storage medium stores a computer program, and the computer program implements the method according to any one of claims 1-7 when executed by a computing device.
A computer program product, the computer program product comprising computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the method according to any one of claims 1-7.