CN110648178A

CN110648178A - Method for increasing kafka consumption capacity

Info

Publication number: CN110648178A
Application number: CN201910907527.5A
Authority: CN
Inventors: 任治州
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-01-03

Abstract

The invention discloses a method for increasing kafka consumption capacity, which is applied to a server system and comprises the following steps: A. establishing a connection with the Kafka service; B. dividing the data segments for each partition to obtain the number of the segments of each partition; C. setting a starting offset and an ending offset for each segment of each partition; D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment; E. and completing the consumption process to obtain consumption data. The method can realize that one Partition can be consumed by a plurality of Consumers under the same group, and the integrity of data can be ensured.

Description

Method for increasing kafka consumption capacity

Technical Field

The invention relates to the technical field of software, in particular to a method for increasing kafka consumption capacity.

Background

Kafka is a high throughput distributed publish-subscribe messaging system. It is now used by many companies as a data pipe and messaging system of many types.

For better illustration and understanding of the technical solution of the present invention, the basic concept of Kafka is introduced as follows:

1. producer and consumer

There are two basic types of clients for Kafka, as shown in fig. 1, including: producer (Producer), Consumer (Consumer), Producer (also known as publisher) creates messages, and Consumer (also known as subscriber) is responsible for consuming messages.

2. Topic (Topic) and Partition (Partition)

In Kafka, messages are categorized by topics (Topic), each corresponding to a "message queue," i.e., similar to a table in a database. However, if all the messages of the same type are stuffed into a "central" queue, there is a lack of scalability, and both the increase in the number of producers/consumers and the increase in the number of messages may exhaust the performance or storage of the system. For this problem, as shown in fig. 2, the concept of Partition (Partition) introduced in the present solution completes horizontal extension.

3. Broker and Cluster (Cluster)

A Kafka server, also known as a Broker, accepts messages sent by the producer and stores them in disk, and the Broker also serves requests from consumers to pull partition messages, returning messages that have been submitted. With specific machine hardware, a Broker can process thousands of partitions and millions of messages per second. Several Broker groups form a Cluster (Cluster), wherein a certain Broker in the Cluster becomes a Cluster Controller (Cluster Controller) which is responsible for managing the Cluster, including assigning partitions to the Broker, monitoring Broker failures, etc. Within a cluster, a partition is responsible for a Broker, also referred to as the Leader for that partition. Of course, one partition can be duplicated on multiple Broker for redundancy so that its partition can be reassigned to other Broker for responsibility when there is a Broker failure.

In the official CLIENT API, different groupids distinguish different groups of consumers, the sum of data consumed by consumers in the same group (groupids are the same) is equal to the sum of data in topic, and some consumers in a group cannot consume data if the sum of consumer data is greater than the number of partitions of topic. And if the number of consumers in the same group is less than the number of partitions of topic, some of them will consume data in multiple partitions. If the number of consumers in the same group is exactly equal to the number of partitions of topic, the consumers correspond to the partitions one by one, so the consumption efficiency of the consumer end is improved only by increasing the number of partitions of topic, but the number of partitions is limited from the aspect of a server hardware environment. Therefore, to improve consumption efficiency of the consuming side under the limited number of topic partitions, a special consumption mode is needed, and consumption parallelism capacity is increased in a phase-changing manner.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned deficiencies in the background art, and provide a method for increasing kafka consumption capability, which can enable one Partition to be consumed by multiple Consumers in the same group, and can ensure the integrity of data.

In order to achieve the technical effects, the invention adopts the following technical scheme:

a method for increasing kafka consumption capacity is applied to a server system and comprises the following steps:

A. establishing a connection with the Kafka service;

B. dividing the data segments for each partition to obtain the number of the segments of each partition;

C. setting a starting offset and an ending offset for each segment of each partition;

D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment;

E. and completing the consumption process to obtain consumption data.

Further, in the step a, specifically, Kafka consummer establishment and Kafka ligation provided by Kafka Client are performed.

Further, the specific method for obtaining the number of segments by dividing the data segments of each segment in the step B is as follows: setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of a segment of a partition is read by an independent thread; the number N of the segments of each partition can be calculated by the method; one of the key technical points of the scheme is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset endOffset and an end offset beginOffset of the partition at the current time point need to be recorded, then data in an (endOffset-beginOffset) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M N threads are needed to complete the whole consumption.

Further, when the obtained N is a non-integer, the final value of N is obtained by adding 1 to the obtained N, and if the obtained N is 3.3, the final value of N is 4, that is, the partition is specifically divided into 4 segments.

Further, the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the start offset S1 of the first segment of each partition is begin offset; the starting offset SN of the nth segment is beginOffset + (N-1) × T +1, T is the step size of each segment, and N is an integer greater than 1; the end offset EN of the last segment of each partition is endOffset-1; the ending offset Em of the other subsections except the last subsection of each subsection is beginOffset + m T, T is the step length of each subsection, and m is 1,2, … and N-1; in this step, data is prepared for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption by recording the start offset and the end offset of each segment and the start offset and the end offset of each partition.

Further, in the step D, specifically, the API provided by Kafka is used to establish a consumer-side consumer of Kafka, and the assign method is used to specify a consumed partition and the seek method is used to specify a start offset of consumption.

Further, in the step D, a consumer at the consuming end corresponds to a segment of a partition, and when the consumer starts to consume next time, the consumption start offset is the end offset of the last consumption of the partition, and the consumption length of each consumer is equal to the step length T of the corresponding segment, Kafka consumption data is consumed in a polling manner, and when the set end offset — endOffset of the last consumption is consumed, the consumption process needs to be stopped, so as to avoid over consumption and data repetition.

Further, in the step E, when

After all the threads are correctly finished, judging that the consumption process is finished, wherein N_iIs the number of segments of the ith partition, and n is the total number of all partitions.

Compared with the prior art, the invention has the following beneficial effects:

at present, the common way of increasing the kafka consumption capacity is to increase the number of partitions and then increase the parallel consumption capacity by deploying a plurality of consumers so as to improve the consumption efficiency; in practice, however, the number of partitions cannot be increased infinitely, and in most cases, the execution speed of the traffic is slow. Aiming at the problems, the method for increasing the kafka consumption capability of the invention provides a special consumption mode, which is changed to increase the parallel consumption capability and records the consumption condition of each thread through multi-thread consumption, thereby improving the consumption parallelism and increasing the utilization rate of an application machine. The technical scheme of the invention is simple and effective, has wide application range, can well reduce the cost and the operation and maintenance complexity, and has performance indexes reaching high industrial technical level.

Drawings

FIG. 1 is a schematic diagram of producer and consumer relationships in kafka.

FIG. 2 is a schematic diagram of the subject and partition relationships in kafka.

FIG. 3 is a multi-threaded parallel consumption diagram of the present invention.

Detailed Description

The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.

Example (b):

the first embodiment is as follows:

a method for increasing kafka consumption capability can enable one Partition to be consumed by a plurality of Consumers under the same group, ensure data integrity and effectively increase the utilization rate of an application machine.

Specifically, this embodiment is performed under the condition that the Kafka cluster is already deployed, and the network and the basic environment can normally work, and meanwhile, it is assumed in this embodiment that it is known that Topic consumed this time has three Partition partitions, namely Partition 1, Partition 2, and Partition 3, and the start offsets of the partitions corresponding to these partitions are beginOffsets1, beginOffsets2, beginOffsets3, and the end offsets of the partitions are endOffsets1, endOffsets2, endOffsets 3.

The method for increasing kafka consumption capability of the embodiment specifically includes the following steps:

and step 1, establishing connection with the Kafka service. In this example, the Kafka consummer provided by Kafka Client is specifically used to establish and connect Kafka.

And 2, dividing the data segment for each Partition (Partition).

As shown in fig. 3, one of the key technical points of the present solution is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset S and an end offset E of the partition at the current time point need to be recorded, then the data in the (E-S) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M × N threads are needed to complete the whole consumption.

Specifically, in this embodiment, the step size of the segment is set to be T, and the number N of data segments of each partition is calculated by using the formula N ═ endinfset/T, where when N is not an integer, it is necessary to add one more on the basis of N. In this embodiment, the number of the divided data segments N1 of Partition 1 is: n1 ═ i (endiffset 1-beginOffset 1)/T.

Specifically, when the above steps are implemented by using a software program, the corresponding program core code is as follows:

by analogy, in this embodiment, the Partition numbers N2 and N3 of Partition 2 and Partition 3 need to be calculated respectively.

And 3, setting a starting offset and an ending offset for each segment of each partition.

Knowing the number of segments, the starting offset and the ending offset for each partition, and the set step size T, the starting offset S for the first segment of each partition₁The starting offset of the remaining segments, except the first segment, can be represented by the formula S_NThe start offset of each Partition 1 can be calculated by the above method, that is, by obtaining (N-1) × T +1, where T is the step size of each Partition, and then the start offset S1 of the first Partition is beginOffset, and the start offset S2 of the second Partition is beginOffset + (2-1) × T + 1.

At the same time, the end offset E of the last segment of each partition is set_NendOffset-1, and the end offset Em of the rest of the segments except the last segment of each partition is beginOffset + m T, m is 1,2, …, N-1, the end offset E of the first segment is then₁Begin offset + 1T, offset E of the last segment_NThe end offset amount of each segment of Partition 1 can be calculated by the above method.

where _ begin offset is the start offset of each segment of the Partition 1 Partition and _ end offset is the end offset of each segment of the Partition 1 Partition.

In this step, the start offset and end offset of each segment and the start offset and end offset of each partition are recorded, and data preparation is performed for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption.

And 4, appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment.

In this embodiment, an API provided by Kafka is specifically used to establish a consumer provider of Kafka, and an assign method is used to specify a consumed Partition and a seek method is used to specify a consumption start offset, such as a certain segment of Partition 1, as follows:

consumer.assign(Partition 1)

consumer.seek(Partition 1,_beginOffset)。

since Kafka consumes data in a round-robin manner, it is necessary to stop the consumption process when the set endOffset is consumed, so as to avoid over consumption and data duplication.

The pseudo code for the specific implementation is as follows:

where record represents each datum in kafka.

And 5, completing the consumption process to obtain consumption data.

When the N1+ N2+ N3 threads are all correctly finished, the whole consumption process is smoothly finished, and the required consumption data can be acquired.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A method for increasing kafka consumption capacity, which is applied to a server system, is characterized by comprising the following steps:

A. establishing a connection with the Kafka service;

E. and completing the consumption process to obtain consumption data.

2. The method for increasing Kafka consumption ability of claim 1, wherein the step A comprises Kafka Consumer establishment and Kafka linkage provided by Kafka Client.

3. The method for increasing kafka consumption capability of claim 1, wherein the specific method for dividing the data segments of each partition in step B to obtain the number of the segments is as follows:

setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of one segment of one partition is read by one independent thread.

4. The method of claim 3, wherein when the calculated N is a non-integer, the final value of N is the calculated N plus 1 and rounded.

5. The method for increasing kafka consumption capability of claim 4, wherein the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the starting offset S of the first segment of each partition₁beginOffset; the NthStarting offset S of segment_NbeginOffset + (N-1) × T +1, where T is the step size of each segment and N is an integer greater than 1;

end offset E of last segment of each partition_NendOffset-1; the end offset Em of each segment except the last segment is beginOffset + m T, T is the step size of each segment, and m is 1,2, …, N-1.

6. The method for increasing Kafka consumption capability of claim 5, wherein in step D, the API provided by Kafka is used to create Kafka consumer, assign the consumed partition using the assign method, and assign the consumption start offset using the seek method.

7. The method of claim 6, wherein in step D, a consumer corresponds to a segment of a partition, and when the consumer starts the next consumption, the consumption start offset is the ending offset of the last consumption of the partition, and each consumption length of each consumer is equal to the step size T of the corresponding segment.

8. The method for increasing kafka consumption ability of claim 7, wherein in step E, when step E is performed, the kafka consumption ability is increased