CN110648178A - Method for increasing kafka consumption capacity - Google Patents

Method for increasing kafka consumption capacity Download PDF

Info

Publication number
CN110648178A
CN110648178A CN201910907527.5A CN201910907527A CN110648178A CN 110648178 A CN110648178 A CN 110648178A CN 201910907527 A CN201910907527 A CN 201910907527A CN 110648178 A CN110648178 A CN 110648178A
Authority
CN
China
Prior art keywords
partition
segment
consumption
offset
kafka
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910907527.5A
Other languages
Chinese (zh)
Inventor
任治州
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201910907527.5A priority Critical patent/CN110648178A/en
Publication of CN110648178A publication Critical patent/CN110648178A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for increasing kafka consumption capacity, which is applied to a server system and comprises the following steps: A. establishing a connection with the Kafka service; B. dividing the data segments for each partition to obtain the number of the segments of each partition; C. setting a starting offset and an ending offset for each segment of each partition; D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment; E. and completing the consumption process to obtain consumption data. The method can realize that one Partition can be consumed by a plurality of Consumers under the same group, and the integrity of data can be ensured.

Description

Method for increasing kafka consumption capacity
Technical Field
The invention relates to the technical field of software, in particular to a method for increasing kafka consumption capacity.
Background
Kafka is a high throughput distributed publish-subscribe messaging system. It is now used by many companies as a data pipe and messaging system of many types.
For better illustration and understanding of the technical solution of the present invention, the basic concept of Kafka is introduced as follows:
1. producer and consumer
There are two basic types of clients for Kafka, as shown in fig. 1, including: producer (Producer), Consumer (Consumer), Producer (also known as publisher) creates messages, and Consumer (also known as subscriber) is responsible for consuming messages.
2. Topic (Topic) and Partition (Partition)
In Kafka, messages are categorized by topics (Topic), each corresponding to a "message queue," i.e., similar to a table in a database. However, if all the messages of the same type are stuffed into a "central" queue, there is a lack of scalability, and both the increase in the number of producers/consumers and the increase in the number of messages may exhaust the performance or storage of the system. For this problem, as shown in fig. 2, the concept of Partition (Partition) introduced in the present solution completes horizontal extension.
3. Broker and Cluster (Cluster)
A Kafka server, also known as a Broker, accepts messages sent by the producer and stores them in disk, and the Broker also serves requests from consumers to pull partition messages, returning messages that have been submitted. With specific machine hardware, a Broker can process thousands of partitions and millions of messages per second. Several Broker groups form a Cluster (Cluster), wherein a certain Broker in the Cluster becomes a Cluster Controller (Cluster Controller) which is responsible for managing the Cluster, including assigning partitions to the Broker, monitoring Broker failures, etc. Within a cluster, a partition is responsible for a Broker, also referred to as the Leader for that partition. Of course, one partition can be duplicated on multiple Broker for redundancy so that its partition can be reassigned to other Broker for responsibility when there is a Broker failure.
In the official CLIENT API, different groupids distinguish different groups of consumers, the sum of data consumed by consumers in the same group (groupids are the same) is equal to the sum of data in topic, and some consumers in a group cannot consume data if the sum of consumer data is greater than the number of partitions of topic. And if the number of consumers in the same group is less than the number of partitions of topic, some of them will consume data in multiple partitions. If the number of consumers in the same group is exactly equal to the number of partitions of topic, the consumers correspond to the partitions one by one, so the consumption efficiency of the consumer end is improved only by increasing the number of partitions of topic, but the number of partitions is limited from the aspect of a server hardware environment. Therefore, to improve consumption efficiency of the consuming side under the limited number of topic partitions, a special consumption mode is needed, and consumption parallelism capacity is increased in a phase-changing manner.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned deficiencies in the background art, and provide a method for increasing kafka consumption capability, which can enable one Partition to be consumed by multiple Consumers in the same group, and can ensure the integrity of data.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method for increasing kafka consumption capacity is applied to a server system and comprises the following steps:
A. establishing a connection with the Kafka service;
B. dividing the data segments for each partition to obtain the number of the segments of each partition;
C. setting a starting offset and an ending offset for each segment of each partition;
D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment;
E. and completing the consumption process to obtain consumption data.
Further, in the step a, specifically, Kafka consummer establishment and Kafka ligation provided by Kafka Client are performed.
Further, the specific method for obtaining the number of segments by dividing the data segments of each segment in the step B is as follows: setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of a segment of a partition is read by an independent thread; the number N of the segments of each partition can be calculated by the method; one of the key technical points of the scheme is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset endOffset and an end offset beginOffset of the partition at the current time point need to be recorded, then data in an (endOffset-beginOffset) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M N threads are needed to complete the whole consumption.
Further, when the obtained N is a non-integer, the final value of N is obtained by adding 1 to the obtained N, and if the obtained N is 3.3, the final value of N is 4, that is, the partition is specifically divided into 4 segments.
Further, the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the start offset S1 of the first segment of each partition is begin offset; the starting offset SN of the nth segment is beginOffset + (N-1) × T +1, T is the step size of each segment, and N is an integer greater than 1; the end offset EN of the last segment of each partition is endOffset-1; the ending offset Em of the other subsections except the last subsection of each subsection is beginOffset + m T, T is the step length of each subsection, and m is 1,2, … and N-1; in this step, data is prepared for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption by recording the start offset and the end offset of each segment and the start offset and the end offset of each partition.
Further, in the step D, specifically, the API provided by Kafka is used to establish a consumer-side consumer of Kafka, and the assign method is used to specify a consumed partition and the seek method is used to specify a start offset of consumption.
Further, in the step D, a consumer at the consuming end corresponds to a segment of a partition, and when the consumer starts to consume next time, the consumption start offset is the end offset of the last consumption of the partition, and the consumption length of each consumer is equal to the step length T of the corresponding segment, Kafka consumption data is consumed in a polling manner, and when the set end offset — endOffset of the last consumption is consumed, the consumption process needs to be stopped, so as to avoid over consumption and data repetition.
Further, in the step E, when
Figure BDA0002213719950000041
After all the threads are correctly finished, judging that the consumption process is finished, wherein NiIs the number of segments of the ith partition, and n is the total number of all partitions.
Compared with the prior art, the invention has the following beneficial effects:
at present, the common way of increasing the kafka consumption capacity is to increase the number of partitions and then increase the parallel consumption capacity by deploying a plurality of consumers so as to improve the consumption efficiency; in practice, however, the number of partitions cannot be increased infinitely, and in most cases, the execution speed of the traffic is slow. Aiming at the problems, the method for increasing the kafka consumption capability of the invention provides a special consumption mode, which is changed to increase the parallel consumption capability and records the consumption condition of each thread through multi-thread consumption, thereby improving the consumption parallelism and increasing the utilization rate of an application machine. The technical scheme of the invention is simple and effective, has wide application range, can well reduce the cost and the operation and maintenance complexity, and has performance indexes reaching high industrial technical level.
Drawings
FIG. 1 is a schematic diagram of producer and consumer relationships in kafka.
FIG. 2 is a schematic diagram of the subject and partition relationships in kafka.
FIG. 3 is a multi-threaded parallel consumption diagram of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
the first embodiment is as follows:
a method for increasing kafka consumption capability can enable one Partition to be consumed by a plurality of Consumers under the same group, ensure data integrity and effectively increase the utilization rate of an application machine.
Specifically, this embodiment is performed under the condition that the Kafka cluster is already deployed, and the network and the basic environment can normally work, and meanwhile, it is assumed in this embodiment that it is known that Topic consumed this time has three Partition partitions, namely Partition 1, Partition 2, and Partition 3, and the start offsets of the partitions corresponding to these partitions are beginOffsets1, beginOffsets2, beginOffsets3, and the end offsets of the partitions are endOffsets1, endOffsets2, endOffsets 3.
The method for increasing kafka consumption capability of the embodiment specifically includes the following steps:
and step 1, establishing connection with the Kafka service. In this example, the Kafka consummer provided by Kafka Client is specifically used to establish and connect Kafka.
And 2, dividing the data segment for each Partition (Partition).
As shown in fig. 3, one of the key technical points of the present solution is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset S and an end offset E of the partition at the current time point need to be recorded, then the data in the (E-S) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M × N threads are needed to complete the whole consumption.
Specifically, in this embodiment, the step size of the segment is set to be T, and the number N of data segments of each partition is calculated by using the formula N ═ endinfset/T, where when N is not an integer, it is necessary to add one more on the basis of N. In this embodiment, the number of the divided data segments N1 of Partition 1 is: n1 ═ i (endiffset 1-beginOffset 1)/T.
Specifically, when the above steps are implemented by using a software program, the corresponding program core code is as follows:
Figure BDA0002213719950000061
Figure BDA0002213719950000071
by analogy, in this embodiment, the Partition numbers N2 and N3 of Partition 2 and Partition 3 need to be calculated respectively.
And 3, setting a starting offset and an ending offset for each segment of each partition.
Knowing the number of segments, the starting offset and the ending offset for each partition, and the set step size T, the starting offset S for the first segment of each partition1The starting offset of the remaining segments, except the first segment, can be represented by the formula SNThe start offset of each Partition 1 can be calculated by the above method, that is, by obtaining (N-1) × T +1, where T is the step size of each Partition, and then the start offset S1 of the first Partition is beginOffset, and the start offset S2 of the second Partition is beginOffset + (2-1) × T + 1.
At the same time, the end offset E of the last segment of each partition is setNendOffset-1, and the end offset Em of the rest of the segments except the last segment of each partition is beginOffset + m T, m is 1,2, …, N-1, the end offset E of the first segment is then1Begin offset + 1T, offset E of the last segmentNThe end offset amount of each segment of Partition 1 can be calculated by the above method.
Specifically, when the above steps are implemented by using a software program, the corresponding program core code is as follows:
Figure BDA0002213719950000072
Figure BDA0002213719950000081
where _ begin offset is the start offset of each segment of the Partition 1 Partition and _ end offset is the end offset of each segment of the Partition 1 Partition.
In this step, the start offset and end offset of each segment and the start offset and end offset of each partition are recorded, and data preparation is performed for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption.
And 4, appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment.
In this embodiment, an API provided by Kafka is specifically used to establish a consumer provider of Kafka, and an assign method is used to specify a consumed Partition and a seek method is used to specify a consumption start offset, such as a certain segment of Partition 1, as follows:
consumer.assign(Partition 1)
consumer.seek(Partition 1,_beginOffset)。
since Kafka consumes data in a round-robin manner, it is necessary to stop the consumption process when the set endOffset is consumed, so as to avoid over consumption and data duplication.
The pseudo code for the specific implementation is as follows:
Figure BDA0002213719950000091
where record represents each datum in kafka.
And 5, completing the consumption process to obtain consumption data.
When the N1+ N2+ N3 threads are all correctly finished, the whole consumption process is smoothly finished, and the required consumption data can be acquired.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. A method for increasing kafka consumption capacity, which is applied to a server system, is characterized by comprising the following steps:
A. establishing a connection with the Kafka service;
B. dividing the data segments for each partition to obtain the number of the segments of each partition;
C. setting a starting offset and an ending offset for each segment of each partition;
D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment;
E. and completing the consumption process to obtain consumption data.
2. The method for increasing Kafka consumption ability of claim 1, wherein the step A comprises Kafka Consumer establishment and Kafka linkage provided by Kafka Client.
3. The method for increasing kafka consumption capability of claim 1, wherein the specific method for dividing the data segments of each partition in step B to obtain the number of the segments is as follows:
setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of one segment of one partition is read by one independent thread.
4. The method of claim 3, wherein when the calculated N is a non-integer, the final value of N is the calculated N plus 1 and rounded.
5. The method for increasing kafka consumption capability of claim 4, wherein the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the starting offset S of the first segment of each partition1beginOffset; the NthStarting offset S of segmentNbeginOffset + (N-1) × T +1, where T is the step size of each segment and N is an integer greater than 1;
end offset E of last segment of each partitionNendOffset-1; the end offset Em of each segment except the last segment is beginOffset + m T, T is the step size of each segment, and m is 1,2, …, N-1.
6. The method for increasing Kafka consumption capability of claim 5, wherein in step D, the API provided by Kafka is used to create Kafka consumer, assign the consumed partition using the assign method, and assign the consumption start offset using the seek method.
7. The method of claim 6, wherein in step D, a consumer corresponds to a segment of a partition, and when the consumer starts the next consumption, the consumption start offset is the ending offset of the last consumption of the partition, and each consumption length of each consumer is equal to the step size T of the corresponding segment.
8. The method for increasing kafka consumption ability of claim 7, wherein in step E, when step E is performed, the kafka consumption ability is increased
Figure FDA0002213719940000021
After all the threads are correctly finished, judging that the consumption process is finished, wherein NiIs the number of segments of the ith partition, and n is the total number of all partitions.
CN201910907527.5A 2019-09-24 2019-09-24 Method for increasing kafka consumption capacity Pending CN110648178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910907527.5A CN110648178A (en) 2019-09-24 2019-09-24 Method for increasing kafka consumption capacity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910907527.5A CN110648178A (en) 2019-09-24 2019-09-24 Method for increasing kafka consumption capacity

Publications (1)

Publication Number Publication Date
CN110648178A true CN110648178A (en) 2020-01-03

Family

ID=69011169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910907527.5A Pending CN110648178A (en) 2019-09-24 2019-09-24 Method for increasing kafka consumption capacity

Country Status (1)

Country Link
CN (1) CN110648178A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199202A (en) * 2020-09-16 2021-01-08 河北航天信息技术有限公司 Development method for expanding Kafka consumption capacity
CN112506992A (en) * 2020-12-04 2021-03-16 中国人寿保险股份有限公司 Fuzzy query method and device for Kafka data, electronic equipment and storage medium
CN112905109A (en) * 2021-01-28 2021-06-04 平安普惠企业管理有限公司 Message processing method, device, equipment and storage medium
CN114385081A (en) * 2021-12-27 2022-04-22 联通智网科技股份有限公司 Disk protection method for kafka cluster and related equipment
CN114401269A (en) * 2021-12-08 2022-04-26 国电南瑞科技股份有限公司 Business data distribution method and system and Internet of things management platform
CN114827049A (en) * 2022-03-02 2022-07-29 厦门服云信息科技有限公司 Accumulated data consumption method based on kafka, terminal equipment and storage medium
CN115150471A (en) * 2022-06-27 2022-10-04 北京百度网讯科技有限公司 Data processing method, device, equipment, storage medium and program product
CN116132395A (en) * 2022-11-15 2023-05-16 马上消费金融股份有限公司 Message processing method, electronic device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344953A1 (en) * 2013-05-16 2014-11-20 Nfluence Media, Inc. Privacy sensitive persona management tools
CN106302385A (en) * 2016-07-26 2017-01-04 努比亚技术有限公司 A kind of message distribution device and method
CN106776855A (en) * 2016-11-29 2017-05-31 上海轻维软件有限公司 The processing method of Kafka data is read based on Spark Streaming
CN109002484A (en) * 2018-06-25 2018-12-14 北京明朝万达科技股份有限公司 A kind of method and system for sequence consumption data
CN109493076A (en) * 2018-11-09 2019-03-19 武汉斗鱼网络科技有限公司 A kind of unique consuming method of Kafka message, system, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140344953A1 (en) * 2013-05-16 2014-11-20 Nfluence Media, Inc. Privacy sensitive persona management tools
CN106302385A (en) * 2016-07-26 2017-01-04 努比亚技术有限公司 A kind of message distribution device and method
CN106776855A (en) * 2016-11-29 2017-05-31 上海轻维软件有限公司 The processing method of Kafka data is read based on Spark Streaming
CN109002484A (en) * 2018-06-25 2018-12-14 北京明朝万达科技股份有限公司 A kind of method and system for sequence consumption data
CN109493076A (en) * 2018-11-09 2019-03-19 武汉斗鱼网络科技有限公司 A kind of unique consuming method of Kafka message, system, server and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高宗宝 等: "Spark平台中Kafka偏移量的读取管理与设计", 《软件》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199202B (en) * 2020-09-16 2023-04-07 河北航天信息技术有限公司 Development method for expanding Kafka consumption capacity
CN112199202A (en) * 2020-09-16 2021-01-08 河北航天信息技术有限公司 Development method for expanding Kafka consumption capacity
CN112506992A (en) * 2020-12-04 2021-03-16 中国人寿保险股份有限公司 Fuzzy query method and device for Kafka data, electronic equipment and storage medium
CN112506992B (en) * 2020-12-04 2024-04-16 中国人寿保险股份有限公司 Fuzzy query method and device for Kafka data, electronic equipment and storage medium
CN112905109A (en) * 2021-01-28 2021-06-04 平安普惠企业管理有限公司 Message processing method, device, equipment and storage medium
CN112905109B (en) * 2021-01-28 2023-02-03 平安普惠企业管理有限公司 Message processing method, device, equipment and storage medium
CN114401269A (en) * 2021-12-08 2022-04-26 国电南瑞科技股份有限公司 Business data distribution method and system and Internet of things management platform
CN114385081A (en) * 2021-12-27 2022-04-22 联通智网科技股份有限公司 Disk protection method for kafka cluster and related equipment
CN114827049B (en) * 2022-03-02 2023-05-09 厦门服云信息科技有限公司 Pile-up data consumption method based on kafka, terminal equipment and storage medium
CN114827049A (en) * 2022-03-02 2022-07-29 厦门服云信息科技有限公司 Accumulated data consumption method based on kafka, terminal equipment and storage medium
CN115150471A (en) * 2022-06-27 2022-10-04 北京百度网讯科技有限公司 Data processing method, device, equipment, storage medium and program product
CN115150471B (en) * 2022-06-27 2024-03-29 北京百度网讯科技有限公司 Data processing method, apparatus, device, storage medium, and program product
CN116132395A (en) * 2022-11-15 2023-05-16 马上消费金融股份有限公司 Message processing method, electronic device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN110648178A (en) Method for increasing kafka consumption capacity
CN111723160B (en) Multi-source heterogeneous incremental data synchronization method and system
US9336069B2 (en) Attributing causality to program execution capacity modifications
CN105335251B (en) A kind of fault recovery method and system
US9201747B2 (en) Real time database system
US20200169616A1 (en) Method and system for synchronizing publication and subscription of message queues
CN110795503A (en) Multi-cluster data synchronization method and related device of distributed storage system
JP2007503628A (en) Fast application notification in clustered computing systems
CN105095364A (en) Data synchronizing system and method
US20130031221A1 (en) Distributed data storage system and method
US20120278817A1 (en) Event distribution pattern for use with a distributed data grid
US10826812B2 (en) Multiple quorum witness
CN114064211B (en) Video stream analysis system and method based on end-side-cloud computing architecture
CN113055430A (en) Data synchronization method and related equipment
JP2016529629A (en) System and method for supporting partition level journaling to synchronize data in a distributed data grid
CN112217847A (en) Micro service platform, implementation method thereof, electronic device and storage medium
CN111552701B (en) Method for determining data consistency in distributed cluster and distributed data system
CN111930538A (en) Production and consumption method based on kafka cluster
CN111163118B (en) Message transmission method and device in Kafka cluster
CN115292414A (en) Method for synchronizing service data to data bins
CN111240901A (en) Node dynamic expansion system, method and equipment of distributed block storage system
CN115495265A (en) Method for improving kafka consumption capacity based on hadoop
CN108111630B (en) Zookeeper cluster system and connection method and system thereof
WO2024051454A1 (en) Method and apparatus for processing transaction log
CN112052104A (en) Message queue management method based on multi-computer-room realization and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200103

RJ01 Rejection of invention patent application after publication