CN110648178A - Method for increasing kafka consumption capacity - Google Patents
Method for increasing kafka consumption capacity Download PDFInfo
- Publication number
- CN110648178A CN110648178A CN201910907527.5A CN201910907527A CN110648178A CN 110648178 A CN110648178 A CN 110648178A CN 201910907527 A CN201910907527 A CN 201910907527A CN 110648178 A CN110648178 A CN 110648178A
- Authority
- CN
- China
- Prior art keywords
- partition
- segment
- consumption
- offset
- kafka
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for increasing kafka consumption capacity, which is applied to a server system and comprises the following steps: A. establishing a connection with the Kafka service; B. dividing the data segments for each partition to obtain the number of the segments of each partition; C. setting a starting offset and an ending offset for each segment of each partition; D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment; E. and completing the consumption process to obtain consumption data. The method can realize that one Partition can be consumed by a plurality of Consumers under the same group, and the integrity of data can be ensured.
Description
Technical Field
The invention relates to the technical field of software, in particular to a method for increasing kafka consumption capacity.
Background
Kafka is a high throughput distributed publish-subscribe messaging system. It is now used by many companies as a data pipe and messaging system of many types.
For better illustration and understanding of the technical solution of the present invention, the basic concept of Kafka is introduced as follows:
1. producer and consumer
There are two basic types of clients for Kafka, as shown in fig. 1, including: producer (Producer), Consumer (Consumer), Producer (also known as publisher) creates messages, and Consumer (also known as subscriber) is responsible for consuming messages.
2. Topic (Topic) and Partition (Partition)
In Kafka, messages are categorized by topics (Topic), each corresponding to a "message queue," i.e., similar to a table in a database. However, if all the messages of the same type are stuffed into a "central" queue, there is a lack of scalability, and both the increase in the number of producers/consumers and the increase in the number of messages may exhaust the performance or storage of the system. For this problem, as shown in fig. 2, the concept of Partition (Partition) introduced in the present solution completes horizontal extension.
3. Broker and Cluster (Cluster)
A Kafka server, also known as a Broker, accepts messages sent by the producer and stores them in disk, and the Broker also serves requests from consumers to pull partition messages, returning messages that have been submitted. With specific machine hardware, a Broker can process thousands of partitions and millions of messages per second. Several Broker groups form a Cluster (Cluster), wherein a certain Broker in the Cluster becomes a Cluster Controller (Cluster Controller) which is responsible for managing the Cluster, including assigning partitions to the Broker, monitoring Broker failures, etc. Within a cluster, a partition is responsible for a Broker, also referred to as the Leader for that partition. Of course, one partition can be duplicated on multiple Broker for redundancy so that its partition can be reassigned to other Broker for responsibility when there is a Broker failure.
In the official CLIENT API, different groupids distinguish different groups of consumers, the sum of data consumed by consumers in the same group (groupids are the same) is equal to the sum of data in topic, and some consumers in a group cannot consume data if the sum of consumer data is greater than the number of partitions of topic. And if the number of consumers in the same group is less than the number of partitions of topic, some of them will consume data in multiple partitions. If the number of consumers in the same group is exactly equal to the number of partitions of topic, the consumers correspond to the partitions one by one, so the consumption efficiency of the consumer end is improved only by increasing the number of partitions of topic, but the number of partitions is limited from the aspect of a server hardware environment. Therefore, to improve consumption efficiency of the consuming side under the limited number of topic partitions, a special consumption mode is needed, and consumption parallelism capacity is increased in a phase-changing manner.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned deficiencies in the background art, and provide a method for increasing kafka consumption capability, which can enable one Partition to be consumed by multiple Consumers in the same group, and can ensure the integrity of data.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method for increasing kafka consumption capacity is applied to a server system and comprises the following steps:
A. establishing a connection with the Kafka service;
B. dividing the data segments for each partition to obtain the number of the segments of each partition;
C. setting a starting offset and an ending offset for each segment of each partition;
D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment;
E. and completing the consumption process to obtain consumption data.
Further, in the step a, specifically, Kafka consummer establishment and Kafka ligation provided by Kafka Client are performed.
Further, the specific method for obtaining the number of segments by dividing the data segments of each segment in the step B is as follows: setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of a segment of a partition is read by an independent thread; the number N of the segments of each partition can be calculated by the method; one of the key technical points of the scheme is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset endOffset and an end offset beginOffset of the partition at the current time point need to be recorded, then data in an (endOffset-beginOffset) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M N threads are needed to complete the whole consumption.
Further, when the obtained N is a non-integer, the final value of N is obtained by adding 1 to the obtained N, and if the obtained N is 3.3, the final value of N is 4, that is, the partition is specifically divided into 4 segments.
Further, the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the start offset S1 of the first segment of each partition is begin offset; the starting offset SN of the nth segment is beginOffset + (N-1) × T +1, T is the step size of each segment, and N is an integer greater than 1; the end offset EN of the last segment of each partition is endOffset-1; the ending offset Em of the other subsections except the last subsection of each subsection is beginOffset + m T, T is the step length of each subsection, and m is 1,2, … and N-1; in this step, data is prepared for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption by recording the start offset and the end offset of each segment and the start offset and the end offset of each partition.
Further, in the step D, specifically, the API provided by Kafka is used to establish a consumer-side consumer of Kafka, and the assign method is used to specify a consumed partition and the seek method is used to specify a start offset of consumption.
Further, in the step D, a consumer at the consuming end corresponds to a segment of a partition, and when the consumer starts to consume next time, the consumption start offset is the end offset of the last consumption of the partition, and the consumption length of each consumer is equal to the step length T of the corresponding segment, Kafka consumption data is consumed in a polling manner, and when the set end offset — endOffset of the last consumption is consumed, the consumption process needs to be stopped, so as to avoid over consumption and data repetition.
Further, in the step E, whenAfter all the threads are correctly finished, judging that the consumption process is finished, wherein NiIs the number of segments of the ith partition, and n is the total number of all partitions.
Compared with the prior art, the invention has the following beneficial effects:
at present, the common way of increasing the kafka consumption capacity is to increase the number of partitions and then increase the parallel consumption capacity by deploying a plurality of consumers so as to improve the consumption efficiency; in practice, however, the number of partitions cannot be increased infinitely, and in most cases, the execution speed of the traffic is slow. Aiming at the problems, the method for increasing the kafka consumption capability of the invention provides a special consumption mode, which is changed to increase the parallel consumption capability and records the consumption condition of each thread through multi-thread consumption, thereby improving the consumption parallelism and increasing the utilization rate of an application machine. The technical scheme of the invention is simple and effective, has wide application range, can well reduce the cost and the operation and maintenance complexity, and has performance indexes reaching high industrial technical level.
Drawings
FIG. 1 is a schematic diagram of producer and consumer relationships in kafka.
FIG. 2 is a schematic diagram of the subject and partition relationships in kafka.
FIG. 3 is a multi-threaded parallel consumption diagram of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
the first embodiment is as follows:
a method for increasing kafka consumption capability can enable one Partition to be consumed by a plurality of Consumers under the same group, ensure data integrity and effectively increase the utilization rate of an application machine.
Specifically, this embodiment is performed under the condition that the Kafka cluster is already deployed, and the network and the basic environment can normally work, and meanwhile, it is assumed in this embodiment that it is known that Topic consumed this time has three Partition partitions, namely Partition 1, Partition 2, and Partition 3, and the start offsets of the partitions corresponding to these partitions are beginOffsets1, beginOffsets2, beginOffsets3, and the end offsets of the partitions are endOffsets1, endOffsets2, endOffsets 3.
The method for increasing kafka consumption capability of the embodiment specifically includes the following steps:
and step 1, establishing connection with the Kafka service. In this example, the Kafka consummer provided by Kafka Client is specifically used to establish and connect Kafka.
And 2, dividing the data segment for each Partition (Partition).
As shown in fig. 3, one of the key technical points of the present solution is to divide each partition under each topic into data segments when a new consumption is started, that is, a start offset S and an end offset E of the partition at the current time point need to be recorded, then the data in the (E-S) interval is divided into N parts by a fixed step length T, each part of data is read by an independent thread, and if there are M partitions, M × N threads are needed to complete the whole consumption.
Specifically, in this embodiment, the step size of the segment is set to be T, and the number N of data segments of each partition is calculated by using the formula N ═ endinfset/T, where when N is not an integer, it is necessary to add one more on the basis of N. In this embodiment, the number of the divided data segments N1 of Partition 1 is: n1 ═ i (endiffset 1-beginOffset 1)/T.
Specifically, when the above steps are implemented by using a software program, the corresponding program core code is as follows:
by analogy, in this embodiment, the Partition numbers N2 and N3 of Partition 2 and Partition 3 need to be calculated respectively.
And 3, setting a starting offset and an ending offset for each segment of each partition.
Knowing the number of segments, the starting offset and the ending offset for each partition, and the set step size T, the starting offset S for the first segment of each partition1The starting offset of the remaining segments, except the first segment, can be represented by the formula SNThe start offset of each Partition 1 can be calculated by the above method, that is, by obtaining (N-1) × T +1, where T is the step size of each Partition, and then the start offset S1 of the first Partition is beginOffset, and the start offset S2 of the second Partition is beginOffset + (2-1) × T + 1.
At the same time, the end offset E of the last segment of each partition is setNendOffset-1, and the end offset Em of the rest of the segments except the last segment of each partition is beginOffset + m T, m is 1,2, …, N-1, the end offset E of the first segment is then1Begin offset + 1T, offset E of the last segmentNThe end offset amount of each segment of Partition 1 can be calculated by the above method.
Specifically, when the above steps are implemented by using a software program, the corresponding program core code is as follows:
where _ begin offset is the start offset of each segment of the Partition 1 Partition and _ end offset is the end offset of each segment of the Partition 1 Partition.
In this step, the start offset and end offset of each segment and the start offset and end offset of each partition are recorded, and data preparation is performed for the following failed retry of consumption of each segment and the start consumption offset of each partition required for the next consumption.
And 4, appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment.
In this embodiment, an API provided by Kafka is specifically used to establish a consumer provider of Kafka, and an assign method is used to specify a consumed Partition and a seek method is used to specify a consumption start offset, such as a certain segment of Partition 1, as follows:
consumer.assign(Partition 1)
consumer.seek(Partition 1,_beginOffset)。
since Kafka consumes data in a round-robin manner, it is necessary to stop the consumption process when the set endOffset is consumed, so as to avoid over consumption and data duplication.
The pseudo code for the specific implementation is as follows:
where record represents each datum in kafka.
And 5, completing the consumption process to obtain consumption data.
When the N1+ N2+ N3 threads are all correctly finished, the whole consumption process is smoothly finished, and the required consumption data can be acquired.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.
Claims (8)
1. A method for increasing kafka consumption capacity, which is applied to a server system, is characterized by comprising the following steps:
A. establishing a connection with the Kafka service;
B. dividing the data segments for each partition to obtain the number of the segments of each partition;
C. setting a starting offset and an ending offset for each segment of each partition;
D. appointing a consumption interval for each consumer according to the obtained number of the segments of each partition and the starting offset and the ending offset of each segment;
E. and completing the consumption process to obtain consumption data.
2. The method for increasing Kafka consumption ability of claim 1, wherein the step A comprises Kafka Consumer establishment and Kafka linkage provided by Kafka Client.
3. The method for increasing kafka consumption capability of claim 1, wherein the specific method for dividing the data segments of each partition in step B to obtain the number of the segments is as follows:
setting the step length of each segment as T, and then setting the segment number N as (endOffset-beginOffset)/T, wherein endOffset is the ending offset of the segment, and beginOffset is the starting offset of the segment; and data of one segment of one partition is read by one independent thread.
4. The method of claim 3, wherein when the calculated N is a non-integer, the final value of N is the calculated N plus 1 and rounded.
5. The method for increasing kafka consumption capability of claim 4, wherein the specific calculation method for setting the start offset and the end offset of each segment in the step C is as follows: the starting offset S of the first segment of each partition1beginOffset; the NthStarting offset S of segmentNbeginOffset + (N-1) × T +1, where T is the step size of each segment and N is an integer greater than 1;
end offset E of last segment of each partitionNendOffset-1; the end offset Em of each segment except the last segment is beginOffset + m T, T is the step size of each segment, and m is 1,2, …, N-1.
6. The method for increasing Kafka consumption capability of claim 5, wherein in step D, the API provided by Kafka is used to create Kafka consumer, assign the consumed partition using the assign method, and assign the consumption start offset using the seek method.
7. The method of claim 6, wherein in step D, a consumer corresponds to a segment of a partition, and when the consumer starts the next consumption, the consumption start offset is the ending offset of the last consumption of the partition, and each consumption length of each consumer is equal to the step size T of the corresponding segment.
8. The method for increasing kafka consumption ability of claim 7, wherein in step E, when step E is performed, the kafka consumption ability is increasedAfter all the threads are correctly finished, judging that the consumption process is finished, wherein NiIs the number of segments of the ith partition, and n is the total number of all partitions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910907527.5A CN110648178A (en) | 2019-09-24 | 2019-09-24 | Method for increasing kafka consumption capacity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910907527.5A CN110648178A (en) | 2019-09-24 | 2019-09-24 | Method for increasing kafka consumption capacity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110648178A true CN110648178A (en) | 2020-01-03 |
Family
ID=69011169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910907527.5A Pending CN110648178A (en) | 2019-09-24 | 2019-09-24 | Method for increasing kafka consumption capacity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110648178A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199202A (en) * | 2020-09-16 | 2021-01-08 | 河北航天信息技术有限公司 | Development method for expanding Kafka consumption capacity |
CN112506992A (en) * | 2020-12-04 | 2021-03-16 | 中国人寿保险股份有限公司 | Fuzzy query method and device for Kafka data, electronic equipment and storage medium |
CN112905109A (en) * | 2021-01-28 | 2021-06-04 | 平安普惠企业管理有限公司 | Message processing method, device, equipment and storage medium |
CN114385081A (en) * | 2021-12-27 | 2022-04-22 | 联通智网科技股份有限公司 | Disk protection method for kafka cluster and related equipment |
CN114401269A (en) * | 2021-12-08 | 2022-04-26 | 国电南瑞科技股份有限公司 | Business data distribution method and system and Internet of things management platform |
CN114827049A (en) * | 2022-03-02 | 2022-07-29 | 厦门服云信息科技有限公司 | Accumulated data consumption method based on kafka, terminal equipment and storage medium |
CN115150471A (en) * | 2022-06-27 | 2022-10-04 | 北京百度网讯科技有限公司 | Data processing method, device, equipment, storage medium and program product |
CN116132395A (en) * | 2022-11-15 | 2023-05-16 | 马上消费金融股份有限公司 | Message processing method, electronic device and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140344953A1 (en) * | 2013-05-16 | 2014-11-20 | Nfluence Media, Inc. | Privacy sensitive persona management tools |
CN106302385A (en) * | 2016-07-26 | 2017-01-04 | 努比亚技术有限公司 | A kind of message distribution device and method |
CN106776855A (en) * | 2016-11-29 | 2017-05-31 | 上海轻维软件有限公司 | The processing method of Kafka data is read based on Spark Streaming |
CN109002484A (en) * | 2018-06-25 | 2018-12-14 | 北京明朝万达科技股份有限公司 | A kind of method and system for sequence consumption data |
CN109493076A (en) * | 2018-11-09 | 2019-03-19 | 武汉斗鱼网络科技有限公司 | A kind of unique consuming method of Kafka message, system, server and storage medium |
-
2019
- 2019-09-24 CN CN201910907527.5A patent/CN110648178A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140344953A1 (en) * | 2013-05-16 | 2014-11-20 | Nfluence Media, Inc. | Privacy sensitive persona management tools |
CN106302385A (en) * | 2016-07-26 | 2017-01-04 | 努比亚技术有限公司 | A kind of message distribution device and method |
CN106776855A (en) * | 2016-11-29 | 2017-05-31 | 上海轻维软件有限公司 | The processing method of Kafka data is read based on Spark Streaming |
CN109002484A (en) * | 2018-06-25 | 2018-12-14 | 北京明朝万达科技股份有限公司 | A kind of method and system for sequence consumption data |
CN109493076A (en) * | 2018-11-09 | 2019-03-19 | 武汉斗鱼网络科技有限公司 | A kind of unique consuming method of Kafka message, system, server and storage medium |
Non-Patent Citations (1)
Title |
---|
高宗宝 等: "Spark平台中Kafka偏移量的读取管理与设计", 《软件》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199202B (en) * | 2020-09-16 | 2023-04-07 | 河北航天信息技术有限公司 | Development method for expanding Kafka consumption capacity |
CN112199202A (en) * | 2020-09-16 | 2021-01-08 | 河北航天信息技术有限公司 | Development method for expanding Kafka consumption capacity |
CN112506992A (en) * | 2020-12-04 | 2021-03-16 | 中国人寿保险股份有限公司 | Fuzzy query method and device for Kafka data, electronic equipment and storage medium |
CN112506992B (en) * | 2020-12-04 | 2024-04-16 | 中国人寿保险股份有限公司 | Fuzzy query method and device for Kafka data, electronic equipment and storage medium |
CN112905109A (en) * | 2021-01-28 | 2021-06-04 | 平安普惠企业管理有限公司 | Message processing method, device, equipment and storage medium |
CN112905109B (en) * | 2021-01-28 | 2023-02-03 | 平安普惠企业管理有限公司 | Message processing method, device, equipment and storage medium |
CN114401269A (en) * | 2021-12-08 | 2022-04-26 | 国电南瑞科技股份有限公司 | Business data distribution method and system and Internet of things management platform |
CN114385081A (en) * | 2021-12-27 | 2022-04-22 | 联通智网科技股份有限公司 | Disk protection method for kafka cluster and related equipment |
CN114827049B (en) * | 2022-03-02 | 2023-05-09 | 厦门服云信息科技有限公司 | Pile-up data consumption method based on kafka, terminal equipment and storage medium |
CN114827049A (en) * | 2022-03-02 | 2022-07-29 | 厦门服云信息科技有限公司 | Accumulated data consumption method based on kafka, terminal equipment and storage medium |
CN115150471A (en) * | 2022-06-27 | 2022-10-04 | 北京百度网讯科技有限公司 | Data processing method, device, equipment, storage medium and program product |
CN115150471B (en) * | 2022-06-27 | 2024-03-29 | 北京百度网讯科技有限公司 | Data processing method, apparatus, device, storage medium, and program product |
CN116132395A (en) * | 2022-11-15 | 2023-05-16 | 马上消费金融股份有限公司 | Message processing method, electronic device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110648178A (en) | Method for increasing kafka consumption capacity | |
CN111723160B (en) | Multi-source heterogeneous incremental data synchronization method and system | |
US9336069B2 (en) | Attributing causality to program execution capacity modifications | |
CN105335251B (en) | A kind of fault recovery method and system | |
US9201747B2 (en) | Real time database system | |
US20200169616A1 (en) | Method and system for synchronizing publication and subscription of message queues | |
CN110795503A (en) | Multi-cluster data synchronization method and related device of distributed storage system | |
JP2007503628A (en) | Fast application notification in clustered computing systems | |
CN105095364A (en) | Data synchronizing system and method | |
US20130031221A1 (en) | Distributed data storage system and method | |
US20120278817A1 (en) | Event distribution pattern for use with a distributed data grid | |
US10826812B2 (en) | Multiple quorum witness | |
CN114064211B (en) | Video stream analysis system and method based on end-side-cloud computing architecture | |
CN113055430A (en) | Data synchronization method and related equipment | |
JP2016529629A (en) | System and method for supporting partition level journaling to synchronize data in a distributed data grid | |
CN112217847A (en) | Micro service platform, implementation method thereof, electronic device and storage medium | |
CN111552701B (en) | Method for determining data consistency in distributed cluster and distributed data system | |
CN111930538A (en) | Production and consumption method based on kafka cluster | |
CN111163118B (en) | Message transmission method and device in Kafka cluster | |
CN115292414A (en) | Method for synchronizing service data to data bins | |
CN111240901A (en) | Node dynamic expansion system, method and equipment of distributed block storage system | |
CN115495265A (en) | Method for improving kafka consumption capacity based on hadoop | |
CN108111630B (en) | Zookeeper cluster system and connection method and system thereof | |
WO2024051454A1 (en) | Method and apparatus for processing transaction log | |
CN112052104A (en) | Message queue management method based on multi-computer-room realization and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200103 |
|
RJ01 | Rejection of invention patent application after publication |