CN113703982A

CN113703982A - Data consumption method, apparatus, terminal device and medium using KAFKA

Info

Publication number: CN113703982A
Application number: CN202111017211.2A
Authority: CN
Inventors: 李勇; 卢道和; 罗锶; 黄叶飞; 边元乔; 陈晓峰; 常亮; 姬岑晨; 胡思文; 郑喜生
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-26

Abstract

The invention relates to the technical field of financial science and technology, and discloses a data consumption method and device using KAFKA, terminal equipment and a computer storage medium. The data consumption method using KAFKA registers a data consumption instance through a KAFKA consumption side according to an instance registration rule, wherein the instance registration rule includes: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance; performing load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance; and maintaining a load balancing state for the data consumption instance to consume the data. The present invention enables a lightweight, highly available data consumption scheme using KAFKA.

Description

Data consumption method, apparatus, terminal device and medium using KAFKA

Technical Field

The present invention relates to the field of financial technology (Fintech) technologies, and in particular, to a data consumption method, apparatus, terminal device, and computer storage medium using KAFKA.

Background

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology, but higher requirements are also put forward on the technologies due to the requirements of the financial industry on safety, instantaneity, stability and the like.

At present, kafka (an open source stream processing platform, which is a high-throughput distributed publish-subscribe message system and can process all action stream data of a consumer in a website) real-time or quasi-real-time consumption data is mainly adopted in the big data field, and is mainly realized by a real-time stream computing framework based on big data components such as spark streaming (a stream processing system for performing high-throughput and fault-tolerant processing on real-time data streams) or Apache Flink (an open source stream processing framework, the core of which is a distributed stream data stream engine written by Java and Scala).

However, the real-time flow computing framework based on big data components not only has a high access threshold, but also consumes machine cost and a lot of human resources for operation and maintenance, thereby resulting in high cost of data consumption by kafka.

Disclosure of Invention

The invention mainly aims to provide a data consumption method, a device, a terminal device and a computer storage medium using KAFKA, aiming at solving the technical problem that the data consumption cost is high due to a real-time flow calculation framework based on a big data component when the data consumption is carried out by using the KAFKA in the prior art.

To achieve the above object, the present invention provides a data consuming method using KAFKA, which is applied to a KAFKA consuming side, the data consuming method using KAFKA including the steps of:

registering a data consumption instance according to an instance registration rule, wherein the instance registration rule comprises: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance;

performing load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance;

and maintaining a load balancing state for the data consumption instance to consume the data.

Further, to achieve the above object, the present invention also provides a data consuming apparatus using KAFKA, which is applied to a KAFKA consuming side, the data consuming apparatus using KAFKA including:

an instance registration module configured to register a data consumption instance according to an instance registration rule, wherein the instance registration rule includes: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance;

the load balancing module is used for carrying out load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance;

and the data consumption module is used for keeping a load balance state for the data consumption instance to perform data consumption.

Wherein the invention uses the respective functional modules of the data consuming apparatus of KAFKA to implement the steps of the data consuming method using KAFKA as described above at run-time.

In addition, to achieve the above object, the present invention also provides a terminal device, including: a memory, a processor and a data consuming program using KAFKA stored on said memory and executable on said processor, said data consuming program using KAFKA when executed by said processor implementing the steps of the data consuming method using KAFKA as described above.

Further, to achieve the above object, the present invention also provides a computer storage medium having stored thereon a data consuming program using KAFKA, which when executed by a processor implements the steps of the data consuming method using KAFKA as described above.

Furthermore, to achieve the above object, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the data consumption method using KAFKA as described above.

The invention provides a data consumption method, a device, a terminal device, a computer storage medium and a computer program product using KAFKA, which register a data consumption instance according to an instance registration rule by a KAFKA consumption terminal, wherein the instance registration rule comprises: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance; performing load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance; and maintaining a load balancing state for the data consumption instance to consume the data.

When the invention uses KAFKA to consume data, firstly, the KAFKA consumption end registers the data consumption instance on the current KAFKA consumption end according to the instance registration rule that the single partition of the single KAFKA theme can only correspond to one data consumption instance and one consumption thread, then the KAFKA consumption end performs load balancing operation according to the weight of all registered data consumption instances and the partition of a certain KAFKA theme on the KAFKA consumption end corresponding to each data consumption instance, and finally, the KAFKA consumption end maintains the load balancing state to enable all registered data consumption instances to consume data through the corresponding partitions.

Compared with the traditional mode of relying on a big data component to consume data by adopting KAFKA, the data consumption method only registers the data consumption examples through the KAFKA consumption end, and performs load balancing operation based on the respective weight of the data consumption end and the partition of the corresponding KAFKA theme, so that the load balancing state is maintained to be used for the registered data consumption examples to consume data. Therefore, a KAFKA lightweight and high-availability data consumption scheme is used, a real-time stream computing framework of large data components such as spark timing or Apache flight is not required, and the cost of data consumption is reduced.

Drawings

Fig. 1 is a schematic device structure diagram of a hardware operating environment of a terminal device according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating an embodiment of a data consumption method using KAFKA of the present invention;

FIG. 3 is a functional block diagram of an embodiment of a data consuming apparatus using KAFKA according to the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic device structure diagram of a terminal device hardware operating environment according to an embodiment of the present invention.

The terminal device of the embodiment of the present invention may be a terminal device of a KAFKA consumption end configured to perform data consumption for a data consumption instance, and the terminal device may be a server, a smart phone, a PC (Personal Computer), a tablet Computer, a portable Computer, or the like.

As shown in fig. 1, the terminal device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the terminal device configuration shown in fig. 1 is not intended to be limiting of the terminal device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data consuming program using KAFKA.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client and performing data communication with the client; and the processor 1001 may be configured to call the data consuming program using KAFKA stored in the memory 1005 and perform the following embodiments of the data consuming method using KAFKA.

Based on the above hardware structure, embodiments of the data consumption method using KAFKA of the present invention are proposed.

At present, kafka real-time or quasi-real-time consumption data is mainly adopted in the field of big data, and is mainly realized by a real-time stream computing framework based on big data components such as spark timing or Apache Flink.

In view of the above phenomenon, the present invention provides a data consumption method using KAFKA. Referring to fig. 2, fig. 2 is a flow chart illustrating a first embodiment of the data consuming method using KAFKA according to the present invention, in this embodiment, the data consuming method using KAFKA is applied to the terminal device configured as KAFKA consuming side, and the terminal device plays a role of a message middleware between a data consuming instance and data to be consumed. The data consumption method using KAFKA of the present invention includes:

step S10, registering the data consumption instance according to an instance registration rule, wherein the instance registration rule includes: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance;

the terminal device configured with the KAFKA consuming terminal can only correspond to one consuming thread of one data consuming instance according to the single partition of the single KAFKA theme, and the single data consuming instance can simultaneously correspond to the instance registration rules of a plurality of partitions by creating a plurality of consuming threads, and register the data consuming instance in the registration center corresponding to the current KAFKA consuming terminal.

It should be noted that, in this embodiment, the partition is a data transfer interface for interfacing with the data consumption instance, so as to provide the data to be consumed for the data consumption instance. The instance registration rule is preset to limit the association relationship between each data consumption instance and the partition of the KAFKA topic configured on the KAFKA consumption side when the data consumption instance is registered on the KAFKA consumption side, namely, a single partition of a KAFKA topic on the KAFKA consumption side can only correspond to the consumption thread of one data consumption instance, and one data consumption instance can simultaneously correspond to a plurality of partitions by creating a plurality of consumption threads, wherein the plurality of partitions can belong to the same KAFKA topic, or the plurality of partitions can belong to a plurality of KAFKA topics respectively.

Specifically, for example, the terminal device configured as the KAFKA consuming side registers one or more data consuming instances in the registry corresponding to the current KAFKA consuming side according to the instance registration rule by acquiring the preset instance registration rule. And a data consumption example registered by the terminal device creates a consumption thread corresponding to a partition of only one KAFKA theme on the current KAFKA consumption end to acquire the data to be consumed for consumption processing, or the data consumption example simultaneously creates a plurality of consumption threads corresponding to all the plurality of partitions of one KAFKA theme (or all the plurality of partitions of the plurality of KAFKA themes) on the current KAFKA consumption end respectively to acquire the data to be consumed for consumption processing, or the data consumption examples registered by the terminal device respectively and correspondingly create one or more consumption threads to acquire the plurality of partitions of the plurality of KAFKA themes on the current KAFKA consumption end to acquire the data to be consumed for consumption processing.

It should be noted that, in this embodiment, the data to be consumed provided by one partition of a certain KAFKA topic configured on the KAFKA consuming side cannot be consumed by two data consuming instances simultaneously or asynchronously, which results in duplication, and therefore, it is necessary to ensure that one partition of the certain KAFKA topic can only correspond to one data consuming instance.

Specifically, for example, in the present embodiment, a terminal device configured as a KAFKA consuming side controls, through a distributed lock, that a single partition of a single KAFKA theme configured on the current KAFKA consuming side can only be consumed by one application instance. And a single data consumption instance can simultaneously acquire distributed locks of multiple partitions for data consumption.

Further, in a possible embodiment, when a certain data consumption instance registered by the terminal device configured as the KAFKA consumption end is abnormal or should go offline, the terminal device releases the distributed lock corresponding to the one or more partitions to which the data consumption instance is robbed, so that other data consumption instances can preempt the data of the one or more partitions.

Step S20, load balancing operation is carried out according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance;

and step S30, keeping the load balance state for the data consumption instance to perform data consumption.

After the terminal device configured as the KAFKA consuming side completes the registration process of the data consuming instances, the terminal device performs load balancing operation according to the weights of all the currently online data consuming instances and the partition of a certain KAFKA theme on the KAFKA consuming side corresponding to each data consuming instance, thereby ensuring that all the data consuming instances respectively consume and process the partition data which is respectively butted under the load balancing state.

It should be noted that, in this embodiment, a registry is built in or externally connected to the terminal device configured as the KAFKA consumption end, and each data consumption instance registered by the terminal device according to the instance registration rule needs to register with the registry at the time of starting: the data consumption instance is started and can begin to consume the data of the kafka, so that the registry issues a notice to each other data consumption instance which is started to be on-line at the current moment for data consumption, so as to inform that each data consumption instance has a new instance on-line and can share the processing of the data of the kafka. At this time, the terminal device starts to perform load balancing operation on all the data consumption instances according to the respective weights of all the currently started online data consumption instances and the partition of a certain KAFKA theme on the KAFKA consumption side corresponding to each data consumption instance.

Further, in a possible embodiment, the step S20 may include:

step S201, detecting all the data consumption examples which are on-line currently, and calculating a total weight value according to the weight of each data consumption example;

it should be noted that, in this embodiment, the weight of the data consumption instance is used to identify the performance of the data consumption instance in performing data consumption processing on the partition, and a higher weight indicates that the performance of the data consumption instance in performing data consumption processing is higher, that is, more partitions can be subjected to data consumption processing.

The terminal device configured as the KAFKA consuming end detects all data consuming instances which are started to be on-line at the current moment so as to be capable of performing data consumption in the process of performing data consumption on the registered data consuming instances, and accordingly obtains the respective weight of each data consuming instance in all the data consuming instances to calculate a total weight value.

Further, in a possible embodiment, the terminal device configured as the KAFKA consuming side may set the weight of the data consuming instance to be the same when registering the data consuming instance, and the respective weights of the data consuming terminals may be different based on the difference in respective machine performances of the registered data consuming terminals. In this regard, the step of "calculating a total value of weights according to the respective weights of each of the data consumption instances" may include:

detecting whether the respective weights of each data consumption instance are the same;

if yes, the total weight value is calculated based on the fact that the number of all the data consumption instances is multiplied by the weight;

if not, the weights of all the data consumption examples are superposed to calculate the total weight value.

After the terminal device configured as the KAFKA consuming side obtains the respective weights of all the data consumption instances starting to go online for data consumption at the current time, the terminal device further detects whether the respective weights are the same in magnitude. Therefore, when the terminal device detects that the respective weights of all the data consumption instances are the same, the terminal device multiplies the same weight by the detected number of all the data consumption instances to calculate a total weight value. Or when the terminal device detects that any one of the weights of each data consumption instance is different from other weights in all the data consumption instances, the terminal device superposes the detected weights of all the data consumption instances one by one to calculate a total weight value.

Specifically, for example, assuming that the terminal device configured as the KAFKA consuming side sets the weight of each data consumption instance to be O (O ═ 1) when registering the data consumption instance, the terminal device may calculate the total weight value to be Q ═ O when the number of total registered data consumption instances (the terminal device registers one data consumption instance each time, that is, registers the relevant information of the data consumption instance in the registry) is a through the registry.

Or, assuming that the terminal device configured as the KAFKA consuming side detects that the weights O of the registered data consumption instances are different from each other, the terminal device obtains the weights O of the a data consumption instances, and calculates the total weight value as

Step S202, comparing the total weight value, the number of partitions of the KAFKA theme corresponding to the data consumption example and the thread number of the partitions of the KAFKA theme processed by the data consumption example to obtain a data comparison result;

step S203, carrying out load balancing operation according to the data comparison result.

It should be noted that, in this embodiment, the load balancing operation is to adjust the threads used by the data consumption instances to perform data consumption processing on the partitions, so that the partitions that all the data consumption instances are respectively butted against are matched with the performance of performing data consumption processing on the partitions.

After the terminal device configured as the KAFKA consumption end obtains the total weight value of all data consumption examples which are started to be on-line at the current moment and can perform data consumption by calculation, the number of partitions of each data consumption example corresponding to a KAFKA theme is divided by the total weight value one by one and is rounded, and then the rounded value and the number of threads which are currently used for performing data consumption processing on the partitions of the KAFKA theme by the data consumption examples are compared in size to obtain a comparison result. Therefore, when the comparison result shows that the rounded value is larger than the number of the threads, the terminal device pauses the threads which are currently used for carrying out data consumption processing on the partition by the data consumption instance one by one until the rounded value is equal to the number of the threads, or when the comparison result shows that the rounded value is smaller than the number of the threads, the terminal device controls the data consumption instance to continue to create a new thread to carry out data consumption processing on the partition of the KAFKA theme until the rounded value is equal to the number of the threads.

Specifically, for example, it is assumed that the number of partitions of a certain KAFKA topic configured by a terminal device configured as a KAFKA consuming side is B, and the number of threads created by a current data consuming instance for processing the partition data of the KAFKA topic is C. Thus, the terminal device, after calculating the total weight value Q, compares the value E of the integer B/Q with C, if E > C, the terminal device starts suspending the thread created by the data consumption instance that processes the partition data of the KAFKA theme and needs to release in the distributed lock after suspending one thread each time, and decreases the number of threads C recorded locally by 1, thus suspending until C ═ E, and if E < C, the terminal device instructs the data consumption instance to continue creating a new thread for continuing to consume the partition data of the KAFKA theme and increases the number of threads C recorded locally by 1, thus increasing until C ═ E.

In this embodiment, if the value E of the B/Q integer is compared with C to obtain E ═ C, no processing is needed, and the load balancing state is already in this case. In addition, when the terminal device instructs the data consumption instance to continue to create a new thread to continue to consume the partition data of the KAFKA theme, the data consumption instance needs to further use the created new thread to rob a lock in the distributed locks, the robbing of the lock continues to consume, and the new thread is destroyed if the lock is not robbed.

Further, in a possible embodiment, the rule that the terminal device configured as the KAFKA consumer triggers the load balancing operation is: and the weight is actively modified to the registry each time a new data consumption instance is online, or the data consumption instance which is online originally stops, or a certain data consumption instance has a performance bottleneck. It should be understood that the rules triggering the load balancing operation may be different from the other rules listed herein in different possible embodiments based on different design requirements of practical applications, and the present invention uses the data consumption method of KAFKA and is not limited to the specific content of the rules triggering the load balancing operation.

Furthermore, the terminal device configured as a KAFKA consuming side is in the process of performing a load balancing operation for the configured data consuming instance continuously. The step S203 may include:

step S2031, according to a preset first period, carrying out load balancing operation for a preset number of times according to the comparison result, and detecting whether a load balancing state is reached after each load balancing operation;

step S2032, if not, repeating the load balancing operation until the number of times of the load balancing operation reaches the preset number;

step S2033, if it is detected that the load balancing state is not reached after the preset number of load balancing operations are performed, performing the load balancing operations according to a preset second period, where the preset second period is equal to the preset first period multiplied by the preset number.

Specifically, for example, in the present embodiment, assuming that the preset first period is 2 minutes and the preset number of times is 5 times, the preset second period is 2 times 5 to be equal to 10 minutes. Then, in the process of performing load balancing operation on the configured data consumption instance, the terminal device configured as the KAFKA consumption end compares the size of the value E of the integer B/Q with the size of C for the first time to obtain E > C, so that after the terminal device suspends one of the threads created by the data consumption instance (reduces C by 1), the size of E is compared with that of C at an interval of 2 minutes, if the value is still E > C, one of the threads created by the data consumption instance continues to be suspended, and the process is cycled for 5 times.

If E > C is obtained after 5 cycles, the terminal equipment judges whether the E > C is balanced or not through load balancing which is performed at the timing of 10 minutes every time, and when the E > C is judged (caused by the special conditions that the state is lost or a new KAFKA theme is on-line and the like), the terminal equipment continues to perform size comparison on the E and the C every 2 minutes in the current 10-minute period so as to continue to pause one of the threads created by the data consumption instance, and the process is repeated for 5 times.

It should be noted that, in this embodiment, when the terminal device configured as the KAFKA consumer compares to E ═ C in any cycle, the terminal device stops continuing to cycle the load balancing operation.

The embodiment of the invention provides a data consumption method using KAFKA, wherein a terminal device configured with a KAFKA consumption end can only correspond to one data consumption instance according to a single partition of a single KAFKA theme, and the single data consumption instance can simultaneously correspond to instance registration rules of a plurality of partitions to register the data consumption instance on the current KAFKA consumption end; after the terminal device configured as the KAFKA consuming side completes the registration process of the data consuming instance, the terminal device performs load balancing operation according to the weights of all online data consuming instances and the partition of a certain KAFKA theme on the KAFKA consuming side corresponding to each data consuming instance, thereby ensuring that all the data consuming instances respectively perform consumption processing on the partition data which is respectively butted under the load balancing state.

Further, based on the above first embodiment, a second embodiment of the data consumption method of the present invention using KAFKA is proposed, and the main difference between the present embodiment and the above first embodiment is that the data consumption method of the present invention using KAFKA may further include:

step S40, modifying the weight of the data consumption instance according to the data consumption state of the data consumption instance, and executing the step of carrying out load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance.

It should be noted that, in the present embodiment, the data consumption states of the data consumption instances include, but are not limited to: CPU utilization, memory consumption, data accumulation, and thread consumption status.

The terminal equipment configured with the KAFKA consumption end can collect data consumption states locally and report the data consumption states to the registry in the process of keeping the load balancing state for data consumption processing of the data consumption examples, the registry instructs the data consumption examples to reset the weight to modify the weight after receiving the data consumption states, and the terminal equipment further performs the load balancing operation according to the weight of each data consumption example and the partition of a certain KAFKA theme on the KAFKA consumption end corresponding to each data consumption example, thereby ensuring that each data consumption example consumes and processes the partition data which is butted with each other in the load balancing state.

Specifically, for example, a resident thread is respectively started by all data consumption instances registered by the terminal device configured with the KAFKA consumption end, so that in the process of consumption processing of the data consumption instance for the butted partial data, the resident thread monitors the data consumption states of the data consumption instance, such as the CPU utilization rate, the memory consumption condition, the data accumulation condition, the consumption thread condition and the like, and reports the monitored data consumption state to a registration center built in or externally connected to the terminal device, after receiving the data consumption state, the registration center issues the data consumption state to other data consumption instances currently performing data consumption processing, so that the data consumption instances reset respective weights according to the machine load condition, and the data consumption instances send the reset weights to the registration center, and the registration center issues each data consumption instance and then uses the new group of weights to perform consumption processing on the partition data. And, during the process of using the new set of weights to perform consumption processing on the partition data by each data consumption instance, the terminal device further performs the operation of step S20 to ensure that the data consumption instances perform data consumption processing in a load balancing state.

It should be noted that, in this embodiment, when the terminal device configured with the KAFKA consuming end instructs, through the built-in or external registry, each data consuming instance to perform weight setting again to modify the weight, if the data consuming instance monitors that the CPU usage rate is below 50% and continues for 1 day, the memory consumption situation always indicates that the memory below 50% is occupied for 1 day, and the consuming thread state indicates that the data of each thread is accumulated below 0 and continues for 1 day, the data consuming instance may adjust the weight by doubling, that is, O ═ O × 2 to perform weight setting again; and if the data consumption instance monitors that the CPU utilization rate is more than 90% and lasts for 5min, or the memory consumption situation memory is more than 90% occupied for 30min, or the consumption thread state indicates that the thread accumulates data for more than 5min, and the data accumulation situation indicates that the data accumulation amount is more than 10000, the data consumption instance can reduce the weight by one time, namely O is 1/2 to reset the weight.

In this embodiment, during the process of maintaining the load balancing state for the data consumption instances to perform data consumption processing, the terminal devices configured with the KAFKA consumption end may report the data consumption states to the registry locally, after receiving the data consumption states, the registry instructs the data consumption instances to perform weight setting again to modify the weights, after that, the terminal devices further perform load balancing operation according to the weights of the data consumption instances and the partition of a certain KAFKA topic on the KAFKA consumption end corresponding to the data consumption instances, thereby ensuring that the data consumption instances each perform consumption processing on the partition data of their own docking under the load balancing state.

Thus, the data consumption method using KAFKA of the invention realizes the automatic weight adjustment according to the data consumption state of the data consumption instance to ensure the load balance for data consumption processing, and ensures the high availability of data consumption using KAFKA.

Further, based on the first embodiment and the second embodiment, a third embodiment of the data consumption method using KAFKA of the present invention is proposed, and the main difference between the present embodiment and the first embodiment and the second embodiment is that, after the load balancing operation is performed according to the weight of the data consumption instance and the partition of the KAFKA topic corresponding to the data consumption instance at the step S20, the data consumption method using KAFKA of the present invention may further include:

step S50, detecting whether all the data consumption instances reach a load balancing state;

step S60, if not, performing corresponding capacity expansion operation on the partition of the KAFKA theme and/or the data consumption instance so as to enable all the data consumption instances after capacity expansion to reach the load balance state;

it should be noted that, in this embodiment, if there is still a performance bottleneck in the data consumption instance after multiple load balancing operations, capacity expansion needs to be evaluated. In addition, if it is found that there is an abnormality in the data consumption instance, the terminal device needs to immediately trigger an alarm (such as an IMS (IP Multimedia system) alarm) to notify the operation and development evaluation whether to perform capacity expansion.

The terminal device configured with the KAFKA consumption end detects whether all the data consumption instances reach a load balancing state after each load balancing operation is carried out according to the weight of all the online data consumption instances and the partition of a certain KAFKA theme on the KAFKA consumption end corresponding to each data consumption instance, so that the terminal device starts to carry out corresponding capacity expansion operation on the partition of the KAFKA theme and/or the data consumption instances for carrying out data consumption processing after detecting that all the data consumption instances still do not reach the load balancing state after carrying out the load balancing operation for multiple times, and all the data consumption instances after the capacity expansion operation can reach the load balancing state for carrying out the data consumption processing.

It should be noted that, in this embodiment, when the terminal device configured with the KAFKA consuming side performs fast capacity expansion on the partition of the KAFKA theme, the partition of the KAFKA theme may be expanded first, so that after the partition is increased, the consumption threads corresponding to the KAFKA theme may also be increased, and thus, the data consumption processing capability of the whole terminal device may be improved by increasing the corresponding number of data consumption instances. Or, one or more partitions of the KAFKA theme can be split, so that after the dockable threads are added, a corresponding number of data consumption instances are added at the same time to take over the added threads for data consumption processing.

In addition, the weight of some data consumption instances with higher machine performance can be manually increased, so that the data consumption instances can take over more threads and the pressure of the data consumption instances with performance bottlenecks is reduced.

Furthermore, for KAFKA used by a plurality of service products, it is also possible to split a large-volume service product, that is, split the service product from a plurality of KAFKA themes originally configured to a single KAFKA theme to improve the overall performance.

In this embodiment, after each load balancing operation is performed by a terminal device configured with a KAFKA consuming side according to the weight of all data consuming instances on the current line and the partition of a KAFKA topic on the KAFKA consuming side corresponding to each data consuming instance, whether all the data consuming instances reach the load balancing state is detected, so that after the terminal device detects that all the data consuming instances do not reach the load balancing state after performing the load balancing operation for multiple times, the terminal device starts to perform corresponding capacity expansion operation on the partition of the KAFKA topic and/or the data consuming instance for performing the data consuming process, so that all the data consuming instances after the capacity expansion operation can reach the load balancing state for performing the data consuming process, and the data consuming efficiency is further improved.

Further, a fourth embodiment of the data consumption method using KAFKA of the present invention is proposed based on the above-described first, second, and third embodiments, and the main difference between this embodiment and the above-described first, second, and third embodiments is that the data consumption method using KAFKA of the present invention may further include:

step S70, recording consumed data identifiers of the data consumption instances for performing data consumption in the load balancing state, and modifying the consumed data identifiers according to actual data consumption requirements, so that the data consumption instances continue to perform data consumption in the load balancing state according to the modified consumed data identifiers.

When the terminal equipment configured with the KAFKA consumption end is used for performing data consumption on each data consumption instance on the current line under the condition of keeping load balance, after each data consumption processing of each data consumption instance is successful, a consumed data identifier is recorded in a registration center built in or externally connected with the terminal equipment, and then the terminal equipment can modify the recorded consumed data identifier according to actual data consumption requirements so that the data consumption instance can continue to perform data consumption in the load balance state according to the modified consumed data identifier when continuing to perform data consumption processing next time.

It should be noted that, in the present embodiment, the actual data consumption requirements include, but are not limited to: lost partial data, duplicate processed partial data, transaction rollback, skipped partial data not processed, and data backtracking (also known as replay).

Specifically, for example, after the data consumption instance acquires the distributed lock corresponding to a partition, the consumption thread starts to perform data consumption processing for the partition when all the currently online data consumption instances reach a load balancing state: reading 1M data (when the partition data at the current moment is less than 1M, reading all the current partition data), and processing, distributing, storing and the like the read data. After each successful data consumption process, the data consumption instance identifies the consumed data of kafka that the current consumption data reaches: the offset is recorded in the registry in the format: "topic, partition, offset" indicates that the partition of a certain KAFKA topic is currently successfully processed to the offset, and when the data consumption instance fails to process the data, the corresponding offset will not be recorded in the registry, so that when the failure is recovered, the data consumption instance can continue to perform the data consumption processing from the offset, thereby ensuring that the data is not lost.

If the data consumption instance has partial failure in the processed partition data and partial success, the terminal device can obtain the following actual data consumption requirements: missing part of the data to select: registering the offset to the registry so that the missing failed portion of the data from the data consumption instance is no longer processed, or, depending on the actual data consumption requirements: repeatedly processing the partial data to select: not registering the offset to the registry to enable the data consumption instance to repeatedly process the successfully processed partial data, and according to the actual data consumption requirement: the transaction rolls back to select: transaction rollback is performed, so that the data consumption instance performs error rollback on the part of the data which is successfully processed, and consistency of the data is ensured.

When the data consumption instance fails to process the partition data or partially fails, the terminal device, according to the actual data consumption requirement: skip part of the data not processed (due to some data compatibility issues, need to skip, do not process) to modify the offset already recorded in the registry, i.e. find the corresponding record in the registry: "topoic, partition, offset", and replaces "offset" with "offset + N" to update the record at the registry: "topic, partition, offset" becomes "topic, partition, offset + N", so that the data of the partition of the current KAFKA topic is skipped over these N pieces of data and not processed.

When the data consumption instance fails to process the partition data or partially fails, the terminal device, according to the actual data consumption requirement: data backtracking, for some reprocessed data, the offset of the record corresponding to the data is modified at the registry, that is, the record corresponding to the data is found at the registry: "topic, partition, offset", and replace "offset" with "offset-N" to update the record at the registry: "topic, partition, offset" becomes "topic, partition, offset-N" such that the data of the partition of the KAFKA topic skips the N pieces of data for the re-consumption process (the maximum length of N traced back is the amount of data of the partition of the KAFKA topic cached above the current KAFKA consumption client).

Further, in a possible embodiment, the terminal device configured with the KAFKA consuming side may obtain the offset value currently consumed by any recorded data consumption instance from the registry, and calculate the difference between the two offset values based on the offset value processed by the data consumption instance most recently recorded by the terminal device, so as to determine the data accumulation condition of the partition of the KAFKA topic currently processed by the data consumption instance.

In addition, the data accumulation condition of the partition of the KAFKA topic currently processed by the data consumption instance can also be used for configuring different production alarm levels for informing the operation and maintenance of timely intervention processing. For example, when the accumulation amount is less than or equal to 5000 in the data accumulation condition, the data processing speed of the corresponding IMS alarm notification data consumption instance is relatively slow, when the accumulation amount is less than or equal to 10000 in the data accumulation condition, the weight of the corresponding IMS alarm notification data consumption instance needs to be adjusted, and when the accumulation amount is less than or equal to 20000 in the data accumulation condition, the corresponding IMS alarm notification data consumption instance is output, and the abnormal operation and maintenance needs to be performed in time.

In this embodiment, when the terminal device configured with the KAFKA consuming end is used for data consumption of each currently online data consumption instance in a load balancing state, after each data consumption processing of each data consumption instance is successful, a consumed data identifier is recorded in a registry built in or externally connected to the terminal device, and then the terminal device may modify the recorded consumed data identifier according to an actual data consumption requirement, so that when the data consumption instance continues to perform data consumption processing next time, data consumption continues in the load balancing state according to the modified consumed data identifier. The method and the device realize the functions of skipping and backtracking consumption of data by flexibly controlling the consumed offset, and further improve the high availability of using KAFKA for data consumption.

Further, the present invention also provides a data consumption apparatus using KAFKA, the data consumption system using KAFKA being applied to a KAFKA consumption side. Referring to fig. 3, fig. 3 is a functional block diagram of an embodiment of a data consuming apparatus using KAFKA according to the present invention. As shown in fig. 3, the data consumption device using KAFKA of the present invention includes:

an instance registration module 10, configured to register a data consumption instance according to an instance registration rule, where the instance registration rule includes: a single partition of a single KAFKA topic corresponds to one consumption thread of one data consumption instance;

a load balancing module 20, configured to perform load balancing operations according to the weight of the data consumption instance and the partition of the KAFKA topic corresponding to the data consumption instance;

and the data consumption module 30 is configured to maintain a load balancing state for the data consumption instance to perform data consumption.

Further, the load balancing module 20 includes:

the detection unit is used for detecting all the data consumption examples which are on line currently and calculating a total weight value according to the weight of each data consumption example;

the comparison unit is used for comparing the total weight value, the number of partitions of the KAFKA theme corresponding to the data consumption example and the thread number of the partitions of the KAFKA theme processed by the data consumption example to obtain a data comparison result;

and the load balancing unit is used for carrying out load balancing operation according to the data comparison result.

Further, the load balancing unit includes:

the first balancing subunit is used for carrying out load balancing operation for preset times according to a preset first period and the comparison result, and detecting whether a load balancing state is reached after each load balancing operation;

a second balancing subunit, configured to repeat the load balancing operation until the number of times of performing the load balancing operation reaches the preset number of times;

and the third balancing subunit is configured to, if it is detected that the load balancing state is not reached after the preset number of load balancing operations are performed, perform the load balancing operation according to a preset second period, where the preset second period is equal to a product of the preset first period and the preset number.

Further, the detecting unit is further configured to detect whether the respective weights of each of the data consumption instances are the same; and if yes, multiplying the total weight value by the weight based on the number of all the data consumption instances; if not, the weights of all the data consumption examples are superposed to calculate the total weight value.

Further, the present invention uses the load balancing module 20 of the data consuming apparatus of KAFKA to modify the weight of the data consuming instance according to the data consuming state of the data consuming instance, and execute the step of performing the load balancing operation according to the weight of the data consuming instance and the partition of the KAFKA topic corresponding to the data consuming instance.

Further, the present invention uses the data consuming apparatus of KAFKA, further comprising:

the capacity expansion module is used for detecting whether all the data consumption examples reach a load balancing state; and if the current state is not reached, performing corresponding capacity expansion operation on the partition of the KAFKA theme and/or the data consumption instance so as to enable all the data consumption instances subjected to capacity expansion to reach the load balance state.

and the recording module is used for recording the consumed data identifier of the data consumption instance for performing data consumption in the load balancing state, and modifying the consumed data identifier according to the actual data consumption requirement so that the data consumption instance continues to perform data consumption in the load balancing state according to the modified consumed data identifier.

The function implementation of each module in the data consuming apparatus using KAFKA corresponds to each step in the data consuming method using KAFKA, and the function and implementation process are not described in detail herein.

The present invention also provides a computer storage medium having stored thereon a data consuming program using KAFKA, which when executed by a processor implements the steps of the data consuming method using KAFKA as described in any one of the embodiments above.

The specific embodiment of the computer storage medium of the present invention is substantially the same as the embodiments of the data consuming method using KAFKA described above, and will not be described herein again.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements the steps of the data consumption method using KAFKA as described in any one of the embodiments above.

The embodiment of the computer program product of the present invention is substantially the same as the embodiments of the data consuming method using KAFKA described above, and will not be described herein again.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A data consumption method using KAFKA, wherein the data consumption method using KAFKA is applied to a KAFKA consumption side, the data consumption method using KAFKA comprising the steps of:

2. The method of claim 1, wherein the step of performing a load balancing operation based on the weight of the data consumption instance and the partition of the KAFKA topic to which the data consumption instance corresponds comprises:

detecting all the data consumption examples which are on line currently, and calculating a total weight value according to the weight of each data consumption example;

comparing the total weight value with the partition number of the KAFKA theme corresponding to the data consumption example and the thread number of the partition of the KAFKA theme processed by the data consumption example to obtain a data comparison result;

and carrying out load balancing operation according to the data comparison result.

3. A method of data consumption using KAFKA of claim 2, wherein the step of performing a load balancing operation based on the data comparison comprises:

according to a preset first period, carrying out load balancing operation for a preset number of times according to the comparison result, and detecting whether a load balancing state is reached after each load balancing operation;

if not, repeating the load balancing operation until the number of times of the load balancing operation reaches the preset number;

and if the load balancing state is not detected after the preset times of load balancing operation, performing the load balancing operation according to a preset second period, wherein the preset second period is equal to the preset first period multiplied by the preset times.

4. A method of data consumption using KAFKA as claimed in claim 2, wherein said step of calculating a total value of weights based on the respective weight of each of said instances of data consumption comprises:

5. A method of data consumption using KAFKA of claim 1, further comprising:

and modifying the weight of the data consumption instance according to the data consumption state of the data consumption instance, and executing the step of carrying out load balancing operation according to the weight of the data consumption instance and the partition of the KAFKA theme corresponding to the data consumption instance.

6. The method of data consumption using KAFKA of any one of claims 1 to 5, further comprising, after the step of performing a load balancing operation based on the weight of the data consumption instance and the partition of the KAFKA topic to which the data consumption instance corresponds:

detecting whether all the data consumption instances reach a load balancing state;

and if not, carrying out corresponding capacity expansion operation on the partition of the KAFKA theme and/or the data consumption instance so as to enable all the data consumption instances subjected to capacity expansion to reach the load balancing state.

7. A method of data consumption using KAFKA as defined in claim 6, wherein the method further comprises:

and recording the consumed data identifier of the data consumption instance for performing data consumption in the load balancing state, and modifying the consumed data identifier according to the actual data consumption requirement so that the data consumption instance continues to perform data consumption in the load balancing state according to the modified consumed data identifier.

8. A data consuming apparatus using KAFKA, wherein the data consuming apparatus using KAFKA is applied to a KAFKA consuming side, the data consuming apparatus using KAFKA comprising:

9. A terminal device, characterized in that the terminal device comprises: memory, processor and a data consuming program using KAFKA stored on the memory and executable on the processor, the data consuming program using KAFKA when executed by the processor implementing the steps of the data consuming method using KAFKA as claimed in any one of claims 1 to 7.

10. A computer storage medium having stored thereon a data consuming program using KAFKA, the data consuming program using KAFKA implementing the steps of the data consuming method using KAFKA as claimed in any one of claims 1 to 7 when executed by a processor.