Detailed Description
In practical applications, when a distributed cluster subscribes to one or more message topics in a message system and the subscribed message topics include a plurality of message queues, in a pull mode, each node device is usually configured to independently allocate a message queue to each node device based on a uniform allocation policy.
In the existing message distribution mechanism, in order to avoid repetition of message queues autonomously distributed by each node device, each node device may load a unified list of node devices to be distributed and a message topic list subscribed by a cluster in a memory, and then sequentially distribute the message queues under the message topics to be distributed to each node device according to the sequence of the message topics to be distributed in the message topic list and the sequence of the node devices to be distributed in the list of node devices to be distributed.
For example, assume that the message system contains 4 message topics topic 1-4; topic1 has 3 queues tp1.q1, tp1.q2 and tp1.q 3; topic2 has 2 queues tp2.q1 and tp2.q 2; topic3 has 1 queue tp3.q 1; and topic4 has 2 queues tp4.q1 and tp4.q2, for a total of 8 queues. In the distributed cluster which is subscribed to the 4 topics at the same time, a total of 4 node devices client1-4 which are used as consumers are included.
Based on the existing message queue allocation mechanism, the final allocation result can be as shown in table 1 below:
TABLE 1
As shown in table 1, since the existing allocation mechanism is completely and mechanically allocated according to the sequence in the list, tp1.q1, tp2.q1, tp3.q1 and tp4.q1 which are arranged at the top in the message subject list to be allocated are allocated to the client1 which is arranged at the top in the node device list to be allocated; and allocating the second-ranked tp1.q2, tp2.q2 and tp4.q2 in the message subject list to be allocated to the second-ranked client2 in the node device list to be allocated, and so on.
As can be seen from Table 1, eventually client1 is assigned 4 message queues, client2 is assigned 3 message queues, client3 is assigned 2 message queues, and client4 is not assigned a message queue.
Therefore, based on the existing message queue distribution mode, the node equipment arranged in front of the node equipment list to be distributed can be preferentially distributed to the message queue; node devices arranged behind the node device list to be allocated may not be allocated to the message queue; in addition, in this way, the number of message queues ultimately allocated to each node device in the same cluster also has a problem of imbalance, and thus, the message processing resources in the cluster cannot be utilized to the maximum extent.
In view of this, the present application provides a load balancing method for message consumption of each node device in a distributed cluster in a pull mode. When each node device in the distributed cluster distributes a message queue under a message topic to be distributed in a message topic list subscribed by the cluster to each node device in the cluster, determining an initial distribution position corresponding to each message topic to be distributed in the node device list to be distributed through a hash value corresponding to each message topic to be distributed, and averagely distributing the node devices for each node device from the node device corresponding to the initial distribution position according to the fact that the total number of the message queues subscribed by the cluster corresponds to the average number of each node device; meanwhile, in the distribution process, the data volume of the message queue distributed to each node device can be combined and matched based on a preset load balancing strategy, and after the distribution of the message queues under all subscribed message topics is completed, the message data is obtained from the target message queue distributed to the device based on the pull mode to perform message processing, so that the message queue distributed to each node device can be ensured to the greatest extent, the quantity and the corresponding data volume can be balanced, and when each node device obtains the data from the message queue distributed to the device based on the pull mode to perform message processing, the message processing load of each node device in the cluster can be approximately balanced, and the overall processing efficiency of the distributed cluster during message processing can be optimized.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 1, fig. 1 is a load balancing method based on pull mode data consumption according to an embodiment of the present application, applied to any node device in a distributed cluster interfacing with a messaging system, where the distributed cluster subscribes to message data of at least one message topic in the messaging system, and the message topic includes a plurality of message queues; the node equipment acquires message data corresponding to the message theme from the message system based on a pull mode to perform message processing; the method performs the steps of:
step 101, calculating a hash value corresponding to each message topic to be distributed subscribed by the cluster, and determining the mapping position of the hash value in a node device list to be distributed as an initial distribution position corresponding to each message topic to be distributed;
the message system can be a distributed data center built based on a server or a server cluster. The distributed cluster refers to a distributed device cluster which is connected with a message system and is composed of a plurality of node devices.
In practical applications, the distributed cluster may subscribe to one or more message topics in the message system. Each node device in the distributed cluster can autonomously complete the distribution of the message queue based on the completely same list of node devices to be distributed loaded in the respective memory and the message topic list subscribed by the cluster.
The list of node devices to be allocated may include node devices in the cluster that can currently perform message processing on message data under a message topic subscribed by the cluster. The message topic list may include message topics subscribed by the cluster and message queues under the message topics.
In this example, after starting the distribution of the message queues under the message topics in the message topic list, each node device may sequentially select each message topic as the message topic to be distributed according to the arrangement order of each message topic in the message topic list, and then sequentially distribute the message queues under each message topic to the node device according to the arrangement order of each message topic in the message topic list.
Based on the existing message queue allocation mechanism, when allocating a message queue under a message topic to be allocated to each node device, the message queue is allocated to each node device in sequence according to the sequence in the node device list to be allocated, usually starting from the first node device in the node device list to be allocated.
According to the distribution mode, although repeated distribution of the message queues can be avoided, because the number of the message queues under each message topic may be different, the message queues are distributed to the node devices in strict sequence, which may cause the number of the message queues distributed to the node devices to be inconsistent, and the node device arranged at the back position in the node device list to be distributed may not be distributed to the message queues.
In this example, in order to avoid the above problem, when each node device allocates the message queue under the message subject to be allocated to each node device, the node device arranged at the first position in the node device list to be allocated may not start to be allocated according to the arrangement order of each node device in the node device list to be allocated.
Specifically, after each node device selects a message topic to be distributed, a hash value (hash value) of the message topic to be distributed may be calculated first.
When calculating the hash value of the message subject to be distributed, the information capable of distinguishing each message subject can be adopted for calculation, so that the calculated hash values of the message subjects can be ensured to be different; for example, a hash value may be calculated based on the name of each message topic.
After calculating the hash value of the message subject to be distributed, mapping the calculated hash value to the node device list to be distributed, and then determining the mapping position of the hash value in the node device list to be distributed as the distribution starting position of the node device to be distributed
The mapping method for mapping the calculated hash value to the list of the node devices to be distributed is not particularly limited in the present application;
in an illustrated embodiment, when mapping the calculated hash value to the to-be-allocated node device list, a remainder (mod) operation may be performed on the calculated hash value and the total number of node devices in the node device list, then a position corresponding to a result of the remainder operation in the to-be-allocated node device list is searched, and the found position is determined as the mapping position as the allocation starting position;
for example, assuming that there are 10 node apparatuses as consumers in the node apparatus list, the calculated hash value is 1234567, and mod operations are performed on 1234567 and 10 to result in 7, so that a message queue may be allocated to each node apparatus from the 7 th node apparatus in the node apparatus list with 7 as an allocation start position.
In this way, since the hash values calculated by different message topics are different from each other, when each node device selects different message topics as node devices to be allocated, it can be ensured that initial allocation positions in the node device list are different from each other, so that node devices later in the node device list can also be allocated to the message queue at an opportunity.
And 102, based on the total number of the message queues subscribed by the cluster, corresponding to the average number of each node device to be distributed in the node device list to be distributed, starting from the node device corresponding to the distribution starting position, averagely distributing the message queues for each node device in the node device list to be distributed, and combining and collocating the data volume of the message queues distributed to each node device based on a preset load balancing strategy.
Through calculating the hash value of each message topic to be distributed, the distribution starting position corresponding to each message topic to be distributed is determined in the node device list to be distributed, although it can be guaranteed to some extent that the node devices at the back in the node device list can also be distributed to the message queue at a chance, the node devices are not distributed according to the total number of the message queues subscribed by the cluster at present, and the number distributed to each node device is random finally.
In this example, each node device may locally store a list of each message queue under each message topic subscribed by the cluster, so that each node device as a consumer has a global list formed by all message queues subscribed by the cluster;
for example, taking the example shown in table 1 as an example, each node device may locally maintain a global message queue list composed of tp1.q1, tp1.q2, tp1.q3, tp2.q1, tp2.q2, tp3.q1, tp4.q1, and tp4.q 2.
In this case, after each node device determines a starting allocation position corresponding to each node device to be allocated in the node devices to be allocated finally by calculating a hash value of a message topic to be allocated, the total number of message queues subscribed by the cluster may be counted based on the global list, the total number of message queues subscribed by the cluster is calculated, the total number corresponds to an average number of each node device in the node device list to be allocated, and then, according to the calculated average number, node devices are evenly allocated for each node device in each node device list to be allocated, starting from the node device corresponding to the determined starting allocation position.
For example, still taking the example shown in table 1 as an example, a cluster subscribes to a total of 8 message queues, which are evenly distributed to 4 consumers, and then each consumer can be distributed to 2 message queues.
In this way, when no message queue is allocated to each node device, the message queues are not strictly allocated according to the arrangement sequence of each node device in the node device list, and the total number of the message queues subscribed by the cluster is fully considered in the allocation process, so that the message queues are guaranteed to be evenly allocated to each node device to the greatest extent.
In addition, in practical application, because the data volumes of the message queues to be allocated under the message subjects to be allocated may be different, in order to ensure that the number of the message queues allocated to each node device and the total data volume of the allocated message queues can be balanced to the greatest extent, each node device may further combine and match the data volumes of the message queues allocated to each node device based on a preset load balancing policy in the process of evenly allocating the message queues to each node device based on the calculated average number.
In an illustrated embodiment, when the node devices evenly distribute the message queues to the node devices according to the calculated average number, the node devices may further count the data size of each message queue under the message topic to be distributed, sort the message queues according to the data size, and generate a sequence based on the sorted order.
When distributing the message queues for each node device, the message queues can be respectively selected for each node device based on the head end and the tail end of the sequence generated after sequencing, and the message queues distributed to each node device are combined and matched;
for example, assume that a cluster focuses on two message topics, topic1 and topic2, there are two message queues, tp1.q1 and tp1.q2, under topic1, and two message queues, tp2.q1 and tp2.q2, under topic 2; the node devices in the cluster as consumers are 2, client1 and client 2. Assuming that the data of tp1.q1 is the largest and the data amount of tp2.q2 is the smallest, the sequence generated by sorting the message queues according to the data amount is tp1.q1> tp1.q2> tp2.q1> tp2.q2, and since the average number of all the message queues distributed to the node devices is 2, when distributing the message queues to the client1, one message queue can be selected at the head end and the tail end of the sequence, tp1.q1 and tp2.q2 can be distributed to the client1, and tp1.q2 and tp2.q1 can be distributed to the client 2.
By the method, the quantity of the message queues distributed to each node device and the total data quantity of the message queues distributed to each node device can be ensured to be balanced to the greatest extent, so that the load of each node device can be balanced.
In addition, by averagely distributing the message queues for the node devices from the determined distribution starting position and combining and collocating the data quantity of the message queues distributed to the node devices based on a preset load balancing strategy, although the quantity of the message queues distributed to the node devices and the corresponding data total quantity tend to be balanced to some extent, due to the fact that the types of the message data in the message queues distributed to the node devices are different, and when the node devices process the message data in the message queues, the corresponding processing overhead values are completely different; therefore, when each node device acquires data from the message queue allocated to the node device based on the pull mode to perform message processing, the load of each node device may still be unbalanced.
In this case, each node device equally allocates the message queues to each node device from the determined allocation start position, and combines and collocates the data volumes of the message queues allocated to each node device based on a preset load balancing policy in the allocation process, and after the message queues are allocated to each node device, each node device may further adjust the allocation results based on the load weight values of each message queue.
In an embodiment shown, each node device may further calculate a corresponding load weight value for each message queue based on a processing overhead value of message data in each message queue and a data size of each message queue.
The processing overhead value may be a system overhead parameter when each node device processes or calculates message data in each message queue; for example, the overhead value may specifically be a time duration that the node device needs to process or calculate message data in each message queue, or other types of overhead parameters.
The load weight value may be a weight value that is calculated according to a certain weighting algorithm and in combination with a processing overhead value and a total data amount of message data in each message queue, and that can represent a load size of each message queue. However, the weighting algorithm is not particularly limited in this example, and those skilled in the art can flexibly select the weighting algorithm with reference to the description in the related art when implementing the technical solution described in the present application.
When each node device calculates a corresponding load weight value for each message queue, the data amount of the message queue allocated to each node device may be adjusted based on the calculated load weight value, so as to balance the load of the message queue allocated to each node device.
When the data volume of the message queues distributed to each node device is adjusted, the load weight values of the message queues distributed to each node device can be added, the addition results of each node device are compared, and then the message queues distributed to each node device are combined and adjusted again based on the comparison results, so that the sum of the adjusted load weight values of the message queues distributed to each node device can be in a basically balanced state;
for example, assume that a cluster focuses on two message topics, topic1 and topic2, there are two message queues, tp1.q1 and tp1.q2, under topic1, and two message queues, tp2.q1 and tp2.q2, under topic 2; the node devices in the cluster as consumers are 2, client1 and client 2. The message queues ultimately assigned to client1 are tp1.q1 and tp2. q2; the message queues assigned to client1 are tp1.q2 and tp2.q 1.
Assuming that the finally calculated load weight values of tp1.q1 and tp2.q2 are both 3, and the load weight values of tp1.q2 and tp2.q1 are both 2, according to the above allocation result, the load of client1 is 6, and the load of client1 is 4. It can be seen that, according to the above allocation results, although the number of message queues allocated to the client1 and the client2 and the data amount are approximately balanced, the actual loads of the client1 and the client2 are completely different.
Therefore, in this case, the above allocation results may be adjusted in combination again based on the actual load weight values allocated to the client1 and the client2, the message queue allocated to the client1 is adjusted to tp1.q1 and tp2.q1, the message queue allocated to the client2 is adjusted to tp1.q2 and tp1.q1, and the loads of the adjusted client1 and the client2 are both 5, so that a load balancing state is achieved.
Of course, in practical applications, besides that each node device may further calculate a corresponding load weight value for each message queue based on the processing overhead value of the message data in each message queue and the data size of each message queue, the load weight value of each message queue may also be manually configured by an administrator.
In this case, each node device may obtain a load weight value manually configured by the administrator for each message queue, and then adjust the data amount of the message queue allocated to each node device based on the load weight value manually configured by the administrator for each message queue, so as to balance the load of the message queue allocated to each node device, and a specific implementation process is not described again.
Therefore, through the method, each node device averagely distributes the message queues to each node device from the determined distribution starting position, the data quantity of the message queues distributed to each node device is combined and matched based on the preset load balancing strategy in the distribution process, and after the message queues are distributed to each node device, each node device can further adjust the distribution result based on the load weight value of each message queue.
And 103, when all the message queues subscribed by the cluster are completely distributed, searching a target message queue distributed to the device, and acquiring message data from the target message queue based on a pull mode to perform message processing.
In this example, after each node device has completely distributed each message queue subscribed by the cluster in the manner described above, each node device may search for a target message queue distributed to the node device, then "pull" the cancellation message data from the target message queue of the message system based on the pull mode, and then perform message processing locally; the specific data acquisition method in the pull mode is not described in detail in the present application, and those skilled in the art can refer to the description in the related art when implementing the technical solution of the present application.
When distributing the message queues for each node device, the balance of the quantity, data volume and load of the distributed message queues is comprehensively considered, so when each node device locally processes the message data in the message queue distributed to itself, the load of each node device in the cluster is approximately equivalent, and the node devices are in a load balanced state.
Corresponding to the method embodiment, the application also provides an embodiment of the device.
Referring to fig. 2, the present application provides a load balancing apparatus 20 for data consumption, which is applied to any node device in a distributed cluster interfacing with a message system; referring to fig. 3, the hardware architecture related to the node device carrying the load balancing apparatus 20 based on pull mode data consumption generally includes a CPU, a memory, a nonvolatile memory, a network interface, an internal bus, and the like; taking a software implementation as an example, the load balancing apparatus 20 based on pull mode data consumption may be generally understood as a computer program loaded in a memory, and a logic apparatus formed by combining software and hardware after being executed by a CPU, where the apparatus 20 includes:
the calculation module 201 is configured to calculate a hash value corresponding to each message topic to be distributed subscribed by the cluster, and determine a mapping position of the hash value in the node device list to be distributed as an initial distribution position corresponding to each message topic to be distributed;
the distribution module 202, which is configured to, based on the total number of the message queues subscribed by the cluster, correspond to the average number of the node devices to be distributed in the node device list to be distributed, start from the node device corresponding to the distribution start position, evenly distribute the message queues for the node devices in the node device list to be distributed, and combine and match the data amount of the message queues distributed to the node devices based on a preset load balancing policy;
the obtaining module 203 searches a target message queue allocated to the device when the distribution of each message queue subscribed by the cluster is completed, and obtains message data from the target message queue based on a pull mode to perform message processing.
In this example, the calculation module 201:
performing a remainder operation on the hash value and the total number of the node devices in the node device list;
searching a position corresponding to the residue taking operation result in the node equipment list;
and determining the searched position as the distribution starting position corresponding to each message subject to be distributed.
In this example, the assignment module 202:
counting the data size of each message queue of the message subject to be distributed;
sequencing the message queues of the message subjects to be distributed according to the data size, and generating a sequence based on the sequenced sequence;
when distributing the message queues for each node device, the message queues are respectively selected for each node device from the head end and the tail end of the sequence so as to combine and match the data quantity of the message queues distributed to each node device.
In this example, the assignment module 202 further:
respectively calculating corresponding load weight values for each message queue based on the processing overhead value of the message data in each message queue and the data volume of each message queue;
and adjusting the number of the message queues distributed to each node device based on the calculated load weight value so as to balance the load corresponding to the message queues distributed to each node device.
In this example, the assignment module 202 further:
acquiring a load weight value pre-configured for each message queue;
and adjusting the number of the message queues distributed to each node device based on the load weight values preconfigured for each message queue, so as to balance the load corresponding to the message queues distributed to each node device.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.