CN114595052A

CN114595052A - Distributed communication load balancing method based on graph partitioning algorithm

Info

Publication number: CN114595052A
Application number: CN202110638133.1A
Authority: CN
Inventors: 阮利; 杨洋; 詹子豪
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2022-06-07

Abstract

A distributed communication load balancing method based on a graph partitioning algorithm balances communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, so that the communication load between the 2 nodes is reduced, the communication traffic in the system can be effectively reduced, and the expandability is better.

Description

Distributed communication load balancing method based on graph partitioning algorithm

The technical field is as follows:

the invention discloses a distributed communication load balancing method based on a graph partitioning algorithm, relates to challenges faced by wide-area high-performance calculation, and belongs to the technical field of computers.

Background art:

in a runtime system with distributed memory, computing tasks running on different nodes send messages to each other through a network for communication. In this section, the communication load per task, i.e. the number of messages per task and other nodes, is discussed herein. If two computing tasks communicating with each other are distributed on the same node, the communication between them can be performed through the shared memory, thereby hiding the delay of the communication. The purpose of communication load balancing is to place tasks with larger mutual communication quantity on the same node as much as possible, so that the communication quantity between the nodes is reduced. Generally, the computation load and the communication mode of the computation task usually have continuity in time, that is, the computation task has the similar characteristic of the load information in the future period of time and the load information in the past period of time, so the load information in the previous period of time is usually used as a basis for providing guidance information for the future load balancing algorithm.

In the load balancing algorithm, since a general algorithm does not consider the communication situation between tasks when balancing the computation load, the communication load between some nodes may be high after balancing. A large amount of communication overhead may affect the performance of the program.

One approach is to consider the balancing of computational and communication loads as a graph partitioning problem, i.e., dividing n interconnected nodes into k segments, so that the weight of the edges between the segments is minimized, and so that the sum of the node weights of the k segments is in a balanced state. In runtime systems, the approach usually taken is a centralized graph partitioning algorithm, such as:

jeannet presents problems in high performance computers as load imbalances in applications and poor management of data locations. Therefore, as the number of cores increases and the number of memories per core decreases drastically, special attention needs to be paid to load balancing and to consider the locality of data as much as possible. Authors use LibTopoMap to analyze the topology between multiple nodes and propose a topology-based load balancing method. According to the method, the load among all nodes is balanced, and then all groups of computing tasks are redistributed by using a METIS algorithm provided by LibTopopap according to the affinity among the computing tasks, so that the communication overhead is reduced.

The cam is a dynamic load balancing method based on hypergraph division. Since the load applied iteratively varies slowly, a load imbalance occurs at intervals. The method models a task as a hypergraph, uses and segments the hypergraph. The nodes in the hypergraph represent computing tasks, while the edges represent communications between the computing tasks. Hypergraph segmentation divides tasks into different regions, balancing computational tasks. The goal of segmentation is to minimize edge cuts, i.e., traffic, while ensuring load distribution balance. Due to this NP-hard problem, the authors have adopted the existing Zoltan approach to solve the problem.

However, this problem is an NP problem, and as the system scale increases, it takes a lot of time to calculate a suitable solution. Therefore, centralized algorithms have scalability problems.

Aiming at the problem of expandability in a centralized graph partitioning algorithm, the distributed graph partitioning algorithm is provided in this chapter to balance communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.

The invention content is as follows:

the invention provides a distributed communication load balancing method based on a graph partitioning algorithm, aiming at solving the problem of expandability in a centralized graph partitioning algorithm and balancing communication loads.

The technical scheme of the invention is as follows:

a distributed communication load balancing method based on a graph partitioning algorithm is characterized in that the algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication volume exceeds a certain threshold value, balancing is started, firstly, external nodes are sequenced according to the communication traffic, and according to the sequence of the communication traffic from top to bottom, the nodes are sequentially selected as target nodes to send detection requests; if the target node does not respond to other nodes, the target node responds to the current node and replies communication information of the calculation task on the target node; after the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.

The method comprises the following steps:

1) the method comprises the steps of dividing the state of a server into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, enabling the current state of the server to be in the INIT state, calculating the size of non-local traffic according to load information and communication information which are recorded together by load collection of local nodes, and determining whether to initiate load balancing, wherein if the proportion of the non-local traffic exceeds a threshold value threshold _ 1.

2) If not, the state changes to NONEED, indicating that the communication load of the node does not need to be adjusted, but the node may still be selected as the target node by other nodes.

3) Otherwise, the state is changed to LOOKING, a suitable target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained.

4) In order to select a suitable target node, the other nodes are sorted in descending order of traffic and load collection requests are sent to the nodes in sequence.

5) In order to avoid multiple nodes selecting the same target node, if a node has responded to the load collection requests of other nodes, its state will become PEER, indicating that a node pair has been formed with other nodes, and then it will reject the current load collection request.

6) The node in LOOKING sends the load collection requests to the target nodes in sequence until a target node is found to respond to the load collection request successfully, the state of the target node is changed to PER, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load balancing is not needed any more, and the state of the target node is changed to NONEED.

7) In particular, if a node is in the LOOKING state, it will not respond to load collection requests from other nodes unless the current target of the node happens to be the node that sent the request.

Description of the drawings:

fig. 1 is a process diagram of an algorithm.

Fig. 2 is a diagram showing a change in server status.

Fig. 3 is a graph comparing the change in speed-up ratio with increasing number of messages compared to NoLB.

Fig. 4 is a comparison graph of the total amount of communication between nodes in the system.

Fig. 5 is a graph comparing execution times of different algorithms.

The specific implementation mode is as follows:

in the load balancing algorithm, since a general algorithm does not consider a communication situation between tasks when performing balancing of the computation load, the communication load between some nodes may be high after balancing. A large amount of communication overhead may affect the performance of the program.

Aiming at the problem of expandability in a centralized graph partitioning algorithm, a distributed graph partitioning algorithm is provided in this chapter to balance communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.

The method is realized on the basis of charm + +6.0.1, the DistCommLB class is realized by inheriting the DistBaseLB, and the functions of load information statistics, load information collection, load scheme calculation and the like are realized.

The method comprises the steps that the state of a server is divided into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, the current state of the server is in the INIT state, the size of non-local traffic is calculated according to load information and communication information which are recorded simultaneously by load collection of local nodes, whether load balancing is initiated or not is determined, and if the proportion of the non-local traffic exceeds a threshold value threshold _ 1. If not, the state changes to NONEED, indicating that the communication load of the node does not need to be adjusted, but the node may still be selected as the target node by other nodes. In order to select a suitable target node, the other nodes are sorted in descending order of traffic and load collection requests are sent to the nodes in sequence. In order to avoid that a plurality of nodes select the same target node, if a node already responds to the load of other nodes, and otherwise, the state is changed to LOOKING, a proper target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained. Collect requests, then his status will become PEER indicating that a node pair has been formed with other nodes, after which it will reject the current load collection request. The node in LOOKING sends the load collection requests to the target nodes in sequence until a target node is found to respond to the load collection request successfully, the state of the target node is changed to PER, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load balancing is not needed any more, and the state of the target node is changed to NONEED. In particular, if a node is in the LOOKING state, it will not respond to other nodes' load collection requests unless the node is currently targeted to the node that sent the request. Wherein the change of the server state is shown in fig. 2.

Wherein, the graph partitioning algorithm adopted in this section is the METIS algorithm. METIS is a recursive graph partitioning algorithm, which first coarsens a graph into a graph with a small number of nodes, partitions the graph and then recursively refines each part of the graph to obtain a final partitioning scheme. In this section, the division is performed using the METIS _ PMETIS _ partgraphcursive () function, and the main parameters passed in are the computation load of all tasks, the communication load between each task, and the tolerance to imbalance.

The test program in this section is derived from a knighbor program written in charm + +, which is a benchmark with a neighbor communication mode. In this benchmark test, each object exchanges fixed-size messages with a fixed-size group for the object in each iteration. Each object is assigned a random computational load. In the test program, communication between the calculation objects is the most critical factor affecting the execution time of the program.

In experimental comparison, the following algorithms are compared:

1) CommLB: based on a greedy centralized algorithm, for each task, k tasks communicated with the task are placed on the same node according to weights, so that communication overhead among the nodes is reduced.

2) MetisLB: a centralized equalization algorithm using the METIS graph cut algorithm.

3) Zoltanlb: a centralized equalization algorithm using a Zoltan graph partitioning algorithm.

4) Greeny lb: centralized computational load balancing algorithms do not take into account the distribution of traffic load.

First, this section compares the comparison of the execution time of the knoeghbor test program as the size of the message sent by each computing task increases. The comparison is performed with the execution time without load balancing as a reference. As shown in fig. 3, the execution time of the knoeghbor test program mainly comes from the intercommunication between the computing tasks, so the execution time of the program increases as the message size increases. Therefore, centralized balancing algorithms such as MetisLB, ZoltanLB, and CommLB can effectively reduce the number of messages between nodes in consideration of communication overhead between nodes, and thus the execution time of a program is shorter than the time taken for not performing load balancing. On the other hand, greeny lb does not consider communication between computing tasks, and therefore does not significantly improve performance compared to no balancing, even due to the overhead of load balancing. The distcommb proposed in this section is not as effective as the centralized graph partitioning algorithm, but is comparable to the greedy-based commb algorithm, and is also relatively effective in reducing the traffic in the system. In addition, the total amount of network communication in the whole system is counted in this section, and the result is shown in fig. 4, which is still compared with the case of no equalization. As can be seen from the figure, for the greedy lb algorithm which does not consider the traffic load, the extra traffic due to load balancing may be more than that due to no balancing, but as the size of the message sent by the computing task increases, the proportion of the extra traffic decreases. Although the DistCommLB proposed in this section is not comparable to a centralized graph partitioning algorithm in effect, the traffic between nodes in the system can still be effectively reduced without performing equalization.

Finally, to demonstrate the scalability of DistCommLB, this section compares the change in execution time of different algorithms as the number of computational tasks in the system increases, with the results shown in FIG. 5. Where the unit is seconds, it can be seen that the execution time of the centralized algorithm increases significantly as the number of computing tasks in the system increases, and especially for the centralized algorithm based on graph partitioning, the execution time is very long. For the distCommLB algorithm proposed in this section, because of the distributed algorithm, the graph partitioning algorithm is performed between 2 nodes each time, and although the total amount of tasks on two nodes is increased as the number of tasks is increased, more time is consumed, compared with the centralized algorithm, the execution time of the DistCommLB algorithm is obviously improved, and thus better expandability is achieved.

Claims

1. A distributed communication load balancing method based on a graph partitioning algorithm is characterized in that the algorithm runs on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic in the total communication volume exceeds a certain threshold value, the balancing is started, firstly, external nodes are sequenced according to the communication traffic, and according to the sequence of the communication traffic from top to bottom, the nodes are sequentially selected as target nodes to send detection requests; if the target node does not respond to other nodes, the target node responds to the current node and replies communication information of the calculation task on the target node; after the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.

2. The method of claim 1, comprising the steps of:

1) the method comprises the steps that the state of a server is divided into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, the current state of the server is in the INIT state, the first step is to calculate the size of non-local traffic according to load information and communication information which are collected and recorded together by the load of a local node, and determine whether to initiate load balancing, and if the proportion of the non-local traffic exceeds a threshold value threshold _ 1;

2) if not, the state is changed to NONEED, which indicates that the communication load of the node is not required to be adjusted, but the node is still possibly selected as a target node by other nodes;

3) otherwise, the state is changed to LOOKING, a proper target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained;

4) in order to select a proper target node, sequencing other nodes according to the descending order of the communication traffic, and sequentially sending load collection requests to each node;

5) in order to avoid that a plurality of nodes select the same target node, if one node already responds to the load collection requests of other nodes, the state of the node is changed into Peer, which indicates that a node pair is formed with other nodes, and then the node rejects the current load collection request;

6) the node in LOOKING sends load collection requests to the target nodes according to the sequence until a target node is found to successfully respond to the load collection request, the state of the target node is changed into Peer, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load of the target node is not needed to be balanced any more, and the state of the target node is changed into NONEED;

7) in particular, if a node is in the LOOKING state, it will not respond to other nodes' load collection requests unless the node is currently targeted to the node that sent the request.