CN114595052A - Distributed communication load balancing method based on graph partitioning algorithm - Google Patents

Distributed communication load balancing method based on graph partitioning algorithm Download PDF

Info

Publication number
CN114595052A
CN114595052A CN202110638133.1A CN202110638133A CN114595052A CN 114595052 A CN114595052 A CN 114595052A CN 202110638133 A CN202110638133 A CN 202110638133A CN 114595052 A CN114595052 A CN 114595052A
Authority
CN
China
Prior art keywords
node
nodes
communication
load
target node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110638133.1A
Other languages
Chinese (zh)
Inventor
阮利
杨洋
詹子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110638133.1A priority Critical patent/CN114595052A/en
Publication of CN114595052A publication Critical patent/CN114595052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A distributed communication load balancing method based on a graph partitioning algorithm balances communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, so that the communication load between the 2 nodes is reduced, the communication traffic in the system can be effectively reduced, and the expandability is better.

Description

Distributed communication load balancing method based on graph partitioning algorithm
The technical field is as follows:
the invention discloses a distributed communication load balancing method based on a graph partitioning algorithm, relates to challenges faced by wide-area high-performance calculation, and belongs to the technical field of computers.
Background art:
in a runtime system with distributed memory, computing tasks running on different nodes send messages to each other through a network for communication. In this section, the communication load per task, i.e. the number of messages per task and other nodes, is discussed herein. If two computing tasks communicating with each other are distributed on the same node, the communication between them can be performed through the shared memory, thereby hiding the delay of the communication. The purpose of communication load balancing is to place tasks with larger mutual communication quantity on the same node as much as possible, so that the communication quantity between the nodes is reduced. Generally, the computation load and the communication mode of the computation task usually have continuity in time, that is, the computation task has the similar characteristic of the load information in the future period of time and the load information in the past period of time, so the load information in the previous period of time is usually used as a basis for providing guidance information for the future load balancing algorithm.
In the load balancing algorithm, since a general algorithm does not consider the communication situation between tasks when balancing the computation load, the communication load between some nodes may be high after balancing. A large amount of communication overhead may affect the performance of the program.
One approach is to consider the balancing of computational and communication loads as a graph partitioning problem, i.e., dividing n interconnected nodes into k segments, so that the weight of the edges between the segments is minimized, and so that the sum of the node weights of the k segments is in a balanced state. In runtime systems, the approach usually taken is a centralized graph partitioning algorithm, such as:
jeannet presents problems in high performance computers as load imbalances in applications and poor management of data locations. Therefore, as the number of cores increases and the number of memories per core decreases drastically, special attention needs to be paid to load balancing and to consider the locality of data as much as possible. Authors use LibTopoMap to analyze the topology between multiple nodes and propose a topology-based load balancing method. According to the method, the load among all nodes is balanced, and then all groups of computing tasks are redistributed by using a METIS algorithm provided by LibTopopap according to the affinity among the computing tasks, so that the communication overhead is reduced.
The cam is a dynamic load balancing method based on hypergraph division. Since the load applied iteratively varies slowly, a load imbalance occurs at intervals. The method models a task as a hypergraph, uses and segments the hypergraph. The nodes in the hypergraph represent computing tasks, while the edges represent communications between the computing tasks. Hypergraph segmentation divides tasks into different regions, balancing computational tasks. The goal of segmentation is to minimize edge cuts, i.e., traffic, while ensuring load distribution balance. Due to this NP-hard problem, the authors have adopted the existing Zoltan approach to solve the problem.
However, this problem is an NP problem, and as the system scale increases, it takes a lot of time to calculate a suitable solution. Therefore, centralized algorithms have scalability problems.
Aiming at the problem of expandability in a centralized graph partitioning algorithm, the distributed graph partitioning algorithm is provided in this chapter to balance communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.
The invention content is as follows:
the invention provides a distributed communication load balancing method based on a graph partitioning algorithm, aiming at solving the problem of expandability in a centralized graph partitioning algorithm and balancing communication loads.
The technical scheme of the invention is as follows:
a distributed communication load balancing method based on a graph partitioning algorithm is characterized in that the algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication volume exceeds a certain threshold value, balancing is started, firstly, external nodes are sequenced according to the communication traffic, and according to the sequence of the communication traffic from top to bottom, the nodes are sequentially selected as target nodes to send detection requests; if the target node does not respond to other nodes, the target node responds to the current node and replies communication information of the calculation task on the target node; after the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.
The method comprises the following steps:
1) the method comprises the steps of dividing the state of a server into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, enabling the current state of the server to be in the INIT state, calculating the size of non-local traffic according to load information and communication information which are recorded together by load collection of local nodes, and determining whether to initiate load balancing, wherein if the proportion of the non-local traffic exceeds a threshold value threshold _ 1.
2) If not, the state changes to NONEED, indicating that the communication load of the node does not need to be adjusted, but the node may still be selected as the target node by other nodes.
3) Otherwise, the state is changed to LOOKING, a suitable target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained.
4) In order to select a suitable target node, the other nodes are sorted in descending order of traffic and load collection requests are sent to the nodes in sequence.
5) In order to avoid multiple nodes selecting the same target node, if a node has responded to the load collection requests of other nodes, its state will become PEER, indicating that a node pair has been formed with other nodes, and then it will reject the current load collection request.
6) The node in LOOKING sends the load collection requests to the target nodes in sequence until a target node is found to respond to the load collection request successfully, the state of the target node is changed to PER, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load balancing is not needed any more, and the state of the target node is changed to NONEED.
7) In particular, if a node is in the LOOKING state, it will not respond to load collection requests from other nodes unless the current target of the node happens to be the node that sent the request.
A distributed communication load balancing method based on a graph partitioning algorithm balances communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, so that the communication load between the 2 nodes is reduced, the communication traffic in the system can be effectively reduced, and the expandability is better.
Description of the drawings:
fig. 1 is a process diagram of an algorithm.
Fig. 2 is a diagram showing a change in server status.
Fig. 3 is a graph comparing the change in speed-up ratio with increasing number of messages compared to NoLB.
Fig. 4 is a comparison graph of the total amount of communication between nodes in the system.
Fig. 5 is a graph comparing execution times of different algorithms.
The specific implementation mode is as follows:
in the load balancing algorithm, since a general algorithm does not consider a communication situation between tasks when performing balancing of the computation load, the communication load between some nodes may be high after balancing. A large amount of communication overhead may affect the performance of the program.
Aiming at the problem of expandability in a centralized graph partitioning algorithm, a distributed graph partitioning algorithm is provided in this chapter to balance communication loads. The algorithm is operated on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic to the total communication traffic exceeds a certain threshold value, the equalization is started, firstly, external nodes are sequenced according to the communication traffic, and the nodes are sequentially selected as target nodes to send detection requests according to the sequence of the communication traffic from top to bottom. And if the target node does not respond to other nodes, the target node responds to the current node and replies the communication information of the computing task on the target node. After the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.
The method is realized on the basis of charm + +6.0.1, the DistCommLB class is realized by inheriting the DistBaseLB, and the functions of load information statistics, load information collection, load scheme calculation and the like are realized.
The method comprises the steps that the state of a server is divided into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, the current state of the server is in the INIT state, the size of non-local traffic is calculated according to load information and communication information which are recorded simultaneously by load collection of local nodes, whether load balancing is initiated or not is determined, and if the proportion of the non-local traffic exceeds a threshold value threshold _ 1. If not, the state changes to NONEED, indicating that the communication load of the node does not need to be adjusted, but the node may still be selected as the target node by other nodes. In order to select a suitable target node, the other nodes are sorted in descending order of traffic and load collection requests are sent to the nodes in sequence. In order to avoid that a plurality of nodes select the same target node, if a node already responds to the load of other nodes, and otherwise, the state is changed to LOOKING, a proper target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained. Collect requests, then his status will become PEER indicating that a node pair has been formed with other nodes, after which it will reject the current load collection request. The node in LOOKING sends the load collection requests to the target nodes in sequence until a target node is found to respond to the load collection request successfully, the state of the target node is changed to PER, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load balancing is not needed any more, and the state of the target node is changed to NONEED. In particular, if a node is in the LOOKING state, it will not respond to other nodes' load collection requests unless the node is currently targeted to the node that sent the request. Wherein the change of the server state is shown in fig. 2.
Wherein, the graph partitioning algorithm adopted in this section is the METIS algorithm. METIS is a recursive graph partitioning algorithm, which first coarsens a graph into a graph with a small number of nodes, partitions the graph and then recursively refines each part of the graph to obtain a final partitioning scheme. In this section, the division is performed using the METIS _ PMETIS _ partgraphcursive () function, and the main parameters passed in are the computation load of all tasks, the communication load between each task, and the tolerance to imbalance.
The test program in this section is derived from a knighbor program written in charm + +, which is a benchmark with a neighbor communication mode. In this benchmark test, each object exchanges fixed-size messages with a fixed-size group for the object in each iteration. Each object is assigned a random computational load. In the test program, communication between the calculation objects is the most critical factor affecting the execution time of the program.
In experimental comparison, the following algorithms are compared:
1) CommLB: based on a greedy centralized algorithm, for each task, k tasks communicated with the task are placed on the same node according to weights, so that communication overhead among the nodes is reduced.
2) MetisLB: a centralized equalization algorithm using the METIS graph cut algorithm.
3) Zoltanlb: a centralized equalization algorithm using a Zoltan graph partitioning algorithm.
4) Greeny lb: centralized computational load balancing algorithms do not take into account the distribution of traffic load.
First, this section compares the comparison of the execution time of the knoeghbor test program as the size of the message sent by each computing task increases. The comparison is performed with the execution time without load balancing as a reference. As shown in fig. 3, the execution time of the knoeghbor test program mainly comes from the intercommunication between the computing tasks, so the execution time of the program increases as the message size increases. Therefore, centralized balancing algorithms such as MetisLB, ZoltanLB, and CommLB can effectively reduce the number of messages between nodes in consideration of communication overhead between nodes, and thus the execution time of a program is shorter than the time taken for not performing load balancing. On the other hand, greeny lb does not consider communication between computing tasks, and therefore does not significantly improve performance compared to no balancing, even due to the overhead of load balancing. The distcommb proposed in this section is not as effective as the centralized graph partitioning algorithm, but is comparable to the greedy-based commb algorithm, and is also relatively effective in reducing the traffic in the system. In addition, the total amount of network communication in the whole system is counted in this section, and the result is shown in fig. 4, which is still compared with the case of no equalization. As can be seen from the figure, for the greedy lb algorithm which does not consider the traffic load, the extra traffic due to load balancing may be more than that due to no balancing, but as the size of the message sent by the computing task increases, the proportion of the extra traffic decreases. Although the DistCommLB proposed in this section is not comparable to a centralized graph partitioning algorithm in effect, the traffic between nodes in the system can still be effectively reduced without performing equalization.
Finally, to demonstrate the scalability of DistCommLB, this section compares the change in execution time of different algorithms as the number of computational tasks in the system increases, with the results shown in FIG. 5. Where the unit is seconds, it can be seen that the execution time of the centralized algorithm increases significantly as the number of computing tasks in the system increases, and especially for the centralized algorithm based on graph partitioning, the execution time is very long. For the distCommLB algorithm proposed in this section, because of the distributed algorithm, the graph partitioning algorithm is performed between 2 nodes each time, and although the total amount of tasks on two nodes is increased as the number of tasks is increased, more time is consumed, compared with the centralized algorithm, the execution time of the DistCommLB algorithm is obviously improved, and thus better expandability is achieved.

Claims (2)

1. A distributed communication load balancing method based on a graph partitioning algorithm is characterized in that the algorithm runs on each node, firstly, the communication traffic between the node and other nodes is counted, when the proportion of the communication traffic in the total communication volume exceeds a certain threshold value, the balancing is started, firstly, external nodes are sequenced according to the communication traffic, and according to the sequence of the communication traffic from top to bottom, the nodes are sequentially selected as target nodes to send detection requests; if the target node does not respond to other nodes, the target node responds to the current node and replies communication information of the calculation task on the target node; after the current node receives the data, the tasks on the two nodes are redistributed by using a graph partitioning algorithm according to the communication information of the tasks on the current node and the target node, and the communication load among the 2 nodes is reduced.
2. The method of claim 1, comprising the steps of:
1) the method comprises the steps that the state of a server is divided into 4 states { INIT, LOOKING, PEER, NONEED }, when a load balancing algorithm is entered, the current state of the server is in the INIT state, the first step is to calculate the size of non-local traffic according to load information and communication information which are collected and recorded together by the load of a local node, and determine whether to initiate load balancing, and if the proportion of the non-local traffic exceeds a threshold value threshold _ 1;
2) if not, the state is changed to NONEED, which indicates that the communication load of the node is not required to be adjusted, but the node is still possibly selected as a target node by other nodes;
3) otherwise, the state is changed to LOOKING, a proper target node is selected, the communication traffic of all calculation tasks is counted, and the communication traffic between the node and other nodes is obtained;
4) in order to select a proper target node, sequencing other nodes according to the descending order of the communication traffic, and sequentially sending load collection requests to each node;
5) in order to avoid that a plurality of nodes select the same target node, if one node already responds to the load collection requests of other nodes, the state of the node is changed into Peer, which indicates that a node pair is formed with other nodes, and then the node rejects the current load collection request;
6) the node in LOOKING sends load collection requests to the target nodes according to the sequence until a target node is found to successfully respond to the load collection request, the state of the target node is changed into Peer, or the traffic of the current target node is lower than a threshold value threshold _2, the target node considers that the load of the target node is not needed to be balanced any more, and the state of the target node is changed into NONEED;
7) in particular, if a node is in the LOOKING state, it will not respond to other nodes' load collection requests unless the node is currently targeted to the node that sent the request.
CN202110638133.1A 2021-06-08 2021-06-08 Distributed communication load balancing method based on graph partitioning algorithm Pending CN114595052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110638133.1A CN114595052A (en) 2021-06-08 2021-06-08 Distributed communication load balancing method based on graph partitioning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110638133.1A CN114595052A (en) 2021-06-08 2021-06-08 Distributed communication load balancing method based on graph partitioning algorithm

Publications (1)

Publication Number Publication Date
CN114595052A true CN114595052A (en) 2022-06-07

Family

ID=81803869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110638133.1A Pending CN114595052A (en) 2021-06-08 2021-06-08 Distributed communication load balancing method based on graph partitioning algorithm

Country Status (1)

Country Link
CN (1) CN114595052A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115016947A (en) * 2022-08-05 2022-09-06 中国空气动力研究与发展中心计算空气动力研究所 Load distribution method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115016947A (en) * 2022-08-05 2022-09-06 中国空气动力研究与发展中心计算空气动力研究所 Load distribution method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN103649916B (en) The distribution of virtual machine in the data center
CN110445866B (en) Task migration and cooperative load balancing method in mobile edge computing environment
Xu et al. Cnn partitioning and offloading for vehicular edge networks in web3
Mayer et al. Graph: Heterogeneity-aware graph computation with adaptive partitioning
CN110798517A (en) Decentralized cluster load balancing method and system, mobile terminal and storage medium
CN108089918B (en) Graph computation load balancing method for heterogeneous server structure
CN112148492A (en) Service deployment and resource allocation method considering multi-user mobility
CN113033800A (en) Distributed deep learning method and device, parameter server and main working node
Yapicioglu et al. A traffic-aware virtual machine placement method for cloud data centers
Chai et al. A parallel placement approach for service function chain using deep reinforcement learning
CN111324429B (en) Micro-service combination scheduling method based on multi-generation ancestry reference distance
Fernández et al. Distributed slicing in dynamic systems
CN114595052A (en) Distributed communication load balancing method based on graph partitioning algorithm
US20130067113A1 (en) Method of optimizing routing in a cluster comprising static communication links and computer program implementing that method
CN105635285B (en) A kind of VM migration scheduling method based on state aware
Fan et al. Node essentiality assessment and distributed collaborative virtual network embedding in datacenters
CN113806017A (en) Method, system, equipment and storage medium for initial deployment of virtual machine in cloud computing
CN109981794B (en) Processing method and device based on block chain node point network and electronic equipment
CN105138391B (en) The multitasking virtual machine distribution method of cloud system justice is distributed towards wide area
CN108055321B (en) High-reliability cluster construction method based on localization platform
Ding et al. A task scheduling algorithm for heterogeneous systems using aco
Taheri et al. Achieving Performability and Reliability of Data Storage in the Internet of Things
Wang et al. Virtual network embedding with pre‐transformation and incentive convergence mechanism
US9203733B2 (en) Method of pseudo-dynamic routing in a cluster comprising static communication links and computer program implementing that method
Khodayarseresht et al. A multi-objective cloud energy optimizer algorithm for federated environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination