CN114217944A

CN114217944A - Dynamic load balancing method for neural network aiming at model parallelism

Info

Publication number: CN114217944A
Application number: CN202110453555.1A
Authority: CN
Inventors: 漆锋滨; 刘鑫; 高捷; 陈德训; 刘沙; 彭超; 黄则强; 王宜鹏
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2022-03-22

Abstract

The invention discloses a dynamic load balancing method for a neural network aiming at model parallelism, which gives a segmentation strategy according to corresponding parameters of different models and systems and further carries out iterative updating in the training process; and according to different models and corresponding parameters of the system, a segmentation strategy for the model network is given, and further iterative updating is carried out in the training process. The invention can automatically provide a better segmentation strategy according to different models and corresponding parameters of the system, does not need to manually adjust the models, ensures the load balance of the computing nodes and greatly improves the optimization efficiency.

Description

Dynamic load balancing method for neural network aiming at model parallelism

Technical Field

The invention relates to a dynamic load balancing method for a neural network aiming at model parallelism, and belongs to the technical field of deep learning model parallelism.

Background

The current deep learning distributed parallel expansion mode mainly comprises data parallel, model parallel and mixed parallel.

Although the data parallel application is the most extensive, for some cases that the model parameters are too large and a single node cannot accommodate all the cases, a method for prioritizing the model parallel is needed; the distributed training of the model parallel mode splits a single model to different nodes, and the method is divided into splitting different network layers to corresponding nodes according to the granularity, splitting different parameters of the same layer to different nodes, and correspondingly realizing the transmission of intermediate output among different nodes.

For a linear model, model parameters corresponding to different data dimensions can be divided into different nodes, for a highly nonlinear neural network, each working node cannot relatively independently complete the updating of parameter training responsible for the working node, and the working node must rely on the cooperation with other working nodes, so that a proper segmentation method needs to be found to realize the minimum cost.

The current mainstream model parallel segmentation methods are coarse-grained segmentation, and the model is only divided and distributed on the nodes according to the network layer, which often cannot ensure that the load on the nodes maintains balance. In addition, because the framework of the model and the architecture of the system are different, the application of the parallel model often needs to be determined manually by an algorithm engineer, which means that repeated optimization is needed to find the lowest-cost load balancing splitting method. Meanwhile, different models and systems require algorithm engineers to master again, and optimization efficiency is greatly reduced.

Disclosure of Invention

The invention aims to provide a dynamic load balancing method for a neural network aiming at model parallelism, so as to solve the problem of segmentation of the model parallelism in a neural network model.

In order to achieve the purpose, the invention adopts the technical scheme that: providing a dynamic load balancing method aiming at model parallelism of a neural network, giving a segmentation strategy according to corresponding parameters of different models and systems, and further carrying out iterative updating in the training process;

according to different models and corresponding parameters of a system, a segmentation strategy for the model network is given, and the method specifically comprises the following steps:

s1, constructing a cost model based on model type, parameter number, network cluster topology bandwidth and node number information, wherein the cost model is used for evaluating the calculation time required by input, output and operation of each operator and evaluating adjacent operators and communication time existing in the operators;

s2, distributing operators to be calculated for all nodes according to the cost model obtained in S1, and the specific steps are as follows:

s21, the cost model carries out state simulation on all available computing nodes in the current system, then the whole computing graph of the cost model is traversed in sequence, and at least one available node used for completing the current operator is obtained for each operator and serves as a computing node;

s22, for an operator with a plurality of available nodes, the node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operator on each available node, and selects the available node which is predicted to finish the current operator most quickly as the mapping calculation node;

s23, repeating S22 for each operator, and continuing to distribute the calculation nodes for the other operators until the distribution of the calculation nodes for each operator in the calculation graph is completed;

further iterative updating is carried out in the training process, and the method specifically comprises the following steps:

s3, distributing a weight parameter for each calculation node before training to represent the distributed load amount, wherein the larger the weight is, the more the distributed load amount is, and the weight parameters of each node are equal initially;

s4, during each round of training, firstly, according to the weight parameters of the current node obtained in the previous step, finding out the segmentation strategies of all the calculation nodes through a cost model and starting training, and counting the self waiting time of each calculation node after the calculation is finished;

s5, after one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each computing node obtained in S4, if so, keeping the current segmentation strategy to continue training, if not, adjusting the respective weight according to the proportion of the waiting time among the computing nodes, so as to change the load amount which each computing node should distribute, and then recalculating the segmentation strategy through a cost model and executing the next round of training;

and S6, repeating S4-S5 until the current segmentation strategy is not changed in a plurality of times of training, namely proving that the segmentation strategy is dynamically optimal in the training.

The further improved scheme in the technical scheme is as follows:

1. in the above solution, in S21, for each traversed operator, first, the available node list is considered, and if a certain node does not provide the kernel implementation of the current operator, the current device is not available for the operator.

2. In the above solution, in S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time for transmitting the input of the operator from other nodes if the operator is placed in the current node.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the invention realizes the general dynamic load balance based on the neural network model parallel, when the model parallel method is applied, a better segmentation strategy can be automatically given according to different models and corresponding parameters of the system, the model does not need to be manually adjusted, the load balance of the computing node is ensured, and the optimization efficiency is greatly improved.

Drawings

FIG. 1 is a schematic diagram of a model parallel method in the prior art.

Fig. 2 is a schematic diagram of a dynamic load balancing method for model parallelism according to the present invention.

Detailed Description

Example (b): the invention provides a dynamic load balancing method for a neural network aiming at model parallelism, which gives a segmentation strategy according to corresponding parameters of different models and systems and further carries out iterative updating in the training process;

s4, during each training, firstly, according to the weight parameters of the current nodes obtained in the previous step, finding out the segmentation strategies of all the calculation nodes through a cost model and starting the training (namely, the distribution scheme of operators in a calculation graph, S2), and counting the self waiting time of each calculation node after the calculation is finished;

In S21, for each traversed operator, first consider its available node list, and if a node does not provide the kernel implementation of the current operator, then the current device is unavailable for that operator.

In S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time when the input of the operator is transmitted from other nodes if the operator is placed in the current node.

The above embodiments are further explained as follows:

according to the invention, a more optimal model segmentation strategy can be automatically analyzed in the training process according to the waiting time after the calculation of each node is completed and by combining the type of the network model, the parameter data volume, the available memory of the system, the number of nodes and other parameters, so that the waiting time of the nodes in the next training round is reduced until the load of each node is balanced.

The invention provides dynamic load balancing based on model parallelism, which comprises the steps of firstly constructing a cost model based on information such as model types, parameter quantity, network cluster topological bandwidth, node numbers and the like, modeling time based on calculation overhead and communication overhead of a memory, distributing operators for each node through a greedy heuristic algorithm, calculating approximate training time of different nodes of the model in a system under different segmentation strategies, and evaluating the load balancing degree of the current strategy.

Through the cost model, the calculation time required by the input, the output and the operation of each operator can be read, then the whole calculation graph is operated, finally, the greedy heuristic algorithm is used for allocating the operators to each node, and the allocation scheme can be used as the final allocation scheme in actual execution.

The cost model scheme carries out state simulation on all available computing nodes in the current node, then sequentially traverses the whole computing graph, firstly considers an available node list of each traversed operator, if a certain node does not provide kernel realization of the current operator, the current equipment is unavailable to the operator, and for the operators with a plurality of available nodes, a node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operators placed on each available node, so that the available node which is predicted to complete the current operator most quickly is selected as the computing node for mapping.

In this process, the algorithm not only considers the estimated completion time of the operator waiting to be executed in each currently available node, thereby estimating the estimated completion time of the current operator, but also considers the communication cost of transmitting the input of the operator from other nodes if the operator is placed in the current node, and then repeats the mapping process to continue to assign computing nodes for the remaining operators until the node assignment is completed for each operator in the graph.

Before training, each node is assigned with a weight parameter for representing the assigned load amount, the greater the weight, the more the assigned load amount, and the weight parameters of each node are equal initially.

During each round of training, firstly, according to the weight of the current node, the segmentation strategies of different nodes are found out through the cost model and training is started, and after calculation of each node is completed, the waiting time of each node is counted.

After one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each node, if so, keeping the current segmentation strategy to continue training, if not, adjusting respective weight by using the proportion of the waiting time among the nodes, thereby changing the distributed load capacity, and then recalculating the segmentation strategy through a cost model and executing the next round of training.

When the dynamic load balancing method for the neural network aiming at the model parallelism is adopted, the general dynamic load balancing based on the neural network model parallelism is realized, when the model parallelism method is applied, a better segmentation strategy can be automatically given according to different models and corresponding parameters of a system, the model is not required to be manually adjusted, the load balancing of a computing node is ensured, and the optimization efficiency is greatly improved.

To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:

parallel models: different computing nodes are responsible for different parts of the network model and train the same batch of data together, and intermediate data in the computing process needs to be transmitted among different computing nodes.

Load balancing: and maintaining the balance of the calculated amount on each node according to parameters such as the calculation cost, the calculation time and the node number of the system of the model.

batch _ size: and training the number of the selected samples at one time when the deep learning model is trained.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A dynamic load balancing method for a neural network aiming at model parallelism is characterized in that: providing a segmentation strategy according to different models and corresponding parameters of the system, and further performing iterative updating in the training process;

2. The method for model-parallel dynamic load balancing of a neural network according to claim 1, wherein: in S21, for each traversed operator, first consider its available node list, and if a node does not provide the kernel implementation of the current operator, then the current device is unavailable for that operator.

3. The method for model-parallel dynamic load balancing of a neural network according to claim 1, wherein: in S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time when the input of the operator is transmitted from other nodes if the operator is placed in the current node.