CN114217944A - Dynamic load balancing method for neural network aiming at model parallelism - Google Patents

Dynamic load balancing method for neural network aiming at model parallelism Download PDF

Info

Publication number
CN114217944A
CN114217944A CN202110453555.1A CN202110453555A CN114217944A CN 114217944 A CN114217944 A CN 114217944A CN 202110453555 A CN202110453555 A CN 202110453555A CN 114217944 A CN114217944 A CN 114217944A
Authority
CN
China
Prior art keywords
node
operator
model
current
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110453555.1A
Other languages
Chinese (zh)
Inventor
漆锋滨
刘鑫
高捷
陈德训
刘沙
彭超
黄则强
王宜鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN202110453555.1A priority Critical patent/CN114217944A/en
Publication of CN114217944A publication Critical patent/CN114217944A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic load balancing method for a neural network aiming at model parallelism, which gives a segmentation strategy according to corresponding parameters of different models and systems and further carries out iterative updating in the training process; and according to different models and corresponding parameters of the system, a segmentation strategy for the model network is given, and further iterative updating is carried out in the training process. The invention can automatically provide a better segmentation strategy according to different models and corresponding parameters of the system, does not need to manually adjust the models, ensures the load balance of the computing nodes and greatly improves the optimization efficiency.

Description

Dynamic load balancing method for neural network aiming at model parallelism
Technical Field
The invention relates to a dynamic load balancing method for a neural network aiming at model parallelism, and belongs to the technical field of deep learning model parallelism.
Background
The current deep learning distributed parallel expansion mode mainly comprises data parallel, model parallel and mixed parallel.
Although the data parallel application is the most extensive, for some cases that the model parameters are too large and a single node cannot accommodate all the cases, a method for prioritizing the model parallel is needed; the distributed training of the model parallel mode splits a single model to different nodes, and the method is divided into splitting different network layers to corresponding nodes according to the granularity, splitting different parameters of the same layer to different nodes, and correspondingly realizing the transmission of intermediate output among different nodes.
For a linear model, model parameters corresponding to different data dimensions can be divided into different nodes, for a highly nonlinear neural network, each working node cannot relatively independently complete the updating of parameter training responsible for the working node, and the working node must rely on the cooperation with other working nodes, so that a proper segmentation method needs to be found to realize the minimum cost.
The current mainstream model parallel segmentation methods are coarse-grained segmentation, and the model is only divided and distributed on the nodes according to the network layer, which often cannot ensure that the load on the nodes maintains balance. In addition, because the framework of the model and the architecture of the system are different, the application of the parallel model often needs to be determined manually by an algorithm engineer, which means that repeated optimization is needed to find the lowest-cost load balancing splitting method. Meanwhile, different models and systems require algorithm engineers to master again, and optimization efficiency is greatly reduced.
Disclosure of Invention
The invention aims to provide a dynamic load balancing method for a neural network aiming at model parallelism, so as to solve the problem of segmentation of the model parallelism in a neural network model.
In order to achieve the purpose, the invention adopts the technical scheme that: providing a dynamic load balancing method aiming at model parallelism of a neural network, giving a segmentation strategy according to corresponding parameters of different models and systems, and further carrying out iterative updating in the training process;
according to different models and corresponding parameters of a system, a segmentation strategy for the model network is given, and the method specifically comprises the following steps:
s1, constructing a cost model based on model type, parameter number, network cluster topology bandwidth and node number information, wherein the cost model is used for evaluating the calculation time required by input, output and operation of each operator and evaluating adjacent operators and communication time existing in the operators;
s2, distributing operators to be calculated for all nodes according to the cost model obtained in S1, and the specific steps are as follows:
s21, the cost model carries out state simulation on all available computing nodes in the current system, then the whole computing graph of the cost model is traversed in sequence, and at least one available node used for completing the current operator is obtained for each operator and serves as a computing node;
s22, for an operator with a plurality of available nodes, the node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operator on each available node, and selects the available node which is predicted to finish the current operator most quickly as the mapping calculation node;
s23, repeating S22 for each operator, and continuing to distribute the calculation nodes for the other operators until the distribution of the calculation nodes for each operator in the calculation graph is completed;
further iterative updating is carried out in the training process, and the method specifically comprises the following steps:
s3, distributing a weight parameter for each calculation node before training to represent the distributed load amount, wherein the larger the weight is, the more the distributed load amount is, and the weight parameters of each node are equal initially;
s4, during each round of training, firstly, according to the weight parameters of the current node obtained in the previous step, finding out the segmentation strategies of all the calculation nodes through a cost model and starting training, and counting the self waiting time of each calculation node after the calculation is finished;
s5, after one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each computing node obtained in S4, if so, keeping the current segmentation strategy to continue training, if not, adjusting the respective weight according to the proportion of the waiting time among the computing nodes, so as to change the load amount which each computing node should distribute, and then recalculating the segmentation strategy through a cost model and executing the next round of training;
and S6, repeating S4-S5 until the current segmentation strategy is not changed in a plurality of times of training, namely proving that the segmentation strategy is dynamically optimal in the training.
The further improved scheme in the technical scheme is as follows:
1. in the above solution, in S21, for each traversed operator, first, the available node list is considered, and if a certain node does not provide the kernel implementation of the current operator, the current device is not available for the operator.
2. In the above solution, in S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time for transmitting the input of the operator from other nodes if the operator is placed in the current node.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention realizes the general dynamic load balance based on the neural network model parallel, when the model parallel method is applied, a better segmentation strategy can be automatically given according to different models and corresponding parameters of the system, the model does not need to be manually adjusted, the load balance of the computing node is ensured, and the optimization efficiency is greatly improved.
Drawings
FIG. 1 is a schematic diagram of a model parallel method in the prior art.
Fig. 2 is a schematic diagram of a dynamic load balancing method for model parallelism according to the present invention.
Detailed Description
Example (b): the invention provides a dynamic load balancing method for a neural network aiming at model parallelism, which gives a segmentation strategy according to corresponding parameters of different models and systems and further carries out iterative updating in the training process;
according to different models and corresponding parameters of a system, a segmentation strategy for the model network is given, and the method specifically comprises the following steps:
s1, constructing a cost model based on model type, parameter number, network cluster topology bandwidth and node number information, wherein the cost model is used for evaluating the calculation time required by input, output and operation of each operator and evaluating adjacent operators and communication time existing in the operators;
s2, distributing operators to be calculated for all nodes according to the cost model obtained in S1, and the specific steps are as follows:
s21, the cost model carries out state simulation on all available computing nodes in the current system, then the whole computing graph of the cost model is traversed in sequence, and at least one available node used for completing the current operator is obtained for each operator and serves as a computing node;
s22, for an operator with a plurality of available nodes, the node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operator on each available node, and selects the available node which is predicted to finish the current operator most quickly as the mapping calculation node;
s23, repeating S22 for each operator, and continuing to distribute the calculation nodes for the other operators until the distribution of the calculation nodes for each operator in the calculation graph is completed;
further iterative updating is carried out in the training process, and the method specifically comprises the following steps:
s3, distributing a weight parameter for each calculation node before training to represent the distributed load amount, wherein the larger the weight is, the more the distributed load amount is, and the weight parameters of each node are equal initially;
s4, during each training, firstly, according to the weight parameters of the current nodes obtained in the previous step, finding out the segmentation strategies of all the calculation nodes through a cost model and starting the training (namely, the distribution scheme of operators in a calculation graph, S2), and counting the self waiting time of each calculation node after the calculation is finished;
s5, after one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each computing node obtained in S4, if so, keeping the current segmentation strategy to continue training, if not, adjusting the respective weight according to the proportion of the waiting time among the computing nodes, so as to change the load amount which each computing node should distribute, and then recalculating the segmentation strategy through a cost model and executing the next round of training;
and S6, repeating S4-S5 until the current segmentation strategy is not changed in a plurality of times of training, namely proving that the segmentation strategy is dynamically optimal in the training.
In S21, for each traversed operator, first consider its available node list, and if a node does not provide the kernel implementation of the current operator, then the current device is unavailable for that operator.
In S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time when the input of the operator is transmitted from other nodes if the operator is placed in the current node.
The above embodiments are further explained as follows:
according to the invention, a more optimal model segmentation strategy can be automatically analyzed in the training process according to the waiting time after the calculation of each node is completed and by combining the type of the network model, the parameter data volume, the available memory of the system, the number of nodes and other parameters, so that the waiting time of the nodes in the next training round is reduced until the load of each node is balanced.
The invention provides dynamic load balancing based on model parallelism, which comprises the steps of firstly constructing a cost model based on information such as model types, parameter quantity, network cluster topological bandwidth, node numbers and the like, modeling time based on calculation overhead and communication overhead of a memory, distributing operators for each node through a greedy heuristic algorithm, calculating approximate training time of different nodes of the model in a system under different segmentation strategies, and evaluating the load balancing degree of the current strategy.
Through the cost model, the calculation time required by the input, the output and the operation of each operator can be read, then the whole calculation graph is operated, finally, the greedy heuristic algorithm is used for allocating the operators to each node, and the allocation scheme can be used as the final allocation scheme in actual execution.
The cost model scheme carries out state simulation on all available computing nodes in the current node, then sequentially traverses the whole computing graph, firstly considers an available node list of each traversed operator, if a certain node does not provide kernel realization of the current operator, the current equipment is unavailable to the operator, and for the operators with a plurality of available nodes, a node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operators placed on each available node, so that the available node which is predicted to complete the current operator most quickly is selected as the computing node for mapping.
In this process, the algorithm not only considers the estimated completion time of the operator waiting to be executed in each currently available node, thereby estimating the estimated completion time of the current operator, but also considers the communication cost of transmitting the input of the operator from other nodes if the operator is placed in the current node, and then repeats the mapping process to continue to assign computing nodes for the remaining operators until the node assignment is completed for each operator in the graph.
Before training, each node is assigned with a weight parameter for representing the assigned load amount, the greater the weight, the more the assigned load amount, and the weight parameters of each node are equal initially.
During each round of training, firstly, according to the weight of the current node, the segmentation strategies of different nodes are found out through the cost model and training is started, and after calculation of each node is completed, the waiting time of each node is counted.
After one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each node, if so, keeping the current segmentation strategy to continue training, if not, adjusting respective weight by using the proportion of the waiting time among the nodes, thereby changing the distributed load capacity, and then recalculating the segmentation strategy through a cost model and executing the next round of training.
When the dynamic load balancing method for the neural network aiming at the model parallelism is adopted, the general dynamic load balancing based on the neural network model parallelism is realized, when the model parallelism method is applied, a better segmentation strategy can be automatically given according to different models and corresponding parameters of a system, the model is not required to be manually adjusted, the load balancing of a computing node is ensured, and the optimization efficiency is greatly improved.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
parallel models: different computing nodes are responsible for different parts of the network model and train the same batch of data together, and intermediate data in the computing process needs to be transmitted among different computing nodes.
Load balancing: and maintaining the balance of the calculated amount on each node according to parameters such as the calculation cost, the calculation time and the node number of the system of the model.
batch _ size: and training the number of the selected samples at one time when the deep learning model is trained.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (3)

1. A dynamic load balancing method for a neural network aiming at model parallelism is characterized in that: providing a segmentation strategy according to different models and corresponding parameters of the system, and further performing iterative updating in the training process;
according to different models and corresponding parameters of a system, a segmentation strategy for the model network is given, and the method specifically comprises the following steps:
s1, constructing a cost model based on model type, parameter number, network cluster topology bandwidth and node number information, wherein the cost model is used for evaluating the calculation time required by input, output and operation of each operator and evaluating adjacent operators and communication time existing in the operators;
s2, distributing operators to be calculated for all nodes according to the cost model obtained in S1, and the specific steps are as follows:
s21, the cost model carries out state simulation on all available computing nodes in the current system, then the whole computing graph of the cost model is traversed in sequence, and at least one available node used for completing the current operator is obtained for each operator and serves as a computing node;
s22, for an operator with a plurality of available nodes, the node allocation algorithm uses a greedy heuristic algorithm to evaluate the predicted completion time of the operator on each available node, and selects the available node which is predicted to finish the current operator most quickly as the mapping calculation node;
s23, repeating S22 for each operator, and continuing to distribute the calculation nodes for the other operators until the distribution of the calculation nodes for each operator in the calculation graph is completed;
further iterative updating is carried out in the training process, and the method specifically comprises the following steps:
s3, distributing a weight parameter for each calculation node before training to represent the distributed load amount, wherein the larger the weight is, the more the distributed load amount is, and the weight parameters of each node are equal initially;
s4, during each round of training, firstly, according to the weight parameters of the current node obtained in the previous step, finding out the segmentation strategies of all the calculation nodes through a cost model and starting training, and counting the self waiting time of each calculation node after the calculation is finished;
s5, after one round of training is finished, judging whether the current load balance is optimal or not according to the maximum waiting time and the average waiting time of each computing node obtained in S4, if so, keeping the current segmentation strategy to continue training, if not, adjusting the respective weight according to the proportion of the waiting time among the computing nodes, so as to change the load amount which each computing node should distribute, and then recalculating the segmentation strategy through a cost model and executing the next round of training;
and S6, repeating S4-S5 until the current segmentation strategy is not changed in a plurality of times of training, namely proving that the segmentation strategy is dynamically optimal in the training.
2. The method for model-parallel dynamic load balancing of a neural network according to claim 1, wherein: in S21, for each traversed operator, first consider its available node list, and if a node does not provide the kernel implementation of the current operator, then the current device is unavailable for that operator.
3. The method for model-parallel dynamic load balancing of a neural network according to claim 1, wherein: in S22, in the evaluation process, the greedy heuristic not only considers the estimated completion time of the operator waiting to be executed in each currently available node to estimate the estimated completion time of the current operator, but also considers the communication time when the input of the operator is transmitted from other nodes if the operator is placed in the current node.
CN202110453555.1A 2021-04-26 2021-04-26 Dynamic load balancing method for neural network aiming at model parallelism Pending CN114217944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110453555.1A CN114217944A (en) 2021-04-26 2021-04-26 Dynamic load balancing method for neural network aiming at model parallelism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110453555.1A CN114217944A (en) 2021-04-26 2021-04-26 Dynamic load balancing method for neural network aiming at model parallelism

Publications (1)

Publication Number Publication Date
CN114217944A true CN114217944A (en) 2022-03-22

Family

ID=80695843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110453555.1A Pending CN114217944A (en) 2021-04-26 2021-04-26 Dynamic load balancing method for neural network aiming at model parallelism

Country Status (1)

Country Link
CN (1) CN114217944A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954721A (en) * 2023-09-20 2023-10-27 天津南大通用数据技术股份有限公司 Asynchronous non-blocking splitting method for multi-modal operator of actuator
CN117032938A (en) * 2023-10-08 2023-11-10 北京燧原智能科技有限公司 Operator parallel scheduling method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954721A (en) * 2023-09-20 2023-10-27 天津南大通用数据技术股份有限公司 Asynchronous non-blocking splitting method for multi-modal operator of actuator
CN116954721B (en) * 2023-09-20 2023-12-15 天津南大通用数据技术股份有限公司 Asynchronous non-blocking splitting method for multi-modal operator of actuator
CN117032938A (en) * 2023-10-08 2023-11-10 北京燧原智能科技有限公司 Operator parallel scheduling method and device, electronic equipment and storage medium
CN117032938B (en) * 2023-10-08 2024-01-09 北京燧原智能科技有限公司 Operator parallel scheduling method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109491790B (en) Container-based industrial Internet of things edge computing resource allocation method and system
Gai et al. Fusion of cognitive wireless networks and edge computing
CN109388484B (en) Multi-resource cloud job scheduling method based on Deep Q-network algorithm
CN111064633B (en) Cloud-edge cooperative power information communication equipment automated testing resource allocation method
CN102724220B (en) Method and apparatus for task cooperation, and system for internet of things
CN111106999A (en) IP-optical network communication service joint distribution method and device
CN109118097B (en) Reliability maintainability guarantee assessment method and device
CN112541584B (en) Deep neural network model parallel mode selection method
CN114217944A (en) Dynamic load balancing method for neural network aiming at model parallelism
CN113821332B (en) Method, device, equipment and medium for optimizing efficiency of automatic machine learning system
CN110198280A (en) A kind of SDN link allocation method based on BP neural network
CN113806018A (en) Kubernetes cluster resource hybrid scheduling method based on neural network and distributed cache
CN106325976A (en) Rendering task scheduling processing method and server
CN108055701A (en) A kind of resource regulating method and base station
CN113179175A (en) Real-time bandwidth prediction method and device for power communication network service
CN102745192B (en) Task allocation system for distributed control system of hybrid vehicle
CN115543626A (en) Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling
CN115421885A (en) Distributed multi-target cloud task scheduling method and device and cloud service system
CN109978241B (en) Method and device for determining charging load of electric automobile
CN112417748B (en) Method, system, equipment and medium for scheduling automatic driving simulation task
CN114629767A (en) Power dispatching network simulation method and device, computer equipment and storage medium
CN113205128A (en) Distributed deep learning performance guarantee method based on serverless computing
CN102774376B (en) Task allocation method of distributed control system of hybrid power vehicle
CN117557016A (en) Whole vehicle manufacturing stamping resource scheduling method based on deep reinforcement learning
CN112213956A (en) Automatic driving simulation task scheduling method, device, equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination