CN111858058A - SGD load balancing method and device based on parallel computing and storage medium - Google Patents

SGD load balancing method and device based on parallel computing and storage medium Download PDF

Info

Publication number
CN111858058A
CN111858058A CN202010723846.3A CN202010723846A CN111858058A CN 111858058 A CN111858058 A CN 111858058A CN 202010723846 A CN202010723846 A CN 202010723846A CN 111858058 A CN111858058 A CN 111858058A
Authority
CN
China
Prior art keywords
nodes
node
load balancing
sub
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010723846.3A
Other languages
Chinese (zh)
Inventor
王彪
王亚强
刘魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Cheng Xin High Tech Information Technology Co ltd
Chengdu University of Information Technology
Original Assignee
Chengdu Cheng Xin High Tech Information Technology Co ltd
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Cheng Xin High Tech Information Technology Co ltd, Chengdu University of Information Technology filed Critical Chengdu Cheng Xin High Tech Information Technology Co ltd
Priority to CN202010723846.3A priority Critical patent/CN111858058A/en
Publication of CN111858058A publication Critical patent/CN111858058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses an SGD load balancing method based on parallel computing, which comprises the following steps: realizing distributed parallel gpu calculation based on a design mode combining model parallel and data parallel; and a semaphore mechanism is adopted to realize synchronous communication between the main node and the sub-nodes, and the optimizer in the sub-container updates the weight by adopting a random gradient descent algorithm. The main node constructs a minimum spanning tree by taking the error in the control table of the child nodes as the weight, finds out the key nodes in the graph nodes, eliminates the nodes without nodes in sequence and redistributes the hardware resources of the nodes. The method realizes that a plurality of model copies simultaneously process different subsets of training samples, periodically carries out interactive combination on the model copies, and optimizes a distributed algorithm. The invention provides a new framework thought to realize the strategy of load balancing calculation, improves the model development efficiency and reduces the development cost, and the algorithm has better adaptability to the data scale and simultaneously realizes the asynchronous communication among the dynamic management sub-containers.

Description

SGD load balancing method and device based on parallel computing and storage medium
Technical Field
The invention relates to the field of machine learning, in particular to a SGD load balancing method and device based on parallel computing and a storage medium.
Background
At present, people have drawn great advantages of artificial intelligence in a plurality of fields. Machine learning is an important ring in artificial intelligence, and helps people make decisions by modeling and training mass data.
However, with the rise of big data, the data size is more and more huge, and the storage and calculation capabilities in the single machine mode cannot meet the requirements of massive data. Distributed machine learning comes from birth, and it has become the mainstream mode in the industry to adopt distributed machine learning to accelerate the speed of model convergence, and there are two more general methods for distributed machine learning at present: model parallel and data parallel.
However, the current parallel computation is limited by the barrel effect, and the next computation can be performed only by waiting until the slowest node is computed. Different subsets of training samples are processed on a plurality of model copies at the same time, and the results of the model copies are periodically combined in an interactive mode, so that the calculation efficiency under large-scale data is improved, and the technical difficulty requirement is high.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an SGD load balancing method, an SGD load balancing device and an SGD load balancing storage medium based on parallel computing, wherein a mode based on combination of a model parallel mode and a data parallel mode is adopted. Compared with the prior art, the method effectively realizes that a plurality of model copies simultaneously process different subsets of the training sample, periodically carries out interactive combination on the results of the model copies, and optimizes the distributed algorithm.
The purpose of the invention is realized by the following technical scheme:
the SGD load balancing method based on parallel computing comprises the following steps:
step 1: constructing a parallel gpu computing architecture, constructing a one-way communication graph by adopting a mode of combining a model parallel mode and a data parallel mode, periodically carrying out model circulation among graph nodes, enabling a model to cover a data set, and preferentially distributing hardware equipment for the graph nodes;
step 2: and dynamically managing node hardware resources, realizing synchronous communication between the main node and the sub-nodes by adopting a semaphore mechanism, and updating the weight by adopting a random gradient descent algorithm in the optimizer in the sub-container.
Specifically, the building of the parallel gpu computing architecture in the step 1 specifically includes the following sub-steps:
s101, configuring a management Node Manager, creating N containers to be deployed on different machines, marking as Node nodes, creating a Node control table on a child Node, and recording a Node ID, a Node data set and a current batch error;
s102, establishing connection among the sub-nodes to form a one-way connection graph, building a neural network in the sub-nodes, and setting a time slice T of one period;
s103, evenly dividing the data samples into N parts, sending the N parts into nodes in sequence, training the nodes on different nodes by using an SGD algorithm, obtaining a local gradient value by each part of the data samples through forward propagation and backward propagation, and updating the gradient;
and S104, traversing according to the hierarchy of the graph in each training period, recording the unbiased estimation quantity of the model error, and recording the error value in the node control table.
Specifically, the traversal process of the graph in the sub-step S104 specifically includes: packing parameters such as weight and bias output by an upper node into an NN object for transmission; after the current node receives the NN object transmitted by the upper node, training the NN object as a hidden layer; and if the current node has a plurality of upper nodes, merging NN objects transmitted from the upper nodes, and solving the mean value of the NN objects as a hidden layer for training.
Specifically, the process of dynamically managing node hardware resources in step 2 specifically includes the following sub-steps:
s201, in each period, inquiring a node control table through a main node, constructing a minimum spanning tree by taking an error in the node control table as a weight, and sequencing the weights in the minimum spanning tree;
s202, when the training model is to be converged, the main node sorts the nodes according to the minimum spanning tree of each period in the node control table and the weight, and sends a synchronization signal to the key node;
and S203, the main node sequentially recovers the tasks of the nodes which do not receive the synchronous signals in the unidirectional communication graph, distributes the hardware resources of the nodes to the adjacent key nodes, and accelerates the calculation speed of the adjacent key nodes until all the nodes finish the training tasks.
A computing device comprising a memory having stored therein computer-executable instructions; a processor for implementing the steps of the load balancing method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned load balancing method.
The invention has the beneficial effects that: the invention provides a new framework thought to realize the strategy of load balancing calculation, improves the model development efficiency and reduces the development cost, so that the algorithm has better adaptability to the data scale, and realizes the asynchronous communication among the dynamic management sub-containers.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of a parallel computing architecture of the present invention.
FIG. 3 is a diagram of the present invention implementing dynamic management of node hardware resources using a semaphore mechanism.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, the SGD load balancing method based on parallel computing mainly includes the following steps:
step 1: constructing a parallel gpu computing architecture, constructing a one-way communication graph by adopting a mode of combining a model parallel mode and a data parallel mode, periodically carrying out model circulation among graph nodes, enabling a model to cover a data set, and preferentially distributing hardware equipment for the graph nodes;
step 2: and dynamically managing node hardware resources, realizing synchronous communication between the main node and the sub-nodes by adopting a semaphore mechanism, and updating the weight by adopting a random gradient descent algorithm in the optimizer in the sub-container.
In this embodiment, as shown in fig. 2, the present invention provides a schematic structural diagram of an SGD load balancing method based on parallel computing, and a specific implementation process of the method includes: firstly, configuring a management Node Manager, creating N containers to be deployed on different machines, marking as Node nodes, and creating a Node control table on a child Node for recording Node IDs, Node data sets and current batch errors. Establishing connection among the sub-nodes to form a one-way connection graph (graph nodes are GPU hardware equipment), building a neural network in the sub-nodes, and setting a time slice T of one period. The data samples are evenly divided into N parts, the N parts are sequentially sent into nodes, an SGD algorithm is used for training on different nodes, each part of data sample is subjected to forward propagation and backward propagation to obtain a local gradient value, and the gradient is updated. And in each training period, recording the unbiased estimation quantity of the model error according to the hierarchy traversal of the graph, and recording the error value in the node control table. In the graph traversal process, weights and offsets between adjacent nodes need to be transmitted, parameters are packaged into an NN object for transmission due to the fact that a neural network is complex and numerous in parameters, and the NN object is used as a hidden layer for training after the nodes receive the NN object transmitted from the upper-layer nodes. And if the nodes have a plurality of upper-layer nodes, merging NN objects transmitted from the upper-layer nodes, and solving the mean value of the NN objects to be used as a hidden layer for training. Model circulation is performed periodically, so that the model runs on all data.
Based on the framework in step 1, after training for a period of time, the error of part of nodes will decrease very slowly, and it takes a very long training time to achieve convergence, which greatly affects training efficiency, and meanwhile, a large amount of invalid calculations will be generated, resulting in waste of hardware resources. Therefore, the invention introduces a semaphore mechanism to realize synchronous communication between the main node and the sub-nodes and manage the dynamic management of the hardware resources of the nodes.
In this embodiment, fig. 3 is a schematic diagram of implementing dynamic management of node hardware resources by using a semaphore mechanism according to the present invention, and a specific implementation process includes: in each period, the main node inquires a node control table, constructs a minimum spanning tree by taking the error in the node control table as a weight, and sorts the weights in the minimum spanning tree. After training a certain period (when the model is to be converged), the main node sorts the nodes according to the minimum spanning tree of each period in the node control table and the weight, and sends synchronous signals to the key nodes. And then, the main node sequentially recovers the tasks of the nodes which do not receive the synchronous signals, and distributes the hardware resources of the nodes to the adjacent key nodes to accelerate the calculation speed of the adjacent nodes so as to improve the efficiency of the whole model.
The architecture thought adopted by the invention can effectively reduce the Loss value, provide the development efficiency of the model, reduce the development cost and have better adaptability to the data scale.
In addition, the invention also provides a computing device and a computer readable storage medium. Wherein a computing device comprises a memory having stored therein computer-executable instructions; and the processor is used for implementing all implementation processes and steps of the load balancing method in the embodiment when executing the computer program. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out all the methods and steps of the above-mentioned load balancing method.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The SGD load balancing method based on parallel computing is characterized by comprising the following steps of:
step 1: constructing a parallel gpu computing architecture, constructing a one-way communication graph by adopting a mode of combining a model parallel mode and a data parallel mode, periodically carrying out model circulation among graph nodes, enabling a model to cover a data set, and preferentially distributing hardware equipment for the graph nodes;
step 2: and dynamically managing node hardware resources, realizing synchronous communication between the main node and the sub-nodes by adopting a semaphore mechanism, and updating the weight by adopting a random gradient descent algorithm in the optimizer in the sub-container.
2. The SGD load balancing method based on parallel computing according to claim 1, wherein the building of the parallel gpu computing architecture in step 1 specifically includes the following sub-steps:
s101, configuring a management Node Manager, creating N containers to be deployed on different machines, marking as Node nodes, creating a Node control table on a child Node, and recording a Node ID, a Node data set and a current batch error;
s102, establishing connection among the sub-nodes to form a one-way connection graph, building a neural network in the sub-nodes, and setting a time slice T of one period;
s103, evenly dividing the data samples into N parts, sending the N parts into nodes in sequence, training the nodes on different nodes by using an SGD algorithm, obtaining a local gradient value by each part of the data samples through forward propagation and backward propagation, and updating the gradient; and S104, traversing according to the hierarchy of the graph in each training period, recording the unbiased estimation quantity of the model error, and recording the error value in the node control table.
3. The SGD load balancing method according to claim 2, wherein the traversal process of the graph in the sub-step S104 specifically includes: packing parameters such as weight and bias output by an upper node into an NN object for transmission; after the current node receives the NN object transmitted by the upper node, training the NN object as a hidden layer; and if the current node has a plurality of upper nodes, merging NN objects transmitted from the upper nodes, and solving the mean value of the NN objects as a hidden layer for training.
4. The SGD load balancing method based on parallel computing according to claim 1, wherein the step 2 of dynamically managing hardware resources of nodes specifically comprises the following sub-steps:
s201, in each period, inquiring a node control table through a main node, constructing a minimum spanning tree by taking an error in the node control table as a weight, and sequencing the weights in the minimum spanning tree;
s202, when the training model is to be converged, the main node sorts the nodes according to the minimum spanning tree of each period in the node control table and the weight, and sends a synchronization signal to the key node;
and S203, the main node sequentially recovers the tasks of the nodes which do not receive the synchronous signals in the unidirectional communication graph, distributes the hardware resources of the nodes to the adjacent key nodes, and accelerates the calculation speed of the adjacent key nodes until all the nodes finish the training tasks.
5. A computing device, comprising
A memory having computer-executable instructions stored therein;
a processor for implementing the steps of the load balancing method according to any one of claims 1 to 4 when executing the computer program.
6. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the load balancing method according to any one of claims 1 to 4.
CN202010723846.3A 2020-07-24 2020-07-24 SGD load balancing method and device based on parallel computing and storage medium Pending CN111858058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723846.3A CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method and device based on parallel computing and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723846.3A CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method and device based on parallel computing and storage medium

Publications (1)

Publication Number Publication Date
CN111858058A true CN111858058A (en) 2020-10-30

Family

ID=72950115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723846.3A Pending CN111858058A (en) 2020-07-24 2020-07-24 SGD load balancing method and device based on parallel computing and storage medium

Country Status (1)

Country Link
CN (1) CN111858058A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN114167828A (en) * 2021-12-03 2022-03-11 润电能源科学技术有限公司 External hanging control method of DCS controller and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN108304918A (en) * 2018-01-18 2018-07-20 中兴飞流信息科技有限公司 A kind of the parameter exchange method and system of the deep learning of data parallel
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN110678843A (en) * 2017-04-17 2020-01-10 微软技术许可有限责任公司 Dynamically partitioning workloads in deep neural network modules to reduce power consumption
CN110795228A (en) * 2018-08-03 2020-02-14 伊姆西Ip控股有限责任公司 Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets
CN111178486A (en) * 2019-11-27 2020-05-19 湖州师范学院 Hyper-parameter asynchronous parallel search method based on population evolution
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
US20200175422A1 (en) * 2018-11-29 2020-06-04 International Business Machines Corporation Asynchronous gradient weight compression

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN110678843A (en) * 2017-04-17 2020-01-10 微软技术许可有限责任公司 Dynamically partitioning workloads in deep neural network modules to reduce power consumption
CN108304918A (en) * 2018-01-18 2018-07-20 中兴飞流信息科技有限公司 A kind of the parameter exchange method and system of the deep learning of data parallel
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN110795228A (en) * 2018-08-03 2020-02-14 伊姆西Ip控股有限责任公司 Adaptive batch dataset partitioning for distributed deep learning using accelerator mixture sets
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
US20200175422A1 (en) * 2018-11-29 2020-06-04 International Business Machines Corporation Asynchronous gradient weight compression
CN111178486A (en) * 2019-11-27 2020-05-19 湖州师范学院 Hyper-parameter asynchronous parallel search method based on population evolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENG DANING 等: "Weighted parallel SGD for distributed unbalanced-workload training system", 《JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING》 *
鲁淑霞 等: "带有方差减小的加权零阶随机梯度下降算法", 《河北大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598118A (en) * 2021-03-03 2021-04-02 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN112598118B (en) * 2021-03-03 2021-06-25 成都晓多科技有限公司 Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN114167828A (en) * 2021-12-03 2022-03-11 润电能源科学技术有限公司 External hanging control method of DCS controller and related device

Similar Documents

Publication Publication Date Title
CN110533183B (en) Task placement method for heterogeneous network perception in pipeline distributed deep learning
CN109032671B (en) Distributed deep learning method and system based on data parallel strategy
Sun et al. Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes
US10171284B2 (en) Reachability-based coordination for cyclic dataflow
CN114756383B (en) Distributed computing method, system, equipment and storage medium
CN103970580B (en) A kind of data flow towards multinuclear cluster compiles optimization method
CN109754060A (en) A kind of training method and device of neural network machine learning model
CN107330516A (en) Model parameter training method, apparatus and system
CN110222005A (en) Data processing system and its method for isomery framework
CN111858058A (en) SGD load balancing method and device based on parallel computing and storage medium
Van Tendeloo et al. PythonPDEVS: a distributed parallel DEVS simulator.
CN108564164A (en) A kind of parallelization deep learning method based on SPARK platforms
Zhan et al. Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking
Sun et al. Gradientflow: Optimizing network performance for large-scale distributed dnn training
CN111274036A (en) Deep learning task scheduling method based on speed prediction
CN111241301A (en) Knowledge graph representation learning-oriented distributed framework construction method
Sun et al. Gssp: eliminating stragglers through grouping synchronous for distributed deep learning in heterogeneous cluster
Osawa et al. Pipefisher: Efficient training of large language models using pipelining and fisher information matrices
Wu et al. Rethinking memory and communication cost for efficient large language model training
CN110135067B (en) Helicopter flow field overlapping mixed grid parallel method under double time step method
CN116755876A (en) Large model hybrid parallel training acceleration method and system
CN112446484A (en) Multitask training cluster intelligent network system and cluster network optimization method
CN116400963A (en) Model automatic parallel method, device and storage medium based on load balancing
Wang et al. A coordinated two-stages virtual network embedding algorithm based on reinforcement learning
CN115600673A (en) Method and system for parallel training DNN model for multi-machine multi-card computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20221209