CN112541584A - Deep neural network model parallel mode selection method - Google Patents

Deep neural network model parallel mode selection method Download PDF

Info

Publication number
CN112541584A
CN112541584A CN201910897718.8A CN201910897718A CN112541584A CN 112541584 A CN112541584 A CN 112541584A CN 201910897718 A CN201910897718 A CN 201910897718A CN 112541584 A CN112541584 A CN 112541584A
Authority
CN
China
Prior art keywords
neural network
data
network model
model
computing nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910897718.8A
Other languages
Chinese (zh)
Other versions
CN112541584B (en
Inventor
刘鑫
刘沙
彭超
朱传家
陈德训
黄则强
陆旭峰
裴阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910897718.8A priority Critical patent/CN112541584B/en
Publication of CN112541584A publication Critical patent/CN112541584A/en
Application granted granted Critical
Publication of CN112541584B publication Critical patent/CN112541584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a deep neural network model parallel mode selection method, which comprises the following steps: s1, calculating the total data volume of the whole neural network model; s2, judging whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory volume of a single calculation node for training, if not, executing S3, and if so, executing S4; s3, selecting a data parallel mode; s4, segmenting the network layer of the neural network model, obtaining the number of calculation nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the calculation nodes in the input parameters is more than twice of the number of the nodes required by the model segmentation, otherwise executing S6; s5, selecting a model parallel mode; s6, selecting a mixed parallel mode comprising data parallel and model parallel. According to the invention, the automatic selection of the distributed extended parallel mode is realized by acquiring and analyzing the information of the model parameters, the hyper-parameters and the data volume, and higher parallel performance is ensured.

Description

Deep neural network model parallel mode selection method
Technical Field
The invention relates to a deep neural network model parallel mode selection method, and belongs to the technical field of deep learning.
Background
Distributed training of data parallel mode stores a backup of a model on each compute node, processes different parts of a data set on each compute node, and the data parallel mode training method requires combining the results of each work node and synchronizing model parameters between the nodes. The distributed training of the model parallel mode distributes different network layers of the neural network model to different computing nodes, or distributes different parameters in the same layer to different computing nodes, and the different computing nodes are responsible for training different parts of the network model. The hybrid parallel mode is that there is both model parallel and data parallel in a batch of computing nodes performing distributed training, for example, the model parallel mode may be used on a group of nodes, and data parallel may be used among groups of nodes.
In recent years, with the development of deep learning technology, various deep neural network models are emerging. With the increasing variety of network models, the network depth also expands from several layers to hundreds of layers. Although the accuracy rate of the deep-level network is greatly improved, the network model parameters are more and more, the training time is longer and longer, and the method becomes a great obstacle to the rapid development and the wide application of the deep learning technology. In order to complete the training of the super-large scale neural network and the super-large scale data, a single computing node is not feasible, and the support of distributed parallel expansion is required. At present, deep learning distributed parallel extension modes mainly include data parallel, model parallel and hybrid parallel, however, how to select a suitable parallel mode becomes the direction of effort of those skilled in the art.
Disclosure of Invention
The invention aims to provide a deep neural network model parallel mode selection method, which realizes automatic selection of a distributed extended parallel mode through information acquisition and analysis of model parameters, hyper-parameters and data quantity and ensures higher parallel performance.
In order to achieve the purpose, the invention adopts the technical scheme that: a deep neural network model parallel mode selection method is characterized in that input parameters of an artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory of a single calculation node for training, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the method for selecting the parallel mode of the deep neural network model can solve the problem of automatically selecting the parallel mode when different types of neural network models are subjected to distributed expansion, can also solve the problem of network layer segmentation when the models are parallel, does not need manual intervention of a user, realizes automatic selection of the distributed expansion parallel mode by acquiring and analyzing information of model parameters, hyper-parameters and data quantity, and ensures higher parallel performance.
Drawings
FIG. 1 is a schematic diagram of a data parallel mode;
FIG. 2 is a schematic diagram of a model parallel model;
FIG. 3 is a schematic diagram of a hybrid parallel mode;
FIG. 4 is a flow chart of the deep neural network model parallel mode selection method of the present invention.
Detailed Description
Example (b): a deep neural network model parallel mode selection method is characterized in that input parameters of an artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory volume of a single calculation node for training, the total available memory volume of the calculation node can be obtained through a system interface, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
The examples are further explained below:
the adaptive deep neural network parallel mode selection method provided by the invention solves the problem of parallel mode selection when the deep neural network model is subjected to distributed parallel expansion, can adaptively select a proper parallel mode from data parallel, model parallel and mixed parallel modes according to the type of the network model, the size of parameters, the size of training data volume and the size of batch _ size to obtain better acceleration performance, and also provides a model network layer segmentation method aiming at the model parallel and mixed parallel to distribute model parameters to different nodes.
Firstly, measuring and calculating the parameter quantity of the whole neural network model, selecting a model parallel mode aiming at the condition that the parameter data quantity of the network model exceeds the total quantity of available memory of a single calculation node, and segmenting according to a network layer, wherein the segmentation of the network layer is divided into a plurality of parts with approximate execution time according to the execution time of each layer;
secondly, aiming at the condition that the quantity of the model parameters does not exceed the total quantity of the available memory of a single computing node, firstly selecting a data parallel mode, further computing the size of the data space needing to be distributed according to the size of batch _ size, if the sum of the quantity of the data and the model parameters exceeds the available memory of the single computing node, dividing the parameters of the network layer with the largest quantity in the model into two parts, distributing the two parts to two computing nodes, mixing and paralleling the two computing nodes by adopting data paralleling and model paralleling, adopting data paralleling between node groups consisting of every two nodes, and if the quantity of the parameters and the quantity of the data still exceed the memory of the nodes after the layer with the largest quantity of the parameters is segmented, continuously segmenting the layer with the larger quantity of the parameters.
By adopting the method for selecting the parallel mode of the deep neural network model, the problem of automatic selection of the parallel mode when different types of neural network models are subjected to distributed expansion can be solved, the problem of network layer segmentation when the models are parallel can also be solved, manual intervention of a user is not needed, automatic selection of the distributed expansion parallel mode is realized by acquiring and analyzing information of model parameters, hyper-parameters and data quantity, and higher parallel performance is ensured.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
data parallel: different computing nodes have multiple copies of the same model, each computing node is assigned to different data, and then the computing results of all computing nodes are combined in a certain manner.
Parallel models: different computing nodes are responsible for different parts of the network model and train the same batch of data together, and intermediate data in the computing process needs to be transmitted among different computing nodes.
batch _ size: and training the number of the selected samples at one time when the deep learning model is trained.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (1)

1. A parallel mode selection method of a deep neural network model is characterized by comprising the following steps: the input parameters of the artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of a single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory of a single calculation node for training, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
CN201910897718.8A 2019-09-23 2019-09-23 Deep neural network model parallel mode selection method Active CN112541584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910897718.8A CN112541584B (en) 2019-09-23 2019-09-23 Deep neural network model parallel mode selection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910897718.8A CN112541584B (en) 2019-09-23 2019-09-23 Deep neural network model parallel mode selection method

Publications (2)

Publication Number Publication Date
CN112541584A true CN112541584A (en) 2021-03-23
CN112541584B CN112541584B (en) 2022-10-04

Family

ID=75012944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910897718.8A Active CN112541584B (en) 2019-09-23 2019-09-23 Deep neural network model parallel mode selection method

Country Status (1)

Country Link
CN (1) CN112541584B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177632A (en) * 2021-04-13 2021-07-27 支付宝(杭州)信息技术有限公司 Model training method, device and equipment based on pipeline parallelism
CN114565105A (en) * 2022-03-02 2022-05-31 北京百度网讯科技有限公司 Data processing method and deep learning model training method and device
CN115061825A (en) * 2022-08-09 2022-09-16 深圳致星科技有限公司 Heterogeneous computing system and method for private computing, private data and federal learning
CN116991560A (en) * 2023-09-25 2023-11-03 粤港澳大湾区数字经济研究院(福田) Parallel scheduling method, device, equipment and storage medium for language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032671A (en) * 2018-06-25 2018-12-18 电子科技大学 A kind of distributed deep learning method and system based on data parallel strategy
US20190188570A1 (en) * 2017-12-20 2019-06-20 Fujitsu Limited Methods and apparatus for model parallelism in artificial neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188570A1 (en) * 2017-12-20 2019-06-20 Fujitsu Limited Methods and apparatus for model parallelism in artificial neural networks
CN109032671A (en) * 2018-06-25 2018-12-18 电子科技大学 A kind of distributed deep learning method and system based on data parallel strategy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HISAO ISHIBUCHI 等: "《Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation》", 《IEEE TRANSACTIONS ON FUZZY SYSTEMS》 *
杨远飞 等: "《基于并行和切片的深度卷积网络设计研究》", 《微电子学与计算机》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177632A (en) * 2021-04-13 2021-07-27 支付宝(杭州)信息技术有限公司 Model training method, device and equipment based on pipeline parallelism
CN113177632B (en) * 2021-04-13 2022-10-14 支付宝(杭州)信息技术有限公司 Model training method, device and equipment based on pipeline parallelism
CN114565105A (en) * 2022-03-02 2022-05-31 北京百度网讯科技有限公司 Data processing method and deep learning model training method and device
CN115061825A (en) * 2022-08-09 2022-09-16 深圳致星科技有限公司 Heterogeneous computing system and method for private computing, private data and federal learning
CN115061825B (en) * 2022-08-09 2022-11-18 深圳致星科技有限公司 Heterogeneous computing system and method for private computing, private data and federal learning
CN116991560A (en) * 2023-09-25 2023-11-03 粤港澳大湾区数字经济研究院(福田) Parallel scheduling method, device, equipment and storage medium for language model
CN116991560B (en) * 2023-09-25 2024-04-16 粤港澳大湾区数字经济研究院(福田) Parallel scheduling method, device, equipment and storage medium for language model

Also Published As

Publication number Publication date
CN112541584B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN112541584B (en) Deep neural network model parallel mode selection method
CN111242282B (en) Deep learning model training acceleration method based on end edge cloud cooperation
CN105550323B (en) Load balance prediction method and prediction analyzer for distributed database
CN108122027A (en) A kind of training method of neural network model, device and chip
CN111064633B (en) Cloud-edge cooperative power information communication equipment automated testing resource allocation method
CN108122032A (en) A kind of neural network model training method, device, chip and system
CN107368891A (en) A kind of compression method and device of deep learning model
CN110516325A (en) A kind of CAE automation simulation analysis method and system
CN106095812A (en) Intelligent test paper generation method based on similarity measurement
CN110705029A (en) Flow field prediction method of oscillating flapping wing energy acquisition system based on transfer learning
CN109657794B (en) Instruction queue-based distributed deep neural network performance modeling method
CN109818792B (en) Controller based on second-order linear system time-varying coupling complex dynamic network model
CN109815855B (en) Electronic equipment automatic test method and system based on machine learning
CN106250933A (en) Method, system and the FPGA processor of data clusters based on FPGA
CN106204597A (en) A kind of based on from the VS dividing method walking the Weakly supervised study of formula
CN113449878B (en) Data distributed incremental learning method, system, equipment and storage medium
CN101399708A (en) Method and device for establishing network performance model
CN112948123B (en) Spark-based grid hydrological model distributed computing method
CN111695701B (en) System for realizing data set construction processing based on federal learning and construction generation method thereof
CN117201308A (en) Network resource allocation method, system, storage medium and electronic equipment
CN110610140A (en) Training method, device and equipment of face recognition model and readable storage medium
CN113516163B (en) Vehicle classification model compression method, device and storage medium based on network pruning
CN115293367A (en) Mixed federal learning method of scheduling model under small sample unbalanced data constraint
CN114238106A (en) Test time prediction method and device, electronic device and storage medium
CN108074240A (en) Recognition methods, identification device, computer readable storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant