CN112541584A - Deep neural network model parallel mode selection method - Google Patents
Deep neural network model parallel mode selection method Download PDFInfo
- Publication number
- CN112541584A CN112541584A CN201910897718.8A CN201910897718A CN112541584A CN 112541584 A CN112541584 A CN 112541584A CN 201910897718 A CN201910897718 A CN 201910897718A CN 112541584 A CN112541584 A CN 112541584A
- Authority
- CN
- China
- Prior art keywords
- neural network
- data
- network model
- model
- computing nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a deep neural network model parallel mode selection method, which comprises the following steps: s1, calculating the total data volume of the whole neural network model; s2, judging whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory volume of a single calculation node for training, if not, executing S3, and if so, executing S4; s3, selecting a data parallel mode; s4, segmenting the network layer of the neural network model, obtaining the number of calculation nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the calculation nodes in the input parameters is more than twice of the number of the nodes required by the model segmentation, otherwise executing S6; s5, selecting a model parallel mode; s6, selecting a mixed parallel mode comprising data parallel and model parallel. According to the invention, the automatic selection of the distributed extended parallel mode is realized by acquiring and analyzing the information of the model parameters, the hyper-parameters and the data volume, and higher parallel performance is ensured.
Description
Technical Field
The invention relates to a deep neural network model parallel mode selection method, and belongs to the technical field of deep learning.
Background
Distributed training of data parallel mode stores a backup of a model on each compute node, processes different parts of a data set on each compute node, and the data parallel mode training method requires combining the results of each work node and synchronizing model parameters between the nodes. The distributed training of the model parallel mode distributes different network layers of the neural network model to different computing nodes, or distributes different parameters in the same layer to different computing nodes, and the different computing nodes are responsible for training different parts of the network model. The hybrid parallel mode is that there is both model parallel and data parallel in a batch of computing nodes performing distributed training, for example, the model parallel mode may be used on a group of nodes, and data parallel may be used among groups of nodes.
In recent years, with the development of deep learning technology, various deep neural network models are emerging. With the increasing variety of network models, the network depth also expands from several layers to hundreds of layers. Although the accuracy rate of the deep-level network is greatly improved, the network model parameters are more and more, the training time is longer and longer, and the method becomes a great obstacle to the rapid development and the wide application of the deep learning technology. In order to complete the training of the super-large scale neural network and the super-large scale data, a single computing node is not feasible, and the support of distributed parallel expansion is required. At present, deep learning distributed parallel extension modes mainly include data parallel, model parallel and hybrid parallel, however, how to select a suitable parallel mode becomes the direction of effort of those skilled in the art.
Disclosure of Invention
The invention aims to provide a deep neural network model parallel mode selection method, which realizes automatic selection of a distributed extended parallel mode through information acquisition and analysis of model parameters, hyper-parameters and data quantity and ensures higher parallel performance.
In order to achieve the purpose, the invention adopts the technical scheme that: a deep neural network model parallel mode selection method is characterized in that input parameters of an artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory of a single calculation node for training, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the method for selecting the parallel mode of the deep neural network model can solve the problem of automatically selecting the parallel mode when different types of neural network models are subjected to distributed expansion, can also solve the problem of network layer segmentation when the models are parallel, does not need manual intervention of a user, realizes automatic selection of the distributed expansion parallel mode by acquiring and analyzing information of model parameters, hyper-parameters and data quantity, and ensures higher parallel performance.
Drawings
FIG. 1 is a schematic diagram of a data parallel mode;
FIG. 2 is a schematic diagram of a model parallel model;
FIG. 3 is a schematic diagram of a hybrid parallel mode;
FIG. 4 is a flow chart of the deep neural network model parallel mode selection method of the present invention.
Detailed Description
Example (b): a deep neural network model parallel mode selection method is characterized in that input parameters of an artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory volume of a single calculation node for training, the total available memory volume of the calculation node can be obtained through a system interface, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
The examples are further explained below:
the adaptive deep neural network parallel mode selection method provided by the invention solves the problem of parallel mode selection when the deep neural network model is subjected to distributed parallel expansion, can adaptively select a proper parallel mode from data parallel, model parallel and mixed parallel modes according to the type of the network model, the size of parameters, the size of training data volume and the size of batch _ size to obtain better acceleration performance, and also provides a model network layer segmentation method aiming at the model parallel and mixed parallel to distribute model parameters to different nodes.
Firstly, measuring and calculating the parameter quantity of the whole neural network model, selecting a model parallel mode aiming at the condition that the parameter data quantity of the network model exceeds the total quantity of available memory of a single calculation node, and segmenting according to a network layer, wherein the segmentation of the network layer is divided into a plurality of parts with approximate execution time according to the execution time of each layer;
secondly, aiming at the condition that the quantity of the model parameters does not exceed the total quantity of the available memory of a single computing node, firstly selecting a data parallel mode, further computing the size of the data space needing to be distributed according to the size of batch _ size, if the sum of the quantity of the data and the model parameters exceeds the available memory of the single computing node, dividing the parameters of the network layer with the largest quantity in the model into two parts, distributing the two parts to two computing nodes, mixing and paralleling the two computing nodes by adopting data paralleling and model paralleling, adopting data paralleling between node groups consisting of every two nodes, and if the quantity of the parameters and the quantity of the data still exceed the memory of the nodes after the layer with the largest quantity of the parameters is segmented, continuously segmenting the layer with the larger quantity of the parameters.
By adopting the method for selecting the parallel mode of the deep neural network model, the problem of automatic selection of the parallel mode when different types of neural network models are subjected to distributed expansion can be solved, the problem of network layer segmentation when the models are parallel can also be solved, manual intervention of a user is not needed, automatic selection of the distributed expansion parallel mode is realized by acquiring and analyzing information of model parameters, hyper-parameters and data quantity, and higher parallel performance is ensured.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
data parallel: different computing nodes have multiple copies of the same model, each computing node is assigned to different data, and then the computing results of all computing nodes are combined in a certain manner.
Parallel models: different computing nodes are responsible for different parts of the network model and train the same batch of data together, and intermediate data in the computing process needs to be transmitted among different computing nodes.
batch _ size: and training the number of the selected samples at one time when the deep learning model is trained.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (1)
1. A parallel mode selection method of a deep neural network model is characterized by comprising the following steps: the input parameters of the artificial intelligence training task comprise a neural network model file, the number of computing nodes and the size of a single training sample data, wherein the neural network model file comprises batch _ size, the number of model parameters and the data type;
the parallel mode selection method comprises the following steps:
s1, calculating parameter data volume of the whole neural network model by a distributed expansion component in the artificial intelligence framework according to the parameter number and the data type of the neural network model, and calculating the data volume of input data according to the size of single training sample data in input parameters and the size of batch _ size in a neural network model file, wherein the sum of the parameter data volume and the data volume of the input data is the total data volume of the neural network model;
s2, the distributed extension module judges whether the total data volume of the neural network model obtained in the S1 exceeds the total available memory of a single calculation node for training, if not, S3 is executed, and if so, S4 is executed;
s3, selecting a data parallel mode, dividing training samples into a plurality of parts with the same number as the number of the computing nodes by the distributed extension component according to the number of the computing nodes in the input parameters, training each computing node by using respective sample data, transmitting gradient data among the computing nodes, and completing training together;
s4, segmenting the network layer of the neural network model, dividing the network layer into a plurality of parts, distributing the model parameters of each part on a computing node, obtaining the number of the computing nodes required to be distributed by the neural network model according to the segmentation result, executing S5 if the number of the computing nodes in the input parameters is less than two times of the number of the nodes required by the model segmentation, otherwise executing S6, wherein the concrete method of segmentation is as follows: selecting a plurality of continuous layers with the maximum number from a starting layer of a network layer as one part, so that the sum of the data quantity of the plurality of layers does not exceed the total available memory of the computing node, and if the data quantity of a certain single network layer exceeds the total available memory of the computing node, dividing the network layer into a plurality of parts according to the total available memory of the computing node;
s5, selecting a model parallel mode, segmenting the neural network model, and distributing the segmented neural network model parameters of each partial network layer to different computing nodes by the distributed expansion component;
s6, selecting a mixed parallel mode comprising data parallel and model parallel, grouping all computing nodes according to the number of the computing nodes required to be distributed by the neural network model, wherein the number of the computing nodes contained in each group is the same as the number of the computing nodes divided by the neural network model, model parallel is adopted in each group of computing nodes, intermediate data is transmitted among the computing nodes in the group, data parallel is adopted among the node groups formed by each group of computing nodes, and gradient data is transmitted by the computing nodes among the groups.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910897718.8A CN112541584B (en) | 2019-09-23 | 2019-09-23 | Deep neural network model parallel mode selection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910897718.8A CN112541584B (en) | 2019-09-23 | 2019-09-23 | Deep neural network model parallel mode selection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112541584A true CN112541584A (en) | 2021-03-23 |
CN112541584B CN112541584B (en) | 2022-10-04 |
Family
ID=75012944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910897718.8A Active CN112541584B (en) | 2019-09-23 | 2019-09-23 | Deep neural network model parallel mode selection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112541584B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177632A (en) * | 2021-04-13 | 2021-07-27 | 支付宝(杭州)信息技术有限公司 | Model training method, device and equipment based on pipeline parallelism |
CN114565105A (en) * | 2022-03-02 | 2022-05-31 | 北京百度网讯科技有限公司 | Data processing method and deep learning model training method and device |
CN115061825A (en) * | 2022-08-09 | 2022-09-16 | 深圳致星科技有限公司 | Heterogeneous computing system and method for private computing, private data and federal learning |
CN116991560A (en) * | 2023-09-25 | 2023-11-03 | 粤港澳大湾区数字经济研究院(福田) | Parallel scheduling method, device, equipment and storage medium for language model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
US20190188570A1 (en) * | 2017-12-20 | 2019-06-20 | Fujitsu Limited | Methods and apparatus for model parallelism in artificial neural networks |
-
2019
- 2019-09-23 CN CN201910897718.8A patent/CN112541584B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190188570A1 (en) * | 2017-12-20 | 2019-06-20 | Fujitsu Limited | Methods and apparatus for model parallelism in artificial neural networks |
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
Non-Patent Citations (2)
Title |
---|
HISAO ISHIBUCHI 等: "《Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation》", 《IEEE TRANSACTIONS ON FUZZY SYSTEMS》 * |
杨远飞 等: "《基于并行和切片的深度卷积网络设计研究》", 《微电子学与计算机》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177632A (en) * | 2021-04-13 | 2021-07-27 | 支付宝(杭州)信息技术有限公司 | Model training method, device and equipment based on pipeline parallelism |
CN113177632B (en) * | 2021-04-13 | 2022-10-14 | 支付宝(杭州)信息技术有限公司 | Model training method, device and equipment based on pipeline parallelism |
CN114565105A (en) * | 2022-03-02 | 2022-05-31 | 北京百度网讯科技有限公司 | Data processing method and deep learning model training method and device |
CN115061825A (en) * | 2022-08-09 | 2022-09-16 | 深圳致星科技有限公司 | Heterogeneous computing system and method for private computing, private data and federal learning |
CN115061825B (en) * | 2022-08-09 | 2022-11-18 | 深圳致星科技有限公司 | Heterogeneous computing system and method for private computing, private data and federal learning |
CN116991560A (en) * | 2023-09-25 | 2023-11-03 | 粤港澳大湾区数字经济研究院(福田) | Parallel scheduling method, device, equipment and storage medium for language model |
CN116991560B (en) * | 2023-09-25 | 2024-04-16 | 粤港澳大湾区数字经济研究院(福田) | Parallel scheduling method, device, equipment and storage medium for language model |
Also Published As
Publication number | Publication date |
---|---|
CN112541584B (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112541584B (en) | Deep neural network model parallel mode selection method | |
CN111242282B (en) | Deep learning model training acceleration method based on end edge cloud cooperation | |
CN105550323B (en) | Load balance prediction method and prediction analyzer for distributed database | |
CN108122027A (en) | A kind of training method of neural network model, device and chip | |
CN111064633B (en) | Cloud-edge cooperative power information communication equipment automated testing resource allocation method | |
CN108122032A (en) | A kind of neural network model training method, device, chip and system | |
CN107368891A (en) | A kind of compression method and device of deep learning model | |
CN110516325A (en) | A kind of CAE automation simulation analysis method and system | |
CN106095812A (en) | Intelligent test paper generation method based on similarity measurement | |
CN110705029A (en) | Flow field prediction method of oscillating flapping wing energy acquisition system based on transfer learning | |
CN109657794B (en) | Instruction queue-based distributed deep neural network performance modeling method | |
CN109818792B (en) | Controller based on second-order linear system time-varying coupling complex dynamic network model | |
CN109815855B (en) | Electronic equipment automatic test method and system based on machine learning | |
CN106250933A (en) | Method, system and the FPGA processor of data clusters based on FPGA | |
CN106204597A (en) | A kind of based on from the VS dividing method walking the Weakly supervised study of formula | |
CN113449878B (en) | Data distributed incremental learning method, system, equipment and storage medium | |
CN101399708A (en) | Method and device for establishing network performance model | |
CN112948123B (en) | Spark-based grid hydrological model distributed computing method | |
CN111695701B (en) | System for realizing data set construction processing based on federal learning and construction generation method thereof | |
CN117201308A (en) | Network resource allocation method, system, storage medium and electronic equipment | |
CN110610140A (en) | Training method, device and equipment of face recognition model and readable storage medium | |
CN113516163B (en) | Vehicle classification model compression method, device and storage medium based on network pruning | |
CN115293367A (en) | Mixed federal learning method of scheduling model under small sample unbalanced data constraint | |
CN114238106A (en) | Test time prediction method and device, electronic device and storage medium | |
CN108074240A (en) | Recognition methods, identification device, computer readable storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |