CN110378472A - A kind of data parallel training method, device and the equipment of deep neural network model - Google Patents
A kind of data parallel training method, device and the equipment of deep neural network model Download PDFInfo
- Publication number
- CN110378472A CN110378472A CN201910672272.9A CN201910672272A CN110378472A CN 110378472 A CN110378472 A CN 110378472A CN 201910672272 A CN201910672272 A CN 201910672272A CN 110378472 A CN110378472 A CN 110378472A
- Authority
- CN
- China
- Prior art keywords
- current
- processor
- data
- training
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses data parallel training method, device, equipment and the computer readable storage mediums of a kind of deep neural network model, this method comprises: first processor sends "current" model parameter and corresponding current training data to each second processor, so that each second processor is trained predetermined depth neural network model according to "current" model parameter and corresponding current training data;The present invention updates "current" model parameter by the gradient data of the default value according to storage, to reduce the oldness problem for calculating the weight parameter that gradient uses;And it withouts waiting for all second processors and completes once training, after current second processor completes training, the training data progress next round training for obtaining next batch can be returned immediately, to improve the whole training effectiveness that the data parallel of DNN model is trained, total training time needed for DNN model training is reduced, the user experience is improved.
Description
Technical field
The present invention relates to deep learning model training field, in particular to a kind of data parallel of deep neural network model
Training method, device, equipment and computer readable storage medium.
Background technique
With the development of modern society's science and technology, deep neural network (Deep Neural Network, DNN) is obtained extensively
General application, including image and visual classification, speech recognition and language translation etc..It is developed more and more widely however as DNN
With use, moulded dimension becomes increasing, such as hundreds of layers, a total of 1,000 ten thousand to 2,000 ten thousand parameters.This growth
So that efficient model training becomes more important.Given hardware device resources, how in the shorter time
Inside being trained restrains model, and reaches higher precision, is always the project being widely studied.
DNN model forms (such as convolutional layer, full articulamentum etc.) by the different types of layer of a system, usually using label figure
The data set of picture trains DNN model.Training is made of multiple epoch, and epoch is that the primary of all images is concentrated to change to data
Generation.The target of DNN model training is to obtain high-precision model in the shortest possible time, and DNN model training reaches demand standard
Total training time needed for exactness is related to hardware efficiency and statistical efficiency.Wherein hardware efficiency is corresponding completes single epoch instruction
Practice the required time;Statistical efficiency correspondence reaches epoch quantity required when desired accuracy.Currently, often simultaneously using data
Row method is trained DNN model.Data parallel method divides input data to be trained, muti-piece GPU simultaneously
The multiple batch of training (criticize) data, operate in every piece of GPU model and are based on same neural network, network structure is the same, Share Model
Parameter.
In the prior art, data parallel is divided into that synchrodata is parallel and the parallel two methods of asynchronous data again.Synchrodata
After batch data gradient has been calculated in all GPU in parallel method, multiple gradients are combined by statistics, update Share Model ginseng
Number (weight parameter), it is similar to use larger batch, as shown in Figure 1;However, although this method can be reduced for calculating ladder
The oldness of the weight parameter of degree makes model that can finally reach higher convergence precision, has preferable statistical efficiency, but work as
When GPU training model speed is inconsistent, this method needs to calculate faster GPU and other is waited to calculate slower GPU, Deng Daosuo
Some GPU go to update weight parameter when calculating completion together, so the just hardware efficiency of significantly lower training, i.e., every training
The time of a complete epoch is elongated.And asynchronous data withouts waiting for all GPU parallel and completes primary training, which GPU is completed
Training, immediately by gradient updating to sharing model parameters, as shown in Fig. 2, reducing the GPU idle waiting time, to improve
Trained hardware efficiency, but since asynchronous parallel has weight parameter oldness used in training process, cause
Its statistical efficiency is lower.
Therefore, the whole training effectiveness for how improving the data parallel training of DNN model, reduces DNN model training
Required total training time promotes user experience, is urgent problem now.
Summary of the invention
The object of the present invention is to provide a kind of data parallel training method of deep neural network model, device, equipment and
Computer readable storage medium improves the whole training effectiveness of the data parallel training of DNN model, reduces DNN model training institute
The total training time needed promotes user experience.
In order to solve the above technical problems, the present invention provides a kind of data parallel training method of deep neural network model,
Include:
The "current" model parameter and training data of first processor acquisition predetermined depth neural network model;
Each current corresponding current training data of training data is obtained using the training data, and to each the
Two processors send "current" model parameter and corresponding current training data, so that each second processor is according to working as
Preceding model parameter and corresponding current training data are trained the predetermined depth neural network model;Wherein, institute
The quantity for stating second processor is greater than or equal to 2;
In not up to default termination condition, the gradient data that current second processor returns is received and stored, judgement is deposited
Whether the quantity of the gradient data of storage reaches default value;Wherein, the default value is greater than or equal to 2, current second
Processor is any second processing;
If it is not, then using the corresponding next training data of the current second processor obtained using the training data as working as
The corresponding current training data of preceding second processor, and by "current" model parameter and the corresponding current training of current second processor
Data are sent to current second processor;
If so, updating "current" model parameter according to the gradient data of the default value of storage, the institute of storage is deleted
Gradient data is stated, and executes the corresponding next training data of current second processor that will be obtained using the training data
As the corresponding current training data of current second processor, and by "current" model parameter and current second processor is corresponding works as
The step of preceding training data is sent to current second processor;
When reaching the default termination condition, parallel training result is determined according to "current" model parameter.
Optionally, the default value is less than or equal to the quantity of the second processor.
Optionally, it is described to each second processor send "current" model parameter and corresponding current training data it
Before, further includes:
According to the model parameter size of the default value and the predetermined depth neural network model, open up in memory
Default memory space;Wherein, the default memory space is used to store the gradient data of the default value.
Optionally, the gradient data of the default value according to storage updates "current" model parameter, comprising:
Calculate the mean value of the gradient data of the default value of storage;
"current" model parameter is updated according to the mean value.
Optionally, it is described to each second processor send "current" model parameter and corresponding current training data it
Afterwards, further includes:
Current second processor "current" model parameter and current training data based on the received, carry out the predetermined depth mind
Propagated forward through network model, is calculated penalty values;
According to the penalty values, the backpropagation of the predetermined depth neural network model is carried out, the ladder is calculated
Degree evidence, and the first processor is sent by the gradient data.
The present invention also provides a kind of data parallel training devices of deep neural network model, comprising:
Module is obtained, for obtaining the "current" model parameter and training data of predetermined depth neural network model;
First sending module, for obtaining the corresponding current instruction of each current training data using the training data
Practice data, and send "current" model parameter and corresponding current training data to each second processor, so that each institute
Second processor is stated according to "current" model parameter and corresponding current training data to the predetermined depth neural network mould
Type is trained;Wherein, the quantity of the second processor is greater than or equal to 2;
Judgment module is stored, is returned in not up to default termination condition, receiving and storing current second processor
Gradient data, judge whether the quantity of the gradient data of storage reaches default value;Wherein, the default value is greater than
Or it is equal to 2, current second processor is any second processing;
Second sending module, it is current by being obtained using the training data if being used for the not up to described default value
The corresponding next training data of second processor is as the corresponding current training data of current second processor, and by "current" model
Parameter and the corresponding current training data of current second processor are sent to current second processor;
Update module, if for reaching the default value, more according to the gradient data of the default value of storage
New "current" model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module;
Determining module, for determining parallel training knot according to "current" model parameter when reaching the default termination condition
Fruit.
Optionally, the device further include:
Module is opened up in storage, for the model parameter according to the default value and the predetermined depth neural network model
Size opens up default memory space in memory;Wherein, the default memory space is for storing the described of the default value
Gradient data.
Optionally, the update module, comprising:
Average calculation unit, the mean value of the gradient data of the default value for calculating storage;
Updating unit, for updating "current" model parameter according to the mean value.
The present invention also provides a kind of data parallel of deep neural network model training equipment, comprising:
Memory, for storing computer program;
Processor realizes the data of deep neural network model as described above when for executing the computer program
The step of parallel training method.
In addition, being deposited on the computer readable storage medium the present invention also provides a kind of computer readable storage medium
Computer program is contained, the computer program realizes deep neural network model as described above when being executed by processor
The step of data parallel training method.
The data parallel training method of a kind of deep neural network model provided by the present invention, by according to the pre- of storage
If the gradient data of numerical value updates "current" model parameter, to reduce the oldness problem for calculating the weight parameter that gradient uses;And
And without waiting for all second processors and complete once training, current second processor is completed after training, can be returned immediately
The training data progress next round training for obtaining next batch is returned, reduces the idle waiting time of second processor, improves
The hardware efficiency of training reduces DNN model to improve the whole training effectiveness of the data parallel training of DNN model
Training required total training time, the user experience is improved.In addition, the present invention also provides a kind of deep neural network models
Data parallel training device, equipment and computer readable storage medium equally have above-mentioned beneficial effect.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow diagram of the synchrodata parallel training of DNN model in the prior art;
Fig. 2 is the flow diagram of the asynchronous data parallel training of DNN model in the prior art;
Fig. 3 is a kind of process of the data parallel training method of deep neural network model provided by the embodiment of the present invention
Figure;
Fig. 4 is the stream of the data parallel training method of another kind deep neural network model provided by the embodiment of the present invention
Journey schematic diagram;
Fig. 5 is a kind of structure of the data parallel training device of deep neural network model provided by the embodiment of the present invention
Block diagram.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to FIG. 3, Fig. 3 is a kind of data parallel training of deep neural network model provided by the embodiment of the present invention
The flow chart of method.This method may include:
Step 101: the "current" model parameter and training data of first processor acquisition predetermined depth neural network model.
Wherein, the first processor in this step can be the processing to training the second processor in cluster to be controlled
Device, such as CPU (central processing unit).The depth that predetermined depth neural network model in this step can be trained for needs is refreshing
It can be by designer or user according to practical field for the particular content of predetermined depth neural network model through network model
Scape and user demand self-setting, the present embodiment do not do any restrictions to this.
Corresponding, the training data in this step can be required when being trained to predetermined depth neural network model
Whole training datas, such as training pictures.The concrete mode that training data is obtained for first processor in this step, can adopt
Realize that the present embodiment does not do any limit to this with the same or similar mode of training data acquisition modes in the prior art
System.
It is understood that the current time pair that the "current" model parameter in the present embodiment can obtain for first processor
Predetermined depth neural network model is trained used model parameter (weight parameter), i.e., newest model parameter.Also
It is to say, the "current" model parameter in this step can not utilize second processor to predetermined depth nerve net for first processor
Under the original state that network model is trained, obtained original model parameter.As first processor can carry out mould in this step
Shape parameter initialization, obtains original model parameter ("current" model parameter).
Step 102: obtaining each current corresponding current training data of training data using training data, and to every
A second processor sends "current" model parameter and corresponding current training data, so that each second processor is according to working as
Preceding model parameter and corresponding current training data are trained predetermined depth neural network model;Wherein, at second
The quantity for managing device is greater than or equal to 2.
Wherein, the second processor in this step can in training set group for predetermined depth neural network model into
The processor of row training, such as GPU (graphics processor).It, can be by setting for the particular number of the second processor in this step
Meter personnel's self-setting, as long as guaranteeing that second processor quantity is greater than or equal to 2, the present embodiment does not do any restrictions to this.
It is understood that the purpose of this step can be first processor under original state by each second processing
Device sends original model parameter ("current" model parameter) and the corresponding current training data of each second processor, makes multiple
Second processor can be simultaneously using original model parameter and corresponding current training data to predetermined depth neural network
Model is trained.
Corresponding, each corresponding current training data of second processor in this step, can be first processor
The each second processor obtained from whole training datas works as previous training data (part training data) to be treated, i.e.,
The training data of one batch.That is, this step can also include that first processor obtains each second processor respectively
Corresponding current training data.The tool of the corresponding current training data of each second processor is obtained for first processor
Body mode can such as be used identical as training data selection in the prior art by designer or user's self-setting
Or similar mode realizes that the present embodiment does not do any restrictions to this.
Specifically, the present embodiment can also include after this step each second processor according to "current" model parameter and each
Self-corresponding current training data is trained predetermined depth neural network model, obtains the step of corresponding gradient data
Suddenly, that is, start each second processor and start model training iteration.For each second processor according to "current" model parameter and
The concrete mode that corresponding current training data is trained predetermined depth neural network model, can be by designer
Self-setting can train similar mode realize using with deep neural network model in the prior art, as shown in figure 4,
Calculating on each GPU (second processor) may include: 1. to check and obtain updated model parameter ("current" model parameter);②
The training data (current training data) for obtaining a batch from CPU (first processor) is inputted as network model;3. carry out
Propagated forward calculates predicted value, and obtains loss (penalty values) using the label in predicted value and training data, 4. carries out reversed
It propagates and calculates gradient, obtain the calculated result (gradient data) for undated parameter ("current" model parameter);Calculated result is sent
It is saved to CPU memory, returns to 1. progress next round calculating immediately after, rather than as synchrodata needs to wait for parallel
All GPU complete further to play return 1. after front-wheel calculates.
That is, current second processor "current" model parameter and current training data based on the received, are preset
The propagated forward of deep neural network model, is calculated penalty values;According to penalty values, predetermined depth neural network model is carried out
Backpropagation, gradient data is calculated, and send first processor for gradient data.Wherein, current second processor
It can be any one second processor in multiple second processors.
It should be noted that all second processors, which can receive, to be worked as in order to guarantee under original state in this step
Preceding model parameter and corresponding current training data, start to be trained predetermined depth neural network model, therefore to
Each second processor sends "current" model parameter and corresponding current training data;It can also be to part in this step
Two processors send "current" model parameter and corresponding current training data, as updated ladder used in "current" model parameter
When the quantity (default value) of degree evidence is less than the quantity of second processor, i.e. m in Fig. 4 is less than n, can be first to default value
Part second processor or send "current" model parameter greater than the part second processor of default value and corresponding work as
Preceding training data, the present embodiment do not do any restrictions to this.
Step 103: in not up to default termination condition, receiving and storing the gradient number that current second processor returns
According to judging whether the quantity of gradient data of storage reaches default value;Wherein, default value is greater than or equal to 2, current second
Processor is any second processor.
It is understood that step 103 to step 105 can not up to preset termination condition to determine in first processor
When the step of Shi Jinhang, the i.e. model parameter of predetermined depth neural network model, training was not completed, what first processor was carried out
Training Control step.It, can be by designer or use for presetting the specific setting and method of determination of termination condition in this step
Family is according to practical scene and user demand self-setting, as can be using training with deep neural network model in the prior art
Termination condition is arranged the same or similar mode and realizes that the present embodiment does not do any restrictions to this.
It is corresponding, what the purpose of this step can be returned for first processor by receiving and storing current second processor
Gradient data is stored and is counted to the gradient data that each second processor returns, thus the quantity of the gradient data in storage
When reaching default value (degree of parallelism parameter), model parameter is updated by step 105.That is, taking elder generation in the present embodiment
To the principle first deposited, it is not required that the gradient data of the default value of storage is from different second processors.
Specifically, the specific value for the default value in this step is arranged, it can be by designer or user voluntarily
The quantity that default value is less than or equal to second processor such as can be set, as long as guaranteeing that default value is greater than or equal in setting
2, the present embodiment does not do any restrictions to this.
Further, in order to facilitate the use of user, default value can be arranged in first processor automatically in the present embodiment,
I.e. first processor can according to the quantity of second processor be arranged default value, such as can directly by default value be set as with
The identical numerical value of the quantity of second processor.
It should be noted that storing the gradient data that current second processor returns for first processor in this step
Concrete mode, i.e., the storage location for the gradient data that each second processor returns, such as can may be used by designer's self-setting
The gradient data that current second processor returns can be cached in the memory of itself with first processor, i.e. first processor can
To open up the default memory space of the gradient data for storing default value in memory;It can also be deposited in other memories
Store up the gradient data that current second processor returns.The present embodiment does not do any restrictions to this.
It is corresponding, if first processor caches the gradient data that current second processor returns, this implementation in memory
Method provided by example can also include the steps that opening up memory space in the memory of first processor, such as first processor root
According to the model parameter size of default value and predetermined depth neural network model, default memory space is opened up in memory;Wherein,
Default memory space is used to store the gradient data of default value.As shown in figure 4, CPU (first processor) is in initialization model
It can be calculated in CPU memory first according to the degree of parallelism parameter m (default value) and model parameter size set before parameter
Memory space (spatial cache) size that need to be opened up opens up the memory space and is used to cache m gradient data.
Step 104: using the corresponding next training data of the current second processor obtained using training data as current
The corresponding current training data of second processor, and by the corresponding current trained number of "current" model parameter and current second processor
According to being sent to current second processor.
It is understood that the purpose of this step can be not up to for first processor in the quantity of the gradient data of storage
The training data (next training data) and current mould for handling current second processor next time when default value
Shape parameter is sent to current second processor, so that current second processor can continue next round training, avoids existing same
The case where waiting in step data parallel training.
Wherein, next training data in this step can be obtained from whole training datas current for first processor
Second processor training data (part training data) to be treated next time.Corresponding, this step can also include processing
Device obtains the step of current second processor corresponding next training data.
Specifically, if can to continue return step 103 etc. to be received next for not up to default termination condition after this step
The gradient data that a current second processor returns.
Step 105: "current" model parameter being updated according to the gradient data of the default value of storage, deletes the gradient number of storage
According to, and enter step 104.
It is understood that the purpose of this step can reach pre- in the quantity of the gradient data of storage for first processor
If updating "current" model parameter when numerical value using the gradient data of the default value of storage, and entering step 104, make current the
Two processors can use updated "current" model parameter and next training data continues the training of next round.
Specifically, updating "current" model using the gradient data of the default value of storage for first processor in this step
The concrete mode of parameter can such as be used and model parameter update side in the prior art by designer's self-setting
The same or similar mode of method is realized, such as first processor can first calculate the equal of the gradient data of the default value of storage
Value;"current" model parameter is being updated according to the mean value.As long as the gradient data that can use default value updates "current" model ginseng
Number, the present embodiment do not do any restrictions to this.
It should be noted that can directly delete storage after first processor has updated "current" model parameter in this step
Gradient data, if CPU empties the spatial cache for caching m (default value) a gradient data in Fig. 4, in order to after renewing
Gradient data needed for storage updates "current" model parameter next time.The gradient data of storage can also be moved into other storages
Device, when the training process to predetermined depth neural network model such as being needed to analyze, CPU can be first by the m of caching in Fig. 4
Gradient data backups to other memories, then deletes the m gradient data cached in spatial cache.The present embodiment does not appoint this
What is limited.
Step 106: when reaching default termination condition, parallel training result being determined according to "current" model parameter.
It is understood that the purpose of this step can be to determine predetermined depth neural network model in first processor
When training reaches default termination condition, determine that the training of predetermined depth neural network model finally obtains using "current" model parameter
Model parameter (training result).
Specifically, the concrete mode for determining parallel training result in this step according to "current" model parameter, Ke Yiyou
Designer or user's self-setting can be such as configured according to the setting of default termination condition correspondence, such as the first processing
Device can be directly using "current" model parameter as parallel training result;First processor can also according to the preset quantity of storage or
Less than the gradient data of preset quantity, "current" model parameter is updated, and using updated "current" model parameter as parallel training
As a result.The present embodiment does not do any restrictions to this.
In the present embodiment, the embodiment of the present invention updates "current" model ginseng by the gradient data of the default value according to storage
Number, to reduce the oldness problem for calculating the weight parameter that gradient uses;And it is complete to without waiting for all second processors
At primary training, after current second processor completes training, the training data progress for obtaining next batch can be returned immediately
Next round training, reduces the idle waiting time of second processor, improves trained hardware efficiency, to improve DNN
The whole training effectiveness of the data parallel training of model, reduces total training time needed for DNN model training, improves user
Experience.
Referring to FIG. 5, Fig. 5 is a kind of data parallel training of deep neural network model provided by the embodiment of the present invention
The structural block diagram of device.The apparatus may include:
Module 10 is obtained, for obtaining the "current" model parameter and training data of predetermined depth neural network model;
First sending module 20, for obtaining the corresponding current training of each current training data using training data
Data, and "current" model parameter and corresponding current training data are sent to each second processor, so that each second
Processor is trained predetermined depth neural network model according to "current" model parameter and corresponding current training data;
Wherein, the quantity of second processor is greater than or equal to 2;
Judgment module 30 is stored, for current second processor being received and stored and being returned in not up to default termination condition
The gradient data returned, judges whether the quantity of the gradient data of storage reaches default value;Wherein, default value is greater than or equal to
2, current second processor is any second processing;
Second sending module 40, if for not up to default value, at obtained using training data current second
The corresponding next training data of device is managed as the corresponding current training data of current second processor, and by "current" model parameter and
The current corresponding current training data of second processor is sent to current second processor;
Update module 50, if being updated according to the gradient data of the default value of storage current for reaching default value
Model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module;
Determining module 60, for determining parallel training result according to "current" model parameter when reaching default termination condition.
Optionally, which can also include:
Module is opened up in storage, for the model parameter size according to default value and predetermined depth neural network model,
Default memory space is opened up in memory;Wherein, the gradient data that memory space is used to store default value is preset.
Optionally, update module 50 may include:
Average calculation unit, the mean value of the gradient data of the default value for calculating storage;
Updating unit, for updating "current" model parameter according to mean value.
In the present embodiment, the embodiment of the present invention is updated by update module 50 according to the gradient data of the default value of storage
"current" model parameter, to reduce the oldness problem for calculating the weight parameter that gradient uses;And without waiting for all second
Processor completes once training can return to the instruction for obtaining next batch after current second processor completes training immediately
Practice data and carry out next round training, reduces the idle waiting time of second processor, improve trained hardware efficiency, thus
The whole training effectiveness for improving the data parallel training of DNN model, reduces total training time needed for DNN model training,
The user experience is improved.
The embodiment of the invention also provides a kind of data parallel of deep neural network model training equipment, comprising: storage
Device, for storing computer program;Processor realizes the depth as provided by above-described embodiment when for executing computer program
The step of data parallel training method of neural network model.
In addition, the embodiment of the invention also provides a kind of computer readable storage medium, on computer readable storage medium
It is stored with computer program, the deep neural network as provided by above-described embodiment is realized when computer program is executed by processor
The step of data parallel training method of model.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities
The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment, set
For standby and computer readable storage medium, since it is corresponded to the methods disclosed in the examples, so be described relatively simple,
Reference may be made to the description of the method.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Above to a kind of data parallel training method, device, the equipment of deep neural network model provided by the present invention
And computer readable storage medium is described in detail.Specific case used herein is to the principle of the present invention and embodiment party
Formula is expounded, and the above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should refer to
It out, for those skilled in the art, without departing from the principle of the present invention, can also be to the present invention
Some improvement and modification can also be carried out, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Claims (10)
1. a kind of data parallel training method of deep neural network model characterized by comprising
The "current" model parameter and training data of first processor acquisition predetermined depth neural network model;
Each current corresponding current training data of training data is obtained using the training data, and at each second
It manages device and sends "current" model parameter and corresponding current training data, so that each second processor is according to current mould
Shape parameter and corresponding current training data are trained the predetermined depth neural network model;Wherein, described
The quantity of two processors is greater than or equal to 2;
In not up to default termination condition, the gradient data that current second processor returns is received and stored, judges storage
Whether the quantity of the gradient data reaches default value;Wherein, the default value is greater than or equal to 2, current second processing
Device is any second processing;
If it is not, then using the corresponding next training data of the current second processor obtained using the training data as current
The corresponding current training data of two processors, and by "current" model parameter and the corresponding current training data of current second processor
It is sent to current second processor;
If so, updating "current" model parameter according to the gradient data of the default value of storage, the ladder of storage is deleted
Degree evidence, and execute it is described using the corresponding next training data of the current second processor obtained using the training data as
The current corresponding current training data of second processor, and by "current" model parameter and the corresponding current instruction of current second processor
Practice the step of data are sent to current second processor;
When reaching the default termination condition, parallel training result is determined according to "current" model parameter.
2. the data parallel training method of deep neural network model according to claim 1, which is characterized in that described pre-
If numerical value is less than or equal to the quantity of the second processor.
3. the data parallel training method of deep neural network model according to claim 1, which is characterized in that it is described to
Each second processor is sent before "current" model parameter and corresponding current training data, further includes:
According to the model parameter size of the default value and the predetermined depth neural network model, open up in memory default
Memory space;Wherein, the default memory space is used to store the gradient data of the default value.
4. the data parallel training method of deep neural network model according to claim 1, which is characterized in that described
"current" model parameter is updated according to the gradient data of the default value of storage, comprising:
Calculate the mean value of the gradient data of the default value of storage;
"current" model parameter is updated according to the mean value.
5. the data parallel training method of deep neural network model according to any one of claims 1 to 4, feature exist
In, it is described send "current" model parameter and corresponding current training data to each second processor after, further includes:
Current second processor "current" model parameter and current training data based on the received, carry out the predetermined depth nerve net
The propagated forward of network model, is calculated penalty values;
According to the penalty values, the backpropagation of the predetermined depth neural network model is carried out, the gradient number is calculated
According to, and the first processor is sent by the gradient data.
6. a kind of data parallel training device of deep neural network model characterized by comprising
Module is obtained, for obtaining the "current" model parameter and training data of predetermined depth neural network model;
First sending module, for obtaining the corresponding current trained number of each current training data using the training data
According to and "current" model parameter and corresponding current training data being sent to each second processor, so that each described the
Two processors according to "current" model parameter and corresponding current training data to the predetermined depth neural network model into
Row training;Wherein, the quantity of the second processor is greater than or equal to 2;
Judgment module is stored, in not up to default termination condition, receiving and storing the ladder that current second processor returns
Degree evidence, judges whether the quantity of the gradient data of storage reaches default value;Wherein, the default value is greater than or waits
In 2, current second processor is any second processing;
Second sending module, if being used for the not up to described default value, current second will obtained using the training data
The corresponding next training data of processor is as the corresponding current training data of current second processor, and by "current" model parameter
Current training data corresponding with current second processor is sent to current second processor;
Update module, if being worked as reaching the default value according to the update of the gradient data of the default value of storage
Preceding model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module;
Determining module, for determining parallel training result according to "current" model parameter when reaching the default termination condition.
7. the data parallel training device of deep neural network model according to claim 6, which is characterized in that also wrap
It includes:
Module is opened up in storage, big for the model parameter according to the default value and the predetermined depth neural network model
It is small, default memory space is opened up in memory;Wherein, the default memory space is used to store the ladder of the default value
Degree evidence.
8. the data parallel training device of deep neural network model according to claim 6, which is characterized in that it is described more
New module, comprising:
Average calculation unit, the mean value of the gradient data of the default value for calculating storage;
Updating unit, for updating "current" model parameter according to the mean value.
9. a kind of data parallel training equipment of deep neural network model characterized by comprising
Memory, for storing computer program;
Processor realizes such as Claims 1-4 described in any item deep neural networks when for executing the computer program
The step of data parallel training method of model.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes such as Claims 1-4 described in any item deep neural network moulds when the computer program is executed by processor
The step of data parallel training method of type.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910672272.9A CN110378472A (en) | 2019-07-24 | 2019-07-24 | A kind of data parallel training method, device and the equipment of deep neural network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910672272.9A CN110378472A (en) | 2019-07-24 | 2019-07-24 | A kind of data parallel training method, device and the equipment of deep neural network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110378472A true CN110378472A (en) | 2019-10-25 |
Family
ID=68255519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910672272.9A Pending CN110378472A (en) | 2019-07-24 | 2019-07-24 | A kind of data parallel training method, device and the equipment of deep neural network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110378472A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942138A (en) * | 2019-11-13 | 2020-03-31 | 华中科技大学 | Deep neural network training method and system in hybrid memory environment |
CN111210022A (en) * | 2020-01-09 | 2020-05-29 | 深圳前海微众银行股份有限公司 | Backward model selection method, device and readable storage medium |
CN111461293A (en) * | 2020-03-17 | 2020-07-28 | 湖南大学 | Deep neural network model training method and device based on GPU and computer equipment |
CN111626434A (en) * | 2020-05-15 | 2020-09-04 | 浪潮电子信息产业股份有限公司 | Distributed training parameter updating method, device, equipment and storage medium |
CN111860828A (en) * | 2020-06-15 | 2020-10-30 | 北京仿真中心 | Neural network training method, storage medium and equipment |
CN111898424A (en) * | 2020-06-19 | 2020-11-06 | 贝壳技术有限公司 | Character recognition model training method and device, electronic equipment and storage medium |
CN112631775A (en) * | 2020-12-24 | 2021-04-09 | 北京百度网讯科技有限公司 | Model training method and device, electronic equipment and computer readable storage medium |
CN113065666A (en) * | 2021-05-11 | 2021-07-02 | 海南善沙网络科技有限公司 | Distributed computing method for training neural network machine learning model |
WO2021136065A1 (en) * | 2019-12-30 | 2021-07-08 | 中兴通讯股份有限公司 | Deep learning method and apparatus, network device, and readable storage medium |
CN113706390A (en) * | 2021-10-29 | 2021-11-26 | 苏州浪潮智能科技有限公司 | Image conversion model training method, image conversion method, device and medium |
WO2021244354A1 (en) * | 2020-06-03 | 2021-12-09 | 上海商汤智能科技有限公司 | Training method for neural network model, and related product |
CN114327399A (en) * | 2021-11-25 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Distributed training method, apparatus, computer device, storage medium and product |
WO2022228060A1 (en) * | 2021-04-29 | 2022-11-03 | 华为技术有限公司 | Data processing method, apparatus, and system |
CN115345285A (en) * | 2022-10-18 | 2022-11-15 | 北京白海科技有限公司 | GPU-based timing chart neural network training method and system and electronic equipment |
CN116663639A (en) * | 2023-07-31 | 2023-08-29 | 浪潮电子信息产业股份有限公司 | Gradient data synchronization method, system, device and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106297774A (en) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | The distributed parallel training method of a kind of neutral net acoustic model and system |
CN106779093A (en) * | 2017-01-06 | 2017-05-31 | 中国科学院上海高等研究院 | Distributed machines learning training method and its system based on sliding window sampling |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
CN108122027A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of training method of neural network model, device and chip |
CN108182469A (en) * | 2017-12-27 | 2018-06-19 | 郑州云海信息技术有限公司 | A kind of neural network model training method, system, device and storage medium |
CN108805292A (en) * | 2017-05-05 | 2018-11-13 | 英特尔公司 | For the instant deep learning in the machine learning of autonomous machine |
CN109600255A (en) * | 2018-12-04 | 2019-04-09 | 中山大学 | A kind of parameter server optimization algorithm of decentralization |
CN109902818A (en) * | 2019-01-15 | 2019-06-18 | 中国科学院信息工程研究所 | A kind of distributed accelerated method and system towards deep learning training mission |
-
2019
- 2019-07-24 CN CN201910672272.9A patent/CN110378472A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106297774A (en) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | The distributed parallel training method of a kind of neutral net acoustic model and system |
CN108122027A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of training method of neural network model, device and chip |
CN106779093A (en) * | 2017-01-06 | 2017-05-31 | 中国科学院上海高等研究院 | Distributed machines learning training method and its system based on sliding window sampling |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
CN108805292A (en) * | 2017-05-05 | 2018-11-13 | 英特尔公司 | For the instant deep learning in the machine learning of autonomous machine |
CN108182469A (en) * | 2017-12-27 | 2018-06-19 | 郑州云海信息技术有限公司 | A kind of neural network model training method, system, device and storage medium |
CN109600255A (en) * | 2018-12-04 | 2019-04-09 | 中山大学 | A kind of parameter server optimization algorithm of decentralization |
CN109902818A (en) * | 2019-01-15 | 2019-06-18 | 中国科学院信息工程研究所 | A kind of distributed accelerated method and system towards deep learning training mission |
Non-Patent Citations (2)
Title |
---|
WEI ZHANG ET AL.: "Staleness-aware Async-SGD for Distributed Deep Learning", 《ARXIV》 * |
陈孟强等: "基于HPC环境的深度学习并行优化", 《计算机工程与科学》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942138A (en) * | 2019-11-13 | 2020-03-31 | 华中科技大学 | Deep neural network training method and system in hybrid memory environment |
CN110942138B (en) * | 2019-11-13 | 2022-02-15 | 华中科技大学 | Deep neural network training method and system in hybrid memory environment |
WO2021136065A1 (en) * | 2019-12-30 | 2021-07-08 | 中兴通讯股份有限公司 | Deep learning method and apparatus, network device, and readable storage medium |
CN111210022A (en) * | 2020-01-09 | 2020-05-29 | 深圳前海微众银行股份有限公司 | Backward model selection method, device and readable storage medium |
CN111210022B (en) * | 2020-01-09 | 2024-05-17 | 深圳前海微众银行股份有限公司 | Backward model selecting method, apparatus and readable storage medium |
CN111461293A (en) * | 2020-03-17 | 2020-07-28 | 湖南大学 | Deep neural network model training method and device based on GPU and computer equipment |
CN111461293B (en) * | 2020-03-17 | 2023-06-06 | 湖南大学 | Deep neural network model training method and device based on GPU and computer equipment |
CN111626434B (en) * | 2020-05-15 | 2022-06-07 | 浪潮电子信息产业股份有限公司 | Distributed training parameter updating method, device, equipment and storage medium |
CN111626434A (en) * | 2020-05-15 | 2020-09-04 | 浪潮电子信息产业股份有限公司 | Distributed training parameter updating method, device, equipment and storage medium |
WO2021244354A1 (en) * | 2020-06-03 | 2021-12-09 | 上海商汤智能科技有限公司 | Training method for neural network model, and related product |
CN111860828A (en) * | 2020-06-15 | 2020-10-30 | 北京仿真中心 | Neural network training method, storage medium and equipment |
CN111860828B (en) * | 2020-06-15 | 2023-11-28 | 北京仿真中心 | Neural network training method, storage medium and equipment |
CN111898424A (en) * | 2020-06-19 | 2020-11-06 | 贝壳技术有限公司 | Character recognition model training method and device, electronic equipment and storage medium |
CN112631775A (en) * | 2020-12-24 | 2021-04-09 | 北京百度网讯科技有限公司 | Model training method and device, electronic equipment and computer readable storage medium |
WO2022228060A1 (en) * | 2021-04-29 | 2022-11-03 | 华为技术有限公司 | Data processing method, apparatus, and system |
CN113065666A (en) * | 2021-05-11 | 2021-07-02 | 海南善沙网络科技有限公司 | Distributed computing method for training neural network machine learning model |
CN113706390A (en) * | 2021-10-29 | 2021-11-26 | 苏州浪潮智能科技有限公司 | Image conversion model training method, image conversion method, device and medium |
CN114327399A (en) * | 2021-11-25 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Distributed training method, apparatus, computer device, storage medium and product |
CN115345285A (en) * | 2022-10-18 | 2022-11-15 | 北京白海科技有限公司 | GPU-based timing chart neural network training method and system and electronic equipment |
CN116663639A (en) * | 2023-07-31 | 2023-08-29 | 浪潮电子信息产业股份有限公司 | Gradient data synchronization method, system, device and medium |
CN116663639B (en) * | 2023-07-31 | 2023-11-03 | 浪潮电子信息产业股份有限公司 | Gradient data synchronization method, system, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378472A (en) | A kind of data parallel training method, device and the equipment of deep neural network model | |
CN108280514B (en) | FPGA-based sparse neural network acceleration system and design method | |
CN106295799B (en) | A kind of implementation method of deep learning multilayer neural network | |
WO2021164250A1 (en) | Turbulence field update method and apparatus, and related device | |
CN108932548A (en) | A kind of degree of rarefication neural network acceleration system based on FPGA | |
CN110059798A (en) | Develop the sparsity in neural network | |
CN109086867A (en) | A kind of convolutional neural networks acceleration system based on FPGA | |
JP2022130363A (en) | Locality improvement through improvement of machine learning model | |
CN108268638A (en) | A kind of generation confrontation network distribution type implementation method based on Spark frames | |
CN104765589B (en) | Grid parallel computation preprocess method based on MPI | |
CN106155814B (en) | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method | |
CN113158608A (en) | Processing method, device and equipment for determining parameters of analog circuit and storage medium | |
CN108986063A (en) | The method, apparatus and computer readable storage medium of gradient fusion | |
CN108881254A (en) | Intruding detection system neural network based | |
CN115951989B (en) | Collaborative flow scheduling numerical simulation method and system based on strict priority | |
CN108509723A (en) | LRU Cache based on artificial neural network prefetch mechanism performance income evaluation method | |
CN112131206A (en) | Multi-model database OrientDB parameter configuration automatic tuning method | |
CN109739646A (en) | A kind of data processing method and device | |
CN106681830B (en) | A kind of task buffer space monitoring method and apparatus | |
CN116578593A (en) | Data caching method, system, device, computer equipment and storage medium | |
CN113064907B (en) | Content updating method based on deep reinforcement learning | |
CN107305486A (en) | A kind of neutral net maxout layers of computing device | |
CN108898648A (en) | A kind of K line chart building method, system and relevant device | |
CN113132454A (en) | Intelligent network interface controller for caching distributed data | |
CN114912041A (en) | Information processing method, electronic device, and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191025 |
|
RJ01 | Rejection of invention patent application after publication |