CN110378472A

CN110378472A - A kind of data parallel training method, device and the equipment of deep neural network model

Info

Publication number: CN110378472A
Application number: CN201910672272.9A
Authority: CN
Inventors: 曹芳; 郭振华; 王丽
Original assignee: Suzhou Wave Intelligent Technology Co Ltd
Current assignee: Suzhou Wave Intelligent Technology Co Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2019-10-25

Abstract

The invention discloses data parallel training method, device, equipment and the computer readable storage mediums of a kind of deep neural network model, this method comprises: first processor sends "current" model parameter and corresponding current training data to each second processor, so that each second processor is trained predetermined depth neural network model according to "current" model parameter and corresponding current training data；The present invention updates "current" model parameter by the gradient data of the default value according to storage, to reduce the oldness problem for calculating the weight parameter that gradient uses；And it withouts waiting for all second processors and completes once training, after current second processor completes training, the training data progress next round training for obtaining next batch can be returned immediately, to improve the whole training effectiveness that the data parallel of DNN model is trained, total training time needed for DNN model training is reduced, the user experience is improved.

Description

A kind of data parallel training method, device and the equipment of deep neural network model

Technical field

The present invention relates to deep learning model training field, in particular to a kind of data parallel of deep neural network model Training method, device, equipment and computer readable storage medium.

Background technique

With the development of modern society's science and technology, deep neural network (Deep Neural Network, DNN) is obtained extensively General application, including image and visual classification, speech recognition and language translation etc..It is developed more and more widely however as DNN With use, moulded dimension becomes increasing, such as hundreds of layers, a total of 1,000 ten thousand to 2,000 ten thousand parameters.This growth So that efficient model training becomes more important.Given hardware device resources, how in the shorter time Inside being trained restrains model, and reaches higher precision, is always the project being widely studied.

DNN model forms (such as convolutional layer, full articulamentum etc.) by the different types of layer of a system, usually using label figure The data set of picture trains DNN model.Training is made of multiple epoch, and epoch is that the primary of all images is concentrated to change to data Generation.The target of DNN model training is to obtain high-precision model in the shortest possible time, and DNN model training reaches demand standard Total training time needed for exactness is related to hardware efficiency and statistical efficiency.Wherein hardware efficiency is corresponding completes single epoch instruction Practice the required time；Statistical efficiency correspondence reaches epoch quantity required when desired accuracy.Currently, often simultaneously using data Row method is trained DNN model.Data parallel method divides input data to be trained, muti-piece GPU simultaneously The multiple batch of training (criticize) data, operate in every piece of GPU model and are based on same neural network, network structure is the same, Share Model Parameter.

In the prior art, data parallel is divided into that synchrodata is parallel and the parallel two methods of asynchronous data again.Synchrodata After batch data gradient has been calculated in all GPU in parallel method, multiple gradients are combined by statistics, update Share Model ginseng Number (weight parameter), it is similar to use larger batch, as shown in Figure 1；However, although this method can be reduced for calculating ladder The oldness of the weight parameter of degree makes model that can finally reach higher convergence precision, has preferable statistical efficiency, but work as When GPU training model speed is inconsistent, this method needs to calculate faster GPU and other is waited to calculate slower GPU, Deng Daosuo Some GPU go to update weight parameter when calculating completion together, so the just hardware efficiency of significantly lower training, i.e., every training The time of a complete epoch is elongated.And asynchronous data withouts waiting for all GPU parallel and completes primary training, which GPU is completed Training, immediately by gradient updating to sharing model parameters, as shown in Fig. 2, reducing the GPU idle waiting time, to improve Trained hardware efficiency, but since asynchronous parallel has weight parameter oldness used in training process, cause Its statistical efficiency is lower.

Therefore, the whole training effectiveness for how improving the data parallel training of DNN model, reduces DNN model training Required total training time promotes user experience, is urgent problem now.

Summary of the invention

The object of the present invention is to provide a kind of data parallel training method of deep neural network model, device, equipment and Computer readable storage medium improves the whole training effectiveness of the data parallel training of DNN model, reduces DNN model training institute The total training time needed promotes user experience.

In order to solve the above technical problems, the present invention provides a kind of data parallel training method of deep neural network model, Include:

The "current" model parameter and training data of first processor acquisition predetermined depth neural network model；

Each current corresponding current training data of training data is obtained using the training data, and to each the Two processors send "current" model parameter and corresponding current training data, so that each second processor is according to working as Preceding model parameter and corresponding current training data are trained the predetermined depth neural network model；Wherein, institute The quantity for stating second processor is greater than or equal to 2；

In not up to default termination condition, the gradient data that current second processor returns is received and stored, judgement is deposited Whether the quantity of the gradient data of storage reaches default value；Wherein, the default value is greater than or equal to 2, current second Processor is any second processing；

If it is not, then using the corresponding next training data of the current second processor obtained using the training data as working as The corresponding current training data of preceding second processor, and by "current" model parameter and the corresponding current training of current second processor Data are sent to current second processor；

If so, updating "current" model parameter according to the gradient data of the default value of storage, the institute of storage is deleted Gradient data is stated, and executes the corresponding next training data of current second processor that will be obtained using the training data As the corresponding current training data of current second processor, and by "current" model parameter and current second processor is corresponding works as The step of preceding training data is sent to current second processor；

When reaching the default termination condition, parallel training result is determined according to "current" model parameter.

Optionally, the default value is less than or equal to the quantity of the second processor.

Optionally, it is described to each second processor send "current" model parameter and corresponding current training data it Before, further includes:

According to the model parameter size of the default value and the predetermined depth neural network model, open up in memory Default memory space；Wherein, the default memory space is used to store the gradient data of the default value.

Optionally, the gradient data of the default value according to storage updates "current" model parameter, comprising:

Calculate the mean value of the gradient data of the default value of storage；

"current" model parameter is updated according to the mean value.

Optionally, it is described to each second processor send "current" model parameter and corresponding current training data it Afterwards, further includes:

Current second processor "current" model parameter and current training data based on the received, carry out the predetermined depth mind Propagated forward through network model, is calculated penalty values；

According to the penalty values, the backpropagation of the predetermined depth neural network model is carried out, the ladder is calculated Degree evidence, and the first processor is sent by the gradient data.

The present invention also provides a kind of data parallel training devices of deep neural network model, comprising:

Module is obtained, for obtaining the "current" model parameter and training data of predetermined depth neural network model；

First sending module, for obtaining the corresponding current instruction of each current training data using the training data Practice data, and send "current" model parameter and corresponding current training data to each second processor, so that each institute Second processor is stated according to "current" model parameter and corresponding current training data to the predetermined depth neural network mould Type is trained；Wherein, the quantity of the second processor is greater than or equal to 2；

Judgment module is stored, is returned in not up to default termination condition, receiving and storing current second processor Gradient data, judge whether the quantity of the gradient data of storage reaches default value；Wherein, the default value is greater than Or it is equal to 2, current second processor is any second processing；

Second sending module, it is current by being obtained using the training data if being used for the not up to described default value The corresponding next training data of second processor is as the corresponding current training data of current second processor, and by "current" model Parameter and the corresponding current training data of current second processor are sent to current second processor；

Update module, if for reaching the default value, more according to the gradient data of the default value of storage New "current" model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module；

Determining module, for determining parallel training knot according to "current" model parameter when reaching the default termination condition Fruit.

Optionally, the device further include:

Module is opened up in storage, for the model parameter according to the default value and the predetermined depth neural network model Size opens up default memory space in memory；Wherein, the default memory space is for storing the described of the default value Gradient data.

Optionally, the update module, comprising:

Average calculation unit, the mean value of the gradient data of the default value for calculating storage；

Updating unit, for updating "current" model parameter according to the mean value.

The present invention also provides a kind of data parallel of deep neural network model training equipment, comprising:

Memory, for storing computer program；

Processor realizes the data of deep neural network model as described above when for executing the computer program The step of parallel training method.

In addition, being deposited on the computer readable storage medium the present invention also provides a kind of computer readable storage medium Computer program is contained, the computer program realizes deep neural network model as described above when being executed by processor The step of data parallel training method.

The data parallel training method of a kind of deep neural network model provided by the present invention, by according to the pre- of storage If the gradient data of numerical value updates "current" model parameter, to reduce the oldness problem for calculating the weight parameter that gradient uses；And And without waiting for all second processors and complete once training, current second processor is completed after training, can be returned immediately The training data progress next round training for obtaining next batch is returned, reduces the idle waiting time of second processor, improves The hardware efficiency of training reduces DNN model to improve the whole training effectiveness of the data parallel training of DNN model Training required total training time, the user experience is improved.In addition, the present invention also provides a kind of deep neural network models Data parallel training device, equipment and computer readable storage medium equally have above-mentioned beneficial effect.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the flow diagram of the synchrodata parallel training of DNN model in the prior art；

Fig. 2 is the flow diagram of the asynchronous data parallel training of DNN model in the prior art；

Fig. 3 is a kind of process of the data parallel training method of deep neural network model provided by the embodiment of the present invention Figure；

Fig. 4 is the stream of the data parallel training method of another kind deep neural network model provided by the embodiment of the present invention Journey schematic diagram；

Fig. 5 is a kind of structure of the data parallel training device of deep neural network model provided by the embodiment of the present invention Block diagram.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Referring to FIG. 3, Fig. 3 is a kind of data parallel training of deep neural network model provided by the embodiment of the present invention The flow chart of method.This method may include:

Step 101: the "current" model parameter and training data of first processor acquisition predetermined depth neural network model.

Wherein, the first processor in this step can be the processing to training the second processor in cluster to be controlled Device, such as CPU (central processing unit).The depth that predetermined depth neural network model in this step can be trained for needs is refreshing It can be by designer or user according to practical field for the particular content of predetermined depth neural network model through network model Scape and user demand self-setting, the present embodiment do not do any restrictions to this.

Corresponding, the training data in this step can be required when being trained to predetermined depth neural network model Whole training datas, such as training pictures.The concrete mode that training data is obtained for first processor in this step, can adopt Realize that the present embodiment does not do any limit to this with the same or similar mode of training data acquisition modes in the prior art System.

It is understood that the current time pair that the "current" model parameter in the present embodiment can obtain for first processor Predetermined depth neural network model is trained used model parameter (weight parameter), i.e., newest model parameter.Also It is to say, the "current" model parameter in this step can not utilize second processor to predetermined depth nerve net for first processor Under the original state that network model is trained, obtained original model parameter.As first processor can carry out mould in this step Shape parameter initialization, obtains original model parameter ("current" model parameter).

Step 102: obtaining each current corresponding current training data of training data using training data, and to every A second processor sends "current" model parameter and corresponding current training data, so that each second processor is according to working as Preceding model parameter and corresponding current training data are trained predetermined depth neural network model；Wherein, at second The quantity for managing device is greater than or equal to 2.

Wherein, the second processor in this step can in training set group for predetermined depth neural network model into The processor of row training, such as GPU (graphics processor).It, can be by setting for the particular number of the second processor in this step Meter personnel's self-setting, as long as guaranteeing that second processor quantity is greater than or equal to 2, the present embodiment does not do any restrictions to this.

It is understood that the purpose of this step can be first processor under original state by each second processing Device sends original model parameter ("current" model parameter) and the corresponding current training data of each second processor, makes multiple Second processor can be simultaneously using original model parameter and corresponding current training data to predetermined depth neural network Model is trained.

Corresponding, each corresponding current training data of second processor in this step, can be first processor The each second processor obtained from whole training datas works as previous training data (part training data) to be treated, i.e., The training data of one batch.That is, this step can also include that first processor obtains each second processor respectively Corresponding current training data.The tool of the corresponding current training data of each second processor is obtained for first processor Body mode can such as be used identical as training data selection in the prior art by designer or user's self-setting Or similar mode realizes that the present embodiment does not do any restrictions to this.

Specifically, the present embodiment can also include after this step each second processor according to "current" model parameter and each Self-corresponding current training data is trained predetermined depth neural network model, obtains the step of corresponding gradient data Suddenly, that is, start each second processor and start model training iteration.For each second processor according to "current" model parameter and The concrete mode that corresponding current training data is trained predetermined depth neural network model, can be by designer Self-setting can train similar mode realize using with deep neural network model in the prior art, as shown in figure 4, Calculating on each GPU (second processor) may include: 1. to check and obtain updated model parameter ("current" model parameter)；② The training data (current training data) for obtaining a batch from CPU (first processor) is inputted as network model；3. carry out Propagated forward calculates predicted value, and obtains loss (penalty values) using the label in predicted value and training data, 4. carries out reversed It propagates and calculates gradient, obtain the calculated result (gradient data) for undated parameter ("current" model parameter)；Calculated result is sent It is saved to CPU memory, returns to 1. progress next round calculating immediately after, rather than as synchrodata needs to wait for parallel All GPU complete further to play return 1. after front-wheel calculates.

That is, current second processor "current" model parameter and current training data based on the received, are preset The propagated forward of deep neural network model, is calculated penalty values；According to penalty values, predetermined depth neural network model is carried out Backpropagation, gradient data is calculated, and send first processor for gradient data.Wherein, current second processor It can be any one second processor in multiple second processors.

It should be noted that all second processors, which can receive, to be worked as in order to guarantee under original state in this step Preceding model parameter and corresponding current training data, start to be trained predetermined depth neural network model, therefore to Each second processor sends "current" model parameter and corresponding current training data；It can also be to part in this step Two processors send "current" model parameter and corresponding current training data, as updated ladder used in "current" model parameter When the quantity (default value) of degree evidence is less than the quantity of second processor, i.e. m in Fig. 4 is less than n, can be first to default value Part second processor or send "current" model parameter greater than the part second processor of default value and corresponding work as Preceding training data, the present embodiment do not do any restrictions to this.

Step 103: in not up to default termination condition, receiving and storing the gradient number that current second processor returns According to judging whether the quantity of gradient data of storage reaches default value；Wherein, default value is greater than or equal to 2, current second Processor is any second processor.

It is understood that step 103 to step 105 can not up to preset termination condition to determine in first processor When the step of Shi Jinhang, the i.e. model parameter of predetermined depth neural network model, training was not completed, what first processor was carried out Training Control step.It, can be by designer or use for presetting the specific setting and method of determination of termination condition in this step Family is according to practical scene and user demand self-setting, as can be using training with deep neural network model in the prior art Termination condition is arranged the same or similar mode and realizes that the present embodiment does not do any restrictions to this.

It is corresponding, what the purpose of this step can be returned for first processor by receiving and storing current second processor Gradient data is stored and is counted to the gradient data that each second processor returns, thus the quantity of the gradient data in storage When reaching default value (degree of parallelism parameter), model parameter is updated by step 105.That is, taking elder generation in the present embodiment To the principle first deposited, it is not required that the gradient data of the default value of storage is from different second processors.

Specifically, the specific value for the default value in this step is arranged, it can be by designer or user voluntarily The quantity that default value is less than or equal to second processor such as can be set, as long as guaranteeing that default value is greater than or equal in setting 2, the present embodiment does not do any restrictions to this.

Further, in order to facilitate the use of user, default value can be arranged in first processor automatically in the present embodiment, I.e. first processor can according to the quantity of second processor be arranged default value, such as can directly by default value be set as with The identical numerical value of the quantity of second processor.

It should be noted that storing the gradient data that current second processor returns for first processor in this step Concrete mode, i.e., the storage location for the gradient data that each second processor returns, such as can may be used by designer's self-setting The gradient data that current second processor returns can be cached in the memory of itself with first processor, i.e. first processor can To open up the default memory space of the gradient data for storing default value in memory；It can also be deposited in other memories Store up the gradient data that current second processor returns.The present embodiment does not do any restrictions to this.

It is corresponding, if first processor caches the gradient data that current second processor returns, this implementation in memory Method provided by example can also include the steps that opening up memory space in the memory of first processor, such as first processor root According to the model parameter size of default value and predetermined depth neural network model, default memory space is opened up in memory；Wherein, Default memory space is used to store the gradient data of default value.As shown in figure 4, CPU (first processor) is in initialization model It can be calculated in CPU memory first according to the degree of parallelism parameter m (default value) and model parameter size set before parameter Memory space (spatial cache) size that need to be opened up opens up the memory space and is used to cache m gradient data.

Step 104: using the corresponding next training data of the current second processor obtained using training data as current The corresponding current training data of second processor, and by the corresponding current trained number of "current" model parameter and current second processor According to being sent to current second processor.

It is understood that the purpose of this step can be not up to for first processor in the quantity of the gradient data of storage The training data (next training data) and current mould for handling current second processor next time when default value Shape parameter is sent to current second processor, so that current second processor can continue next round training, avoids existing same The case where waiting in step data parallel training.

Wherein, next training data in this step can be obtained from whole training datas current for first processor Second processor training data (part training data) to be treated next time.Corresponding, this step can also include processing Device obtains the step of current second processor corresponding next training data.

Specifically, if can to continue return step 103 etc. to be received next for not up to default termination condition after this step The gradient data that a current second processor returns.

Step 105: "current" model parameter being updated according to the gradient data of the default value of storage, deletes the gradient number of storage According to, and enter step 104.

It is understood that the purpose of this step can reach pre- in the quantity of the gradient data of storage for first processor If updating "current" model parameter when numerical value using the gradient data of the default value of storage, and entering step 104, make current the Two processors can use updated "current" model parameter and next training data continues the training of next round.

Specifically, updating "current" model using the gradient data of the default value of storage for first processor in this step The concrete mode of parameter can such as be used and model parameter update side in the prior art by designer's self-setting The same or similar mode of method is realized, such as first processor can first calculate the equal of the gradient data of the default value of storage Value；"current" model parameter is being updated according to the mean value.As long as the gradient data that can use default value updates "current" model ginseng Number, the present embodiment do not do any restrictions to this.

It should be noted that can directly delete storage after first processor has updated "current" model parameter in this step Gradient data, if CPU empties the spatial cache for caching m (default value) a gradient data in Fig. 4, in order to after renewing Gradient data needed for storage updates "current" model parameter next time.The gradient data of storage can also be moved into other storages Device, when the training process to predetermined depth neural network model such as being needed to analyze, CPU can be first by the m of caching in Fig. 4 Gradient data backups to other memories, then deletes the m gradient data cached in spatial cache.The present embodiment does not appoint this What is limited.

Step 106: when reaching default termination condition, parallel training result being determined according to "current" model parameter.

It is understood that the purpose of this step can be to determine predetermined depth neural network model in first processor When training reaches default termination condition, determine that the training of predetermined depth neural network model finally obtains using "current" model parameter Model parameter (training result).

Specifically, the concrete mode for determining parallel training result in this step according to "current" model parameter, Ke Yiyou Designer or user's self-setting can be such as configured according to the setting of default termination condition correspondence, such as the first processing Device can be directly using "current" model parameter as parallel training result；First processor can also according to the preset quantity of storage or Less than the gradient data of preset quantity, "current" model parameter is updated, and using updated "current" model parameter as parallel training As a result.The present embodiment does not do any restrictions to this.

In the present embodiment, the embodiment of the present invention updates "current" model ginseng by the gradient data of the default value according to storage Number, to reduce the oldness problem for calculating the weight parameter that gradient uses；And it is complete to without waiting for all second processors At primary training, after current second processor completes training, the training data progress for obtaining next batch can be returned immediately Next round training, reduces the idle waiting time of second processor, improves trained hardware efficiency, to improve DNN The whole training effectiveness of the data parallel training of model, reduces total training time needed for DNN model training, improves user Experience.

Referring to FIG. 5, Fig. 5 is a kind of data parallel training of deep neural network model provided by the embodiment of the present invention The structural block diagram of device.The apparatus may include:

Module 10 is obtained, for obtaining the "current" model parameter and training data of predetermined depth neural network model；

First sending module 20, for obtaining the corresponding current training of each current training data using training data Data, and "current" model parameter and corresponding current training data are sent to each second processor, so that each second Processor is trained predetermined depth neural network model according to "current" model parameter and corresponding current training data； Wherein, the quantity of second processor is greater than or equal to 2；

Judgment module 30 is stored, for current second processor being received and stored and being returned in not up to default termination condition The gradient data returned, judges whether the quantity of the gradient data of storage reaches default value；Wherein, default value is greater than or equal to 2, current second processor is any second processing；

Second sending module 40, if for not up to default value, at obtained using training data current second The corresponding next training data of device is managed as the corresponding current training data of current second processor, and by "current" model parameter and The current corresponding current training data of second processor is sent to current second processor；

Update module 50, if being updated according to the gradient data of the default value of storage current for reaching default value Model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module；

Determining module 60, for determining parallel training result according to "current" model parameter when reaching default termination condition.

Optionally, which can also include:

Module is opened up in storage, for the model parameter size according to default value and predetermined depth neural network model, Default memory space is opened up in memory；Wherein, the gradient data that memory space is used to store default value is preset.

Optionally, update module 50 may include:

Updating unit, for updating "current" model parameter according to mean value.

In the present embodiment, the embodiment of the present invention is updated by update module 50 according to the gradient data of the default value of storage "current" model parameter, to reduce the oldness problem for calculating the weight parameter that gradient uses；And without waiting for all second Processor completes once training can return to the instruction for obtaining next batch after current second processor completes training immediately Practice data and carry out next round training, reduces the idle waiting time of second processor, improve trained hardware efficiency, thus The whole training effectiveness for improving the data parallel training of DNN model, reduces total training time needed for DNN model training, The user experience is improved.

The embodiment of the invention also provides a kind of data parallel of deep neural network model training equipment, comprising: storage Device, for storing computer program；Processor realizes the depth as provided by above-described embodiment when for executing computer program The step of data parallel training method of neural network model.

In addition, the embodiment of the invention also provides a kind of computer readable storage medium, on computer readable storage medium It is stored with computer program, the deep neural network as provided by above-described embodiment is realized when computer program is executed by processor The step of data parallel training method of model.

Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment, set For standby and computer readable storage medium, since it is corresponded to the methods disclosed in the examples, so be described relatively simple, Reference may be made to the description of the method.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

Above to a kind of data parallel training method, device, the equipment of deep neural network model provided by the present invention And computer readable storage medium is described in detail.Specific case used herein is to the principle of the present invention and embodiment party Formula is expounded, and the above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should refer to It out, for those skilled in the art, without departing from the principle of the present invention, can also be to the present invention Some improvement and modification can also be carried out, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.

Claims

1. a kind of data parallel training method of deep neural network model characterized by comprising

Each current corresponding current training data of training data is obtained using the training data, and at each second It manages device and sends "current" model parameter and corresponding current training data, so that each second processor is according to current mould Shape parameter and corresponding current training data are trained the predetermined depth neural network model；Wherein, described The quantity of two processors is greater than or equal to 2；

In not up to default termination condition, the gradient data that current second processor returns is received and stored, judges storage Whether the quantity of the gradient data reaches default value；Wherein, the default value is greater than or equal to 2, current second processing Device is any second processing；

If it is not, then using the corresponding next training data of the current second processor obtained using the training data as current The corresponding current training data of two processors, and by "current" model parameter and the corresponding current training data of current second processor It is sent to current second processor；

If so, updating "current" model parameter according to the gradient data of the default value of storage, the ladder of storage is deleted Degree evidence, and execute it is described using the corresponding next training data of the current second processor obtained using the training data as The current corresponding current training data of second processor, and by "current" model parameter and the corresponding current instruction of current second processor Practice the step of data are sent to current second processor；

2. the data parallel training method of deep neural network model according to claim 1, which is characterized in that described pre- If numerical value is less than or equal to the quantity of the second processor.

3. the data parallel training method of deep neural network model according to claim 1, which is characterized in that it is described to Each second processor is sent before "current" model parameter and corresponding current training data, further includes:

4. the data parallel training method of deep neural network model according to claim 1, which is characterized in that described "current" model parameter is updated according to the gradient data of the default value of storage, comprising:

"current" model parameter is updated according to the mean value.

5. the data parallel training method of deep neural network model according to any one of claims 1 to 4, feature exist In, it is described send "current" model parameter and corresponding current training data to each second processor after, further includes:

Current second processor "current" model parameter and current training data based on the received, carry out the predetermined depth nerve net The propagated forward of network model, is calculated penalty values；

According to the penalty values, the backpropagation of the predetermined depth neural network model is carried out, the gradient number is calculated According to, and the first processor is sent by the gradient data.

6. a kind of data parallel training device of deep neural network model characterized by comprising

First sending module, for obtaining the corresponding current trained number of each current training data using the training data According to and "current" model parameter and corresponding current training data being sent to each second processor, so that each described the Two processors according to "current" model parameter and corresponding current training data to the predetermined depth neural network model into Row training；Wherein, the quantity of the second processor is greater than or equal to 2；

Judgment module is stored, in not up to default termination condition, receiving and storing the ladder that current second processor returns Degree evidence, judges whether the quantity of the gradient data of storage reaches default value；Wherein, the default value is greater than or waits In 2, current second processor is any second processing；

Second sending module, if being used for the not up to described default value, current second will obtained using the training data The corresponding next training data of processor is as the corresponding current training data of current second processor, and by "current" model parameter Current training data corresponding with current second processor is sent to current second processor；

Update module, if being worked as reaching the default value according to the update of the gradient data of the default value of storage Preceding model parameter deletes the gradient data of storage, and sends enabling signal to the second sending module；

Determining module, for determining parallel training result according to "current" model parameter when reaching the default termination condition.

7. the data parallel training device of deep neural network model according to claim 6, which is characterized in that also wrap It includes:

Module is opened up in storage, big for the model parameter according to the default value and the predetermined depth neural network model It is small, default memory space is opened up in memory；Wherein, the default memory space is used to store the ladder of the default value Degree evidence.

8. the data parallel training device of deep neural network model according to claim 6, which is characterized in that it is described more New module, comprising:

9. a kind of data parallel training equipment of deep neural network model characterized by comprising

Memory, for storing computer program；

Processor realizes such as Claims 1-4 described in any item deep neural networks when for executing the computer program The step of data parallel training method of model.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes such as Claims 1-4 described in any item deep neural network moulds when the computer program is executed by processor The step of data parallel training method of type.