CN109255439A

CN109255439A - A kind of DNN model training method and device that multiple GPU are parallel

Info

Publication number: CN109255439A
Application number: CN201710564223.4A
Authority: CN
Inventors: 龚轶凡; 靳江明; 苏磊
Original assignee: Beijing Tusimple Future Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2017-07-12
Filing date: 2017-07-12
Publication date: 2019-01-22
Anticipated expiration: 2037-07-12
Also published as: CN109255439B

Abstract

The present invention discloses a kind of DNN model training method and device that multiple GPU are parallel, and existing training precision is low when for solving the problems, such as multiple GPU parallel training DNN models in the prior art.This method comprises: in propagated forward treatment process, to BN input data subset before receiving；It determines global preceding to the equal value set of BN input data；According to before the overall situation to the equal value set of BN input data, to it is preceding carried out to BN input data subset before handle to BN, to BN output data subset before obtaining；In back-propagating treatment process, to BN input data subset after reception；Determine the equal value set of global backward BN input data；It is preceding to BN data mean value set according to the equal value set of global BN input data backward, backward BN input data subset and the overall situation, it is handled to BN after the preceding progress to BN input data subset, obtains the gradient of preceding each data into BN input data subset.

Description

A kind of DNN model training method and device that multiple GPU are parallel

Technical field

The present invention relates to field of information processing, in particular to a kind of multiple graphics processing unit (Graphics Processing Unit, GPU) parallel deep neural network (Deep Neutral Network, DNN) model training method And device.

Background technique

At present in the deep learning that picture is classified and divided, DNN model training will do it.It deposits in the prior art In a kind of method of multiple GPU parallel trainings.By the data (or being global data) of one or more picture according to the number of GPU Amount is divided into multiple data subsets, and this multiple data subset is corresponded to and distributes to multiple GPU, and each GPU is sub using the data of distribution Collection is trained DNN model, to improve the efficiency of training.Specifically during actual treatment, in a cycle of training Interior, a collection of training data (data batch) (such as plurality of pictures) of acquisition according to existing GPU card number, is divided by system The data subset (sub batch) of corresponding number, and data subset is distributed to corresponding GPU card.In training, each piece The DNN model to be trained of the pre-loaded complete set of meeting, reuses the data subset being assigned to and goes to train the DNN on GPU card Model.

Since the data that every piece of GPU is obtained are different, the ladder that different GPU cards train the DNN Model Weight come will lead to Degree has differences.

In this case it will do it mold sync operation, i.e., will train the gradient come on different GPU and carry out reduction merging, Determination obtains identical gradient, then is gone to update the Model Weight on every piece of GPU with the gradient that the reduction merges.

Through the above scheme, the efficiency of more GPU parallel training DNN models improves, but the accuracy decline integrally trained , and when the quantity of GPU is more, accuracy decline is more obvious.

Summary of the invention

In view of the above problems, the present invention provides a kind of DNN model training method and device that multiple GPU are parallel, to Solve the problems, such as that existing training precision is low when multiple GPU parallel training DNN models in the prior art.

According to the one aspect of the application, a kind of model training side DNN that multiple GPU are parallel is provided in some embodiments Method a, comprising: GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, at propagated forward During reason, to normalization (BN) input data subset before receiving；It determines global preceding to the equal value set of BN input data；According to institute State before the overall situation to the equal value set of BN input data, handled before being carried out to the forward direction BN input data subset to BN, before obtaining to BN output data subset；

In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is The gradient set of the forward direction BN output data subset；Determine the equal value set of global backward BN input data；According to the overall situation It is right backward to BN data mean value set before the equal value set of BN input data, the backward BN input data subset and the overall situation The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset Degree.

According to the one aspect of the application, a kind of DNN model training that more GPU are parallel dress is provided in some embodiments It sets, described device is arranged in each GPU of multiple GPU, and described device includes: preceding to normalization (BN) processing unit, is used for In propagated forward treatment process, to BN input data subset before receiving；It determines global preceding to the equal value set of BN input data；Root It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset Forward direction BN output data subset；Backward BN processing unit, is used in back-propagating treatment process, to BN input data after reception Subset, the backward BN input data subset are the gradient set of the forward direction BN output data subset；Determine global backward BN The equal value set of input data；According to the equal value set of the global backward BN input data, the backward BN input data subset and It is described it is global before to BN data mean value set, handled after being carried out to the forward direction BN input data subset to BN, obtain it is described before The gradient of each data into BN input data subset.

Through method and apparatus provided by the embodiments of the present application, when multiple GPU parallel training DNN models, at forward direction BN The equal value set before the overall situation to BN input data is introduced during reason, is introduced after the overall situation into BN treatment process rear to BN The equal value set of input data can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on complete It handled before the mean value situation progress of office data to BN processing and backward BN, obtain carrying out when global data is trained with single GPU Similar overall situation gradient improves training precision, and training precision is low when so as to solve multiple GPU parallel trainings in the prior art The problem of.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.

Fig. 1 a is the schematic diagram of DNN model training multiple GPU parallel in the prior art；

Fig. 1 b is the training precision and testing accuracy figure of DNN model training multiple GPU parallel in the prior art；

Fig. 2 is the flow chart of the parallel DNN model training method of multiple GPU provided by the embodiments of the present application；

Fig. 3 a is a kind of process flow diagram of step 201 in Fig. 2；

Fig. 3 b is a kind of process flow diagram of step 201 in Fig. 2；

Fig. 3 c is a kind of process flow diagram of step 201 in Fig. 2；

Fig. 4 a is a kind of process flow diagram of step 202 in Fig. 2；

Fig. 4 b is a kind of process flow diagram of step 202 in Fig. 2；

Fig. 4 c is a kind of process flow diagram of step 202 in Fig. 2；

Fig. 5 is the structural block diagram of the parallel DNN model training apparatus of multiple GPU provided by the embodiments of the present application；

Fig. 6 is the model training precision figure for implementing method shown in Fig. 2；

Fig. 7 is the tested accuracy figure for implementing method shown in Fig. 2.

Specific embodiment

Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.

In the prior art, when carrying out more GPU parallel training DNN models, the data subset that is assigned to due to each GPU It is a part of global data, is deposited using the gradient that the trained DNN of data subset will lead to the Model Weight that different GPU are trained In difference, the gradient come will be trained on different GPU at this time and carries out reduction merging, obtains identical gradient, then gone with the gradient The Model Weight on each GPU is updated, the precision of training is wanted when will lead to the ratio of precision list GPU of the model of more GPU parallel trainings It is low, and when the quantity of GPU is more, accuracy decline is more obvious.

Present inventor has found the parallel model training side DNN more GPU during solving above-mentioned technical problem In method, normalization (Batch Normalization, BN) layer operate between data, mainly to the number for entering BN layers on the GPU The mean value and variance that total data is calculated according to subset (i.e. BN layers of input data subset) reuse the mean value and variance to data The each data concentrated are normalized.Specific as shown in Figure 1a, multiple GPU include GPU 0, GPU 1 and GPU 2, GPU 0, GPU 1 and GPU 2 is right using the data subset Sub Batch 0, Sub Batch 1 and the Sub Batch 2 that are assigned to respectively Preloading DNN model is trained in it, in forward direction treatment process, carries out BN processing to the data subset for entering BN layers, The gradient come will be trained after backward processing on different GPU and carries out reduction merging, determines and obtains identical gradient, then really with this Surely the gradient obtained goes to update the Model Weight on every piece of GPU.

But during forward direction processing, since the data subset on each GPU is a part of global data, respectively Data between a GPU have differences, and the mean value and variance of the data subset calculated in BN layers are all different, use mean value With variance each data are normalized, the locality of the data on each GPU of further expansion, so as to cause each The calculated gradient direction of GPU is not global descent direction and the low problem of training precision.As shown in Figure 1 b, 3 GPU are simultaneously When capable DNN model training precision (as shown in fine line in Fig. 1 b) carries out DNN model training to global data compared to single GPU Precision (as shown in heavy line in Fig. 1 b) have dropped 7% or so, and 3 GPU parallel testing accuracy (fine dotted line in such as Fig. 1 b It is shown) than single GPU when testing accuracy (as shown in heavy line in Fig. 1 b) have dropped 15% or so.When GPU quantity further increases Added-time, precision can further decline.

For this problem, it in method provided by the embodiments of the present application, when multiple GPU parallel training DNN models, is carrying out During propagated forward processing, to before the overall situation to BN input data determine it is global before to the equal value set of BN input data, and root According to before the overall situation to the equal value set of BN input data, to BN processing before being carried out to the forward direction BN input data subsets for entering BN layer, And rear into BN treatment process, using the gradient set of the preceding data subset to BN processing output to the input of BN as after, And the equal value set of global backward BN input data is determined to global backward BN input data, according to the backward BN input data of the overall situation Equal value set is handled to BN after the preceding progress to BN input data subset, determines the gradient before obtaining to BN input data；Due to The equal value set before the overall situation to BN input data is introduced in forward direction BN treatment process, is introduced rear into BN treatment process The equal value set of global backward BN input data can make up for it GPU and not obtain the defect that total data carries out DNN model training, Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art Training precision low problem when DNN model.

Method and apparatus provided by the embodiments of the present application are described in detail below.

Embodiment one

Referring to fig. 2, the embodiment of the present application provides a kind of DNN model training method that multiple GPU are parallel, the place of this method Managing process includes:

Step 201, a GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, preceding To during dissemination process, to normalization (BN) input data subset before receiving；It determines global preceding to BN input data mean value collection It closes；It is handled before being carried out to the forward direction BN input data subset to BN according to the overall situation is preceding to the equal value set of BN input data, To BN output data subset before obtaining；

Step 202, in back-propagating treatment process, to BN input data subset after reception, the backward BN inputs number It is the gradient set of the forward direction BN output data subset according to subset；Determine the equal value set of global backward BN input data；According to To BN data mean value before the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation Set is handled after carrying out to the forward direction BN input data subset to BN, is obtained each in the forward direction BN input data subset The gradient of data.

Method provided by the present application is preceding to BN input data mean value according to the determining overall situation in propagated forward treatment process Set is handled before the preceding progress to BN input data subset to BN, in back-propagating treatment process, after the determining overall situation To BN data mean value set before to the equal value set of BN input data, backward BN input data subset and the overall situation, to the forward direction BN Input data subset is handled after carrying out to BN, and can make up for it GPU there is no the defect that total data carries out DNN model training, Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art Existing training precision low problem when DNN model.

The forward direction BN processing in propagated forward treatment process and the backward BN in back-propagating treatment process are handled below It is described in detail.

In the embodiment of the present invention, in abovementioned steps 201, can specifically it pass through before determining the overall situation to the equal value set of BN input data But it is not limited only to following two mode to realize:

In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, before determining the overall situation by main GPU To the equal value set of BN input data, and other will be sent to respectively from GPU, from GPU to the equal value set of BN input data before the overall situation It is global preceding to the equal value set of BN input data to no longer need to independent calculating.

Regardless of principal and subordinate, each GPU independently determines global preceding to the equal value set of BN input data by mode 2, multiple GPU.

Based on mode 1, when GPU is main GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding A1~step A4 is realized:

Step A1, main GPU determines the forward direction BN input data subset of the GPU according to the forward direction BN input data subset Equal value set, the equal value set of the forward direction BN input data subset include: the mean value peace of the forward direction BN input data subset Square mean value；

Step A2, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU；

Step A3, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction BN of GPU The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data Value set includes: the mean value and mean value of square before the overall situation to BN input data；

Step A4, it will be sent to before the overall situation to the equal value set of BN input data other respectively from GPU.

Based on mode 1, when GPU is from GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding B1~step B3 is realized:

Step B1, forward direction BN input data from GPU is determined from GPU according to the forward direction BN input data subset Collect equal value set, the equal value set of the forward direction BN input data subset include: the forward direction BN input data subset mean value and Mean value of square；

Step B2, the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU；

Step B3, receive from the main GPU it is described it is global before to the equal value set of BN input data, it is described it is global before to The equal value set of BN input data includes: the mean value and mean value of square before the overall situation to BN input data.

Based on mode 2, can be realized to the equal value set of BN input data by following steps C1~step C3 before determining the overall situation:

Step C1, the forward direction BN input data subset mean value of the GPU is determined according to the forward direction BN input data subset Set, the equal value set of the forward direction BN input data subset include: the mean value of the forward direction BN input data subset and square equal Value；

Step C2, the equal value set of forward direction BN input data subset of the GPU is sent to other GPU；

Step C3, the equal value set of forward direction BN input data subset from other each GPU is received；

Step C4, it is inputted according to the forward direction BN of the equal value set of forward direction BN input data subset of the GPU and other each GPU The equal value set of data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data Value set includes: the mean value and mean value of square before the overall situation to BN input data.

Wherein step A1, step B1 are identical with the implementation of step C1.Wherein mode 1, in mode 2, GPU is according to multiple The equal value set of input data subset of GPU determines that the global preceding mode to the equal value set of BN input data is also identical.

Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step Rapid 201 specific implementation is described in detail, respectively referring to Fig. 3 a, Fig. 3 b and Fig. 3 c.

Fig. 3 a shows the detailed process of step 201 in Fig. 2, including following process flow:

Step 2011, to BN input data subset before receiving, this it is preceding to BN input data subset be in propagated forward processing It is input to BN layers of data subset in the process, is specifically represented by B_i={ x_i,j(j=1,2 ... m_i), B_iIt is i-th for the GPU Forward direction BN input data subset when a GPU, x_i,jFor the data in the forward direction BN input data subset, m_iFor the forward direction The quantity of data in BN input data subset；

Step 2012, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the forward direction BN Input data subset determines the equal value set of forward direction BN input data subset of the main GPU, the forward direction BN input data subset Equal value set includes: the mean value and mean value of square of the forward direction BN input data subset；

It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN input The mean value of data subset, according to formulaDetermine the mean value of square of the forward direction BN input data subset；

Wherein, μ_iThe mean value of forward direction BN input data subset when for the GPU being i-th of GPU, v_iIt is for the GPU The mean value of square of forward direction BN input data subset when i GPU；

In the embodiment in fact of the application, to the equal of BN input data subset before can also being determined according to other methods Value and mean value of square, these methods be for those of ordinary skills it is well known, repeat no more herein；

Step 2013, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU；

Step 2014, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction of GPU The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input data Equal value set includes: the mean value and mean value of square before the overall situation to BN input data；And it will be described global preceding equal to BN input data Value set is sent to other respectively from GPU；

It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN input The mean value of data, according to formulaDetermine the global preceding mean value of square to BN input data；

Wherein, n is the quantity of the multiple GPU, m_iFor the data in the forward direction BN input data subset of i-th of GPU Quantity, μ_iFor the mean value of the forward direction BN input data subset of i-th of GPU, μ is the global preceding mean value to BN input data, v_i For the mean value of square of the forward direction BN input data subset of i-th of GPU, v is the global preceding mean value of square to BN input data；

In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the equal of BN input data Value and mean value of square, those of ordinary skill in the art can select specific algorithm according to specific application scenarios, the application this In be not listed one by one；

Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN Enter the variance of data；

It, can be according to formula σ in some embodiments of the present application²=v- μ²It determines described global preceding to BN input data Variance, wherein σ²For the global preceding variance to BN input data, v and μ are respectively described as shown in step 2014 To the mean value of square and mean value of BN input data before global；

In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the side of BN input data Difference, those skilled in the art can select specific algorithm according to specific application scenarios, and the application is not listed one by one here；

Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset Each data operate before carrying out to BN, to data subset after BN before obtaining；

It, can be according to formula in some embodiments of the present applicationTo forward direction BN input data The each data concentrated operate before carrying out to BN, wherein x as described above_i,jFor the number in the forward direction BN input data subset According to m_iFor the quantity of the data in the forward direction BN input data subset, μ is the global preceding mean value to BN input data, σ² For the global preceding variance to BN input data, ε is fixed minimum nonzero value, prevents the occurrence of removing zero,For institute Data before stating into data subset after BN；

Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN Output data subset.

It, can be according to formula in some embodiments of the present applicationTo data after the forward direction BN Each data are concentrated to carry out offset operation, wherein γ, β are offset parameter,For the number in data subset after the forward direction BN According to y_i,jFor the data in the forward direction BN output data subset.

In above-mentioned propagated forward treatment process, GPU determines that the overall situation is preceding to BN input data mean value and mean value of square, to preceding To the progress of BN input data subset based on being handled before the overall situation to the BN of BN input data mean value and mean value of square, can make up for it GPU does not obtain the defect handled before total data carries out to BN, at BN before capable of being carried out based on the mean value situation of global data Reason.

Treatment process shown in Fig. 3 a describes the forward direction BN processing working principle of the main GPU in multiple GPU, from GPU's The difference of forward direction BN processing working principle and the forward direction BN processing working principle of main GPU is above-mentioned steps 2012-2014, other Treatment process is identical as step 2011 shown in Fig. 3 a and 2015-2017, below with reference to Fig. 3 b to forward direction BN processing place from GPU Reason process is illustrated, and is repeated no more in Fig. 3 b with identical processing step in Fig. 3 a.

Step 2011, to BN input data subset before receiving；

Step 2012 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN Input data subset determines the equal value set of forward direction BN input data subset from GPU；To BN input data subset before determining The method of equal value set is identical as step 2012, and which is not described herein again；

Step 2013 ', the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU；

Step 2014 ', it receives described global preceding to the equal value set of BN input data, the overall situation from the main GPU The equal value set of forward direction BN input data includes: the mean value and mean value of square before the overall situation to BN input data；

In the case where multiple GPU point is master/slave GPU, is determined from main GPU it is global preceding to the equal value set of BN input data, It is preceding to the equal value set of BN input data that overall situation determined by main GPU is received from GPU, can save the process resource from GPU.

In some other embodiment of the application, master/slave GPU can not also be distinguished, before each GPU is independently determined the overall situation To the equal value set of BN input data.The forward direction BN processing working principle of each independent GPU and the forward direction BN of main GPU handle working principle Difference be above-mentioned steps 2012-2014, other treatment processes are identical as step 2011 shown in Fig. 3 a and 2015-2017, It is illustrated below with reference to treatment process of Fig. 3 c to each GPU, is repeated no more in Fig. 3 c with identical processing step in Fig. 3 a.

Step 2011, to BN input data subset before receiving；

Step 2012 " determines that the forward direction BN input data subset of the GPU is equal according to the forward direction BN input data subset Value set；Identical as step 2012 to the method for the equal value set of BN input data subset before determining, which is not described herein again；

The equal value set of identified forward direction BN input data subset is sent to other each GPU by step 2013 "；It receives and From the equal value set of forward direction BN input data subset of other each GPU；

Step 2014 ", according to the forward direction BN of forward direction BN input data the subset equal value set and other each GPU of the GPU The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input number According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data；It determines global preceding to BN input data mean value The method of set is identical as step 2014, and which is not described herein again；

In the case that each GPU in multiple GPU is independent GPU, each GPU is inputted before respectively determining the overall situation respectively to BN Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.

The BN processing of back-propagating treatment process is illustrated below.

In the embodiment of the present invention, in abovementioned steps 202, determine that the global backward equal value set of BN input data can specifically pass through But it is not limited only to following two mode to realize:

In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, after determining the overall situation by main GPU Other are sent to respectively from GPU, from GPU to the equal value set of BN input data, and by the backward equal value set of BN input data of the overall situation No longer need to the equal value set of the independent global backward BN input data of calculating.

Regardless of principal and subordinate, each GPU independently determines the equal value set of global backward BN input data by mode 2, multiple GPU.

Main GPU determines that the global backward equal value set of BN input data can be realized by following step D1~D4 in mode 1:

Step D1, main GPU determines institute according to the backward BN input data subset and the forward direction BN input data subset State the backward equal value set of BN input data subset of main GPU, the backward equal value set of BN input data subset include after to BN Input data subset mean value and forward direction BN gradient calibration data mean value；

Step D2, it receives from other respectively from the backward equal value set of BN input data subset of GPU；

Step D3, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN of GPU The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set；

Step D4, the global backward equal value set of BN input data is sent to other respectively from GPU.

Determine that the global backward equal value set of BN input data can be realized by following step E1~E3 from GPU in mode 1:

Step E1, described true according to the backward BN input data subset and the forward direction BN input data subset from GPU The fixed backward equal value set of BN input data subset from GPU, the backward equal value set of BN input data subset from GPU Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value；

Step E2, the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU；

Step E3, the equal value set of the backward BN input data of the overall situation from the main GPU is received, the global backward BN is defeated Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value.

Each GPU can be realized by following steps F1~F4 in mode 2:

Step F1, the described GPU is determined according to the backward BN input data subset and the forward direction BN input data subset The backward equal value set of BN input data subset of the GPU, after the backward equal value set of BN input data subset of the GPU includes To BN input data subset mean value and forward direction BN gradient calibration data mean value；

Step F2, the backward equal value set of BN input data subset of the GPU is sent to other each GPU；

Step F3, the backward equal value set of BN input data subset from other each GPU is received；

Step F4, it is inputted according to the backward BN of the equal value set of backward BN input data subset of the GPU and other each GPU The equal value set of data subset, determines the equal value set of the global backward BN input data, and the global backward BN input data is equal Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.

Wherein step D1, step E1 are identical with the implementation of step F1.Wherein mode 1, in mode 2, GPU is according to multiple The backward equal value set of BN input data subset of GPU determines the global preceding mode to the equal value set of BN input data also phase Together.

Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step Rapid 202 specific implementation is described in detail, and distinguishes a, Fig. 4 b and Fig. 4 c referring to fig. 4.

Fig. 4 a shows the detailed process of step 202 in Fig. 2, including following process flow:

It step 2021, to BN input data subset is in above-mentioned steps 2017 after this after reception to BN input data subset The gradient set for determining obtained forward direction BN input data subset, is specifically represented by G_iBackward BN input data subset when for the GPU being i-th of GPU,It is scheduled loss function, y_i,jFor the forward direction BN Data in output data subset,For y_i,jGradient namely backward BN input data subset in data；

Step 2022, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the backward BN Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection of the main GPU Close, the backward equal value set of BN input data subset include after to BN input data subset mean value and forward direction BN gradient calibration number According to mean value；

It, can be according to formula in some embodiments of the present applicationDetermine the backward BN input Data subset mean value, whereinBackward BN input data subset mean value when for the GPU being i-th of GPU；

It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN Gradient calibration data mean value, wherein φ_iForward direction BN gradient calibration data mean value when for the GPU being i-th of GPU；

Step 2023, it receives from other respectively from the backward equal value set of BN input data subset of GPU；

Step 2024, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward of GPU The equal value set of BN input data subset determines the equal value set of global backward BN input data, the global backward BN input data Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value；After the overall situation It is sent to the equal value set of BN input data subset other respectively from GPU；

It, can be according to formula in some embodiments of the present applicationDetermine the global backward BN input Data mean value, wherein n is the quantity of the multiple GPU, m_iFor the number of the data of the forward direction BN input data subset of i-th of GPU Amount,For the forward direction BN output data subset gradient mean value of i-th of GPU,For the global backward BN input data mean value；

It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN ladder Spend correction data mean value, wherein φ_iFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN Gradient calibration data mean value；

Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset Gradient；

It, can be according to formula in some embodiments of the present applicationDetermine data after the forward direction BN The gradient of each data in subset, whereinIt is scheduled loss function, γ is offset parameter,For number after the forward direction BN According to sub- intensive dataGradient；

Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined；

In some embodiments of the present application, according to formula It determines The gradient of the global preceding variance to BN input data, wherein σ²For, to the variance of BN input data, ε is to fix before the overall situation Minimum nonzero value, φ be it is described it is global before to BN gradient calibration data mean value,It is described global preceding to BN output data ladder Mean value is spent, γ is offset parameter,For the gradient of the global preceding variance to BN input data；

Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation The global preceding gradient to BN input data mean value is determined according to mean value；

In some embodiments of the present application, according to formulaIt determines described global preceding defeated to BN Enter the gradient of data mean value, wherein σ²For, to the variance of BN input data, ε is fixed minimum nonzero value, and φ is institute before the overall situation It stating before the overall situation to BN gradient calibration data mean value, γ is offset parameter,It is the global preceding ladder to BN input data mean value Degree；

Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset Degree；

In some embodiments of the present application, according to formula Determine the gradient of each data in the forward direction BN input data subset, whereinFor the forward direction determined in above-mentioned steps 2025 After BN in data subset each data gradient, σ²For, to the variance of BN input data, ε is fixed minimum non-zero before the overall situation Value,For in above-mentioned steps 2026 determine the overall situation before to BN input data variance gradient,It is above-mentioned steps 2027 The preceding gradient to BN input data mean value of the overall situation of middle determination,It is the data x in the forward direction BN input data subset_i,j's Gradient.

In above-mentioned back-propagating treatment process, by the gradient of the forward direction BN output data subset exported after the preceding processing to BN Gather the input handled as after to BN, the equal value set of the determining backward BN input data of the overall situation of GPU, based on global preceding to BN input Data mean value set and the global backward equal value set of BN input data, to it is preceding carried out to BN input data subset after at BN Reason can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on the mean value situation of global data The preceding global gradient for handling to BN processing and backward BN, obtaining carrying out similar data when global data is trained with list GPU is carried out, Similar precision when being trained with list GPU to global data can be reached to the precision of model training；To which the application mentions More GPU out parallel DNN model training method is able to solve multiple GPU parallel training DNN models existing in the prior art When the low problem of existing training precision.

Treatment process shown in Fig. 4 a describes the backward BN processing working principle of the main GPU in multiple GPU, from GPU's The difference of backward BN processing working principle and the backward BN processing working principle of main GPU is above-mentioned steps 2022-2024, other Treatment process is identical as step 2011 shown in Fig. 4 a and 2025-2028, below with reference to Fig. 4 b to backward BN processing place from GPU Reason process is illustrated, and is repeated no more in Fig. 4 b with identical processing step in Fig. 4 a.

Step 2021, to BN input data subset after reception；

Step 2022 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection from GPU Close, it is described from the backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN ladder Spend correction data mean value；

Step 2023 ', the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU；

Step 2024 ', the equal value set of the backward BN input data of the overall situation from the main GPU is received, it is described global backward The equal value set of BN input data includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value；

Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset Degree.

In the case where being master/slave GPU for multiple GPU points, the equal value set of global backward BN input data is determined by main GPU, The equal value set of the backward BN input data of the overall situation determined by main GPU is received from GPU, the process resource from GPU can be saved.

In some other embodiment of the application, master/slave GPU can not also be distinguished, after each GPU is independently determined the overall situation To the equal value set of BN input data.The backward BN processing working principle of each independent GPU and the backward BN of main GPU handle working principle Difference be above-mentioned steps 2022-2024, other treatment processes are identical as step 2021 shown in Fig. 4 a and 2025-2028, It is illustrated below with reference to treatment process of Fig. 4 c to each GPU, is repeated no more in Fig. 4 c with identical processing step in Fig. 4 a.

Step 2021, to BN input data subset after reception；

Step 2022 ", the GPU are true according to the backward BN input data subset and the forward direction BN input data subset The backward equal value set of BN input data subset of the fixed GPU, the backward equal value set of BN input data subset of the GPU include Backward BN input data subset mean value and forward direction BN gradient calibration data mean value；

The backward equal value set of BN input data subset of the GPU is sent to other each GPU by step 2023 "；It receives and From the backward equal value set of BN input data subset of other each GPU；

Step 2024 ", according to the backward BN of backward BN input data the subset equal value set and other each GPU of the GPU The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set；

In the case that each GPU in multiple GPU is independent GPU, each GPU respectively determines global backward BN input respectively Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.

On the basis of the processing method shown in Fig. 2 to Fig. 4 c, more GPU provided by the embodiments of the present application parallel DNN model Training method further comprises following processing: according to defeated to the equal value set of BN input data and global backward BN before the overall situation Enter data mean value set, determine the gradient of BN layers of training parameter, the training parameter includes above-mentioned offset parameter γ and β.

It, can be according to formula in some embodiments of the present applicationDetermine the ladder of offset parameter γ Degree, according to public affairsDetermine the gradient of offset parameter β, whereinOffset when for the GPU being i-th of GPU The gradient of parameter γ,The gradient of offset parameter β when for the GPU being i-th of GPU.

After the gradient for determining offset parameter γ and β, gradient and the gradient descent algorithm of the determination can use to update The value of γ and β achievees the purpose that optimize DNN model.

The DNN model training apparatus parallel to multiple GPU provided by the embodiments of the present application is illustrated below, which sets It is placed in each GPU in multiple GPU, which carries out DNN model training to the data subset being assigned to, and Fig. 5 shows the dress The structural block diagram set, the device include: preceding to BN processing unit 51 and backward BN processing unit 52.

Forward direction BN processing unit 51, it is preceding to BN input data subset for receiving in propagated forward treatment process；It determines To the equal value set of BN input data before global；It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN Enter before data subset carries out and handled to BN, to BN output data subset before obtaining；

Wherein, in some embodiments of the present application, the forward direction BN processing unit 51 determines global preceding to BN input number According to equal value set, comprising: in the case where the GPU is the main GPU in multiple GPU, the main GPU is defeated according to the forward direction BN Enter the equal value set of forward direction BN input data subset that data subset determines the GPU, the forward direction BN input data subset mean value Set includes: the mean value and mean value of square of the forward direction BN input data subset；It receives from other respectively defeated from the forward direction BN of GPU Enter the equal value set of data subset；According to the equal value set of forward direction BN input data subset of the main GPU and other respectively before GPU To the equal value set of BN input data subset, determine that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to input number to BN According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data；It will be described global preceding equal to BN input data Value set is sent to other respectively from GPU.

In other embodiments of the application, the forward direction BN processing unit determines global preceding to BN input data mean value Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN input data Subset determines the equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset It include: the mean value and mean value of square of the forward direction BN input data subset；By identified forward direction BN input data subset mean value Gather the main GPU being sent in multiple GPU；Receive from the main GPU it is described it is global before to the equal value set of BN input data, To the mean value and mean value of square of BN input data before including: the overall situation to the equal value set of BN input data before the overall situation.

In other embodiments of the application, the forward direction BN processing unit 51 determines global preceding equal to BN input data Value set, comprising: the forward direction BN input data subset mean value collection of the GPU is determined according to the forward direction BN input data subset It closes, the equal value set of the forward direction BN input data subset includes: the mean value and mean value of square of the forward direction BN input data subset； The equal value set of forward direction BN input data subset of the GPU is sent to other GPU；Receive the forward direction BN from other each GPU The equal value set of input data subset；According to the forward direction of forward direction BN input data the subset equal value set and other each GPU of the GPU The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input Data mean value set includes: the mean value and mean value of square before the overall situation to BN input data.

The forward direction BN processing unit 51 determines the equal value set of forward direction BN input data subset of the GPU, comprising: according to FormulaDetermine the mean value of the forward direction BN input data subset, wherein Β_i={ x_i,j(j=1,2 ... m_i), the forward direction BN input data subset that Bi is the GPU when being i-th of GPU, x_i,jFor in the forward direction BN input data subset Data, mi be the forward direction BN input data subset in data quantity, μ_iForward direction when for the GPU being i-th of GPU The mean value of BN input data subset；According to formulaDetermine square of the forward direction BN input data subset Mean value, wherein ν_iThe mean value of square of forward direction BN input data subset when for the GPU being i-th of GPU.

The forward direction BN processing unit 51 determines global preceding to the equal value set of BN input data, comprising: according to formulaDetermine the global preceding mean value to BN input data, wherein n is the quantity of the multiple GPU, μ_iIt is The mean value of the forward direction BN input data subset of i GPU, m_iFor the number of the data in the forward direction BN input data subset of i-th of GPU Amount, μ are the global preceding mean value to BN input data；According to formulaIt determines described global preceding to BN input number According to mean value of square, wherein ν_iFor the mean value of square of the forward direction BN input data subset of i-th of GPU, ν be it is described it is global before to The mean value of square of BN input data.

The forward direction BN processing unit 51 is handled before carrying out to the forward direction BN input data subset to BN, comprising: according to It is described it is global before to BN input data mean value and mean value of square, to each data in the forward direction BN input data subset into It is operated before row to BN, to data subset after BN before obtaining；Offset behaviour is carried out to each data in data subset after the forward direction BN Make, obtains the forward direction BN output data subset.

The forward direction BN processing unit 51 is grasped before carrying out to each data in the forward direction BN input data subset to BN Make, comprising:

According to formula σ²=ν-μ²Determine it is described it is global before to BN input data variance, wherein ν be it is described it is global before to The mean value of square of BN input data, μ are the global preceding mean value to BN input data, σ²It is described global preceding to BN input number According to variance；

According to formulaIt is grasped before being carried out to each data in the forward direction BN input data subset to BN Make, wherein Β_i={ x_i,j(j=1,2 ... m_i), B_iForward direction BN input data subset when for the GPU being i-th of GPU, x_i,jFor the data in the forward direction BN input data subset, m_iFor the quantity of the data in the forward direction BN input data subset, μ is the global preceding mean value to BN input data, σ²For the global preceding variance to BN input data, ε is fixed pole Small nonzero value,For the data in data subset after the forward direction BN.

The forward direction BN processing unit 51 carries out offset operation, packet to data each in data subset after the forward direction BN It includes: according to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein γ, β are offset parameter,For the data in data subset after the forward direction BN, y_i,jFor the forward direction BN output data subset In data.

The forward direction BN processing unit 51, is also used to: will be sent to it to the equal value set of BN input data before the overall situation Its each GPU；Alternatively, the equal value set of forward direction BN input data subset of the GPU is sent to other each GPU.

Backward BN processing unit 52, is used in back-propagating treatment process, described to BN input data subset after reception Backward BN input data subset be before the forward direction BN processing unit 51 carries out to after BN processing, obtained forward direction BN output data The gradient set of subset；Determine the equal value set of global backward BN input data；According to the global backward BN input data mean value To BN data mean value set before set, the backward BN input data subset and the overall situation, to the forward direction BN input data Subset is handled after carrying out to BN, obtains the gradient of each data in the forward direction BN input data subset.

Wherein, the backward BN processing unit 52 determines the equal value set of global backward BN input data, comprising: described In the case that GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction BN input data Subset determines the backward equal value set of BN input data subset of the main GPU, the backward equal value set of BN input data subset Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value；It receives from other respectively from the backward of GPU The equal value set of BN input data subset；According to the equal value set of backward BN input data subset of the main GPU and other respectively from GPU The backward equal value set of BN input data subset, determine that the global backward equal value set of BN input data, the global backward BN are defeated Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value；It will be described The global backward equal value set of BN input data is sent to other respectively from GPU.

In some embodiments of the present application, the backward BN processing unit 52 determines global backward BN input data mean value Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN input data Subset and the forward direction BN input data subset determine the backward equal value set of BN input data subset from GPU, it is described from The backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN gradient calibration data Mean value；The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU；It receives from described The global equal value set of BN input data backward of main GPU includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient Correction data mean value.

In other embodiments of the application, the backward BN processing unit determines global backward BN input data mean value Set, comprising: the GPU is according to the backward BN input data subset and forward direction BN input data subset determination The backward equal value set of BN input data subset of GPU, the backward equal value set of BN input data subset of the GPU include after to BN Input data subset mean value and forward direction BN gradient calibration data mean value；By the backward BN input data subset mean value collection of the GPU Conjunction is sent to other each GPU；Receive the backward equal value set of BN input data subset from other each GPU；According to the GPU's The backward equal value set of BN input data subset of the backward equal value set of BN input data subset and other each GPU determines described complete The equal value set of the backward BN input data of office, the global backward equal value set of BN input data include: global backward BN input number It is preceding to BN gradient calibration data mean value according to mean value and the overall situation.

The backward BN processing unit 52 determines the backward equal value set of BN input data subset of the GPU, comprising: according to FormulaDetermine the backward BN input data subset mean value, whereinFor Backward BN input data subset when the GPU is i-th of GPU,It is scheduled loss function, y_i,jFor forward direction BN output Data in data subset,For y_i,jGradient,Backward BN input data subset when for the GPU being i-th of GPU Mean value；According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Β_i= {x_i,j(j=1,2 ... m_i) it is forward direction BN input data subset of GPU when being i-th of GPU, x_i,jIt is defeated for the forward direction BN Enter the data in data subset, m_iFor the quantity of the data in the forward direction BN input data subset, φ_iIt is i-th for the GPU Forward direction BN gradient calibration data mean value when a GPU.

The backward BN processing unit 52 determines the equal value set of the global backward BN input data subset, comprising: according to FormulaDetermine the global backward BN input data mean value, wherein n is the quantity of the multiple GPU, m_iFor The quantity of the data of the forward direction BN input data subset of i-th of GPU,For the forward direction BN output data subset gradient of i-th of GPU Mean value,For the global backward BN input data mean value；According to formulaIt determines described global preceding to BN ladder Spend correction data mean value, wherein φ_iFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN Gradient calibration data mean value.

The backward BN processing unit 52 is handled after carrying out to the forward direction BN input data subset to BN, comprising: according to The backward BN input data subset determines the gradient of each data in data subset after the forward direction BN；Before the overall situation To BN gradient calibration data mean value before to the equal value set of BN input data, global backward BN input data mean value and the overall situation, determine The gradient of the global preceding variance to BN input data；According to it is described it is global before to the equal value set of BN input data, described complete The global preceding gradient to BN input data mean value is determined to BN gradient calibration data mean value before office；According to data after the forward direction BN Number is inputted to BN before the gradient of each data, the gradient of the global preceding variance to BN input data, the overall situation in subset According to the gradient of mean value, it is described it is global before to the equal value set of BN input data and it is described it is global before to BN input data mean value, really The gradient of each data in the fixed forward direction BN input data subset.

The backward BN processing unit 52 determines the gradient of each data in data subset after the forward direction BN, It include: according to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU,It is predetermined Loss function, y_i,jFor the data in the forward direction BN output data subset,For y_i,jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.

The backward BN processing unit 52 determines the gradient of the global preceding variance to BN input data, comprising: according to formulaDetermine the gradient of the global preceding variance to BN input data, In, σ²For before the overall situation to the variance of BN input data, σ²=ν-μ², ν is the global preceding mean value of square to BN input data, μ For the global preceding mean value to BN input data, ε is fixed minimum nonzero value, and φ is described global preceding to BN gradient calibration Data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,It is described global preceding to BN The gradient of the variance of input data.

The backward BN processing unit 52 determines the global preceding gradient to BN input data mean value, comprising: according to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ²To be inputted before the overall situation to BN The variance of data, σ²=ν-μ², ν is the global preceding mean value of square to BN input data, and μ is that the overall situation is preceding to be inputted to BN The mean value of data, ε are fixed minimum nonzero value, and φ is that the overall situation is preceding to BN gradient calibration data mean value, and γ is offset ginseng Number,It is the global preceding gradient to BN input data mean value.

The backward BN processing unit 52 determines the gradient of each data in the forward direction BN input data subset, comprising: According to formulaIt determines in the forward direction BN input data subset The gradient of each data, whereinFor the gradient of each data in data subset after the forward direction BN, σ²For before the overall situation to BN The variance of input data, σ²=ν-μ², ν is the global preceding mean value of square to BN input data, and μ is described global preceding to BN The mean value of input data, ε are fixed minimum nonzero value,For it is described it is global before to BN input data variance gradient,It is the global preceding gradient to BN input data mean value,It is the data x in the forward direction BN input data subset_i,j Gradient.

The backward BN processing unit 52, is also used to: according to before the overall situation to the equal value set of BN input data and it is global after To the equal value set of BN input data, determine the gradient of BN layers of training parameter, the BN layers of training parameter include offset parameter γ and β。

The backward BN processing unit 52 can be according to the following formulaDetermine offset parameter γ's Gradient, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is the global preceding mean value to BN input data,It is described global preceding to BN output data gradient mean value, σ²For before the overall situation to the variance of BN input data, σ²=ν-μ², ν is institute State the mean value of square before the overall situation to BN input data, μ be it is described it is global before to BN input data mean value, ε is fixed minimum Nonzero value, m_iThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,It is for the GPU The gradient of offset parameter γ when i-th of GPU.

The backward BN processing unit 52 can be according to the following formulaDetermine the gradient of offset parameter β, Wherein,It is described global preceding to BN output data gradient mean value, m_iForward direction BN when for the GPU being i-th of GPU inputs number According to the quantity of the data of subset,The gradient of offset parameter β when for the GPU being i-th of GPU.

It is equal according to the determining backward BN input data of the overall situation in back-propagating treatment process by device shown in fig. 5 Value set, backward BN input data subset and it is global before to BN data mean value set, to the forward direction BN input data subset into It is handled after row to BN, the problem of incomplete caused data difference of data expands can be further compensated for；So as to obtain and list Consistent data gradient, raising training precision between similar gradient, multiple GPU when a GPU progress global data training, from And when being able to solve multiple GPU parallel training DNN models in the prior art, cause since data gradient is inconsistent between each GPU The low problem of existing training precision.

The feelings of the DNN model training method parallel to multiple GPU provided by the embodiments of the present application in practical applications below Condition is illustrated.

During concrete application, treatment process shown in Fig. 2, Fig. 3 a and Fig. 4 a can be integrated to deep learning instruction Practice in frame MXNet, realizes completely executable technical solution.The system design of MXNet can be divided into C++ layers and Python Layer.C++ layers are mainly responsible for task schedule, and internal memory optimization, calculating system level functions, the Python layers of major function such as graphics-optimized is Complete training process is encapsulated, and the interface interacted with user is provided.In MXNet, traditional Python layers of training process is such as Under:

During actually realizing, C++ layers and Python layers can all be modified, it can be with normal call after modification Python interface, after multiple GPU provided by the embodiments of the present application parallel DNN model training method, Python layers of instruction It is as follows to practice process:

After implementing above-mentioned processing, training precision and testing accuracy can be significantly improved, Fig. 6 shows 3 GPU and instructs parallel The training precision comparative situation of experienced and single GPU training, wherein can find to apply multiple GPU provided by the present application significantly Parallel training method training precision (the parallel global data training precision of 3GPU shown in solid in such as Fig. 6) close to list Training precision (the list GPU training precision as shown in thick dashed line in Fig. 6) when the training of GPU global data, and it is in the prior art It is obvious if training precision (such as the parallel local data's training precision of 3GPU shown in fine dotted line in Fig. 6) when more GPU parallel trainings Training precision when will be lower than the training of list GPU global data.Fig. 7 shows the inspection of 3 GPU parallel trainings and single GPU training Test accuracy comparison situation, wherein the testing accuracy of the parallel training method of application multiple GPU provided by the present application is (empty in such as Fig. 7 The parallel global data testing accuracy of 3GPU shown in line) close to the testing accuracy (in such as Fig. 7 when the training of single GPU global data List GPU testing accuracy shown in heavy line), and testing accuracy when more GPU parallel trainings in the prior art is (thin in such as Fig. 7 The parallel local data's testing accuracy of 3GPU shown in solid) to be then obviously lower than testing accuracy when list GPU global data is trained. As can be seen from the figure the model accuracy of application method provided by the embodiments of the present application training, when can achieve with list GPU training Similar precision, and the model accuracy of the method training of more GPU parallel trainings improves 15% or so than in the prior art.

It is core of the invention thought above, in order to enable those skilled in the art to better understand the present invention in embodiment Technical solution, and keep the above objects, features, and advantages of the embodiment of the present invention more obvious and easy to understand, with reference to the accompanying drawing Technical solution in the embodiment of the present invention is described in further detail.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of deep neural network model training method that multiple graphics processing units are parallel characterized by comprising

A GPU in multiple graphics processing unit GPU is when carrying out DNN model training to the data subset being assigned to, preceding To during dissemination process, to normalization BN input data subset before receiving；It determines global preceding to the equal value set of BN input data； It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset To preceding to BN output data subset；

In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is described The gradient set of forward direction BN output data subset；Determine the equal value set of global backward BN input data；According to described global backward It is right to the equal value set of BN input data before the equal value set of BN input data, the backward BN input data subset and the overall situation The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset Degree.

2. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:

In the case where the GPU is the main GPU in multiple GPU, the main GPU is true according to the forward direction BN input data subset The equal value set of forward direction BN input data subset of the fixed GPU, the equal value set of the forward direction BN input data subset include: described The mean value and mean value of square of forward direction BN input data subset；

It receives from other respectively from the equal value set of forward direction BN input data subset of GPU；

According to the equal value set of forward direction BN input data subset of the main GPU and other respectively from forward direction BN input data of GPU Collect equal value set, determines that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global；

It is other respectively from GPU by being sent to before the overall situation to the equal value set of BN input data.

3. according to the method described in claim 2, it is characterized in that, to the equal value set of BN input data subset before determining, comprising:

According to formulaDetermine the mean value of the forward direction BN input data subset, wherein B_i={ x_{I, j}(j= 1,2 ... m_i), B_iForward direction BN input data subset when for the GPU being i-th of GPU, x_{I, j}For the forward direction BN input data Data in subset, m_iFor the quantity of the data in the forward direction BN input data subset, μ_iWhen for the GPU being i-th of GPU Forward direction BN input data subset mean value；

According to formulaDetermine the mean value of square of the forward direction BN input data subset, wherein v_iIt is described The mean value of square of forward direction BN input data subset when GPU is i-th of GPU.

4. according to the method described in claim 2, it is characterized in that, determining global preceding to the equal value set of BN input data, comprising:

According to formulaDetermine the global preceding mean value to BN input data, wherein n is the multiple GPU's Quantity, μ_iFor the mean value of the forward direction BN input data subset of i-th of GPU, m_iIn forward direction BN input data subset for i-th of GPU Data quantity, μ be it is described it is global before to BN input data mean value；

According to formulaDetermine the global preceding mean value of square to BN input data, wherein v_iFor i-th of GPU Forward direction BN input data subset mean value of square, v be it is described it is global before to BN input data mean value of square.

5. according to the method described in claim 2, it is characterized in that, at BN before being carried out to the forward direction BN input data subset Reason, comprising:

According to the global preceding mean value and mean value of square to BN input data, to every in the forward direction BN input data subset A data operate before carrying out to BN, to data subset after BN before obtaining；

Offset operation operation is carried out to data each in data subset after the forward direction BN, obtains forward direction BN output data Collection.

6. according to the method described in claim 5, it is characterized in that, to each data in the forward direction BN input data subset It is operated before carrying out to BN, comprising:

According to formula σ²=v- μ²The determining global preceding variance to BN input data, wherein v, which is that the overall situation is preceding, inputs number to BN According to mean value of square, μ be it is described it is global before to BN input data mean value, σ²It is described global preceding to the side of BN input data Difference；

According to formulaIt is operated before being carried out to each data in the forward direction BN input data subset to BN, In, B_i={ x_{I, j}(j=1,2 ... m_i), B_iForward direction BN input data subset when for the GPU being i-th of GPU, x_{I, j}For institute Data before stating into BN input data subset, m_iFor the quantity of the data in the forward direction BN input data subset, μ is described To the mean value of BN input data, σ before global²For the global preceding variance to BN input data, ε is fixed minimum non-zero Value,For the data in data subset after the forward direction BN.

7. according to the method described in claim 5, it is characterized in that, being carried out to data each in data subset after the forward direction BN Offset operation, comprising:

According to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein γ, β are offset parameter,For the data in data subset after the forward direction BN, y_{I, j}For the forward direction BN output data subset In data.

8. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:

It is described true according to the forward direction BN input data subset from GPU in the case where the GPU is the slave GPU in multiple GPU The fixed equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset includes: institute To the mean value and mean value of square of BN input data subset before stating；

The equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU；

It is preceding to the equal value set of BN input data to receive the overall situation from the main GPU, it is described global preceding to BN input data Equal value set includes: the mean value and mean value of square before the overall situation to BN input data.

9. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:

The equal value set of forward direction BN input data subset that the GPU is determined according to the forward direction BN input data subset, before described It include: the mean value and mean value of square of the forward direction BN input data subset to the equal value set of BN input data subset；

The equal value set of forward direction BN input data subset of the GPU is sent to other GPU；

Receive the equal value set of forward direction BN input data subset from other each GPU；

Forward direction BN input data subset according to the equal value set of forward direction BN input data subset of the GPU and other each GPU is equal Value set determines that the overall situation is preceding to the equal value set of BN input data, and the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global.

10. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:

In the case where the GPU is the main GPU in multiple GPU, the main GPU according to the backward BN input data subset and The forward direction BN input data subset determines that the backward equal value set of BN input data subset of the main GPU, the backward BN are defeated Enter after the equal value set of data subset includes to BN input data subset mean value and forward direction BN gradient calibration data mean value；

It receives from other respectively from the backward equal value set of BN input data subset of GPU；

According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU Collect equal value set, determines the equal value set of the global backward BN input data, the equal value set of the global backward BN input data To BN gradient calibration data mean value before including: global backward BN input data mean value and being global；

The global backward equal value set of BN input data is sent to other respectively from GPU.

11. according to the method described in claim 10, it is characterized in that, determining that the backward BN input data subset of the GPU is equal Value set, comprising:

According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is pre- Fixed loss function, y_{I, j}For the data in the forward direction BN output data subset,For y_{I, j}Gradient,For the GPU Backward BN input data subset mean value when for i-th of GPU；

According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein B_i={ x_{I, j}} (j=1,2 ... m_i) it is forward direction BN input data subset of GPU when being i-th of GPU, x_{I, j}Number is inputted for the forward direction BN According to the data in subset, m_iFor the quantity of the data in the forward direction BN input data subset, φ_iIt is i-th of GPU for the GPU When forward direction BN gradient calibration data mean value.

12. according to the method described in claim 10, it is characterized in that, determining the global backward BN input data subset mean value Set, comprising:

According to formulaDetermine the global backward BN input data mean value, wherein n is the multiple GPU's Quantity, m_iFor the quantity of the data of the forward direction BN input data subset of i-th of GPU,Number is exported for the forward direction BN of i-th of GPU According to subset gradient mean value,For the global backward BN input data mean value；

According to formulaIt determines described global preceding to BN gradient calibration data mean value, wherein φ_iFor i-th of GPU Forward direction BN gradient calibration data mean value, φ be it is described it is global before to BN gradient calibration data mean value.

13. according to the method described in claim 10, it is characterized in that, to BN after being carried out to the forward direction BN input data subset Processing, comprising:

The gradient of each data in data subset after the forward direction BN is determined according to the backward BN input data subset；

According to preceding to BN gradient to the equal value set of BN input data, global backward BN input data mean value and the overall situation before the overall situation Correction data mean value determines the gradient of the global preceding variance to BN input data；

According to preceding to the determining overall situation of BN gradient calibration data mean value to the equal value set of BN input data, the overall situation before the overall situation The gradient of forward direction BN input data mean value；

According to the ladder of the gradient of each data, the global preceding variance to BN input data in data subset after the forward direction BN Degree, global preceding gradient, the overall situation to BN input data mean value are preceding to the equal value set of BN input data and the overall situation The mean value of forward direction BN input data determines the gradient of each data in the forward direction BN input data subset.

14. according to the method for claim 13, which is characterized in that determine after the forward direction BN each data in data subset Gradient, comprising:

According to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined Loss function, y_{I, j}For the data in the forward direction BN output data subset,For y_{I, j}Gradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.

15. according to the method for claim 13, which is characterized in that determine the ladder of the global preceding variance to BN input data Degree, comprising:

According to formulaDetermine the global preceding variance to BN input data Gradient, wherein σ²For the global preceding variance to BN input data, σ²=v- μ², v is described global preceding to BN input number According to mean value of square, μ be it is described it is global before to BN input data mean value, ε is fixed minimum nonzero value, and φ is the overall situation Forward direction BN gradient calibration data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,For The gradient of the global preceding variance to BN input data.

16. according to the method for claim 13, which is characterized in that determine the global preceding gradient to BN input data mean value, Include:

According to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ²For institute State the variance before the overall situation to BN input data, σ²=v- μ², v is the global preceding mean value of square to BN input data, and μ is institute State the mean value before the overall situation to BN input data, ε is fixed minimum nonzero value, φ be it is described it is global before to BN gradient calibration data Mean value, γ are offset parameter,It is the global preceding gradient to BN input data mean value.

17. according to the method for claim 13, which is characterized in that determine every number in the forward direction BN input data subset According to gradient, comprising:

According to formulaDetermine the forward direction BN input data subset In each data gradient, whereinFor the gradient of each data in data subset after the forward direction BN, σ²For the overall situation The variance of forward direction BN input data, σ²=v- μ², v is the global preceding mean value of square to BN input data, and μ is the overall situation The mean value of forward direction BN input data, ε are fixed minimum nonzero value,For the global preceding variance to BN input data Gradient,It is the global preceding gradient to BN input data mean value,It is the number in the forward direction BN input data subset According to x_{I, j}Gradient.

18. according to the method described in claim 10, it is characterized in that, the method also includes:

According to, to the equal value set of BN input data and the global backward equal value set of BN input data, determining BN layers of training before the overall situation The gradient of parameter, the BN layers of training parameter include offset parameter γ and β.

19. according to the method for claim 18, which is characterized in that according to formulaDetermine offset ginseng The gradient of number γ, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is described global preceding to BN input data Mean value,It is described global preceding to BN output data gradient mean value, σ²For the global preceding variance to BN input data, σ² =v- μ², v is the global preceding mean value of square to BN input data, and μ is the global preceding mean value to BN input data, ε For fixed minimum nonzero value, m_iThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,The gradient of offset parameter γ when for the GPU being i-th of GPU.

20. according to the method for claim 18, which is characterized in that according to formulaDetermine offset parameter β Gradient, whereinIt is described global preceding to BN output data gradient mean value, m_iForward direction when for the GPU being i-th of GPU The quantity of the data of BN input data subset,The gradient of offset parameter β when for the GPU being i-th of GPU.

21. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:

In the case where the GPU is the slave GPU in multiple GPU, it is described from GPU according to the backward BN input data subset and The forward direction BN input data subset determines the backward equal value set of BN input data subset from GPU, described after GPU To BN input data subset mean value and forward direction BN gradient calibration data mean value after including to the equal value set of BN input data subset；

The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU；

Receive the equal value set of the backward BN input data of the overall situation from the main GPU, the global backward BN input data mean value Set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.

22. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:

After the GPU determines the GPU according to the backward BN input data subset and the forward direction BN input data subset To the equal value set of BN input data subset, the backward equal value set of BN input data subset of the GPU include after to BN input number According to subset mean value and forward direction BN gradient calibration data mean value；

The backward equal value set of BN input data subset of the GPU is sent to other each GPU；

Receive the backward equal value set of BN input data subset from other each GPU；

Backward BN input data subset according to the equal value set of backward BN input data subset of the GPU and other each GPU is equal Value set determines that the equal value set of the global backward BN input data, the global backward equal value set of BN input data include: To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation.

23. a kind of deep neural network model training device that multiple graphics processing units are parallel, which is characterized in that described device It is arranged in each GPU of multiple GPU, described device includes:

Forward direction normalizes BN processing unit, preceding to BN input data subset for receiving in propagated forward treatment process；It determines To the equal value set of BN input data before global；It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN Enter before data subset carries out and handled to BN, to BN output data subset before obtaining；

Backward BN processing unit, is used in back-propagating treatment process, to BN input data subset, the backward BN after reception Input data subset is the gradient set of the forward direction BN output data subset；Determine global backward BN input data mean value collection It closes；It is preceding to BN according to the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation Data mean value set is handled after carrying out to the forward direction BN input data subset to BN, obtains forward direction BN input data Concentrate the gradient of each data.

24. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:

25. device according to claim 24, which is characterized in that before the forward direction BN processing unit determines the GPU To the equal value set of BN input data subset, comprising:

26. device according to claim 24, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:

27. device according to claim 24, which is characterized in that the forward direction BN processing unit inputs the forward direction BN Data subset is handled before carrying out to BN, comprising:

Offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN output data subset.

28. device according to claim 27, which is characterized in that the forward direction BN processing unit inputs the forward direction BN Each data in data subset operate before carrying out to BN, comprising:

According to formula σ²=v- μ²Determine the global preceding variance to BN input data, wherein v is described global preceding defeated to BN Enter the mean value of square of data, μ is the global preceding mean value to BN input data, σ²It is described global preceding to BN input data Variance；

29. device according to claim 27, which is characterized in that the forward direction BN processing unit is to number after the forward direction BN Offset operation is carried out according to data each in subset, comprising:

30. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:

31. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:

32. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising:

In the case where the GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction BN input data subset determines the backward equal value set of BN input data subset of the main GPU, backward BN input data Collect after equal value set includes to BN input data subset mean value and forward direction BN gradient calibration data mean value；

According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU Collect equal value set, determines that the equal value set of global backward BN input data, the global backward equal value set of BN input data include: To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation；

33. device according to claim 32, which is characterized in that after the backward BN processing unit determines the GPU To the equal value set of BN input data subset, comprising:

According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined Loss function, y_{I, j}For the data in the forward direction BN output data subset,For y_{I, j}Gradient,It is for the GPU Backward BN input data subset mean value when i-th of GPU；

34. device according to claim 32, which is characterized in that the backward BN processing unit determines described global backward The equal value set of BN input data subset, comprising:

35. device according to claim 32, which is characterized in that the backward BN processing unit inputs the forward direction BN Data subset is handled after carrying out to BN, comprising:

36. device according to claim 35, which is characterized in that after the backward BN processing unit determines the forward direction BN The gradient of each data in data subset, comprising:

37. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN Enter the gradient of the variance of data, comprising:

38. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN Enter the gradient of data mean value, comprising:

39. the device according to claim 335, which is characterized in that the backward BN processing unit determines the forward direction BN The gradient of each data in input data subset, comprising:

40. device according to claim 32, which is characterized in that the backward BN processing unit is also used to:

41. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter γ, wherein φ be it is described it is global before to BN gradient calibration data mean value, μ is the global preceding mean value to BN input data,To be described global preceding to BN output data gradient mean value, σ 2 is described complete To the variance of BN input data, σ before office²=v- μ², v is the global preceding mean value of square to BN input data, and μ is described complete To the mean value of BN input data before office, ε is fixed minimum nonzero value, m_iForward direction BN when for the GPU being i-th of GPU is defeated Enter the quantity of the data of data subset,The gradient of offset parameter γ when for the GPU being i-th of GPU.

42. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter β, whereinIt is described global preceding to BN output data gradient mean value, m_iFor The quantity of the data of forward direction BN input data subset when the GPU is i-th of GPU,When for the GPU being i-th of GPU Offset parameter β gradient.

43. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising:

Receiving the backward equal value set of BN input data of the overall situation from the main GPU includes: global backward BN input data mean value With before the overall situation to BN gradient calibration data mean value.

44. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising: