CN109255439A - A kind of DNN model training method and device that multiple GPU are parallel - Google Patents
A kind of DNN model training method and device that multiple GPU are parallel Download PDFInfo
- Publication number
- CN109255439A CN109255439A CN201710564223.4A CN201710564223A CN109255439A CN 109255439 A CN109255439 A CN 109255439A CN 201710564223 A CN201710564223 A CN 201710564223A CN 109255439 A CN109255439 A CN 109255439A
- Authority
- CN
- China
- Prior art keywords
- input data
- gpu
- forward direction
- mean value
- global
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The present invention discloses a kind of DNN model training method and device that multiple GPU are parallel, and existing training precision is low when for solving the problems, such as multiple GPU parallel training DNN models in the prior art.This method comprises: in propagated forward treatment process, to BN input data subset before receiving;It determines global preceding to the equal value set of BN input data;According to before the overall situation to the equal value set of BN input data, to it is preceding carried out to BN input data subset before handle to BN, to BN output data subset before obtaining;In back-propagating treatment process, to BN input data subset after reception;Determine the equal value set of global backward BN input data;It is preceding to BN data mean value set according to the equal value set of global BN input data backward, backward BN input data subset and the overall situation, it is handled to BN after the preceding progress to BN input data subset, obtains the gradient of preceding each data into BN input data subset.
Description
Technical field
The present invention relates to field of information processing, in particular to a kind of multiple graphics processing unit (Graphics
Processing Unit, GPU) parallel deep neural network (Deep Neutral Network, DNN) model training method
And device.
Background technique
At present in the deep learning that picture is classified and divided, DNN model training will do it.It deposits in the prior art
In a kind of method of multiple GPU parallel trainings.By the data (or being global data) of one or more picture according to the number of GPU
Amount is divided into multiple data subsets, and this multiple data subset is corresponded to and distributes to multiple GPU, and each GPU is sub using the data of distribution
Collection is trained DNN model, to improve the efficiency of training.Specifically during actual treatment, in a cycle of training
Interior, a collection of training data (data batch) (such as plurality of pictures) of acquisition according to existing GPU card number, is divided by system
The data subset (sub batch) of corresponding number, and data subset is distributed to corresponding GPU card.In training, each piece
The DNN model to be trained of the pre-loaded complete set of meeting, reuses the data subset being assigned to and goes to train the DNN on GPU card
Model.
Since the data that every piece of GPU is obtained are different, the ladder that different GPU cards train the DNN Model Weight come will lead to
Degree has differences.
In this case it will do it mold sync operation, i.e., will train the gradient come on different GPU and carry out reduction merging,
Determination obtains identical gradient, then is gone to update the Model Weight on every piece of GPU with the gradient that the reduction merges.
Through the above scheme, the efficiency of more GPU parallel training DNN models improves, but the accuracy decline integrally trained
, and when the quantity of GPU is more, accuracy decline is more obvious.
Summary of the invention
In view of the above problems, the present invention provides a kind of DNN model training method and device that multiple GPU are parallel, to
Solve the problems, such as that existing training precision is low when multiple GPU parallel training DNN models in the prior art.
According to the one aspect of the application, a kind of model training side DNN that multiple GPU are parallel is provided in some embodiments
Method a, comprising: GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, at propagated forward
During reason, to normalization (BN) input data subset before receiving;It determines global preceding to the equal value set of BN input data;According to institute
State before the overall situation to the equal value set of BN input data, handled before being carried out to the forward direction BN input data subset to BN, before obtaining to
BN output data subset;
In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is
The gradient set of the forward direction BN output data subset;Determine the equal value set of global backward BN input data;According to the overall situation
It is right backward to BN data mean value set before the equal value set of BN input data, the backward BN input data subset and the overall situation
The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset
Degree.
According to the one aspect of the application, a kind of DNN model training that more GPU are parallel dress is provided in some embodiments
It sets, described device is arranged in each GPU of multiple GPU, and described device includes: preceding to normalization (BN) processing unit, is used for
In propagated forward treatment process, to BN input data subset before receiving;It determines global preceding to the equal value set of BN input data;Root
It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset
Forward direction BN output data subset;Backward BN processing unit, is used in back-propagating treatment process, to BN input data after reception
Subset, the backward BN input data subset are the gradient set of the forward direction BN output data subset;Determine global backward BN
The equal value set of input data;According to the equal value set of the global backward BN input data, the backward BN input data subset and
It is described it is global before to BN data mean value set, handled after being carried out to the forward direction BN input data subset to BN, obtain it is described before
The gradient of each data into BN input data subset.
Through method and apparatus provided by the embodiments of the present application, when multiple GPU parallel training DNN models, at forward direction BN
The equal value set before the overall situation to BN input data is introduced during reason, is introduced after the overall situation into BN treatment process rear to BN
The equal value set of input data can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on complete
It handled before the mean value situation progress of office data to BN processing and backward BN, obtain carrying out when global data is trained with single GPU
Similar overall situation gradient improves training precision, and training precision is low when so as to solve multiple GPU parallel trainings in the prior art
The problem of.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention
It applies example to be used to explain the present invention together, not be construed as limiting the invention.
Fig. 1 a is the schematic diagram of DNN model training multiple GPU parallel in the prior art;
Fig. 1 b is the training precision and testing accuracy figure of DNN model training multiple GPU parallel in the prior art;
Fig. 2 is the flow chart of the parallel DNN model training method of multiple GPU provided by the embodiments of the present application;
Fig. 3 a is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 3 b is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 3 c is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 4 a is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 4 b is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 4 c is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 5 is the structural block diagram of the parallel DNN model training apparatus of multiple GPU provided by the embodiments of the present application;
Fig. 6 is the model training precision figure for implementing method shown in Fig. 2;
Fig. 7 is the tested accuracy figure for implementing method shown in Fig. 2.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real
The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation
Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention
Range.
In the prior art, when carrying out more GPU parallel training DNN models, the data subset that is assigned to due to each GPU
It is a part of global data, is deposited using the gradient that the trained DNN of data subset will lead to the Model Weight that different GPU are trained
In difference, the gradient come will be trained on different GPU at this time and carries out reduction merging, obtains identical gradient, then gone with the gradient
The Model Weight on each GPU is updated, the precision of training is wanted when will lead to the ratio of precision list GPU of the model of more GPU parallel trainings
It is low, and when the quantity of GPU is more, accuracy decline is more obvious.
Present inventor has found the parallel model training side DNN more GPU during solving above-mentioned technical problem
In method, normalization (Batch Normalization, BN) layer operate between data, mainly to the number for entering BN layers on the GPU
The mean value and variance that total data is calculated according to subset (i.e. BN layers of input data subset) reuse the mean value and variance to data
The each data concentrated are normalized.Specific as shown in Figure 1a, multiple GPU include GPU 0, GPU 1 and GPU 2, GPU
0, GPU 1 and GPU 2 is right using the data subset Sub Batch 0, Sub Batch 1 and the Sub Batch 2 that are assigned to respectively
Preloading DNN model is trained in it, in forward direction treatment process, carries out BN processing to the data subset for entering BN layers,
The gradient come will be trained after backward processing on different GPU and carries out reduction merging, determines and obtains identical gradient, then really with this
Surely the gradient obtained goes to update the Model Weight on every piece of GPU.
But during forward direction processing, since the data subset on each GPU is a part of global data, respectively
Data between a GPU have differences, and the mean value and variance of the data subset calculated in BN layers are all different, use mean value
With variance each data are normalized, the locality of the data on each GPU of further expansion, so as to cause each
The calculated gradient direction of GPU is not global descent direction and the low problem of training precision.As shown in Figure 1 b, 3 GPU are simultaneously
When capable DNN model training precision (as shown in fine line in Fig. 1 b) carries out DNN model training to global data compared to single GPU
Precision (as shown in heavy line in Fig. 1 b) have dropped 7% or so, and 3 GPU parallel testing accuracy (fine dotted line in such as Fig. 1 b
It is shown) than single GPU when testing accuracy (as shown in heavy line in Fig. 1 b) have dropped 15% or so.When GPU quantity further increases
Added-time, precision can further decline.
For this problem, it in method provided by the embodiments of the present application, when multiple GPU parallel training DNN models, is carrying out
During propagated forward processing, to before the overall situation to BN input data determine it is global before to the equal value set of BN input data, and root
According to before the overall situation to the equal value set of BN input data, to BN processing before being carried out to the forward direction BN input data subsets for entering BN layer,
And rear into BN treatment process, using the gradient set of the preceding data subset to BN processing output to the input of BN as after,
And the equal value set of global backward BN input data is determined to global backward BN input data, according to the backward BN input data of the overall situation
Equal value set is handled to BN after the preceding progress to BN input data subset, determines the gradient before obtaining to BN input data;Due to
The equal value set before the overall situation to BN input data is introduced in forward direction BN treatment process, is introduced rear into BN treatment process
The equal value set of global backward BN input data can make up for it GPU and not obtain the defect that total data carries out DNN model training,
Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number
Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art
Training precision low problem when DNN model.
Method and apparatus provided by the embodiments of the present application are described in detail below.
Embodiment one
Referring to fig. 2, the embodiment of the present application provides a kind of DNN model training method that multiple GPU are parallel, the place of this method
Managing process includes:
Step 201, a GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, preceding
To during dissemination process, to normalization (BN) input data subset before receiving;It determines global preceding to BN input data mean value collection
It closes;It is handled before being carried out to the forward direction BN input data subset to BN according to the overall situation is preceding to the equal value set of BN input data,
To BN output data subset before obtaining;
Step 202, in back-propagating treatment process, to BN input data subset after reception, the backward BN inputs number
It is the gradient set of the forward direction BN output data subset according to subset;Determine the equal value set of global backward BN input data;According to
To BN data mean value before the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation
Set is handled after carrying out to the forward direction BN input data subset to BN, is obtained each in the forward direction BN input data subset
The gradient of data.
Method provided by the present application is preceding to BN input data mean value according to the determining overall situation in propagated forward treatment process
Set is handled before the preceding progress to BN input data subset to BN, in back-propagating treatment process, after the determining overall situation
To BN data mean value set before to the equal value set of BN input data, backward BN input data subset and the overall situation, to the forward direction BN
Input data subset is handled after carrying out to BN, and can make up for it GPU there is no the defect that total data carries out DNN model training,
Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number
Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art
Existing training precision low problem when DNN model.
The forward direction BN processing in propagated forward treatment process and the backward BN in back-propagating treatment process are handled below
It is described in detail.
In the embodiment of the present invention, in abovementioned steps 201, can specifically it pass through before determining the overall situation to the equal value set of BN input data
But it is not limited only to following two mode to realize:
In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, before determining the overall situation by main GPU
To the equal value set of BN input data, and other will be sent to respectively from GPU, from GPU to the equal value set of BN input data before the overall situation
It is global preceding to the equal value set of BN input data to no longer need to independent calculating.
Regardless of principal and subordinate, each GPU independently determines global preceding to the equal value set of BN input data by mode 2, multiple GPU.
Based on mode 1, when GPU is main GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding
A1~step A4 is realized:
Step A1, main GPU determines the forward direction BN input data subset of the GPU according to the forward direction BN input data subset
Equal value set, the equal value set of the forward direction BN input data subset include: the mean value peace of the forward direction BN input data subset
Square mean value;
Step A2, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
Step A3, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction BN of GPU
The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data
Value set includes: the mean value and mean value of square before the overall situation to BN input data;
Step A4, it will be sent to before the overall situation to the equal value set of BN input data other respectively from GPU.
Based on mode 1, when GPU is from GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding
B1~step B3 is realized:
Step B1, forward direction BN input data from GPU is determined from GPU according to the forward direction BN input data subset
Collect equal value set, the equal value set of the forward direction BN input data subset include: the forward direction BN input data subset mean value and
Mean value of square;
Step B2, the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
Step B3, receive from the main GPU it is described it is global before to the equal value set of BN input data, it is described it is global before to
The equal value set of BN input data includes: the mean value and mean value of square before the overall situation to BN input data.
Based on mode 2, can be realized to the equal value set of BN input data by following steps C1~step C3 before determining the overall situation:
Step C1, the forward direction BN input data subset mean value of the GPU is determined according to the forward direction BN input data subset
Set, the equal value set of the forward direction BN input data subset include: the mean value of the forward direction BN input data subset and square equal
Value;
Step C2, the equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Step C3, the equal value set of forward direction BN input data subset from other each GPU is received;
Step C4, it is inputted according to the forward direction BN of the equal value set of forward direction BN input data subset of the GPU and other each GPU
The equal value set of data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data
Value set includes: the mean value and mean value of square before the overall situation to BN input data.
Wherein step A1, step B1 are identical with the implementation of step C1.Wherein mode 1, in mode 2, GPU is according to multiple
The equal value set of input data subset of GPU determines that the global preceding mode to the equal value set of BN input data is also identical.
Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step
Rapid 201 specific implementation is described in detail, respectively referring to Fig. 3 a, Fig. 3 b and Fig. 3 c.
Fig. 3 a shows the detailed process of step 201 in Fig. 2, including following process flow:
Step 2011, to BN input data subset before receiving, this it is preceding to BN input data subset be in propagated forward processing
It is input to BN layers of data subset in the process, is specifically represented by Bi={ xi,j(j=1,2 ... mi), BiIt is i-th for the GPU
Forward direction BN input data subset when a GPU, xi,jFor the data in the forward direction BN input data subset, miFor the forward direction
The quantity of data in BN input data subset;
Step 2012, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the forward direction BN
Input data subset determines the equal value set of forward direction BN input data subset of the main GPU, the forward direction BN input data subset
Equal value set includes: the mean value and mean value of square of the forward direction BN input data subset;
It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN input
The mean value of data subset, according to formulaDetermine the mean value of square of the forward direction BN input data subset;
Wherein, μiThe mean value of forward direction BN input data subset when for the GPU being i-th of GPU, viIt is for the GPU
The mean value of square of forward direction BN input data subset when i GPU;
In the embodiment in fact of the application, to the equal of BN input data subset before can also being determined according to other methods
Value and mean value of square, these methods be for those of ordinary skills it is well known, repeat no more herein;
Step 2013, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
Step 2014, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction of GPU
The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input data
Equal value set includes: the mean value and mean value of square before the overall situation to BN input data;And it will be described global preceding equal to BN input data
Value set is sent to other respectively from GPU;
It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN input
The mean value of data, according to formulaDetermine the global preceding mean value of square to BN input data;
Wherein, n is the quantity of the multiple GPU, miFor the data in the forward direction BN input data subset of i-th of GPU
Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, μ is the global preceding mean value to BN input data, vi
For the mean value of square of the forward direction BN input data subset of i-th of GPU, v is the global preceding mean value of square to BN input data;
In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the equal of BN input data
Value and mean value of square, those of ordinary skill in the art can select specific algorithm according to specific application scenarios, the application this
In be not listed one by one;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN
Enter the variance of data;
It, can be according to formula σ in some embodiments of the present application2=v- μ2It determines described global preceding to BN input data
Variance, wherein σ2For the global preceding variance to BN input data, v and μ are respectively described as shown in step 2014
To the mean value of square and mean value of BN input data before global;
In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the side of BN input data
Difference, those skilled in the art can select specific algorithm according to specific application scenarios, and the application is not listed one by one here;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset
Each data operate before carrying out to BN, to data subset after BN before obtaining;
It, can be according to formula in some embodiments of the present applicationTo forward direction BN input data
The each data concentrated operate before carrying out to BN, wherein x as described abovei,jFor the number in the forward direction BN input data subset
According to miFor the quantity of the data in the forward direction BN input data subset, μ is the global preceding mean value to BN input data, σ2
For the global preceding variance to BN input data, ε is fixed minimum nonzero value, prevents the occurrence of removing zero,For institute
Data before stating into data subset after BN;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN
Output data subset.
It, can be according to formula in some embodiments of the present applicationTo data after the forward direction BN
Each data are concentrated to carry out offset operation, wherein γ, β are offset parameter,For the number in data subset after the forward direction BN
According to yi,jFor the data in the forward direction BN output data subset.
In above-mentioned propagated forward treatment process, GPU determines that the overall situation is preceding to BN input data mean value and mean value of square, to preceding
To the progress of BN input data subset based on being handled before the overall situation to the BN of BN input data mean value and mean value of square, can make up for it
GPU does not obtain the defect handled before total data carries out to BN, at BN before capable of being carried out based on the mean value situation of global data
Reason.
Treatment process shown in Fig. 3 a describes the forward direction BN processing working principle of the main GPU in multiple GPU, from GPU's
The difference of forward direction BN processing working principle and the forward direction BN processing working principle of main GPU is above-mentioned steps 2012-2014, other
Treatment process is identical as step 2011 shown in Fig. 3 a and 2015-2017, below with reference to Fig. 3 b to forward direction BN processing place from GPU
Reason process is illustrated, and is repeated no more in Fig. 3 b with identical processing step in Fig. 3 a.
Step 2011, to BN input data subset before receiving;
Step 2012 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN
Input data subset determines the equal value set of forward direction BN input data subset from GPU;To BN input data subset before determining
The method of equal value set is identical as step 2012, and which is not described herein again;
Step 2013 ', the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
Step 2014 ', it receives described global preceding to the equal value set of BN input data, the overall situation from the main GPU
The equal value set of forward direction BN input data includes: the mean value and mean value of square before the overall situation to BN input data;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN
Enter the variance of data;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset
Each data operate before carrying out to BN, to data subset after BN before obtaining;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN
Output data subset.
In the case where multiple GPU point is master/slave GPU, is determined from main GPU it is global preceding to the equal value set of BN input data,
It is preceding to the equal value set of BN input data that overall situation determined by main GPU is received from GPU, can save the process resource from GPU.
In some other embodiment of the application, master/slave GPU can not also be distinguished, before each GPU is independently determined the overall situation
To the equal value set of BN input data.The forward direction BN processing working principle of each independent GPU and the forward direction BN of main GPU handle working principle
Difference be above-mentioned steps 2012-2014, other treatment processes are identical as step 2011 shown in Fig. 3 a and 2015-2017,
It is illustrated below with reference to treatment process of Fig. 3 c to each GPU, is repeated no more in Fig. 3 c with identical processing step in Fig. 3 a.
Step 2011, to BN input data subset before receiving;
Step 2012 " determines that the forward direction BN input data subset of the GPU is equal according to the forward direction BN input data subset
Value set;Identical as step 2012 to the method for the equal value set of BN input data subset before determining, which is not described herein again;
The equal value set of identified forward direction BN input data subset is sent to other each GPU by step 2013 ";It receives and
From the equal value set of forward direction BN input data subset of other each GPU;
Step 2014 ", according to the forward direction BN of forward direction BN input data the subset equal value set and other each GPU of the GPU
The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input number
According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data;It determines global preceding to BN input data mean value
The method of set is identical as step 2014, and which is not described herein again;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN
Enter the variance of data;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset
Each data operate before carrying out to BN, to data subset after BN before obtaining;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN
Output data subset.
In the case that each GPU in multiple GPU is independent GPU, each GPU is inputted before respectively determining the overall situation respectively to BN
Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.
The BN processing of back-propagating treatment process is illustrated below.
In the embodiment of the present invention, in abovementioned steps 202, determine that the global backward equal value set of BN input data can specifically pass through
But it is not limited only to following two mode to realize:
In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, after determining the overall situation by main GPU
Other are sent to respectively from GPU, from GPU to the equal value set of BN input data, and by the backward equal value set of BN input data of the overall situation
No longer need to the equal value set of the independent global backward BN input data of calculating.
Regardless of principal and subordinate, each GPU independently determines the equal value set of global backward BN input data by mode 2, multiple GPU.
Main GPU determines that the global backward equal value set of BN input data can be realized by following step D1~D4 in mode 1:
Step D1, main GPU determines institute according to the backward BN input data subset and the forward direction BN input data subset
State the backward equal value set of BN input data subset of main GPU, the backward equal value set of BN input data subset include after to BN
Input data subset mean value and forward direction BN gradient calibration data mean value;
Step D2, it receives from other respectively from the backward equal value set of BN input data subset of GPU;
Step D3, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN of GPU
The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number
To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set;
Step D4, the global backward equal value set of BN input data is sent to other respectively from GPU.
Determine that the global backward equal value set of BN input data can be realized by following step E1~E3 from GPU in mode 1:
Step E1, described true according to the backward BN input data subset and the forward direction BN input data subset from GPU
The fixed backward equal value set of BN input data subset from GPU, the backward equal value set of BN input data subset from GPU
Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value;
Step E2, the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Step E3, the equal value set of the backward BN input data of the overall situation from the main GPU is received, the global backward BN is defeated
Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value.
Each GPU can be realized by following steps F1~F4 in mode 2:
Step F1, the described GPU is determined according to the backward BN input data subset and the forward direction BN input data subset
The backward equal value set of BN input data subset of the GPU, after the backward equal value set of BN input data subset of the GPU includes
To BN input data subset mean value and forward direction BN gradient calibration data mean value;
Step F2, the backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Step F3, the backward equal value set of BN input data subset from other each GPU is received;
Step F4, it is inputted according to the backward BN of the equal value set of backward BN input data subset of the GPU and other each GPU
The equal value set of data subset, determines the equal value set of the global backward BN input data, and the global backward BN input data is equal
Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.
Wherein step D1, step E1 are identical with the implementation of step F1.Wherein mode 1, in mode 2, GPU is according to multiple
The backward equal value set of BN input data subset of GPU determines the global preceding mode to the equal value set of BN input data also phase
Together.
Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step
Rapid 202 specific implementation is described in detail, and distinguishes a, Fig. 4 b and Fig. 4 c referring to fig. 4.
Fig. 4 a shows the detailed process of step 202 in Fig. 2, including following process flow:
It step 2021, to BN input data subset is in above-mentioned steps 2017 after this after reception to BN input data subset
The gradient set for determining obtained forward direction BN input data subset, is specifically represented by
GiBackward BN input data subset when for the GPU being i-th of GPU,It is scheduled loss function, yi,jFor the forward direction BN
Data in output data subset,For yi,jGradient namely backward BN input data subset in data;
Step 2022, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the backward BN
Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection of the main GPU
Close, the backward equal value set of BN input data subset include after to BN input data subset mean value and forward direction BN gradient calibration number
According to mean value;
It, can be according to formula in some embodiments of the present applicationDetermine the backward BN input
Data subset mean value, whereinBackward BN input data subset mean value when for the GPU being i-th of GPU;
It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN
Gradient calibration data mean value, wherein φiForward direction BN gradient calibration data mean value when for the GPU being i-th of GPU;
Step 2023, it receives from other respectively from the backward equal value set of BN input data subset of GPU;
Step 2024, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward of GPU
The equal value set of BN input data subset determines the equal value set of global backward BN input data, the global backward BN input data
Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value;After the overall situation
It is sent to the equal value set of BN input data subset other respectively from GPU;
It, can be according to formula in some embodiments of the present applicationDetermine the global backward BN input
Data mean value, wherein n is the quantity of the multiple GPU, miFor the number of the data of the forward direction BN input data subset of i-th of GPU
Amount,For the forward direction BN output data subset gradient mean value of i-th of GPU,For the global backward BN input data mean value;
It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN ladder
Spend correction data mean value, wherein φiFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN
Gradient calibration data mean value;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset
Gradient;
It, can be according to formula in some embodiments of the present applicationDetermine data after the forward direction BN
The gradient of each data in subset, whereinIt is scheduled loss function, γ is offset parameter,For number after the forward direction BN
According to sub- intensive dataGradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and
To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
In some embodiments of the present application, according to formula It determines
The gradient of the global preceding variance to BN input data, wherein σ2For, to the variance of BN input data, ε is to fix before the overall situation
Minimum nonzero value, φ be it is described it is global before to BN gradient calibration data mean value,It is described global preceding to BN output data ladder
Mean value is spent, γ is offset parameter,For the gradient of the global preceding variance to BN input data;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation
The global preceding gradient to BN input data mean value is determined according to mean value;
In some embodiments of the present application, according to formulaIt determines described global preceding defeated to BN
Enter the gradient of data mean value, wherein σ2For, to the variance of BN input data, ε is fixed minimum nonzero value, and φ is institute before the overall situation
It stating before the overall situation to BN gradient calibration data mean value, γ is offset parameter,It is the global preceding ladder to BN input data mean value
Degree;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN
The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value
Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset
Degree;
In some embodiments of the present application, according to formula
Determine the gradient of each data in the forward direction BN input data subset, whereinFor the forward direction determined in above-mentioned steps 2025
After BN in data subset each data gradient, σ2For, to the variance of BN input data, ε is fixed minimum non-zero before the overall situation
Value,For in above-mentioned steps 2026 determine the overall situation before to BN input data variance gradient,It is above-mentioned steps 2027
The preceding gradient to BN input data mean value of the overall situation of middle determination,It is the data x in the forward direction BN input data subseti,j's
Gradient.
In above-mentioned back-propagating treatment process, by the gradient of the forward direction BN output data subset exported after the preceding processing to BN
Gather the input handled as after to BN, the equal value set of the determining backward BN input data of the overall situation of GPU, based on global preceding to BN input
Data mean value set and the global backward equal value set of BN input data, to it is preceding carried out to BN input data subset after at BN
Reason can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on the mean value situation of global data
The preceding global gradient for handling to BN processing and backward BN, obtaining carrying out similar data when global data is trained with list GPU is carried out,
Similar precision when being trained with list GPU to global data can be reached to the precision of model training;To which the application mentions
More GPU out parallel DNN model training method is able to solve multiple GPU parallel training DNN models existing in the prior art
When the low problem of existing training precision.
Treatment process shown in Fig. 4 a describes the backward BN processing working principle of the main GPU in multiple GPU, from GPU's
The difference of backward BN processing working principle and the backward BN processing working principle of main GPU is above-mentioned steps 2022-2024, other
Treatment process is identical as step 2011 shown in Fig. 4 a and 2025-2028, below with reference to Fig. 4 b to backward BN processing place from GPU
Reason process is illustrated, and is repeated no more in Fig. 4 b with identical processing step in Fig. 4 a.
Step 2021, to BN input data subset after reception;
Step 2022 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN
Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection from GPU
Close, it is described from the backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN ladder
Spend correction data mean value;
Step 2023 ', the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Step 2024 ', the equal value set of the backward BN input data of the overall situation from the main GPU is received, it is described global backward
The equal value set of BN input data includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset
Gradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and
To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation
The global preceding gradient to BN input data mean value is determined according to mean value;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN
The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value
Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset
Degree.
In the case where being master/slave GPU for multiple GPU points, the equal value set of global backward BN input data is determined by main GPU,
The equal value set of the backward BN input data of the overall situation determined by main GPU is received from GPU, the process resource from GPU can be saved.
In some other embodiment of the application, master/slave GPU can not also be distinguished, after each GPU is independently determined the overall situation
To the equal value set of BN input data.The backward BN processing working principle of each independent GPU and the backward BN of main GPU handle working principle
Difference be above-mentioned steps 2022-2024, other treatment processes are identical as step 2021 shown in Fig. 4 a and 2025-2028,
It is illustrated below with reference to treatment process of Fig. 4 c to each GPU, is repeated no more in Fig. 4 c with identical processing step in Fig. 4 a.
Step 2021, to BN input data subset after reception;
Step 2022 ", the GPU are true according to the backward BN input data subset and the forward direction BN input data subset
The backward equal value set of BN input data subset of the fixed GPU, the backward equal value set of BN input data subset of the GPU include
Backward BN input data subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU by step 2023 ";It receives and
From the backward equal value set of BN input data subset of other each GPU;
Step 2024 ", according to the backward BN of backward BN input data the subset equal value set and other each GPU of the GPU
The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number
To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset
Gradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and
To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation
The global preceding gradient to BN input data mean value is determined according to mean value;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN
The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value
Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset
Degree.
In the case that each GPU in multiple GPU is independent GPU, each GPU respectively determines global backward BN input respectively
Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.
On the basis of the processing method shown in Fig. 2 to Fig. 4 c, more GPU provided by the embodiments of the present application parallel DNN model
Training method further comprises following processing: according to defeated to the equal value set of BN input data and global backward BN before the overall situation
Enter data mean value set, determine the gradient of BN layers of training parameter, the training parameter includes above-mentioned offset parameter γ and β.
It, can be according to formula in some embodiments of the present applicationDetermine the ladder of offset parameter γ
Degree, according to public affairsDetermine the gradient of offset parameter β, whereinOffset when for the GPU being i-th of GPU
The gradient of parameter γ,The gradient of offset parameter β when for the GPU being i-th of GPU.
After the gradient for determining offset parameter γ and β, gradient and the gradient descent algorithm of the determination can use to update
The value of γ and β achievees the purpose that optimize DNN model.
The DNN model training apparatus parallel to multiple GPU provided by the embodiments of the present application is illustrated below, which sets
It is placed in each GPU in multiple GPU, which carries out DNN model training to the data subset being assigned to, and Fig. 5 shows the dress
The structural block diagram set, the device include: preceding to BN processing unit 51 and backward BN processing unit 52.
Forward direction BN processing unit 51, it is preceding to BN input data subset for receiving in propagated forward treatment process;It determines
To the equal value set of BN input data before global;It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN
Enter before data subset carries out and handled to BN, to BN output data subset before obtaining;
Wherein, in some embodiments of the present application, the forward direction BN processing unit 51 determines global preceding to BN input number
According to equal value set, comprising: in the case where the GPU is the main GPU in multiple GPU, the main GPU is defeated according to the forward direction BN
Enter the equal value set of forward direction BN input data subset that data subset determines the GPU, the forward direction BN input data subset mean value
Set includes: the mean value and mean value of square of the forward direction BN input data subset;It receives from other respectively defeated from the forward direction BN of GPU
Enter the equal value set of data subset;According to the equal value set of forward direction BN input data subset of the main GPU and other respectively before GPU
To the equal value set of BN input data subset, determine that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to input number to BN
According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data;It will be described global preceding equal to BN input data
Value set is sent to other respectively from GPU.
In other embodiments of the application, the forward direction BN processing unit determines global preceding to BN input data mean value
Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN input data
Subset determines the equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset
It include: the mean value and mean value of square of the forward direction BN input data subset;By identified forward direction BN input data subset mean value
Gather the main GPU being sent in multiple GPU;Receive from the main GPU it is described it is global before to the equal value set of BN input data,
To the mean value and mean value of square of BN input data before including: the overall situation to the equal value set of BN input data before the overall situation.
In other embodiments of the application, the forward direction BN processing unit 51 determines global preceding equal to BN input data
Value set, comprising: the forward direction BN input data subset mean value collection of the GPU is determined according to the forward direction BN input data subset
It closes, the equal value set of the forward direction BN input data subset includes: the mean value and mean value of square of the forward direction BN input data subset;
The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;Receive the forward direction BN from other each GPU
The equal value set of input data subset;According to the forward direction of forward direction BN input data the subset equal value set and other each GPU of the GPU
The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input
Data mean value set includes: the mean value and mean value of square before the overall situation to BN input data.
The forward direction BN processing unit 51 determines the equal value set of forward direction BN input data subset of the GPU, comprising: according to
FormulaDetermine the mean value of the forward direction BN input data subset, wherein Βi={ xi,j(j=1,2 ...
mi), the forward direction BN input data subset that Bi is the GPU when being i-th of GPU, xi,jFor in the forward direction BN input data subset
Data, mi be the forward direction BN input data subset in data quantity, μiForward direction when for the GPU being i-th of GPU
The mean value of BN input data subset;According to formulaDetermine square of the forward direction BN input data subset
Mean value, wherein νiThe mean value of square of forward direction BN input data subset when for the GPU being i-th of GPU.
The forward direction BN processing unit 51 determines global preceding to the equal value set of BN input data, comprising: according to formulaDetermine the global preceding mean value to BN input data, wherein n is the quantity of the multiple GPU, μiIt is
The mean value of the forward direction BN input data subset of i GPU, miFor the number of the data in the forward direction BN input data subset of i-th of GPU
Amount, μ are the global preceding mean value to BN input data;According to formulaIt determines described global preceding to BN input number
According to mean value of square, wherein νiFor the mean value of square of the forward direction BN input data subset of i-th of GPU, ν be it is described it is global before to
The mean value of square of BN input data.
The forward direction BN processing unit 51 is handled before carrying out to the forward direction BN input data subset to BN, comprising: according to
It is described it is global before to BN input data mean value and mean value of square, to each data in the forward direction BN input data subset into
It is operated before row to BN, to data subset after BN before obtaining;Offset behaviour is carried out to each data in data subset after the forward direction BN
Make, obtains the forward direction BN output data subset.
The forward direction BN processing unit 51 is grasped before carrying out to each data in the forward direction BN input data subset to BN
Make, comprising:
According to formula σ2=ν-μ2Determine it is described it is global before to BN input data variance, wherein ν be it is described it is global before to
The mean value of square of BN input data, μ are the global preceding mean value to BN input data, σ2It is described global preceding to BN input number
According to variance;
According to formulaIt is grasped before being carried out to each data in the forward direction BN input data subset to BN
Make, wherein Βi={ xi,j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU,
xi,jFor the data in the forward direction BN input data subset, miFor the quantity of the data in the forward direction BN input data subset,
μ is the global preceding mean value to BN input data, σ2For the global preceding variance to BN input data, ε is fixed pole
Small nonzero value,For the data in data subset after the forward direction BN.
The forward direction BN processing unit 51 carries out offset operation, packet to data each in data subset after the forward direction BN
It includes: according to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein
γ, β are offset parameter,For the data in data subset after the forward direction BN, yi,jFor the forward direction BN output data subset
In data.
The forward direction BN processing unit 51, is also used to: will be sent to it to the equal value set of BN input data before the overall situation
Its each GPU;Alternatively, the equal value set of forward direction BN input data subset of the GPU is sent to other each GPU.
Backward BN processing unit 52, is used in back-propagating treatment process, described to BN input data subset after reception
Backward BN input data subset be before the forward direction BN processing unit 51 carries out to after BN processing, obtained forward direction BN output data
The gradient set of subset;Determine the equal value set of global backward BN input data;According to the global backward BN input data mean value
To BN data mean value set before set, the backward BN input data subset and the overall situation, to the forward direction BN input data
Subset is handled after carrying out to BN, obtains the gradient of each data in the forward direction BN input data subset.
Wherein, the backward BN processing unit 52 determines the equal value set of global backward BN input data, comprising: described
In the case that GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction BN input data
Subset determines the backward equal value set of BN input data subset of the main GPU, the backward equal value set of BN input data subset
Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value;It receives from other respectively from the backward of GPU
The equal value set of BN input data subset;According to the equal value set of backward BN input data subset of the main GPU and other respectively from GPU
The backward equal value set of BN input data subset, determine that the global backward equal value set of BN input data, the global backward BN are defeated
Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value;It will be described
The global backward equal value set of BN input data is sent to other respectively from GPU.
In some embodiments of the present application, the backward BN processing unit 52 determines global backward BN input data mean value
Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN input data
Subset and the forward direction BN input data subset determine the backward equal value set of BN input data subset from GPU, it is described from
The backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN gradient calibration data
Mean value;The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;It receives from described
The global equal value set of BN input data backward of main GPU includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient
Correction data mean value.
In other embodiments of the application, the backward BN processing unit determines global backward BN input data mean value
Set, comprising: the GPU is according to the backward BN input data subset and forward direction BN input data subset determination
The backward equal value set of BN input data subset of GPU, the backward equal value set of BN input data subset of the GPU include after to BN
Input data subset mean value and forward direction BN gradient calibration data mean value;By the backward BN input data subset mean value collection of the GPU
Conjunction is sent to other each GPU;Receive the backward equal value set of BN input data subset from other each GPU;According to the GPU's
The backward equal value set of BN input data subset of the backward equal value set of BN input data subset and other each GPU determines described complete
The equal value set of the backward BN input data of office, the global backward equal value set of BN input data include: global backward BN input number
It is preceding to BN gradient calibration data mean value according to mean value and the overall situation.
The backward BN processing unit 52 determines the backward equal value set of BN input data subset of the GPU, comprising: according to
FormulaDetermine the backward BN input data subset mean value, whereinFor
Backward BN input data subset when the GPU is i-th of GPU,It is scheduled loss function, yi,jFor forward direction BN output
Data in data subset,For yi,jGradient,Backward BN input data subset when for the GPU being i-th of GPU
Mean value;According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Βi=
{xi,j(j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xi,jIt is defeated for the forward direction BN
Enter the data in data subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th for the GPU
Forward direction BN gradient calibration data mean value when a GPU.
The backward BN processing unit 52 determines the equal value set of the global backward BN input data subset, comprising: according to
FormulaDetermine the global backward BN input data mean value, wherein n is the quantity of the multiple GPU, miFor
The quantity of the data of the forward direction BN input data subset of i-th of GPU,For the forward direction BN output data subset gradient of i-th of GPU
Mean value,For the global backward BN input data mean value;According to formulaIt determines described global preceding to BN ladder
Spend correction data mean value, wherein φiFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN
Gradient calibration data mean value.
The backward BN processing unit 52 is handled after carrying out to the forward direction BN input data subset to BN, comprising: according to
The backward BN input data subset determines the gradient of each data in data subset after the forward direction BN;Before the overall situation
To BN gradient calibration data mean value before to the equal value set of BN input data, global backward BN input data mean value and the overall situation, determine
The gradient of the global preceding variance to BN input data;According to it is described it is global before to the equal value set of BN input data, described complete
The global preceding gradient to BN input data mean value is determined to BN gradient calibration data mean value before office;According to data after the forward direction BN
Number is inputted to BN before the gradient of each data, the gradient of the global preceding variance to BN input data, the overall situation in subset
According to the gradient of mean value, it is described it is global before to the equal value set of BN input data and it is described it is global before to BN input data mean value, really
The gradient of each data in the fixed forward direction BN input data subset.
The backward BN processing unit 52 determines the gradient of each data in data subset after the forward direction BN,
It include: according to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU,It is predetermined
Loss function, yi,jFor the data in the forward direction BN output data subset,For yi,jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
The backward BN processing unit 52 determines the gradient of the global preceding variance to BN input data, comprising: according to formulaDetermine the gradient of the global preceding variance to BN input data,
In, σ2For before the overall situation to the variance of BN input data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, μ
For the global preceding mean value to BN input data, ε is fixed minimum nonzero value, and φ is described global preceding to BN gradient calibration
Data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,It is described global preceding to BN
The gradient of the variance of input data.
The backward BN processing unit 52 determines the global preceding gradient to BN input data mean value, comprising: according to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2To be inputted before the overall situation to BN
The variance of data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, and μ is that the overall situation is preceding to be inputted to BN
The mean value of data, ε are fixed minimum nonzero value, and φ is that the overall situation is preceding to BN gradient calibration data mean value, and γ is offset ginseng
Number,It is the global preceding gradient to BN input data mean value.
The backward BN processing unit 52 determines the gradient of each data in the forward direction BN input data subset, comprising:
According to formulaIt determines in the forward direction BN input data subset
The gradient of each data, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For before the overall situation to BN
The variance of input data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, and μ is described global preceding to BN
The mean value of input data, ε are fixed minimum nonzero value,For it is described it is global before to BN input data variance gradient,It is the global preceding gradient to BN input data mean value,It is the data x in the forward direction BN input data subseti,j
Gradient.
The backward BN processing unit 52, is also used to: according to before the overall situation to the equal value set of BN input data and it is global after
To the equal value set of BN input data, determine the gradient of BN layers of training parameter, the BN layers of training parameter include offset parameter γ and
β。
The backward BN processing unit 52 can be according to the following formulaDetermine offset parameter γ's
Gradient, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is the global preceding mean value to BN input data,It is described global preceding to BN output data gradient mean value, σ2For before the overall situation to the variance of BN input data, σ2=ν-μ2, ν is institute
State the mean value of square before the overall situation to BN input data, μ be it is described it is global before to BN input data mean value, ε is fixed minimum
Nonzero value, miThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,It is for the GPU
The gradient of offset parameter γ when i-th of GPU.
The backward BN processing unit 52 can be according to the following formulaDetermine the gradient of offset parameter β,
Wherein,It is described global preceding to BN output data gradient mean value, miForward direction BN when for the GPU being i-th of GPU inputs number
According to the quantity of the data of subset,The gradient of offset parameter β when for the GPU being i-th of GPU.
It is equal according to the determining backward BN input data of the overall situation in back-propagating treatment process by device shown in fig. 5
Value set, backward BN input data subset and it is global before to BN data mean value set, to the forward direction BN input data subset into
It is handled after row to BN, the problem of incomplete caused data difference of data expands can be further compensated for;So as to obtain and list
Consistent data gradient, raising training precision between similar gradient, multiple GPU when a GPU progress global data training, from
And when being able to solve multiple GPU parallel training DNN models in the prior art, cause since data gradient is inconsistent between each GPU
The low problem of existing training precision.
The feelings of the DNN model training method parallel to multiple GPU provided by the embodiments of the present application in practical applications below
Condition is illustrated.
During concrete application, treatment process shown in Fig. 2, Fig. 3 a and Fig. 4 a can be integrated to deep learning instruction
Practice in frame MXNet, realizes completely executable technical solution.The system design of MXNet can be divided into C++ layers and Python
Layer.C++ layers are mainly responsible for task schedule, and internal memory optimization, calculating system level functions, the Python layers of major function such as graphics-optimized is
Complete training process is encapsulated, and the interface interacted with user is provided.In MXNet, traditional Python layers of training process is such as
Under:
During actually realizing, C++ layers and Python layers can all be modified, it can be with normal call after modification
Python interface, after multiple GPU provided by the embodiments of the present application parallel DNN model training method, Python layers of instruction
It is as follows to practice process:
After implementing above-mentioned processing, training precision and testing accuracy can be significantly improved, Fig. 6 shows 3 GPU and instructs parallel
The training precision comparative situation of experienced and single GPU training, wherein can find to apply multiple GPU provided by the present application significantly
Parallel training method training precision (the parallel global data training precision of 3GPU shown in solid in such as Fig. 6) close to list
Training precision (the list GPU training precision as shown in thick dashed line in Fig. 6) when the training of GPU global data, and it is in the prior art
It is obvious if training precision (such as the parallel local data's training precision of 3GPU shown in fine dotted line in Fig. 6) when more GPU parallel trainings
Training precision when will be lower than the training of list GPU global data.Fig. 7 shows the inspection of 3 GPU parallel trainings and single GPU training
Test accuracy comparison situation, wherein the testing accuracy of the parallel training method of application multiple GPU provided by the present application is (empty in such as Fig. 7
The parallel global data testing accuracy of 3GPU shown in line) close to the testing accuracy (in such as Fig. 7 when the training of single GPU global data
List GPU testing accuracy shown in heavy line), and testing accuracy when more GPU parallel trainings in the prior art is (thin in such as Fig. 7
The parallel local data's testing accuracy of 3GPU shown in solid) to be then obviously lower than testing accuracy when list GPU global data is trained.
As can be seen from the figure the model accuracy of application method provided by the embodiments of the present application training, when can achieve with list GPU training
Similar precision, and the model accuracy of the method training of more GPU parallel trainings improves 15% or so than in the prior art.
It is core of the invention thought above, in order to enable those skilled in the art to better understand the present invention in embodiment
Technical solution, and keep the above objects, features, and advantages of the embodiment of the present invention more obvious and easy to understand, with reference to the accompanying drawing
Technical solution in the embodiment of the present invention is described in further detail.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (44)
1. a kind of deep neural network model training method that multiple graphics processing units are parallel characterized by comprising
A GPU in multiple graphics processing unit GPU is when carrying out DNN model training to the data subset being assigned to, preceding
To during dissemination process, to normalization BN input data subset before receiving;It determines global preceding to the equal value set of BN input data;
It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset
To preceding to BN output data subset;
In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is described
The gradient set of forward direction BN output data subset;Determine the equal value set of global backward BN input data;According to described global backward
It is right to the equal value set of BN input data before the equal value set of BN input data, the backward BN input data subset and the overall situation
The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset
Degree.
2. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
In the case where the GPU is the main GPU in multiple GPU, the main GPU is true according to the forward direction BN input data subset
The equal value set of forward direction BN input data subset of the fixed GPU, the equal value set of the forward direction BN input data subset include: described
The mean value and mean value of square of forward direction BN input data subset;
It receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
According to the equal value set of forward direction BN input data subset of the main GPU and other respectively from forward direction BN input data of GPU
Collect equal value set, determines that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to include: to the equal value set of BN input data
To the mean value and mean value of square of BN input data before global;
It is other respectively from GPU by being sent to before the overall situation to the equal value set of BN input data.
3. according to the method described in claim 2, it is characterized in that, to the equal value set of BN input data subset before determining, comprising:
According to formulaDetermine the mean value of the forward direction BN input data subset, wherein Bi={ xI, j(j=
1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor the forward direction BN input data
Data in subset, miFor the quantity of the data in the forward direction BN input data subset, μiWhen for the GPU being i-th of GPU
Forward direction BN input data subset mean value;
According to formulaDetermine the mean value of square of the forward direction BN input data subset, wherein viIt is described
The mean value of square of forward direction BN input data subset when GPU is i-th of GPU.
4. according to the method described in claim 2, it is characterized in that, determining global preceding to the equal value set of BN input data, comprising:
According to formulaDetermine the global preceding mean value to BN input data, wherein n is the multiple GPU's
Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, miIn forward direction BN input data subset for i-th of GPU
Data quantity, μ be it is described it is global before to BN input data mean value;
According to formulaDetermine the global preceding mean value of square to BN input data, wherein viFor i-th of GPU
Forward direction BN input data subset mean value of square, v be it is described it is global before to BN input data mean value of square.
5. according to the method described in claim 2, it is characterized in that, at BN before being carried out to the forward direction BN input data subset
Reason, comprising:
According to the global preceding mean value and mean value of square to BN input data, to every in the forward direction BN input data subset
A data operate before carrying out to BN, to data subset after BN before obtaining;
Offset operation operation is carried out to data each in data subset after the forward direction BN, obtains forward direction BN output data
Collection.
6. according to the method described in claim 5, it is characterized in that, to each data in the forward direction BN input data subset
It is operated before carrying out to BN, comprising:
According to formula σ2=v- μ2The determining global preceding variance to BN input data, wherein v, which is that the overall situation is preceding, inputs number to BN
According to mean value of square, μ be it is described it is global before to BN input data mean value, σ2It is described global preceding to the side of BN input data
Difference;
According to formulaIt is operated before being carried out to each data in the forward direction BN input data subset to BN,
In, Bi={ xI, j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor institute
Data before stating into BN input data subset, miFor the quantity of the data in the forward direction BN input data subset, μ is described
To the mean value of BN input data, σ before global2For the global preceding variance to BN input data, ε is fixed minimum non-zero
Value,For the data in data subset after the forward direction BN.
7. according to the method described in claim 5, it is characterized in that, being carried out to data each in data subset after the forward direction BN
Offset operation, comprising:
According to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein
γ, β are offset parameter,For the data in data subset after the forward direction BN, yI, jFor the forward direction BN output data subset
In data.
8. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
It is described true according to the forward direction BN input data subset from GPU in the case where the GPU is the slave GPU in multiple GPU
The fixed equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset includes: institute
To the mean value and mean value of square of BN input data subset before stating;
The equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
It is preceding to the equal value set of BN input data to receive the overall situation from the main GPU, it is described global preceding to BN input data
Equal value set includes: the mean value and mean value of square before the overall situation to BN input data.
9. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
The equal value set of forward direction BN input data subset that the GPU is determined according to the forward direction BN input data subset, before described
It include: the mean value and mean value of square of the forward direction BN input data subset to the equal value set of BN input data subset;
The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Receive the equal value set of forward direction BN input data subset from other each GPU;
Forward direction BN input data subset according to the equal value set of forward direction BN input data subset of the GPU and other each GPU is equal
Value set determines that the overall situation is preceding to the equal value set of BN input data, and the overall situation is preceding to include: to the equal value set of BN input data
To the mean value and mean value of square of BN input data before global.
10. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet
It includes:
In the case where the GPU is the main GPU in multiple GPU, the main GPU according to the backward BN input data subset and
The forward direction BN input data subset determines that the backward equal value set of BN input data subset of the main GPU, the backward BN are defeated
Enter after the equal value set of data subset includes to BN input data subset mean value and forward direction BN gradient calibration data mean value;
It receives from other respectively from the backward equal value set of BN input data subset of GPU;
According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU
Collect equal value set, determines the equal value set of the global backward BN input data, the equal value set of the global backward BN input data
To BN gradient calibration data mean value before including: global backward BN input data mean value and being global;
The global backward equal value set of BN input data is sent to other respectively from GPU.
11. according to the method described in claim 10, it is characterized in that, determining that the backward BN input data subset of the GPU is equal
Value set, comprising:
According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is pre-
Fixed loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient,For the GPU
Backward BN input data subset mean value when for i-th of GPU;
According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Bi={ xI, j}
(j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xI, jNumber is inputted for the forward direction BN
According to the data in subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th of GPU for the GPU
When forward direction BN gradient calibration data mean value.
12. according to the method described in claim 10, it is characterized in that, determining the global backward BN input data subset mean value
Set, comprising:
According to formulaDetermine the global backward BN input data mean value, wherein n is the multiple GPU's
Quantity, miFor the quantity of the data of the forward direction BN input data subset of i-th of GPU,Number is exported for the forward direction BN of i-th of GPU
According to subset gradient mean value,For the global backward BN input data mean value;
According to formulaIt determines described global preceding to BN gradient calibration data mean value, wherein φiFor i-th of GPU
Forward direction BN gradient calibration data mean value, φ be it is described it is global before to BN gradient calibration data mean value.
13. according to the method described in claim 10, it is characterized in that, to BN after being carried out to the forward direction BN input data subset
Processing, comprising:
The gradient of each data in data subset after the forward direction BN is determined according to the backward BN input data subset;
According to preceding to BN gradient to the equal value set of BN input data, global backward BN input data mean value and the overall situation before the overall situation
Correction data mean value determines the gradient of the global preceding variance to BN input data;
According to preceding to the determining overall situation of BN gradient calibration data mean value to the equal value set of BN input data, the overall situation before the overall situation
The gradient of forward direction BN input data mean value;
According to the ladder of the gradient of each data, the global preceding variance to BN input data in data subset after the forward direction BN
Degree, global preceding gradient, the overall situation to BN input data mean value are preceding to the equal value set of BN input data and the overall situation
The mean value of forward direction BN input data determines the gradient of each data in the forward direction BN input data subset.
14. according to the method for claim 13, which is characterized in that determine after the forward direction BN each data in data subset
Gradient, comprising:
According to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined
Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
15. according to the method for claim 13, which is characterized in that determine the ladder of the global preceding variance to BN input data
Degree, comprising:
According to formulaDetermine the global preceding variance to BN input data
Gradient, wherein σ2For the global preceding variance to BN input data, σ2=v- μ2, v is described global preceding to BN input number
According to mean value of square, μ be it is described it is global before to BN input data mean value, ε is fixed minimum nonzero value, and φ is the overall situation
Forward direction BN gradient calibration data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,For
The gradient of the global preceding variance to BN input data.
16. according to the method for claim 13, which is characterized in that determine the global preceding gradient to BN input data mean value,
Include:
According to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2For institute
State the variance before the overall situation to BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is institute
State the mean value before the overall situation to BN input data, ε is fixed minimum nonzero value, φ be it is described it is global before to BN gradient calibration data
Mean value, γ are offset parameter,It is the global preceding gradient to BN input data mean value.
17. according to the method for claim 13, which is characterized in that determine every number in the forward direction BN input data subset
According to gradient, comprising:
According to formulaDetermine the forward direction BN input data subset
In each data gradient, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For the overall situation
The variance of forward direction BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is the overall situation
The mean value of forward direction BN input data, ε are fixed minimum nonzero value,For the global preceding variance to BN input data
Gradient,It is the global preceding gradient to BN input data mean value,It is the number in the forward direction BN input data subset
According to xI, jGradient.
18. according to the method described in claim 10, it is characterized in that, the method also includes:
According to, to the equal value set of BN input data and the global backward equal value set of BN input data, determining BN layers of training before the overall situation
The gradient of parameter, the BN layers of training parameter include offset parameter γ and β.
19. according to the method for claim 18, which is characterized in that according to formulaDetermine offset ginseng
The gradient of number γ, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is described global preceding to BN input data
Mean value,It is described global preceding to BN output data gradient mean value, σ2For the global preceding variance to BN input data, σ2
=v- μ2, v is the global preceding mean value of square to BN input data, and μ is the global preceding mean value to BN input data, ε
For fixed minimum nonzero value, miThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,The gradient of offset parameter γ when for the GPU being i-th of GPU.
20. according to the method for claim 18, which is characterized in that according to formulaDetermine offset parameter β
Gradient, whereinIt is described global preceding to BN output data gradient mean value, miForward direction when for the GPU being i-th of GPU
The quantity of the data of BN input data subset,The gradient of offset parameter β when for the GPU being i-th of GPU.
21. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet
It includes:
In the case where the GPU is the slave GPU in multiple GPU, it is described from GPU according to the backward BN input data subset and
The forward direction BN input data subset determines the backward equal value set of BN input data subset from GPU, described after GPU
To BN input data subset mean value and forward direction BN gradient calibration data mean value after including to the equal value set of BN input data subset;
The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Receive the equal value set of the backward BN input data of the overall situation from the main GPU, the global backward BN input data mean value
Set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.
22. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet
It includes:
After the GPU determines the GPU according to the backward BN input data subset and the forward direction BN input data subset
To the equal value set of BN input data subset, the backward equal value set of BN input data subset of the GPU include after to BN input number
According to subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Receive the backward equal value set of BN input data subset from other each GPU;
Backward BN input data subset according to the equal value set of backward BN input data subset of the GPU and other each GPU is equal
Value set determines that the equal value set of the global backward BN input data, the global backward equal value set of BN input data include:
To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation.
23. a kind of deep neural network model training device that multiple graphics processing units are parallel, which is characterized in that described device
It is arranged in each GPU of multiple GPU, described device includes:
Forward direction normalizes BN processing unit, preceding to BN input data subset for receiving in propagated forward treatment process;It determines
To the equal value set of BN input data before global;It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN
Enter before data subset carries out and handled to BN, to BN output data subset before obtaining;
Backward BN processing unit, is used in back-propagating treatment process, to BN input data subset, the backward BN after reception
Input data subset is the gradient set of the forward direction BN output data subset;Determine global backward BN input data mean value collection
It closes;It is preceding to BN according to the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation
Data mean value set is handled after carrying out to the forward direction BN input data subset to BN, obtains forward direction BN input data
Concentrate the gradient of each data.
24. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN
Enter data mean value set, comprising:
In the case where the GPU is the main GPU in multiple GPU, the main GPU is true according to the forward direction BN input data subset
The equal value set of forward direction BN input data subset of the fixed GPU, the equal value set of the forward direction BN input data subset include: described
The mean value and mean value of square of forward direction BN input data subset;
It receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
According to the equal value set of forward direction BN input data subset of the main GPU and other respectively from forward direction BN input data of GPU
Collect equal value set, determines that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to include: to the equal value set of BN input data
To the mean value and mean value of square of BN input data before global;
It is other respectively from GPU by being sent to before the overall situation to the equal value set of BN input data.
25. device according to claim 24, which is characterized in that before the forward direction BN processing unit determines the GPU
To the equal value set of BN input data subset, comprising:
According to formulaDetermine the mean value of the forward direction BN input data subset, wherein Bi={ xI, j(j=
1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor the forward direction BN input data
Data in subset, miFor the quantity of the data in the forward direction BN input data subset, μiWhen for the GPU being i-th of GPU
Forward direction BN input data subset mean value;
According to formulaDetermine the mean value of square of the forward direction BN input data subset, wherein viIt is described
The mean value of square of forward direction BN input data subset when GPU is i-th of GPU.
26. device according to claim 24, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN
Enter data mean value set, comprising:
According to formulaDetermine the global preceding mean value to BN input data, wherein n is the multiple GPU's
Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, miIn forward direction BN input data subset for i-th of GPU
Data quantity, μ be it is described it is global before to BN input data mean value;
According to formulaDetermine the global preceding mean value of square to BN input data, wherein viFor i-th of GPU
Forward direction BN input data subset mean value of square, v be it is described it is global before to BN input data mean value of square.
27. device according to claim 24, which is characterized in that the forward direction BN processing unit inputs the forward direction BN
Data subset is handled before carrying out to BN, comprising:
According to the global preceding mean value and mean value of square to BN input data, to every in the forward direction BN input data subset
A data operate before carrying out to BN, to data subset after BN before obtaining;
Offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN output data subset.
28. device according to claim 27, which is characterized in that the forward direction BN processing unit inputs the forward direction BN
Each data in data subset operate before carrying out to BN, comprising:
According to formula σ2=v- μ2Determine the global preceding variance to BN input data, wherein v is described global preceding defeated to BN
Enter the mean value of square of data, μ is the global preceding mean value to BN input data, σ2It is described global preceding to BN input data
Variance;
According to formulaIt is operated before being carried out to each data in the forward direction BN input data subset to BN,
In, Bi={ xI, j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor institute
Data before stating into BN input data subset, miFor the quantity of the data in the forward direction BN input data subset, μ is described
To the mean value of BN input data, σ before global2For the global preceding variance to BN input data, ε is fixed minimum non-zero
Value,For the data in data subset after the forward direction BN.
29. device according to claim 27, which is characterized in that the forward direction BN processing unit is to number after the forward direction BN
Offset operation is carried out according to data each in subset, comprising:
According to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein
γ, β are offset parameter,For the data in data subset after the forward direction BN, yI, jFor the forward direction BN output data subset
In data.
30. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN
Enter data mean value set, comprising:
It is described true according to the forward direction BN input data subset from GPU in the case where the GPU is the slave GPU in multiple GPU
The fixed equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset includes: institute
To the mean value and mean value of square of BN input data subset before stating;
The equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
It is preceding to the equal value set of BN input data to receive the overall situation from the main GPU, it is described global preceding to BN input data
Equal value set includes: the mean value and mean value of square before the overall situation to BN input data.
31. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN
Enter data mean value set, comprising:
The equal value set of forward direction BN input data subset that the GPU is determined according to the forward direction BN input data subset, before described
It include: the mean value and mean value of square of the forward direction BN input data subset to the equal value set of BN input data subset;
The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Receive the equal value set of forward direction BN input data subset from other each GPU;
Forward direction BN input data subset according to the equal value set of forward direction BN input data subset of the GPU and other each GPU is equal
Value set determines that the overall situation is preceding to the equal value set of BN input data, and the overall situation is preceding to include: to the equal value set of BN input data
To the mean value and mean value of square of BN input data before global.
32. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated
Enter data mean value set, comprising:
In the case where the GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction
BN input data subset determines the backward equal value set of BN input data subset of the main GPU, backward BN input data
Collect after equal value set includes to BN input data subset mean value and forward direction BN gradient calibration data mean value;
It receives from other respectively from the backward equal value set of BN input data subset of GPU;
According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU
Collect equal value set, determines that the equal value set of global backward BN input data, the global backward equal value set of BN input data include:
To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation;
The global backward equal value set of BN input data is sent to other respectively from GPU.
33. device according to claim 32, which is characterized in that after the backward BN processing unit determines the GPU
To the equal value set of BN input data subset, comprising:
According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined
Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient,It is for the GPU
Backward BN input data subset mean value when i-th of GPU;
According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Bi={ xI, j}
(j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xI, jNumber is inputted for the forward direction BN
According to the data in subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th of GPU for the GPU
When forward direction BN gradient calibration data mean value.
34. device according to claim 32, which is characterized in that the backward BN processing unit determines described global backward
The equal value set of BN input data subset, comprising:
According to formulaDetermine the global backward BN input data mean value, wherein n is the multiple GPU's
Quantity, miFor the quantity of the data of the forward direction BN input data subset of i-th of GPU,Number is exported for the forward direction BN of i-th of GPU
According to subset gradient mean value,For the global backward BN input data mean value;
According to formulaIt determines described global preceding to BN gradient calibration data mean value, wherein φiFor i-th of GPU
Forward direction BN gradient calibration data mean value, φ be it is described it is global before to BN gradient calibration data mean value.
35. device according to claim 32, which is characterized in that the backward BN processing unit inputs the forward direction BN
Data subset is handled after carrying out to BN, comprising:
The gradient of each data in data subset after the forward direction BN is determined according to the backward BN input data subset;
According to preceding to BN gradient to the equal value set of BN input data, global backward BN input data mean value and the overall situation before the overall situation
Correction data mean value determines the gradient of the global preceding variance to BN input data;
According to preceding to the determining overall situation of BN gradient calibration data mean value to the equal value set of BN input data, the overall situation before the overall situation
The gradient of forward direction BN input data mean value;
According to the ladder of the gradient of each data, the global preceding variance to BN input data in data subset after the forward direction BN
Degree, global preceding gradient, the overall situation to BN input data mean value are preceding to the equal value set of BN input data and the overall situation
The mean value of forward direction BN input data determines the gradient of each data in the forward direction BN input data subset.
36. device according to claim 35, which is characterized in that after the backward BN processing unit determines the forward direction BN
The gradient of each data in data subset, comprising:
According to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined
Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
37. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN
Enter the gradient of the variance of data, comprising:
According to formulaDetermine the global preceding variance to BN input data
Gradient, wherein σ2For the global preceding variance to BN input data, σ2=v- μ2, v is described global preceding to BN input number
According to mean value of square, μ be it is described it is global before to BN input data mean value, ε is fixed minimum nonzero value, and φ is the overall situation
Forward direction BN gradient calibration data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,For
The gradient of the global preceding variance to BN input data.
38. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN
Enter the gradient of data mean value, comprising:
According to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2For institute
State the variance before the overall situation to BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is institute
State the mean value before the overall situation to BN input data, ε is fixed minimum nonzero value, φ be it is described it is global before to BN gradient calibration data
Mean value, γ are offset parameter,It is the global preceding gradient to BN input data mean value.
39. the device according to claim 335, which is characterized in that the backward BN processing unit determines the forward direction BN
The gradient of each data in input data subset, comprising:
According to formulaDetermine the forward direction BN input data subset
In each data gradient, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For the overall situation
The variance of forward direction BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is the overall situation
The mean value of forward direction BN input data, ε are fixed minimum nonzero value,For the global preceding variance to BN input data
Gradient,It is the global preceding gradient to BN input data mean value,It is the number in the forward direction BN input data subset
According to xI, jGradient.
40. device according to claim 32, which is characterized in that the backward BN processing unit is also used to:
According to, to the equal value set of BN input data and the global backward equal value set of BN input data, determining BN layers of training before the overall situation
The gradient of parameter, the BN layers of training parameter include offset parameter γ and β.
41. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter γ, wherein φ be it is described it is global before to BN gradient calibration data mean value,
μ is the global preceding mean value to BN input data,To be described global preceding to BN output data gradient mean value, σ 2 is described complete
To the variance of BN input data, σ before office2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is described complete
To the mean value of BN input data before office, ε is fixed minimum nonzero value, miForward direction BN when for the GPU being i-th of GPU is defeated
Enter the quantity of the data of data subset,The gradient of offset parameter γ when for the GPU being i-th of GPU.
42. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter β, whereinIt is described global preceding to BN output data gradient mean value, miFor
The quantity of the data of forward direction BN input data subset when the GPU is i-th of GPU,When for the GPU being i-th of GPU
Offset parameter β gradient.
43. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated
Enter data mean value set, comprising:
In the case where the GPU is the slave GPU in multiple GPU, it is described from GPU according to the backward BN input data subset and
The forward direction BN input data subset determines the backward equal value set of BN input data subset from GPU, described after GPU
To BN input data subset mean value and forward direction BN gradient calibration data mean value after including to the equal value set of BN input data subset;
The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Receiving the backward equal value set of BN input data of the overall situation from the main GPU includes: global backward BN input data mean value
With before the overall situation to BN gradient calibration data mean value.
44. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated
Enter data mean value set, comprising:
After the GPU determines the GPU according to the backward BN input data subset and the forward direction BN input data subset
To the equal value set of BN input data subset, the backward equal value set of BN input data subset of the GPU include after to BN input number
According to subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Receive the backward equal value set of BN input data subset from other each GPU;
Backward BN input data subset according to the equal value set of backward BN input data subset of the GPU and other each GPU is equal
Value set determines that the equal value set of the global backward BN input data, the global backward equal value set of BN input data include:
To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710564223.4A CN109255439B (en) | 2017-07-12 | 2017-07-12 | DNN model training method and device with multiple GPUs in parallel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710564223.4A CN109255439B (en) | 2017-07-12 | 2017-07-12 | DNN model training method and device with multiple GPUs in parallel |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109255439A true CN109255439A (en) | 2019-01-22 |
CN109255439B CN109255439B (en) | 2021-04-02 |
Family
ID=65050560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710564223.4A Active CN109255439B (en) | 2017-07-12 | 2017-07-12 | DNN model training method and device with multiple GPUs in parallel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109255439B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308233A (en) * | 2019-08-02 | 2021-02-02 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for processing data |
CN112328532A (en) * | 2020-11-02 | 2021-02-05 | 长沙景嘉微电子股份有限公司 | Multi-GPU communication method and device, storage medium and electronic device |
US20210089887A1 (en) * | 2019-09-24 | 2021-03-25 | Apple Inc. | Variance-Based Learning Rate Control For Training Machine-Learning Models |
CN113011563A (en) * | 2021-03-19 | 2021-06-22 | 北京大学 | Convolutional neural network batch normalization processing method based on GPU |
CN117952815A (en) * | 2023-12-26 | 2024-04-30 | 深圳市腾进达信息技术有限公司 | Method and system for supporting multiple GPU (graphics processing Unit) to work simultaneously by single system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488662A (en) * | 2013-04-01 | 2014-01-01 | 哈尔滨工业大学深圳研究生院 | Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN104036451A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Parallel model processing method and device based on multiple graphics processing units |
US20160259037A1 (en) * | 2015-03-03 | 2016-09-08 | Nvidia Corporation | Radar based user interface |
CN106096605A (en) * | 2016-06-02 | 2016-11-09 | 史方 | A kind of image obscuring area detection method based on degree of depth study and device |
KR20170012019A (en) * | 2015-07-24 | 2017-02-02 | 삼성전자주식회사 | Method for optimizing parallel matrix multiplication in a system supporting multiple CPU and multiple GPU |
-
2017
- 2017-07-12 CN CN201710564223.4A patent/CN109255439B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488662A (en) * | 2013-04-01 | 2014-01-01 | 哈尔滨工业大学深圳研究生院 | Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN104036451A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Parallel model processing method and device based on multiple graphics processing units |
US20160259037A1 (en) * | 2015-03-03 | 2016-09-08 | Nvidia Corporation | Radar based user interface |
KR20170012019A (en) * | 2015-07-24 | 2017-02-02 | 삼성전자주식회사 | Method for optimizing parallel matrix multiplication in a system supporting multiple CPU and multiple GPU |
CN106096605A (en) * | 2016-06-02 | 2016-11-09 | 史方 | A kind of image obscuring area detection method based on degree of depth study and device |
Non-Patent Citations (3)
Title |
---|
M NIENIEWSKI: "Real-Time US Image Enhancement by Forward-Backward Diffusion Using GPU", 《IMAGE PROCESSING AND COMMUNICATIONS CHALLENGES 7》 * |
SERGEY IOFFE ET AL: "Batch Normalization: Accelerating Deep Network Training by Reducing", 《ARXIV》 * |
韩丹: "基于CPU-GPU的条件随机场并行化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308233A (en) * | 2019-08-02 | 2021-02-02 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for processing data |
CN112308233B (en) * | 2019-08-02 | 2024-07-19 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for processing data |
US20210089887A1 (en) * | 2019-09-24 | 2021-03-25 | Apple Inc. | Variance-Based Learning Rate Control For Training Machine-Learning Models |
CN112328532A (en) * | 2020-11-02 | 2021-02-05 | 长沙景嘉微电子股份有限公司 | Multi-GPU communication method and device, storage medium and electronic device |
CN112328532B (en) * | 2020-11-02 | 2024-02-09 | 长沙景嘉微电子股份有限公司 | Method and device for multi-GPU communication, storage medium and electronic device |
CN113011563A (en) * | 2021-03-19 | 2021-06-22 | 北京大学 | Convolutional neural network batch normalization processing method based on GPU |
CN117952815A (en) * | 2023-12-26 | 2024-04-30 | 深圳市腾进达信息技术有限公司 | Method and system for supporting multiple GPU (graphics processing Unit) to work simultaneously by single system |
Also Published As
Publication number | Publication date |
---|---|
CN109255439B (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255439A (en) | A kind of DNN model training method and device that multiple GPU are parallel | |
CN104714852B (en) | A kind of parameter synchronization optimization method and its system suitable for distributed machines study | |
CN107358293B (en) | Neural network training method and device | |
CN106796716B (en) | For providing the device and method of super-resolution for low-resolution image | |
CN111178542B (en) | System and method based on machine learning modeling | |
CN108122032B (en) | Neural network model training method, device, chip and system | |
CN109034381A (en) | Training mission optimization system, method and its non-transient computer readable media | |
CN108171762A (en) | System and method for is reconfigured quickly in a kind of similar image of the compressed sensing of deep learning | |
CN103262119B (en) | For the method and system that image is split | |
CN109299781A (en) | Distributed deep learning system based on momentum and beta pruning | |
CN106611216A (en) | Computing method and device based on neural network | |
CN112686385B (en) | Multi-site three-dimensional image oriented federal deep learning method and system | |
CN109117897A (en) | Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks | |
JP6981329B2 (en) | Distributed deep learning system | |
CN111914936B (en) | Data characteristic enhancement method and device for corpus data and computer equipment | |
CN109190504A (en) | Processing method, device and the readable storage medium storing program for executing of automobile image data | |
CN109272044A (en) | A kind of image similarity determines method, apparatus, equipment and storage medium | |
CN109550252A (en) | A kind of game AI training method, apparatus and system | |
CN112541584B (en) | Deep neural network model parallel mode selection method | |
CN108053454A (en) | A kind of graph structure data creation method that confrontation network is generated based on depth convolution | |
CN110263835B (en) | Rock category automatic identification method based on deep learning and Bayesian network | |
CN113342525A (en) | Distributed data processing system and method thereof | |
CN110502949A (en) | A kind of QR code image Fast Blind deblurring method based on adaptive scale control | |
CN110689136A (en) | Deep learning model obtaining method, device, equipment and storage medium | |
CN103839280B (en) | A kind of human body attitude tracking of view-based access control model information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200326 Address after: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Applicant after: BEIJING TUSENZHITU TECHNOLOGY Co.,Ltd. Address before: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Applicant before: TuSimple |
|
GR01 | Patent grant | ||
GR01 | Patent grant |