CN109255439A - A kind of DNN model training method and device that multiple GPU are parallel - Google Patents

A kind of DNN model training method and device that multiple GPU are parallel Download PDF

Info

Publication number
CN109255439A
CN109255439A CN201710564223.4A CN201710564223A CN109255439A CN 109255439 A CN109255439 A CN 109255439A CN 201710564223 A CN201710564223 A CN 201710564223A CN 109255439 A CN109255439 A CN 109255439A
Authority
CN
China
Prior art keywords
input data
gpu
forward direction
mean value
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710564223.4A
Other languages
Chinese (zh)
Other versions
CN109255439B (en
Inventor
龚轶凡
靳江明
苏磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tusimple Technology Co Ltd
Original Assignee
Beijing Tusimple Future Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tusimple Future Technology Co Ltd filed Critical Beijing Tusimple Future Technology Co Ltd
Priority to CN201710564223.4A priority Critical patent/CN109255439B/en
Publication of CN109255439A publication Critical patent/CN109255439A/en
Application granted granted Critical
Publication of CN109255439B publication Critical patent/CN109255439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The present invention discloses a kind of DNN model training method and device that multiple GPU are parallel, and existing training precision is low when for solving the problems, such as multiple GPU parallel training DNN models in the prior art.This method comprises: in propagated forward treatment process, to BN input data subset before receiving;It determines global preceding to the equal value set of BN input data;According to before the overall situation to the equal value set of BN input data, to it is preceding carried out to BN input data subset before handle to BN, to BN output data subset before obtaining;In back-propagating treatment process, to BN input data subset after reception;Determine the equal value set of global backward BN input data;It is preceding to BN data mean value set according to the equal value set of global BN input data backward, backward BN input data subset and the overall situation, it is handled to BN after the preceding progress to BN input data subset, obtains the gradient of preceding each data into BN input data subset.

Description

A kind of DNN model training method and device that multiple GPU are parallel
Technical field
The present invention relates to field of information processing, in particular to a kind of multiple graphics processing unit (Graphics Processing Unit, GPU) parallel deep neural network (Deep Neutral Network, DNN) model training method And device.
Background technique
At present in the deep learning that picture is classified and divided, DNN model training will do it.It deposits in the prior art In a kind of method of multiple GPU parallel trainings.By the data (or being global data) of one or more picture according to the number of GPU Amount is divided into multiple data subsets, and this multiple data subset is corresponded to and distributes to multiple GPU, and each GPU is sub using the data of distribution Collection is trained DNN model, to improve the efficiency of training.Specifically during actual treatment, in a cycle of training Interior, a collection of training data (data batch) (such as plurality of pictures) of acquisition according to existing GPU card number, is divided by system The data subset (sub batch) of corresponding number, and data subset is distributed to corresponding GPU card.In training, each piece The DNN model to be trained of the pre-loaded complete set of meeting, reuses the data subset being assigned to and goes to train the DNN on GPU card Model.
Since the data that every piece of GPU is obtained are different, the ladder that different GPU cards train the DNN Model Weight come will lead to Degree has differences.
In this case it will do it mold sync operation, i.e., will train the gradient come on different GPU and carry out reduction merging, Determination obtains identical gradient, then is gone to update the Model Weight on every piece of GPU with the gradient that the reduction merges.
Through the above scheme, the efficiency of more GPU parallel training DNN models improves, but the accuracy decline integrally trained , and when the quantity of GPU is more, accuracy decline is more obvious.
Summary of the invention
In view of the above problems, the present invention provides a kind of DNN model training method and device that multiple GPU are parallel, to Solve the problems, such as that existing training precision is low when multiple GPU parallel training DNN models in the prior art.
According to the one aspect of the application, a kind of model training side DNN that multiple GPU are parallel is provided in some embodiments Method a, comprising: GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, at propagated forward During reason, to normalization (BN) input data subset before receiving;It determines global preceding to the equal value set of BN input data;According to institute State before the overall situation to the equal value set of BN input data, handled before being carried out to the forward direction BN input data subset to BN, before obtaining to BN output data subset;
In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is The gradient set of the forward direction BN output data subset;Determine the equal value set of global backward BN input data;According to the overall situation It is right backward to BN data mean value set before the equal value set of BN input data, the backward BN input data subset and the overall situation The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset Degree.
According to the one aspect of the application, a kind of DNN model training that more GPU are parallel dress is provided in some embodiments It sets, described device is arranged in each GPU of multiple GPU, and described device includes: preceding to normalization (BN) processing unit, is used for In propagated forward treatment process, to BN input data subset before receiving;It determines global preceding to the equal value set of BN input data;Root It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset Forward direction BN output data subset;Backward BN processing unit, is used in back-propagating treatment process, to BN input data after reception Subset, the backward BN input data subset are the gradient set of the forward direction BN output data subset;Determine global backward BN The equal value set of input data;According to the equal value set of the global backward BN input data, the backward BN input data subset and It is described it is global before to BN data mean value set, handled after being carried out to the forward direction BN input data subset to BN, obtain it is described before The gradient of each data into BN input data subset.
Through method and apparatus provided by the embodiments of the present application, when multiple GPU parallel training DNN models, at forward direction BN The equal value set before the overall situation to BN input data is introduced during reason, is introduced after the overall situation into BN treatment process rear to BN The equal value set of input data can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on complete It handled before the mean value situation progress of office data to BN processing and backward BN, obtain carrying out when global data is trained with single GPU Similar overall situation gradient improves training precision, and training precision is low when so as to solve multiple GPU parallel trainings in the prior art The problem of.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.
Fig. 1 a is the schematic diagram of DNN model training multiple GPU parallel in the prior art;
Fig. 1 b is the training precision and testing accuracy figure of DNN model training multiple GPU parallel in the prior art;
Fig. 2 is the flow chart of the parallel DNN model training method of multiple GPU provided by the embodiments of the present application;
Fig. 3 a is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 3 b is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 3 c is a kind of process flow diagram of step 201 in Fig. 2;
Fig. 4 a is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 4 b is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 4 c is a kind of process flow diagram of step 202 in Fig. 2;
Fig. 5 is the structural block diagram of the parallel DNN model training apparatus of multiple GPU provided by the embodiments of the present application;
Fig. 6 is the model training precision figure for implementing method shown in Fig. 2;
Fig. 7 is the tested accuracy figure for implementing method shown in Fig. 2.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.
In the prior art, when carrying out more GPU parallel training DNN models, the data subset that is assigned to due to each GPU It is a part of global data, is deposited using the gradient that the trained DNN of data subset will lead to the Model Weight that different GPU are trained In difference, the gradient come will be trained on different GPU at this time and carries out reduction merging, obtains identical gradient, then gone with the gradient The Model Weight on each GPU is updated, the precision of training is wanted when will lead to the ratio of precision list GPU of the model of more GPU parallel trainings It is low, and when the quantity of GPU is more, accuracy decline is more obvious.
Present inventor has found the parallel model training side DNN more GPU during solving above-mentioned technical problem In method, normalization (Batch Normalization, BN) layer operate between data, mainly to the number for entering BN layers on the GPU The mean value and variance that total data is calculated according to subset (i.e. BN layers of input data subset) reuse the mean value and variance to data The each data concentrated are normalized.Specific as shown in Figure 1a, multiple GPU include GPU 0, GPU 1 and GPU 2, GPU 0, GPU 1 and GPU 2 is right using the data subset Sub Batch 0, Sub Batch 1 and the Sub Batch 2 that are assigned to respectively Preloading DNN model is trained in it, in forward direction treatment process, carries out BN processing to the data subset for entering BN layers, The gradient come will be trained after backward processing on different GPU and carries out reduction merging, determines and obtains identical gradient, then really with this Surely the gradient obtained goes to update the Model Weight on every piece of GPU.
But during forward direction processing, since the data subset on each GPU is a part of global data, respectively Data between a GPU have differences, and the mean value and variance of the data subset calculated in BN layers are all different, use mean value With variance each data are normalized, the locality of the data on each GPU of further expansion, so as to cause each The calculated gradient direction of GPU is not global descent direction and the low problem of training precision.As shown in Figure 1 b, 3 GPU are simultaneously When capable DNN model training precision (as shown in fine line in Fig. 1 b) carries out DNN model training to global data compared to single GPU Precision (as shown in heavy line in Fig. 1 b) have dropped 7% or so, and 3 GPU parallel testing accuracy (fine dotted line in such as Fig. 1 b It is shown) than single GPU when testing accuracy (as shown in heavy line in Fig. 1 b) have dropped 15% or so.When GPU quantity further increases Added-time, precision can further decline.
For this problem, it in method provided by the embodiments of the present application, when multiple GPU parallel training DNN models, is carrying out During propagated forward processing, to before the overall situation to BN input data determine it is global before to the equal value set of BN input data, and root According to before the overall situation to the equal value set of BN input data, to BN processing before being carried out to the forward direction BN input data subsets for entering BN layer, And rear into BN treatment process, using the gradient set of the preceding data subset to BN processing output to the input of BN as after, And the equal value set of global backward BN input data is determined to global backward BN input data, according to the backward BN input data of the overall situation Equal value set is handled to BN after the preceding progress to BN input data subset, determines the gradient before obtaining to BN input data;Due to The equal value set before the overall situation to BN input data is introduced in forward direction BN treatment process, is introduced rear into BN treatment process The equal value set of global backward BN input data can make up for it GPU and not obtain the defect that total data carries out DNN model training, Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art Training precision low problem when DNN model.
Method and apparatus provided by the embodiments of the present application are described in detail below.
Embodiment one
Referring to fig. 2, the embodiment of the present application provides a kind of DNN model training method that multiple GPU are parallel, the place of this method Managing process includes:
Step 201, a GPU in multiple GPU is when carrying out DNN model training to the data subset being assigned to, preceding To during dissemination process, to normalization (BN) input data subset before receiving;It determines global preceding to BN input data mean value collection It closes;It is handled before being carried out to the forward direction BN input data subset to BN according to the overall situation is preceding to the equal value set of BN input data, To BN output data subset before obtaining;
Step 202, in back-propagating treatment process, to BN input data subset after reception, the backward BN inputs number It is the gradient set of the forward direction BN output data subset according to subset;Determine the equal value set of global backward BN input data;According to To BN data mean value before the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation Set is handled after carrying out to the forward direction BN input data subset to BN, is obtained each in the forward direction BN input data subset The gradient of data.
Method provided by the present application is preceding to BN input data mean value according to the determining overall situation in propagated forward treatment process Set is handled before the preceding progress to BN input data subset to BN, in back-propagating treatment process, after the determining overall situation To BN data mean value set before to the equal value set of BN input data, backward BN input data subset and the overall situation, to the forward direction BN Input data subset is handled after carrying out to BN, and can make up for it GPU there is no the defect that total data carries out DNN model training, Can based on the mean value situation of global data carry out before to BN processing and backward BN processing, obtain and single GPU carry out the overall situation number Similar global gradient, raising training precision when according to training, so as to solve multiple GPU parallel trainings in the prior art Existing training precision low problem when DNN model.
The forward direction BN processing in propagated forward treatment process and the backward BN in back-propagating treatment process are handled below It is described in detail.
In the embodiment of the present invention, in abovementioned steps 201, can specifically it pass through before determining the overall situation to the equal value set of BN input data But it is not limited only to following two mode to realize:
In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, before determining the overall situation by main GPU To the equal value set of BN input data, and other will be sent to respectively from GPU, from GPU to the equal value set of BN input data before the overall situation It is global preceding to the equal value set of BN input data to no longer need to independent calculating.
Regardless of principal and subordinate, each GPU independently determines global preceding to the equal value set of BN input data by mode 2, multiple GPU.
Based on mode 1, when GPU is main GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding A1~step A4 is realized:
Step A1, main GPU determines the forward direction BN input data subset of the GPU according to the forward direction BN input data subset Equal value set, the equal value set of the forward direction BN input data subset include: the mean value peace of the forward direction BN input data subset Square mean value;
Step A2, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
Step A3, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction BN of GPU The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data Value set includes: the mean value and mean value of square before the overall situation to BN input data;
Step A4, it will be sent to before the overall situation to the equal value set of BN input data other respectively from GPU.
Based on mode 1, when GPU is from GPU, following steps can be passed through to the equal value set of BN input data by determining that the overall situation is preceding B1~step B3 is realized:
Step B1, forward direction BN input data from GPU is determined from GPU according to the forward direction BN input data subset Collect equal value set, the equal value set of the forward direction BN input data subset include: the forward direction BN input data subset mean value and Mean value of square;
Step B2, the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
Step B3, receive from the main GPU it is described it is global before to the equal value set of BN input data, it is described it is global before to The equal value set of BN input data includes: the mean value and mean value of square before the overall situation to BN input data.
Based on mode 2, can be realized to the equal value set of BN input data by following steps C1~step C3 before determining the overall situation:
Step C1, the forward direction BN input data subset mean value of the GPU is determined according to the forward direction BN input data subset Set, the equal value set of the forward direction BN input data subset include: the mean value of the forward direction BN input data subset and square equal Value;
Step C2, the equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Step C3, the equal value set of forward direction BN input data subset from other each GPU is received;
Step C4, it is inputted according to the forward direction BN of the equal value set of forward direction BN input data subset of the GPU and other each GPU The equal value set of data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding equal to BN input data Value set includes: the mean value and mean value of square before the overall situation to BN input data.
Wherein step A1, step B1 are identical with the implementation of step C1.Wherein mode 1, in mode 2, GPU is according to multiple The equal value set of input data subset of GPU determines that the global preceding mode to the equal value set of BN input data is also identical.
Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step Rapid 201 specific implementation is described in detail, respectively referring to Fig. 3 a, Fig. 3 b and Fig. 3 c.
Fig. 3 a shows the detailed process of step 201 in Fig. 2, including following process flow:
Step 2011, to BN input data subset before receiving, this it is preceding to BN input data subset be in propagated forward processing It is input to BN layers of data subset in the process, is specifically represented by Bi={ xi,j(j=1,2 ... mi), BiIt is i-th for the GPU Forward direction BN input data subset when a GPU, xi,jFor the data in the forward direction BN input data subset, miFor the forward direction The quantity of data in BN input data subset;
Step 2012, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the forward direction BN Input data subset determines the equal value set of forward direction BN input data subset of the main GPU, the forward direction BN input data subset Equal value set includes: the mean value and mean value of square of the forward direction BN input data subset;
It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN input The mean value of data subset, according to formulaDetermine the mean value of square of the forward direction BN input data subset;
Wherein, μiThe mean value of forward direction BN input data subset when for the GPU being i-th of GPU, viIt is for the GPU The mean value of square of forward direction BN input data subset when i GPU;
In the embodiment in fact of the application, to the equal of BN input data subset before can also being determined according to other methods Value and mean value of square, these methods be for those of ordinary skills it is well known, repeat no more herein;
Step 2013, it receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
Step 2014, according to the equal value set of forward direction BN input data subset of the main GPU and other respectively from the forward direction of GPU The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input data Equal value set includes: the mean value and mean value of square before the overall situation to BN input data;And it will be described global preceding equal to BN input data Value set is sent to other respectively from GPU;
It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN input The mean value of data, according to formulaDetermine the global preceding mean value of square to BN input data;
Wherein, n is the quantity of the multiple GPU, miFor the data in the forward direction BN input data subset of i-th of GPU Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, μ is the global preceding mean value to BN input data, vi For the mean value of square of the forward direction BN input data subset of i-th of GPU, v is the global preceding mean value of square to BN input data;
In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the equal of BN input data Value and mean value of square, those of ordinary skill in the art can select specific algorithm according to specific application scenarios, the application this In be not listed one by one;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN Enter the variance of data;
It, can be according to formula σ in some embodiments of the present application2=v- μ2It determines described global preceding to BN input data Variance, wherein σ2For the global preceding variance to BN input data, v and μ are respectively described as shown in step 2014 To the mean value of square and mean value of BN input data before global;
In the other embodiments of the application, it can also be determined according to other methods before the overall situation to the side of BN input data Difference, those skilled in the art can select specific algorithm according to specific application scenarios, and the application is not listed one by one here;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset Each data operate before carrying out to BN, to data subset after BN before obtaining;
It, can be according to formula in some embodiments of the present applicationTo forward direction BN input data The each data concentrated operate before carrying out to BN, wherein x as described abovei,jFor the number in the forward direction BN input data subset According to miFor the quantity of the data in the forward direction BN input data subset, μ is the global preceding mean value to BN input data, σ2 For the global preceding variance to BN input data, ε is fixed minimum nonzero value, prevents the occurrence of removing zero,For institute Data before stating into data subset after BN;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN Output data subset.
It, can be according to formula in some embodiments of the present applicationTo data after the forward direction BN Each data are concentrated to carry out offset operation, wherein γ, β are offset parameter,For the number in data subset after the forward direction BN According to yi,jFor the data in the forward direction BN output data subset.
In above-mentioned propagated forward treatment process, GPU determines that the overall situation is preceding to BN input data mean value and mean value of square, to preceding To the progress of BN input data subset based on being handled before the overall situation to the BN of BN input data mean value and mean value of square, can make up for it GPU does not obtain the defect handled before total data carries out to BN, at BN before capable of being carried out based on the mean value situation of global data Reason.
Treatment process shown in Fig. 3 a describes the forward direction BN processing working principle of the main GPU in multiple GPU, from GPU's The difference of forward direction BN processing working principle and the forward direction BN processing working principle of main GPU is above-mentioned steps 2012-2014, other Treatment process is identical as step 2011 shown in Fig. 3 a and 2015-2017, below with reference to Fig. 3 b to forward direction BN processing place from GPU Reason process is illustrated, and is repeated no more in Fig. 3 b with identical processing step in Fig. 3 a.
Step 2011, to BN input data subset before receiving;
Step 2012 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN Input data subset determines the equal value set of forward direction BN input data subset from GPU;To BN input data subset before determining The method of equal value set is identical as step 2012, and which is not described herein again;
Step 2013 ', the equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
Step 2014 ', it receives described global preceding to the equal value set of BN input data, the overall situation from the main GPU The equal value set of forward direction BN input data includes: the mean value and mean value of square before the overall situation to BN input data;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN Enter the variance of data;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset Each data operate before carrying out to BN, to data subset after BN before obtaining;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN Output data subset.
In the case where multiple GPU point is master/slave GPU, is determined from main GPU it is global preceding to the equal value set of BN input data, It is preceding to the equal value set of BN input data that overall situation determined by main GPU is received from GPU, can save the process resource from GPU.
In some other embodiment of the application, master/slave GPU can not also be distinguished, before each GPU is independently determined the overall situation To the equal value set of BN input data.The forward direction BN processing working principle of each independent GPU and the forward direction BN of main GPU handle working principle Difference be above-mentioned steps 2012-2014, other treatment processes are identical as step 2011 shown in Fig. 3 a and 2015-2017, It is illustrated below with reference to treatment process of Fig. 3 c to each GPU, is repeated no more in Fig. 3 c with identical processing step in Fig. 3 a.
Step 2011, to BN input data subset before receiving;
Step 2012 " determines that the forward direction BN input data subset of the GPU is equal according to the forward direction BN input data subset Value set;Identical as step 2012 to the method for the equal value set of BN input data subset before determining, which is not described herein again;
The equal value set of identified forward direction BN input data subset is sent to other each GPU by step 2013 ";It receives and From the equal value set of forward direction BN input data subset of other each GPU;
Step 2014 ", according to the forward direction BN of forward direction BN input data the subset equal value set and other each GPU of the GPU The equal value set of input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input number According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data;It determines global preceding to BN input data mean value The method of set is identical as step 2014, and which is not described herein again;
Step 2015, it according to the global preceding mean value and mean value of square to BN input data, determines global preceding defeated to BN Enter the variance of data;
Step 2016, according to the global preceding variance to BN input data in the forward direction BN input data subset Each data operate before carrying out to BN, to data subset after BN before obtaining;
Step 2017, offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN Output data subset.
In the case that each GPU in multiple GPU is independent GPU, each GPU is inputted before respectively determining the overall situation respectively to BN Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.
The BN processing of back-propagating treatment process is illustrated below.
In the embodiment of the present invention, in abovementioned steps 202, determine that the global backward equal value set of BN input data can specifically pass through But it is not limited only to following two mode to realize:
In mode 1, multiple GPU, choosing a GPU as main GPU, other GPU is from GPU, after determining the overall situation by main GPU Other are sent to respectively from GPU, from GPU to the equal value set of BN input data, and by the backward equal value set of BN input data of the overall situation No longer need to the equal value set of the independent global backward BN input data of calculating.
Regardless of principal and subordinate, each GPU independently determines the equal value set of global backward BN input data by mode 2, multiple GPU.
Main GPU determines that the global backward equal value set of BN input data can be realized by following step D1~D4 in mode 1:
Step D1, main GPU determines institute according to the backward BN input data subset and the forward direction BN input data subset State the backward equal value set of BN input data subset of main GPU, the backward equal value set of BN input data subset include after to BN Input data subset mean value and forward direction BN gradient calibration data mean value;
Step D2, it receives from other respectively from the backward equal value set of BN input data subset of GPU;
Step D3, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN of GPU The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set;
Step D4, the global backward equal value set of BN input data is sent to other respectively from GPU.
Determine that the global backward equal value set of BN input data can be realized by following step E1~E3 from GPU in mode 1:
Step E1, described true according to the backward BN input data subset and the forward direction BN input data subset from GPU The fixed backward equal value set of BN input data subset from GPU, the backward equal value set of BN input data subset from GPU Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value;
Step E2, the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Step E3, the equal value set of the backward BN input data of the overall situation from the main GPU is received, the global backward BN is defeated Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value.
Each GPU can be realized by following steps F1~F4 in mode 2:
Step F1, the described GPU is determined according to the backward BN input data subset and the forward direction BN input data subset The backward equal value set of BN input data subset of the GPU, after the backward equal value set of BN input data subset of the GPU includes To BN input data subset mean value and forward direction BN gradient calibration data mean value;
Step F2, the backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Step F3, the backward equal value set of BN input data subset from other each GPU is received;
Step F4, it is inputted according to the backward BN of the equal value set of backward BN input data subset of the GPU and other each GPU The equal value set of data subset, determines the equal value set of the global backward BN input data, and the global backward BN input data is equal Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.
Wherein step D1, step E1 are identical with the implementation of step F1.Wherein mode 1, in mode 2, GPU is according to multiple The backward equal value set of BN input data subset of GPU determines the global preceding mode to the equal value set of BN input data also phase Together.
Below for the main GPU in foregoing manner 1, the slave GPU in mode 1 and each GPU in mode 2, aforementioned step Rapid 202 specific implementation is described in detail, and distinguishes a, Fig. 4 b and Fig. 4 c referring to fig. 4.
Fig. 4 a shows the detailed process of step 202 in Fig. 2, including following process flow:
It step 2021, to BN input data subset is in above-mentioned steps 2017 after this after reception to BN input data subset The gradient set for determining obtained forward direction BN input data subset, is specifically represented by GiBackward BN input data subset when for the GPU being i-th of GPU,It is scheduled loss function, yi,jFor the forward direction BN Data in output data subset,For yi,jGradient namely backward BN input data subset in data;
Step 2022, in the case where the GPU is the main GPU in multiple GPU, the main GPU is according to the backward BN Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection of the main GPU Close, the backward equal value set of BN input data subset include after to BN input data subset mean value and forward direction BN gradient calibration number According to mean value;
It, can be according to formula in some embodiments of the present applicationDetermine the backward BN input Data subset mean value, whereinBackward BN input data subset mean value when for the GPU being i-th of GPU;
It, can be according to formula in some embodiments of the present applicationDetermine the forward direction BN Gradient calibration data mean value, wherein φiForward direction BN gradient calibration data mean value when for the GPU being i-th of GPU;
Step 2023, it receives from other respectively from the backward equal value set of BN input data subset of GPU;
Step 2024, according to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward of GPU The equal value set of BN input data subset determines the equal value set of global backward BN input data, the global backward BN input data Value set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value;After the overall situation It is sent to the equal value set of BN input data subset other respectively from GPU;
It, can be according to formula in some embodiments of the present applicationDetermine the global backward BN input Data mean value, wherein n is the quantity of the multiple GPU, miFor the number of the data of the forward direction BN input data subset of i-th of GPU Amount,For the forward direction BN output data subset gradient mean value of i-th of GPU,For the global backward BN input data mean value;
It, can be according to formula in some embodiments of the present applicationIt determines described global preceding to BN ladder Spend correction data mean value, wherein φiFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN Gradient calibration data mean value;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset Gradient;
It, can be according to formula in some embodiments of the present applicationDetermine data after the forward direction BN The gradient of each data in subset, whereinIt is scheduled loss function, γ is offset parameter,For number after the forward direction BN According to sub- intensive dataGradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
In some embodiments of the present application, according to formula It determines The gradient of the global preceding variance to BN input data, wherein σ2For, to the variance of BN input data, ε is to fix before the overall situation Minimum nonzero value, φ be it is described it is global before to BN gradient calibration data mean value,It is described global preceding to BN output data ladder Mean value is spent, γ is offset parameter,For the gradient of the global preceding variance to BN input data;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation The global preceding gradient to BN input data mean value is determined according to mean value;
In some embodiments of the present application, according to formulaIt determines described global preceding defeated to BN Enter the gradient of data mean value, wherein σ2For, to the variance of BN input data, ε is fixed minimum nonzero value, and φ is institute before the overall situation It stating before the overall situation to BN gradient calibration data mean value, γ is offset parameter,It is the global preceding ladder to BN input data mean value Degree;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset Degree;
In some embodiments of the present application, according to formula Determine the gradient of each data in the forward direction BN input data subset, whereinFor the forward direction determined in above-mentioned steps 2025 After BN in data subset each data gradient, σ2For, to the variance of BN input data, ε is fixed minimum non-zero before the overall situation Value,For in above-mentioned steps 2026 determine the overall situation before to BN input data variance gradient,It is above-mentioned steps 2027 The preceding gradient to BN input data mean value of the overall situation of middle determination,It is the data x in the forward direction BN input data subseti,j's Gradient.
In above-mentioned back-propagating treatment process, by the gradient of the forward direction BN output data subset exported after the preceding processing to BN Gather the input handled as after to BN, the equal value set of the determining backward BN input data of the overall situation of GPU, based on global preceding to BN input Data mean value set and the global backward equal value set of BN input data, to it is preceding carried out to BN input data subset after at BN Reason can make up for it GPU and not obtain the defect that total data carries out DNN model training, can be based on the mean value situation of global data The preceding global gradient for handling to BN processing and backward BN, obtaining carrying out similar data when global data is trained with list GPU is carried out, Similar precision when being trained with list GPU to global data can be reached to the precision of model training;To which the application mentions More GPU out parallel DNN model training method is able to solve multiple GPU parallel training DNN models existing in the prior art When the low problem of existing training precision.
Treatment process shown in Fig. 4 a describes the backward BN processing working principle of the main GPU in multiple GPU, from GPU's The difference of backward BN processing working principle and the backward BN processing working principle of main GPU is above-mentioned steps 2022-2024, other Treatment process is identical as step 2011 shown in Fig. 4 a and 2025-2028, below with reference to Fig. 4 b to backward BN processing place from GPU Reason process is illustrated, and is repeated no more in Fig. 4 b with identical processing step in Fig. 4 a.
Step 2021, to BN input data subset after reception;
Step 2022 ', the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN Input data subset and the forward direction BN input data subset determine the backward BN input data subset mean value collection from GPU Close, it is described from the backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN ladder Spend correction data mean value;
Step 2023 ', the identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Step 2024 ', the equal value set of the backward BN input data of the overall situation from the main GPU is received, it is described global backward The equal value set of BN input data includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset Gradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation The global preceding gradient to BN input data mean value is determined according to mean value;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset Degree.
In the case where being master/slave GPU for multiple GPU points, the equal value set of global backward BN input data is determined by main GPU, The equal value set of the backward BN input data of the overall situation determined by main GPU is received from GPU, the process resource from GPU can be saved.
In some other embodiment of the application, master/slave GPU can not also be distinguished, after each GPU is independently determined the overall situation To the equal value set of BN input data.The backward BN processing working principle of each independent GPU and the backward BN of main GPU handle working principle Difference be above-mentioned steps 2022-2024, other treatment processes are identical as step 2021 shown in Fig. 4 a and 2025-2028, It is illustrated below with reference to treatment process of Fig. 4 c to each GPU, is repeated no more in Fig. 4 c with identical processing step in Fig. 4 a.
Step 2021, to BN input data subset after reception;
Step 2022 ", the GPU are true according to the backward BN input data subset and the forward direction BN input data subset The backward equal value set of BN input data subset of the fixed GPU, the backward equal value set of BN input data subset of the GPU include Backward BN input data subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU by step 2023 ";It receives and From the backward equal value set of BN input data subset of other each GPU;
Step 2024 ", according to the backward BN of backward BN input data the subset equal value set and other each GPU of the GPU The equal value set of input data subset determines that the equal value set of the global backward BN input data, the global backward BN input number To BN gradient calibration data mean value before including: global backward BN input data mean value and be global according to equal value set;
Step 2025, each data in data subset are determined after the forward direction BN according to the backward BN input data subset Gradient;
Step 2026, according to it is described it is global before to the equal value set of BN input data, global backward BN input data mean value and To BN gradient calibration data mean value before global, the gradient of the global preceding variance to BN input data is determined;
Step 2027, according to preceding to BN gradient calibration number to the equal value set of BN input data, the overall situation before the overall situation The global preceding gradient to BN input data mean value is determined according to mean value;
Step 2028, it is inputted according to before the gradient of each data, the overall situation in data subset after the forward direction BN to BN The gradient of the variance of data, global preceding gradient, the overall situation to BN input data mean value are preceding to BN input data mean value Set and the global preceding mean value to BN input data, determine the ladder of each data in the forward direction BN input data subset Degree.
In the case that each GPU in multiple GPU is independent GPU, each GPU respectively determines global backward BN input respectively Data mean value set, the operation independence between each GPU is high, the processing result independent of other GPU.
On the basis of the processing method shown in Fig. 2 to Fig. 4 c, more GPU provided by the embodiments of the present application parallel DNN model Training method further comprises following processing: according to defeated to the equal value set of BN input data and global backward BN before the overall situation Enter data mean value set, determine the gradient of BN layers of training parameter, the training parameter includes above-mentioned offset parameter γ and β.
It, can be according to formula in some embodiments of the present applicationDetermine the ladder of offset parameter γ Degree, according to public affairsDetermine the gradient of offset parameter β, whereinOffset when for the GPU being i-th of GPU The gradient of parameter γ,The gradient of offset parameter β when for the GPU being i-th of GPU.
After the gradient for determining offset parameter γ and β, gradient and the gradient descent algorithm of the determination can use to update The value of γ and β achievees the purpose that optimize DNN model.
The DNN model training apparatus parallel to multiple GPU provided by the embodiments of the present application is illustrated below, which sets It is placed in each GPU in multiple GPU, which carries out DNN model training to the data subset being assigned to, and Fig. 5 shows the dress The structural block diagram set, the device include: preceding to BN processing unit 51 and backward BN processing unit 52.
Forward direction BN processing unit 51, it is preceding to BN input data subset for receiving in propagated forward treatment process;It determines To the equal value set of BN input data before global;It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN Enter before data subset carries out and handled to BN, to BN output data subset before obtaining;
Wherein, in some embodiments of the present application, the forward direction BN processing unit 51 determines global preceding to BN input number According to equal value set, comprising: in the case where the GPU is the main GPU in multiple GPU, the main GPU is defeated according to the forward direction BN Enter the equal value set of forward direction BN input data subset that data subset determines the GPU, the forward direction BN input data subset mean value Set includes: the mean value and mean value of square of the forward direction BN input data subset;It receives from other respectively defeated from the forward direction BN of GPU Enter the equal value set of data subset;According to the equal value set of forward direction BN input data subset of the main GPU and other respectively before GPU To the equal value set of BN input data subset, determine that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to input number to BN According to the mean value and mean value of square that equal value set includes: before the overall situation to BN input data;It will be described global preceding equal to BN input data Value set is sent to other respectively from GPU.
In other embodiments of the application, the forward direction BN processing unit determines global preceding to BN input data mean value Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the forward direction BN input data Subset determines the equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset It include: the mean value and mean value of square of the forward direction BN input data subset;By identified forward direction BN input data subset mean value Gather the main GPU being sent in multiple GPU;Receive from the main GPU it is described it is global before to the equal value set of BN input data, To the mean value and mean value of square of BN input data before including: the overall situation to the equal value set of BN input data before the overall situation.
In other embodiments of the application, the forward direction BN processing unit 51 determines global preceding equal to BN input data Value set, comprising: the forward direction BN input data subset mean value collection of the GPU is determined according to the forward direction BN input data subset It closes, the equal value set of the forward direction BN input data subset includes: the mean value and mean value of square of the forward direction BN input data subset; The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;Receive the forward direction BN from other each GPU The equal value set of input data subset;According to the forward direction of forward direction BN input data the subset equal value set and other each GPU of the GPU The equal value set of BN input data subset determines that the overall situation is preceding to the equal value set of BN input data, described global preceding to BN input Data mean value set includes: the mean value and mean value of square before the overall situation to BN input data.
The forward direction BN processing unit 51 determines the equal value set of forward direction BN input data subset of the GPU, comprising: according to FormulaDetermine the mean value of the forward direction BN input data subset, wherein Βi={ xi,j(j=1,2 ... mi), the forward direction BN input data subset that Bi is the GPU when being i-th of GPU, xi,jFor in the forward direction BN input data subset Data, mi be the forward direction BN input data subset in data quantity, μiForward direction when for the GPU being i-th of GPU The mean value of BN input data subset;According to formulaDetermine square of the forward direction BN input data subset Mean value, wherein νiThe mean value of square of forward direction BN input data subset when for the GPU being i-th of GPU.
The forward direction BN processing unit 51 determines global preceding to the equal value set of BN input data, comprising: according to formulaDetermine the global preceding mean value to BN input data, wherein n is the quantity of the multiple GPU, μiIt is The mean value of the forward direction BN input data subset of i GPU, miFor the number of the data in the forward direction BN input data subset of i-th of GPU Amount, μ are the global preceding mean value to BN input data;According to formulaIt determines described global preceding to BN input number According to mean value of square, wherein νiFor the mean value of square of the forward direction BN input data subset of i-th of GPU, ν be it is described it is global before to The mean value of square of BN input data.
The forward direction BN processing unit 51 is handled before carrying out to the forward direction BN input data subset to BN, comprising: according to It is described it is global before to BN input data mean value and mean value of square, to each data in the forward direction BN input data subset into It is operated before row to BN, to data subset after BN before obtaining;Offset behaviour is carried out to each data in data subset after the forward direction BN Make, obtains the forward direction BN output data subset.
The forward direction BN processing unit 51 is grasped before carrying out to each data in the forward direction BN input data subset to BN Make, comprising:
According to formula σ2=ν-μ2Determine it is described it is global before to BN input data variance, wherein ν be it is described it is global before to The mean value of square of BN input data, μ are the global preceding mean value to BN input data, σ2It is described global preceding to BN input number According to variance;
According to formulaIt is grasped before being carried out to each data in the forward direction BN input data subset to BN Make, wherein Βi={ xi,j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xi,jFor the data in the forward direction BN input data subset, miFor the quantity of the data in the forward direction BN input data subset, μ is the global preceding mean value to BN input data, σ2For the global preceding variance to BN input data, ε is fixed pole Small nonzero value,For the data in data subset after the forward direction BN.
The forward direction BN processing unit 51 carries out offset operation, packet to data each in data subset after the forward direction BN It includes: according to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein γ, β are offset parameter,For the data in data subset after the forward direction BN, yi,jFor the forward direction BN output data subset In data.
The forward direction BN processing unit 51, is also used to: will be sent to it to the equal value set of BN input data before the overall situation Its each GPU;Alternatively, the equal value set of forward direction BN input data subset of the GPU is sent to other each GPU.
Backward BN processing unit 52, is used in back-propagating treatment process, described to BN input data subset after reception Backward BN input data subset be before the forward direction BN processing unit 51 carries out to after BN processing, obtained forward direction BN output data The gradient set of subset;Determine the equal value set of global backward BN input data;According to the global backward BN input data mean value To BN data mean value set before set, the backward BN input data subset and the overall situation, to the forward direction BN input data Subset is handled after carrying out to BN, obtains the gradient of each data in the forward direction BN input data subset.
Wherein, the backward BN processing unit 52 determines the equal value set of global backward BN input data, comprising: described In the case that GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction BN input data Subset determines the backward equal value set of BN input data subset of the main GPU, the backward equal value set of BN input data subset Including rear to BN input data subset mean value and forward direction BN gradient calibration data mean value;It receives from other respectively from the backward of GPU The equal value set of BN input data subset;According to the equal value set of backward BN input data subset of the main GPU and other respectively from GPU The backward equal value set of BN input data subset, determine that the global backward equal value set of BN input data, the global backward BN are defeated Enter before data mean value set includes: global backward BN input data mean value and is global to BN gradient calibration data mean value;It will be described The global backward equal value set of BN input data is sent to other respectively from GPU.
In some embodiments of the present application, the backward BN processing unit 52 determines global backward BN input data mean value Set, comprising: the GPU be multiple GPU in slave GPU in the case where, it is described from GPU according to the backward BN input data Subset and the forward direction BN input data subset determine the backward equal value set of BN input data subset from GPU, it is described from The backward equal value set of BN input data subset of GPU include after to BN input data subset mean value and forward direction BN gradient calibration data Mean value;The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;It receives from described The global equal value set of BN input data backward of main GPU includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient Correction data mean value.
In other embodiments of the application, the backward BN processing unit determines global backward BN input data mean value Set, comprising: the GPU is according to the backward BN input data subset and forward direction BN input data subset determination The backward equal value set of BN input data subset of GPU, the backward equal value set of BN input data subset of the GPU include after to BN Input data subset mean value and forward direction BN gradient calibration data mean value;By the backward BN input data subset mean value collection of the GPU Conjunction is sent to other each GPU;Receive the backward equal value set of BN input data subset from other each GPU;According to the GPU's The backward equal value set of BN input data subset of the backward equal value set of BN input data subset and other each GPU determines described complete The equal value set of the backward BN input data of office, the global backward equal value set of BN input data include: global backward BN input number It is preceding to BN gradient calibration data mean value according to mean value and the overall situation.
The backward BN processing unit 52 determines the backward equal value set of BN input data subset of the GPU, comprising: according to FormulaDetermine the backward BN input data subset mean value, whereinFor Backward BN input data subset when the GPU is i-th of GPU,It is scheduled loss function, yi,jFor forward direction BN output Data in data subset,For yi,jGradient,Backward BN input data subset when for the GPU being i-th of GPU Mean value;According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Βi= {xi,j(j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xi,jIt is defeated for the forward direction BN Enter the data in data subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th for the GPU Forward direction BN gradient calibration data mean value when a GPU.
The backward BN processing unit 52 determines the equal value set of the global backward BN input data subset, comprising: according to FormulaDetermine the global backward BN input data mean value, wherein n is the quantity of the multiple GPU, miFor The quantity of the data of the forward direction BN input data subset of i-th of GPU,For the forward direction BN output data subset gradient of i-th of GPU Mean value,For the global backward BN input data mean value;According to formulaIt determines described global preceding to BN ladder Spend correction data mean value, wherein φiFor the forward direction BN gradient calibration data mean value of i-th of GPU, φ is described global preceding to BN Gradient calibration data mean value.
The backward BN processing unit 52 is handled after carrying out to the forward direction BN input data subset to BN, comprising: according to The backward BN input data subset determines the gradient of each data in data subset after the forward direction BN;Before the overall situation To BN gradient calibration data mean value before to the equal value set of BN input data, global backward BN input data mean value and the overall situation, determine The gradient of the global preceding variance to BN input data;According to it is described it is global before to the equal value set of BN input data, described complete The global preceding gradient to BN input data mean value is determined to BN gradient calibration data mean value before office;According to data after the forward direction BN Number is inputted to BN before the gradient of each data, the gradient of the global preceding variance to BN input data, the overall situation in subset According to the gradient of mean value, it is described it is global before to the equal value set of BN input data and it is described it is global before to BN input data mean value, really The gradient of each data in the fixed forward direction BN input data subset.
The backward BN processing unit 52 determines the gradient of each data in data subset after the forward direction BN, It include: according to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU,It is predetermined Loss function, yi,jFor the data in the forward direction BN output data subset,For yi,jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
The backward BN processing unit 52 determines the gradient of the global preceding variance to BN input data, comprising: according to formulaDetermine the gradient of the global preceding variance to BN input data, In, σ2For before the overall situation to the variance of BN input data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, μ For the global preceding mean value to BN input data, ε is fixed minimum nonzero value, and φ is described global preceding to BN gradient calibration Data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,It is described global preceding to BN The gradient of the variance of input data.
The backward BN processing unit 52 determines the global preceding gradient to BN input data mean value, comprising: according to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2To be inputted before the overall situation to BN The variance of data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, and μ is that the overall situation is preceding to be inputted to BN The mean value of data, ε are fixed minimum nonzero value, and φ is that the overall situation is preceding to BN gradient calibration data mean value, and γ is offset ginseng Number,It is the global preceding gradient to BN input data mean value.
The backward BN processing unit 52 determines the gradient of each data in the forward direction BN input data subset, comprising: According to formulaIt determines in the forward direction BN input data subset The gradient of each data, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For before the overall situation to BN The variance of input data, σ2=ν-μ2, ν is the global preceding mean value of square to BN input data, and μ is described global preceding to BN The mean value of input data, ε are fixed minimum nonzero value,For it is described it is global before to BN input data variance gradient,It is the global preceding gradient to BN input data mean value,It is the data x in the forward direction BN input data subseti,j Gradient.
The backward BN processing unit 52, is also used to: according to before the overall situation to the equal value set of BN input data and it is global after To the equal value set of BN input data, determine the gradient of BN layers of training parameter, the BN layers of training parameter include offset parameter γ and β。
The backward BN processing unit 52 can be according to the following formulaDetermine offset parameter γ's Gradient, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is the global preceding mean value to BN input data,It is described global preceding to BN output data gradient mean value, σ2For before the overall situation to the variance of BN input data, σ2=ν-μ2, ν is institute State the mean value of square before the overall situation to BN input data, μ be it is described it is global before to BN input data mean value, ε is fixed minimum Nonzero value, miThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,It is for the GPU The gradient of offset parameter γ when i-th of GPU.
The backward BN processing unit 52 can be according to the following formulaDetermine the gradient of offset parameter β, Wherein,It is described global preceding to BN output data gradient mean value, miForward direction BN when for the GPU being i-th of GPU inputs number According to the quantity of the data of subset,The gradient of offset parameter β when for the GPU being i-th of GPU.
It is equal according to the determining backward BN input data of the overall situation in back-propagating treatment process by device shown in fig. 5 Value set, backward BN input data subset and it is global before to BN data mean value set, to the forward direction BN input data subset into It is handled after row to BN, the problem of incomplete caused data difference of data expands can be further compensated for;So as to obtain and list Consistent data gradient, raising training precision between similar gradient, multiple GPU when a GPU progress global data training, from And when being able to solve multiple GPU parallel training DNN models in the prior art, cause since data gradient is inconsistent between each GPU The low problem of existing training precision.
The feelings of the DNN model training method parallel to multiple GPU provided by the embodiments of the present application in practical applications below Condition is illustrated.
During concrete application, treatment process shown in Fig. 2, Fig. 3 a and Fig. 4 a can be integrated to deep learning instruction Practice in frame MXNet, realizes completely executable technical solution.The system design of MXNet can be divided into C++ layers and Python Layer.C++ layers are mainly responsible for task schedule, and internal memory optimization, calculating system level functions, the Python layers of major function such as graphics-optimized is Complete training process is encapsulated, and the interface interacted with user is provided.In MXNet, traditional Python layers of training process is such as Under:
During actually realizing, C++ layers and Python layers can all be modified, it can be with normal call after modification Python interface, after multiple GPU provided by the embodiments of the present application parallel DNN model training method, Python layers of instruction It is as follows to practice process:
After implementing above-mentioned processing, training precision and testing accuracy can be significantly improved, Fig. 6 shows 3 GPU and instructs parallel The training precision comparative situation of experienced and single GPU training, wherein can find to apply multiple GPU provided by the present application significantly Parallel training method training precision (the parallel global data training precision of 3GPU shown in solid in such as Fig. 6) close to list Training precision (the list GPU training precision as shown in thick dashed line in Fig. 6) when the training of GPU global data, and it is in the prior art It is obvious if training precision (such as the parallel local data's training precision of 3GPU shown in fine dotted line in Fig. 6) when more GPU parallel trainings Training precision when will be lower than the training of list GPU global data.Fig. 7 shows the inspection of 3 GPU parallel trainings and single GPU training Test accuracy comparison situation, wherein the testing accuracy of the parallel training method of application multiple GPU provided by the present application is (empty in such as Fig. 7 The parallel global data testing accuracy of 3GPU shown in line) close to the testing accuracy (in such as Fig. 7 when the training of single GPU global data List GPU testing accuracy shown in heavy line), and testing accuracy when more GPU parallel trainings in the prior art is (thin in such as Fig. 7 The parallel local data's testing accuracy of 3GPU shown in solid) to be then obviously lower than testing accuracy when list GPU global data is trained. As can be seen from the figure the model accuracy of application method provided by the embodiments of the present application training, when can achieve with list GPU training Similar precision, and the model accuracy of the method training of more GPU parallel trainings improves 15% or so than in the prior art.
It is core of the invention thought above, in order to enable those skilled in the art to better understand the present invention in embodiment Technical solution, and keep the above objects, features, and advantages of the embodiment of the present invention more obvious and easy to understand, with reference to the accompanying drawing Technical solution in the embodiment of the present invention is described in further detail.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (44)

1. a kind of deep neural network model training method that multiple graphics processing units are parallel characterized by comprising
A GPU in multiple graphics processing unit GPU is when carrying out DNN model training to the data subset being assigned to, preceding To during dissemination process, to normalization BN input data subset before receiving;It determines global preceding to the equal value set of BN input data; It is preceding to the equal value set of BN input data according to the overall situation, it handles, obtains to BN before being carried out to the forward direction BN input data subset To preceding to BN output data subset;
In back-propagating treatment process, to BN input data subset after reception, the backward BN input data subset is described The gradient set of forward direction BN output data subset;Determine the equal value set of global backward BN input data;According to described global backward It is right to the equal value set of BN input data before the equal value set of BN input data, the backward BN input data subset and the overall situation The forward direction BN input data subset is handled after carrying out to BN, obtains the ladder of each data in the forward direction BN input data subset Degree.
2. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
In the case where the GPU is the main GPU in multiple GPU, the main GPU is true according to the forward direction BN input data subset The equal value set of forward direction BN input data subset of the fixed GPU, the equal value set of the forward direction BN input data subset include: described The mean value and mean value of square of forward direction BN input data subset;
It receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
According to the equal value set of forward direction BN input data subset of the main GPU and other respectively from forward direction BN input data of GPU Collect equal value set, determines that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global;
It is other respectively from GPU by being sent to before the overall situation to the equal value set of BN input data.
3. according to the method described in claim 2, it is characterized in that, to the equal value set of BN input data subset before determining, comprising:
According to formulaDetermine the mean value of the forward direction BN input data subset, wherein Bi={ xI, j(j= 1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor the forward direction BN input data Data in subset, miFor the quantity of the data in the forward direction BN input data subset, μiWhen for the GPU being i-th of GPU Forward direction BN input data subset mean value;
According to formulaDetermine the mean value of square of the forward direction BN input data subset, wherein viIt is described The mean value of square of forward direction BN input data subset when GPU is i-th of GPU.
4. according to the method described in claim 2, it is characterized in that, determining global preceding to the equal value set of BN input data, comprising:
According to formulaDetermine the global preceding mean value to BN input data, wherein n is the multiple GPU's Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, miIn forward direction BN input data subset for i-th of GPU Data quantity, μ be it is described it is global before to BN input data mean value;
According to formulaDetermine the global preceding mean value of square to BN input data, wherein viFor i-th of GPU Forward direction BN input data subset mean value of square, v be it is described it is global before to BN input data mean value of square.
5. according to the method described in claim 2, it is characterized in that, at BN before being carried out to the forward direction BN input data subset Reason, comprising:
According to the global preceding mean value and mean value of square to BN input data, to every in the forward direction BN input data subset A data operate before carrying out to BN, to data subset after BN before obtaining;
Offset operation operation is carried out to data each in data subset after the forward direction BN, obtains forward direction BN output data Collection.
6. according to the method described in claim 5, it is characterized in that, to each data in the forward direction BN input data subset It is operated before carrying out to BN, comprising:
According to formula σ2=v- μ2The determining global preceding variance to BN input data, wherein v, which is that the overall situation is preceding, inputs number to BN According to mean value of square, μ be it is described it is global before to BN input data mean value, σ2It is described global preceding to the side of BN input data Difference;
According to formulaIt is operated before being carried out to each data in the forward direction BN input data subset to BN, In, Bi={ xI, j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor institute Data before stating into BN input data subset, miFor the quantity of the data in the forward direction BN input data subset, μ is described To the mean value of BN input data, σ before global2For the global preceding variance to BN input data, ε is fixed minimum non-zero Value,For the data in data subset after the forward direction BN.
7. according to the method described in claim 5, it is characterized in that, being carried out to data each in data subset after the forward direction BN Offset operation, comprising:
According to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein γ, β are offset parameter,For the data in data subset after the forward direction BN, yI, jFor the forward direction BN output data subset In data.
8. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
It is described true according to the forward direction BN input data subset from GPU in the case where the GPU is the slave GPU in multiple GPU The fixed equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset includes: institute To the mean value and mean value of square of BN input data subset before stating;
The equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
It is preceding to the equal value set of BN input data to receive the overall situation from the main GPU, it is described global preceding to BN input data Equal value set includes: the mean value and mean value of square before the overall situation to BN input data.
9. the method according to claim 1, wherein determining global preceding to the equal value set of BN input data, comprising:
The equal value set of forward direction BN input data subset that the GPU is determined according to the forward direction BN input data subset, before described It include: the mean value and mean value of square of the forward direction BN input data subset to the equal value set of BN input data subset;
The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Receive the equal value set of forward direction BN input data subset from other each GPU;
Forward direction BN input data subset according to the equal value set of forward direction BN input data subset of the GPU and other each GPU is equal Value set determines that the overall situation is preceding to the equal value set of BN input data, and the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global.
10. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:
In the case where the GPU is the main GPU in multiple GPU, the main GPU according to the backward BN input data subset and The forward direction BN input data subset determines that the backward equal value set of BN input data subset of the main GPU, the backward BN are defeated Enter after the equal value set of data subset includes to BN input data subset mean value and forward direction BN gradient calibration data mean value;
It receives from other respectively from the backward equal value set of BN input data subset of GPU;
According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU Collect equal value set, determines the equal value set of the global backward BN input data, the equal value set of the global backward BN input data To BN gradient calibration data mean value before including: global backward BN input data mean value and being global;
The global backward equal value set of BN input data is sent to other respectively from GPU.
11. according to the method described in claim 10, it is characterized in that, determining that the backward BN input data subset of the GPU is equal Value set, comprising:
According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is pre- Fixed loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient,For the GPU Backward BN input data subset mean value when for i-th of GPU;
According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Bi={ xI, j} (j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xI, jNumber is inputted for the forward direction BN According to the data in subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th of GPU for the GPU When forward direction BN gradient calibration data mean value.
12. according to the method described in claim 10, it is characterized in that, determining the global backward BN input data subset mean value Set, comprising:
According to formulaDetermine the global backward BN input data mean value, wherein n is the multiple GPU's Quantity, miFor the quantity of the data of the forward direction BN input data subset of i-th of GPU,Number is exported for the forward direction BN of i-th of GPU According to subset gradient mean value,For the global backward BN input data mean value;
According to formulaIt determines described global preceding to BN gradient calibration data mean value, wherein φiFor i-th of GPU Forward direction BN gradient calibration data mean value, φ be it is described it is global before to BN gradient calibration data mean value.
13. according to the method described in claim 10, it is characterized in that, to BN after being carried out to the forward direction BN input data subset Processing, comprising:
The gradient of each data in data subset after the forward direction BN is determined according to the backward BN input data subset;
According to preceding to BN gradient to the equal value set of BN input data, global backward BN input data mean value and the overall situation before the overall situation Correction data mean value determines the gradient of the global preceding variance to BN input data;
According to preceding to the determining overall situation of BN gradient calibration data mean value to the equal value set of BN input data, the overall situation before the overall situation The gradient of forward direction BN input data mean value;
According to the ladder of the gradient of each data, the global preceding variance to BN input data in data subset after the forward direction BN Degree, global preceding gradient, the overall situation to BN input data mean value are preceding to the equal value set of BN input data and the overall situation The mean value of forward direction BN input data determines the gradient of each data in the forward direction BN input data subset.
14. according to the method for claim 13, which is characterized in that determine after the forward direction BN each data in data subset Gradient, comprising:
According to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
15. according to the method for claim 13, which is characterized in that determine the ladder of the global preceding variance to BN input data Degree, comprising:
According to formulaDetermine the global preceding variance to BN input data Gradient, wherein σ2For the global preceding variance to BN input data, σ2=v- μ2, v is described global preceding to BN input number According to mean value of square, μ be it is described it is global before to BN input data mean value, ε is fixed minimum nonzero value, and φ is the overall situation Forward direction BN gradient calibration data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,For The gradient of the global preceding variance to BN input data.
16. according to the method for claim 13, which is characterized in that determine the global preceding gradient to BN input data mean value, Include:
According to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2For institute State the variance before the overall situation to BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is institute State the mean value before the overall situation to BN input data, ε is fixed minimum nonzero value, φ be it is described it is global before to BN gradient calibration data Mean value, γ are offset parameter,It is the global preceding gradient to BN input data mean value.
17. according to the method for claim 13, which is characterized in that determine every number in the forward direction BN input data subset According to gradient, comprising:
According to formulaDetermine the forward direction BN input data subset In each data gradient, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For the overall situation The variance of forward direction BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is the overall situation The mean value of forward direction BN input data, ε are fixed minimum nonzero value,For the global preceding variance to BN input data Gradient,It is the global preceding gradient to BN input data mean value,It is the number in the forward direction BN input data subset According to xI, jGradient.
18. according to the method described in claim 10, it is characterized in that, the method also includes:
According to, to the equal value set of BN input data and the global backward equal value set of BN input data, determining BN layers of training before the overall situation The gradient of parameter, the BN layers of training parameter include offset parameter γ and β.
19. according to the method for claim 18, which is characterized in that according to formulaDetermine offset ginseng The gradient of number γ, wherein for φ to be described global preceding to BN gradient calibration data mean value, μ is described global preceding to BN input data Mean value,It is described global preceding to BN output data gradient mean value, σ2For the global preceding variance to BN input data, σ2 =v- μ2, v is the global preceding mean value of square to BN input data, and μ is the global preceding mean value to BN input data, ε For fixed minimum nonzero value, miThe quantity of the data of forward direction BN input data subset when for the GPU being i-th of GPU,The gradient of offset parameter γ when for the GPU being i-th of GPU.
20. according to the method for claim 18, which is characterized in that according to formulaDetermine offset parameter β Gradient, whereinIt is described global preceding to BN output data gradient mean value, miForward direction when for the GPU being i-th of GPU The quantity of the data of BN input data subset,The gradient of offset parameter β when for the GPU being i-th of GPU.
21. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:
In the case where the GPU is the slave GPU in multiple GPU, it is described from GPU according to the backward BN input data subset and The forward direction BN input data subset determines the backward equal value set of BN input data subset from GPU, described after GPU To BN input data subset mean value and forward direction BN gradient calibration data mean value after including to the equal value set of BN input data subset;
The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Receive the equal value set of the backward BN input data of the overall situation from the main GPU, the global backward BN input data mean value Set includes: that global backward BN input data mean value and the overall situation are preceding to BN gradient calibration data mean value.
22. according to the method described in claim 5, it is characterized in that, determining the equal value set of global backward BN input data, packet It includes:
After the GPU determines the GPU according to the backward BN input data subset and the forward direction BN input data subset To the equal value set of BN input data subset, the backward equal value set of BN input data subset of the GPU include after to BN input number According to subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Receive the backward equal value set of BN input data subset from other each GPU;
Backward BN input data subset according to the equal value set of backward BN input data subset of the GPU and other each GPU is equal Value set determines that the equal value set of the global backward BN input data, the global backward equal value set of BN input data include: To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation.
23. a kind of deep neural network model training device that multiple graphics processing units are parallel, which is characterized in that described device It is arranged in each GPU of multiple GPU, described device includes:
Forward direction normalizes BN processing unit, preceding to BN input data subset for receiving in propagated forward treatment process;It determines To the equal value set of BN input data before global;It is preceding to the equal value set of BN input data according to the overall situation, it is defeated to the forward direction BN Enter before data subset carries out and handled to BN, to BN output data subset before obtaining;
Backward BN processing unit, is used in back-propagating treatment process, to BN input data subset, the backward BN after reception Input data subset is the gradient set of the forward direction BN output data subset;Determine global backward BN input data mean value collection It closes;It is preceding to BN according to the equal value set of the global BN input data backward, the backward BN input data subset and the overall situation Data mean value set is handled after carrying out to the forward direction BN input data subset to BN, obtains forward direction BN input data Concentrate the gradient of each data.
24. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:
In the case where the GPU is the main GPU in multiple GPU, the main GPU is true according to the forward direction BN input data subset The equal value set of forward direction BN input data subset of the fixed GPU, the equal value set of the forward direction BN input data subset include: described The mean value and mean value of square of forward direction BN input data subset;
It receives from other respectively from the equal value set of forward direction BN input data subset of GPU;
According to the equal value set of forward direction BN input data subset of the main GPU and other respectively from forward direction BN input data of GPU Collect equal value set, determines that the overall situation is preceding to the equal value set of BN input data, the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global;
It is other respectively from GPU by being sent to before the overall situation to the equal value set of BN input data.
25. device according to claim 24, which is characterized in that before the forward direction BN processing unit determines the GPU To the equal value set of BN input data subset, comprising:
According to formulaDetermine the mean value of the forward direction BN input data subset, wherein Bi={ xI, j(j= 1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor the forward direction BN input data Data in subset, miFor the quantity of the data in the forward direction BN input data subset, μiWhen for the GPU being i-th of GPU Forward direction BN input data subset mean value;
According to formulaDetermine the mean value of square of the forward direction BN input data subset, wherein viIt is described The mean value of square of forward direction BN input data subset when GPU is i-th of GPU.
26. device according to claim 24, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:
According to formulaDetermine the global preceding mean value to BN input data, wherein n is the multiple GPU's Quantity, μiFor the mean value of the forward direction BN input data subset of i-th of GPU, miIn forward direction BN input data subset for i-th of GPU Data quantity, μ be it is described it is global before to BN input data mean value;
According to formulaDetermine the global preceding mean value of square to BN input data, wherein viFor i-th of GPU Forward direction BN input data subset mean value of square, v be it is described it is global before to BN input data mean value of square.
27. device according to claim 24, which is characterized in that the forward direction BN processing unit inputs the forward direction BN Data subset is handled before carrying out to BN, comprising:
According to the global preceding mean value and mean value of square to BN input data, to every in the forward direction BN input data subset A data operate before carrying out to BN, to data subset after BN before obtaining;
Offset operation is carried out to data each in data subset after the forward direction BN, obtains the forward direction BN output data subset.
28. device according to claim 27, which is characterized in that the forward direction BN processing unit inputs the forward direction BN Each data in data subset operate before carrying out to BN, comprising:
According to formula σ2=v- μ2Determine the global preceding variance to BN input data, wherein v is described global preceding defeated to BN Enter the mean value of square of data, μ is the global preceding mean value to BN input data, σ2It is described global preceding to BN input data Variance;
According to formulaIt is operated before being carried out to each data in the forward direction BN input data subset to BN, In, Bi={ xI, j(j=1,2 ... mi), BiForward direction BN input data subset when for the GPU being i-th of GPU, xI, jFor institute Data before stating into BN input data subset, miFor the quantity of the data in the forward direction BN input data subset, μ is described To the mean value of BN input data, σ before global2For the global preceding variance to BN input data, ε is fixed minimum non-zero Value,For the data in data subset after the forward direction BN.
29. device according to claim 27, which is characterized in that the forward direction BN processing unit is to number after the forward direction BN Offset operation is carried out according to data each in subset, comprising:
According to formulaOffset operation is carried out to data each in data subset after the forward direction BN, wherein γ, β are offset parameter,For the data in data subset after the forward direction BN, yI, jFor the forward direction BN output data subset In data.
30. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:
It is described true according to the forward direction BN input data subset from GPU in the case where the GPU is the slave GPU in multiple GPU The fixed equal value set of forward direction BN input data subset from GPU, the equal value set of the forward direction BN input data subset includes: institute To the mean value and mean value of square of BN input data subset before stating;
The equal value set of identified forward direction BN input data subset is sent to the main GPU in multiple GPU;
It is preceding to the equal value set of BN input data to receive the overall situation from the main GPU, it is described global preceding to BN input data Equal value set includes: the mean value and mean value of square before the overall situation to BN input data.
31. device according to claim 23, which is characterized in that the forward direction BN processing unit determines global preceding defeated to BN Enter data mean value set, comprising:
The equal value set of forward direction BN input data subset that the GPU is determined according to the forward direction BN input data subset, before described It include: the mean value and mean value of square of the forward direction BN input data subset to the equal value set of BN input data subset;
The equal value set of forward direction BN input data subset of the GPU is sent to other GPU;
Receive the equal value set of forward direction BN input data subset from other each GPU;
Forward direction BN input data subset according to the equal value set of forward direction BN input data subset of the GPU and other each GPU is equal Value set determines that the overall situation is preceding to the equal value set of BN input data, and the overall situation is preceding to include: to the equal value set of BN input data To the mean value and mean value of square of BN input data before global.
32. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising:
In the case where the GPU is the main GPU in multiple GPU, according to the backward BN input data subset and the forward direction BN input data subset determines the backward equal value set of BN input data subset of the main GPU, backward BN input data Collect after equal value set includes to BN input data subset mean value and forward direction BN gradient calibration data mean value;
It receives from other respectively from the backward equal value set of BN input data subset of GPU;
According to the equal value set of backward BN input data subset of the main GPU and other respectively from the backward BN input data of GPU Collect equal value set, determines that the equal value set of global backward BN input data, the global backward equal value set of BN input data include: To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation;
The global backward equal value set of BN input data is sent to other respectively from GPU.
33. device according to claim 32, which is characterized in that after the backward BN processing unit determines the GPU To the equal value set of BN input data subset, comprising:
According to formulaDetermine the backward BN input data subset mean value, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient,It is for the GPU Backward BN input data subset mean value when i-th of GPU;
According to formulaDetermine the forward direction BN gradient calibration data mean value, wherein Bi={ xI, j} (j=1,2 ... mi) it is forward direction BN input data subset of GPU when being i-th of GPU, xI, jNumber is inputted for the forward direction BN According to the data in subset, miFor the quantity of the data in the forward direction BN input data subset, φiIt is i-th of GPU for the GPU When forward direction BN gradient calibration data mean value.
34. device according to claim 32, which is characterized in that the backward BN processing unit determines described global backward The equal value set of BN input data subset, comprising:
According to formulaDetermine the global backward BN input data mean value, wherein n is the multiple GPU's Quantity, miFor the quantity of the data of the forward direction BN input data subset of i-th of GPU,Number is exported for the forward direction BN of i-th of GPU According to subset gradient mean value,For the global backward BN input data mean value;
According to formulaIt determines described global preceding to BN gradient calibration data mean value, wherein φiFor i-th of GPU Forward direction BN gradient calibration data mean value, φ be it is described it is global before to BN gradient calibration data mean value.
35. device according to claim 32, which is characterized in that the backward BN processing unit inputs the forward direction BN Data subset is handled after carrying out to BN, comprising:
The gradient of each data in data subset after the forward direction BN is determined according to the backward BN input data subset;
According to preceding to BN gradient to the equal value set of BN input data, global backward BN input data mean value and the overall situation before the overall situation Correction data mean value determines the gradient of the global preceding variance to BN input data;
According to preceding to the determining overall situation of BN gradient calibration data mean value to the equal value set of BN input data, the overall situation before the overall situation The gradient of forward direction BN input data mean value;
According to the ladder of the gradient of each data, the global preceding variance to BN input data in data subset after the forward direction BN Degree, global preceding gradient, the overall situation to BN input data mean value are preceding to the equal value set of BN input data and the overall situation The mean value of forward direction BN input data determines the gradient of each data in the forward direction BN input data subset.
36. device according to claim 35, which is characterized in that after the backward BN processing unit determines the forward direction BN The gradient of each data in data subset, comprising:
According to formulaDetermine the gradient of each data in data subset after the forward direction BN, whereinBackward BN input data subset when for the GPU being i-th of GPU, l is predetermined Loss function, yI, jFor the data in the forward direction BN output data subset,For yI, jGradient, γ is offset parameter,For data in data subset after the forward direction BNGradient.
37. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN Enter the gradient of the variance of data, comprising:
According to formulaDetermine the global preceding variance to BN input data Gradient, wherein σ2For the global preceding variance to BN input data, σ2=v- μ2, v is described global preceding to BN input number According to mean value of square, μ be it is described it is global before to BN input data mean value, ε is fixed minimum nonzero value, and φ is the overall situation Forward direction BN gradient calibration data mean value,For, to BN output data gradient mean value, γ is offset parameter before the overall situation,For The gradient of the global preceding variance to BN input data.
38. device according to claim 35, which is characterized in that the backward BN processing unit determines global preceding defeated to BN Enter the gradient of data mean value, comprising:
According to formulaDetermine the global preceding gradient to BN input data mean value, wherein σ2For institute State the variance before the overall situation to BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is institute State the mean value before the overall situation to BN input data, ε is fixed minimum nonzero value, φ be it is described it is global before to BN gradient calibration data Mean value, γ are offset parameter,It is the global preceding gradient to BN input data mean value.
39. the device according to claim 335, which is characterized in that the backward BN processing unit determines the forward direction BN The gradient of each data in input data subset, comprising:
According to formulaDetermine the forward direction BN input data subset In each data gradient, whereinFor the gradient of each data in data subset after the forward direction BN, σ2For the overall situation The variance of forward direction BN input data, σ2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is the overall situation The mean value of forward direction BN input data, ε are fixed minimum nonzero value,For the global preceding variance to BN input data Gradient,It is the global preceding gradient to BN input data mean value,It is the number in the forward direction BN input data subset According to xI, jGradient.
40. device according to claim 32, which is characterized in that the backward BN processing unit is also used to:
According to, to the equal value set of BN input data and the global backward equal value set of BN input data, determining BN layers of training before the overall situation The gradient of parameter, the BN layers of training parameter include offset parameter γ and β.
41. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter γ, wherein φ be it is described it is global before to BN gradient calibration data mean value, μ is the global preceding mean value to BN input data,To be described global preceding to BN output data gradient mean value, σ 2 is described complete To the variance of BN input data, σ before office2=v- μ2, v is the global preceding mean value of square to BN input data, and μ is described complete To the mean value of BN input data before office, ε is fixed minimum nonzero value, miForward direction BN when for the GPU being i-th of GPU is defeated Enter the quantity of the data of data subset,The gradient of offset parameter γ when for the GPU being i-th of GPU.
42. device according to claim 40, which is characterized in that the backward BN processing unit is according to formulaDetermine the gradient of offset parameter β, whereinIt is described global preceding to BN output data gradient mean value, miFor The quantity of the data of forward direction BN input data subset when the GPU is i-th of GPU,When for the GPU being i-th of GPU Offset parameter β gradient.
43. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising:
In the case where the GPU is the slave GPU in multiple GPU, it is described from GPU according to the backward BN input data subset and The forward direction BN input data subset determines the backward equal value set of BN input data subset from GPU, described after GPU To BN input data subset mean value and forward direction BN gradient calibration data mean value after including to the equal value set of BN input data subset;
The identified backward equal value set of BN input data subset is sent to the main GPU in multiple GPU;
Receiving the backward equal value set of BN input data of the overall situation from the main GPU includes: global backward BN input data mean value With before the overall situation to BN gradient calibration data mean value.
44. device according to claim 27, which is characterized in that the backward BN processing unit determines that global backward BN is defeated Enter data mean value set, comprising:
After the GPU determines the GPU according to the backward BN input data subset and the forward direction BN input data subset To the equal value set of BN input data subset, the backward equal value set of BN input data subset of the GPU include after to BN input number According to subset mean value and forward direction BN gradient calibration data mean value;
The backward equal value set of BN input data subset of the GPU is sent to other each GPU;
Receive the backward equal value set of BN input data subset from other each GPU;
Backward BN input data subset according to the equal value set of backward BN input data subset of the GPU and other each GPU is equal Value set determines that the equal value set of the global backward BN input data, the global backward equal value set of BN input data include: To BN gradient calibration data mean value before global backward BN input data mean value and the overall situation.
CN201710564223.4A 2017-07-12 2017-07-12 DNN model training method and device with multiple GPUs in parallel Active CN109255439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710564223.4A CN109255439B (en) 2017-07-12 2017-07-12 DNN model training method and device with multiple GPUs in parallel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710564223.4A CN109255439B (en) 2017-07-12 2017-07-12 DNN model training method and device with multiple GPUs in parallel

Publications (2)

Publication Number Publication Date
CN109255439A true CN109255439A (en) 2019-01-22
CN109255439B CN109255439B (en) 2021-04-02

Family

ID=65050560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710564223.4A Active CN109255439B (en) 2017-07-12 2017-07-12 DNN model training method and device with multiple GPUs in parallel

Country Status (1)

Country Link
CN (1) CN109255439B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308233A (en) * 2019-08-02 2021-02-02 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for processing data
CN112328532A (en) * 2020-11-02 2021-02-05 长沙景嘉微电子股份有限公司 Multi-GPU communication method and device, storage medium and electronic device
US20210089887A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Variance-Based Learning Rate Control For Training Machine-Learning Models
CN113011563A (en) * 2021-03-19 2021-06-22 北京大学 Convolutional neural network batch normalization processing method based on GPU
CN117952815A (en) * 2023-12-26 2024-04-30 深圳市腾进达信息技术有限公司 Method and system for supporting multiple GPU (graphics processing Unit) to work simultaneously by single system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488662A (en) * 2013-04-01 2014-01-01 哈尔滨工业大学深圳研究生院 Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units
US20160259037A1 (en) * 2015-03-03 2016-09-08 Nvidia Corporation Radar based user interface
CN106096605A (en) * 2016-06-02 2016-11-09 史方 A kind of image obscuring area detection method based on degree of depth study and device
KR20170012019A (en) * 2015-07-24 2017-02-02 삼성전자주식회사 Method for optimizing parallel matrix multiplication in a system supporting multiple CPU and multiple GPU

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488662A (en) * 2013-04-01 2014-01-01 哈尔滨工业大学深圳研究生院 Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units
US20160259037A1 (en) * 2015-03-03 2016-09-08 Nvidia Corporation Radar based user interface
KR20170012019A (en) * 2015-07-24 2017-02-02 삼성전자주식회사 Method for optimizing parallel matrix multiplication in a system supporting multiple CPU and multiple GPU
CN106096605A (en) * 2016-06-02 2016-11-09 史方 A kind of image obscuring area detection method based on degree of depth study and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
M NIENIEWSKI: "Real-Time US Image Enhancement by Forward-Backward Diffusion Using GPU", 《IMAGE PROCESSING AND COMMUNICATIONS CHALLENGES 7》 *
SERGEY IOFFE ET AL: "Batch Normalization: Accelerating Deep Network Training by Reducing", 《ARXIV》 *
韩丹: "基于CPU-GPU的条件随机场并行化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308233A (en) * 2019-08-02 2021-02-02 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for processing data
CN112308233B (en) * 2019-08-02 2024-07-19 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for processing data
US20210089887A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Variance-Based Learning Rate Control For Training Machine-Learning Models
CN112328532A (en) * 2020-11-02 2021-02-05 长沙景嘉微电子股份有限公司 Multi-GPU communication method and device, storage medium and electronic device
CN112328532B (en) * 2020-11-02 2024-02-09 长沙景嘉微电子股份有限公司 Method and device for multi-GPU communication, storage medium and electronic device
CN113011563A (en) * 2021-03-19 2021-06-22 北京大学 Convolutional neural network batch normalization processing method based on GPU
CN117952815A (en) * 2023-12-26 2024-04-30 深圳市腾进达信息技术有限公司 Method and system for supporting multiple GPU (graphics processing Unit) to work simultaneously by single system

Also Published As

Publication number Publication date
CN109255439B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN109255439A (en) A kind of DNN model training method and device that multiple GPU are parallel
CN104714852B (en) A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN107358293B (en) Neural network training method and device
CN106796716B (en) For providing the device and method of super-resolution for low-resolution image
CN111178542B (en) System and method based on machine learning modeling
CN108122032B (en) Neural network model training method, device, chip and system
CN109034381A (en) Training mission optimization system, method and its non-transient computer readable media
CN108171762A (en) System and method for is reconfigured quickly in a kind of similar image of the compressed sensing of deep learning
CN103262119B (en) For the method and system that image is split
CN109299781A (en) Distributed deep learning system based on momentum and beta pruning
CN106611216A (en) Computing method and device based on neural network
CN112686385B (en) Multi-site three-dimensional image oriented federal deep learning method and system
CN109117897A (en) Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks
JP6981329B2 (en) Distributed deep learning system
CN111914936B (en) Data characteristic enhancement method and device for corpus data and computer equipment
CN109190504A (en) Processing method, device and the readable storage medium storing program for executing of automobile image data
CN109272044A (en) A kind of image similarity determines method, apparatus, equipment and storage medium
CN109550252A (en) A kind of game AI training method, apparatus and system
CN112541584B (en) Deep neural network model parallel mode selection method
CN108053454A (en) A kind of graph structure data creation method that confrontation network is generated based on depth convolution
CN110263835B (en) Rock category automatic identification method based on deep learning and Bayesian network
CN113342525A (en) Distributed data processing system and method thereof
CN110502949A (en) A kind of QR code image Fast Blind deblurring method based on adaptive scale control
CN110689136A (en) Deep learning model obtaining method, device, equipment and storage medium
CN103839280B (en) A kind of human body attitude tracking of view-based access control model information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200326

Address after: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District

Applicant after: BEIJING TUSENZHITU TECHNOLOGY Co.,Ltd.

Address before: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District

Applicant before: TuSimple

GR01 Patent grant
GR01 Patent grant