CN109919313A - A kind of method and distribution training system of gradient transmission - Google Patents
A kind of method and distribution training system of gradient transmission Download PDFInfo
- Publication number
- CN109919313A CN109919313A CN201910101338.9A CN201910101338A CN109919313A CN 109919313 A CN109919313 A CN 109919313A CN 201910101338 A CN201910101338 A CN 201910101338A CN 109919313 A CN109919313 A CN 109919313A
- Authority
- CN
- China
- Prior art keywords
- gradient
- neural network
- network model
- neuron
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
This application discloses a kind of method of gradient transmission and distributed training systems to improve the training effectiveness of distributed training to the efficiency of transmission of the gradient generated during training for promotion.This method comprises: obtaining the gradient of the corresponding weight of i-th layer of neuron of neural network model according to the training data of input;The gradient of the corresponding weight of i-th layer of neuron is sent to gradient buffer area;Whether the quantity for judging the gradient of gradient cache bank memories storage is more than propagation threshold;According to judging result, the gradient that gradient cache bank memories store up is sent to gradient collection module;Obtain the gradient mean value of the corresponding weight of i-th layer of neuron of the neural network model obtained according to the gradient that the multiple neural network models stored in gradient collection module are sent;And corresponding weight is updated according to the gradient mean value of the corresponding weight of i-th layer of neuron, to execute the next iteration of neural network model.
Description
Technical field
A kind of method transmitted the invention relates to information technology field more particularly to gradient and distributed training system
System.
Background technique
Currently, artificial intelligence (artificial intelligence, AI) receives significant attention, core technology is each
A field obtains important breakthrough, and one of core technology of AI is exactly deep learning, and deep learning is a kind of based on neural network mould
The machine learning techniques of type.Neural network model includes multilayer neuron, and every layer of neuron corresponds at least one weight.Nerve net
Network model by successive ignition could normal use, the iteration of neural network model can be with are as follows: true according to the training data of magnanimity
Optimal weight is made, the difference of the prediction result and priori knowledge that make neural network model is minimum.
In the iteration to neural network model, to improve training effectiveness, it can be distributed using multiple trained equipment
Formula training.In an iteration of multiple trained equipment to neural network model, for any weight, multiple trained equipment are calculated
The gradient of the weight out may be different, and multiple trained equipment need the gradient by calculated each weight to pass
It is defeated, to determine gradient mean value.Multiple trained equipment are updated weight using gradient mean value, multiple for any weight
The training updated weight of equipment is identical.Each trained equipment after being updated to the corresponding weight of each layer neuron,
The corresponding weight of updated each layer neuron is used to carry out next iteration to neural network model respectively.
How multiple trained equipment synchronize pair the gradient that multiple trained equipment obtain during an iteration
Training effectiveness is affected.
Summary of the invention
The embodiment of the present application provides a kind of method of gradient transmission, and the gradient to generation during training for promotion is in nerve
Synchronous efficiency between network model.
In a first aspect, the present invention provides a kind of method of gradient transmission, in the distribution training system of neural network model
In system, including multiple neural network models, each neural network model include n-layer neuron, every layer of neuron corresponding at least one
A weight, wherein n is positive integer, and multiple neural network models carry out an iteration simultaneously, every in multiple neural network models
For a neural network model when carrying out an iteration, process is similar.Any neural network model in training system in a distributed manner
For be illustrated.
Neural network model is in an iteration, it is possible, firstly, to input training data, according to the training data of input, obtains
Take the gradient of the corresponding weight of i-th layer of neuron of the neural network model, wherein i is the positive integer no more than n;By the mind
The gradient of the corresponding weight of i-th layer of neuron through network model is sent to the gradient buffer area of the neural network model.It is inciting somebody to action
After gradient is sent to gradient buffer area, it can be determined that the quantity of the gradient of the gradient cache bank memories of neural network model storage is
It is no more than the propagation threshold determined;Then according to judging result, the gradient cache bank memories of the neural network model are stored up
Gradient is sent to gradient collection module, and after having sent, gradient buffer area is just without gradient;Generally, when in gradient buffer area
After the quantity of the gradient of storage is more than the propagation threshold determined, the gradient that gradient cache bank memories store up is sent to gradient and is collected
Module.Gradient is sent to gradient and collects module by multiple neural network models, then gradient is collected in module and is stored with multiple minds
The gradient sent through network model is further obtained according to the multiple neural network mould stored in the gradient collection module
The gradient mean value of the corresponding weight of i-th layer of neuron for the neural network model that the gradient that type is sent obtains;Finally, according to this
The gradient mean value of the corresponding weight of i-th layer of neuron of neural network model updates i-th layer of neuron of the neural network model
Corresponding weight, to execute the next iteration of the neural network model.
By the way that propagation threshold is arranged, the quantity of the gradient of storage is compared with propagation threshold, it is determined whether by gradient
Transmitted, and also according to the gradient of multiple neural network models, determine gradient mean value, using gradient mean value to weighted value into
Row updates, and not only realizes the transmission of gradient, also improves the training effectiveness of distributed training.In addition, according to neural network model
Determine that propagation threshold, different neural network models use different propagation thresholds, avoid because of propagation threshold and neural network
The loss of efficiency of transmission caused by unmatched models further increases the training effectiveness of distributed training.
With reference to first aspect, in the first possible implementation of the first aspect, judging neural network model
Before whether the quantity of the gradient of gradient cache bank memories storage is more than the propagation threshold determined, propagation threshold set can be determined,
There is m alternate transmission threshold value in propagation threshold set, m alternate transmission threshold value in propagation threshold set is obtained by m iteration
In the corresponding transmission duration of each alternate transmission threshold value;Then according to this m transmission duration, the propagation threshold is determined.
It is possible to further predefine the threshold value set including at least two alternate transmission threshold values, in distribution training
During the iteration of system, propagation threshold is determined, at least two alternate transmission threshold values to realize the maximum of efficiency of transmission
Change.
The possible implementation of with reference to first aspect the first, in second of possible implementation of first aspect
In, according to this m transmission duration, when determining the propagation threshold, it can be shortest transmission in this m transmission duration of selection
The corresponding alternate transmission threshold value of duration is as the propagation threshold.
It is possible to further using the corresponding alternate transmission threshold value of most short transmission duration as final propagation threshold, with reality
Now quickly transmission improves the efficiency of distributed training.
The possible implementation of first or second kind with reference to first aspect, in the third possible realization of first aspect
In mode, each alternate transmission threshold value is not less than the number of the corresponding weight of any layer neuron of the neural network model.
Second aspect, the present invention provides a kind of distributed training system, which has realization above-mentioned
The functional module of method in any possible realization of first aspect and first aspect.The functional module can pass through hardware
It realizes, corresponding software realization can also be executed by hardware.The hardware or software include one or more and above-mentioned function phase
Corresponding module.
The distribution training system includes multiple neural network models and multiple transmission modules, the neural network model and should
Transmission module corresponds, and each neural network model includes n-layer neuron, and every layer of neuron corresponds at least one weight,
In, n is positive integer.
The neural network model, for according to the training data of input, obtaining the neural network model in an iteration
The corresponding weight of i-th layer of neuron gradient, wherein i is positive integer no more than n;By i-th layer of the neural network model
The gradient of the corresponding weight of neuron is sent to the gradient buffer area of the neural network model;
Whether the quantity of the transmission module, the gradient of the gradient cache bank memories storage for judging the neural network model surpasses
Cross the propagation threshold determined;According to judging result, the gradient that the gradient cache bank memories of the neural network model are stored up is sent
To gradient collection module;
The neural network model is also used to obtain according to the multiple neural network model stored in the gradient collection module
The gradient mean value of the corresponding weight of i-th layer of neuron for the neural network model that the gradient of transmission obtains;According to the nerve net
I-th layer of neuron that the gradient mean value of the corresponding weight of i-th layer of neuron of network model updates the neural network model is corresponding
Weight, to execute the next iteration of the neural network model.
By the way that propagation threshold is arranged, the quantity of the gradient of storage is compared with propagation threshold, it is determined whether by gradient
Transmitted, and also according to the gradient of multiple neural network models, determine gradient mean value, using gradient mean value to weighted value into
Row updates, and not only realizes the transmission of gradient, also improves the training effectiveness of distributed training.In addition, according to neural network model
Determine that propagation threshold, different neural network models use different propagation thresholds, avoid because of propagation threshold and neural network
The loss of efficiency of transmission caused by unmatched models further increases the training effectiveness of distributed training.
In conjunction with second aspect, in the first possible implementation of the second aspect, which is also used to pass through
The corresponding transmission duration of each alternate transmission threshold value in m alternate transmission threshold value in m iteration acquisition propagation threshold set;Root
According to this m transmission duration, the propagation threshold is determined, and trigger the transmission module and execute and judge that the gradient of neural network model is slow
The step of whether quantity for depositing the gradient stored in area is more than the propagation threshold determined.
It is possible to further predefine the threshold value set including at least two alternate transmission threshold values, in distribution training
During the iteration of system, propagation threshold is determined, at least two alternate transmission threshold values to realize the maximum of efficiency of transmission
Change.
In conjunction with the first possible implementation of second aspect, in second of possible implementation of second aspect
In, the transmission module according to this m transmission duration when determining the propagation threshold for being specifically used for: selecting this m transmission
The corresponding alternate transmission threshold value of shortest transmission duration is as the propagation threshold in duration.
It is fast to realize it is possible to further the propagation threshold for being used as the corresponding alternate transmission threshold value of most short transmission duration
Speed transmission improves the efficiency of distributed training.
In conjunction with the possible implementation of first or second kind of second aspect, in the third possible realization of second aspect
In mode, each alternate transmission threshold value is not less than the number of the corresponding weight of any layer neuron of the neural network model.
The third aspect, the present invention provides a kind of distributed training systems, which includes at least one
Training equipment, each trained equipment includes: processor and memory.The memory, for storing computer instruction;The processing
Device, for executing the computer instruction in the memory, to realize any possible of above-mentioned first aspect and first aspect
The method being somebody's turn to do in realization.
Fourth aspect, the present invention provide a kind of non-volatile computer readable storage medium storing program for executing, are stored in the storage medium
Computer instruction, the computer instruction are used to execute any possibility to realize above-mentioned first aspect and first aspect by processor
Realization in various method.
5th aspect, the present invention provides a kind of computer program product, when computer is read and executes the computer program
When product, so that computer executes the method being somebody's turn to do in any possible realization of above-mentioned first aspect and first aspect.
6th aspect, the present invention provide a kind of chip, which couples with memory, for reading and executing the memory
The software program of middle storage, in the method being somebody's turn to do in realizing any possible realization of above-mentioned first aspect and first aspect.
Detailed description of the invention
Figure 1A is a kind of neural network model schematic diagram provided in the embodiment of the present application;
Figure 1B is a kind of distributed training system schematic diagram provided in the embodiment of the present application;
Fig. 1 C is a kind of distributed training system schematic diagram of the decentralization provided in the embodiment of the present application;
Fig. 2A is a kind of iterative process schematic diagram provided in the embodiment of the present application;
Fig. 2 B is a kind of process schematic of gradient transmission provided in the embodiment of the present application;
Fig. 3 is a kind of process schematic of the determining propagation threshold provided in the embodiment of the present application;
Fig. 4 is a kind of process schematic of the threshold value set provided in the embodiment of the present application;
Fig. 5 is a kind of distributed training system schematic diagram provided in the embodiment of the present application.
Specific embodiment
Below in conjunction with attached drawing, the embodiment of the present application is described in detail.
This programme in order to facilitate understanding is first introduced neural network model.It should be understood that neural network model is one
Kind imitates the network model of animal nerve network behavior feature, and this network model relies on its complexity, by adjusting inside
Relationship interconnected between great deal of nodes, to achieve the purpose that handle information.
The process that the process of training neural network namely learns the corresponding weight of neuron, final purpose are
Obtain the corresponding weight of each layer of neuron of trained neural network model.
Below with reference to Figure 1A, to the training process of the possible neural network model for being applied to the embodiment of the present application a kind of into
Row detailed description.
Figure 1A is a kind of schematic block diagram of neural network model 100 provided by the embodiments of the present application.Neural network model
100 include n-layer neuron, and each layer in n-layer neuron includes one or more neuron, each layer of all neurons
It is connect with next layer of all neurons.It is illustrated by taking the neural network model 100 in Figure 1A as an example, with reference to the 1st layer of Figure 1A
Including each layer in two neurons, the 2nd to n-1 layer include three neurons, n-th layer include a neuron, wherein
N is the positive integer not less than 2, and the i in Figure 1A is the positive integer no more than n and not less than 1.Each neuron has corresponding power
Weight.
An iteration in 100 training process of neural network model is described in detail below.
It is concentrated from training data and obtains training data, training data is defeated as the 1st layer of neural network model 100
Enter, the input of first layer is by exporting a prediction result from n-th layer after the 1st multiple neurons into n-th layer.Specifically
Ground, each layer of neuron all have corresponding weight.By training data input first layer neuron, first layer neuron be based on pair
The output valve for the weight output first layer neuron answered.Using the output valve of first layer neuron as the defeated of second layer neuron
Enter, output valve of the second layer neuron based on corresponding weight output second layer neuron.Similarly, and so on, finally from n-th
Layer one prediction result of output.
During training neural network model 100, it is desirable to the prediction knot of the n-th layer output of neural network model 100
For fruit as close as the priori knowledge (prior knowledge) of training data, priori knowledge is also referred to as true value
(ground truth) generally comprises the corresponding legitimate reading of training data provided by people.So can be by comparing current
Prediction result and priori knowledge, each layer of mind in neural network model 100 is updated further according to difference condition between the two
(process of initialization certainly, is usually had before first time updates, as initialization neural network mould through the corresponding weight of member
The corresponding weight of each layer neuron in type 100).Therefore, after the prediction result for obtaining n-th layer output, using ERROR ALGORITHM, according to
Prediction result and the corresponding weight of priori knowledge amendment neuron, it is specific as follows.
Loss function is calculated according to prediction result and priori knowledge, according to loss function, along the side of n-th layer to the 1st layer
To the corresponding weight of each layer of neuron in amendment neural network model 100.It can be by calculating the corresponding ladder of each weight
Degree is to be modified weight, which is obtained by loss function, which can differentiate to weight by loss function and obtain
?.
It include that loss letter is calculated according to prediction result and priori knowledge according to prediction result and priori knowledge amendment weight
Number, according to loss function, along the gradient of the corresponding weight of every layer of neuron of direction calculating of n-th layer to the 1st layer.In other words
It says, the calculating of the gradient of the corresponding weight of each layer neuron is successively carried out according to from n-th layer to the 1st layer of sequence, is calculated
After the gradient of the corresponding weight of complete i-th layer of neuron, start the gradient for calculating the corresponding weight of (i-1)-th layer of neuron.It obtains every
After the gradient of the corresponding weight of layer neuron, the corresponding weight of each layer neuron is modified according to each gradient, is completed primary
Iteration.
During successive ignition, the corresponding weight of each layer neuron is constantly corrected, to realize nerve net
Priori knowledge of the prediction result that network model 100 exports as close as training data.
Gradient transmission method in the application is applied to the distributed training system of neural network model.Distribution training system
System includes multiple trained equipment.There is identical neural network model in each trained upper side administration, and each trained equipment obtains respectively
The different training datas that training data is concentrated are trained.After iteration several times, the multiple trained equipment obtains multiple
Neural network model after identical training, the neural network model after training in any trained equipment be complete it is distributed
Trained neural network model.
In addition to this, each trained equipment further includes transmission module.It is said by taking the distributed training system 200 in Figure 1B as an example
The process of bright distributed training.Illustratively, the training only drawn out in the embodiment of the present invention in distributed training system 200 is set
For 210 and training equipment 220, the quantity of actual trained equipment can be more.
In an iteration of distribution training, each trained equipment is concentrated from training data and obtains training data, each to instruct
It is different to practice the corresponding training data of equipment.Training data is inputted the neural network in corresponding training equipment by each trained equipment respectively
Model, for example, the training data that training equipment 210 will acquire inputs the neural network model 211 in training equipment 210, training
The training data that equipment 220 will acquire inputs the neural network model 221 in training equipment 220.Each neural network model difference
Obtain the prediction result of the training data for input, and each layer neuron being calculated based on prediction result and priori knowledge
The gradient of corresponding weight.Since the training data for inputting each neural network model is different, in an iteration, each nerve net
The gradient that network model is calculated is different.That is, in distributed training system 200 as shown in Figure 1B, in an iteration
In, input neural network model 211 is different from the training data of neural network model 221, then, based on different training numbers
According to the resulting gradient of neural network model 211 is different from the resulting gradient of neural network model 221.If according to neural network mould
The resulting gradient of type 211 adjusts the corresponding weight of each layer neuron in neural network model 211, according to neural network model
221 resulting gradients adjust the corresponding weight of each layer neuron in neural network model 221, several times after iteration, after training
Neural network model 211 in the corresponding weight of each layer neuron power corresponding with layer neuron each in neural network model 221
Weight is different, and the neural network model 211 and neural network model 221 after training are different.
Therefore, during each iteration, after each neural network model obtains the gradient of weight, first by the gradient of weight
It is sent to corresponding gradient buffer area to be stored, be deposited when the transmission module in each trained equipment determines in gradient buffer area
When the quantity of the gradient of storage is greater than propagation threshold, corresponding transmission module need to transmit the gradient of weight.Specifically, training
The gradient of weight is sent to gradient collection module 230 by the transmission module 212 in equipment 210, the transmission mould in training equipment 220
The gradient of weight is sent to gradient collection module 230 by block 222.It is stored in gradient collection module in each gradient buffer area
Gradient.The gradient of weight is calculated according to the gradient in the gradient buffer area of each neural network model stored in gradient collection module
Mean value.Each neural network model is calculated according to gradient mean value updates corresponding weight.Optionally, to each neural network model institute
The gradient of the weight obtained does the gradient mean value for averagely obtaining the weight, can also be to the gradient of the resulting weight of each neural network model
It does weighted average and obtains the gradient mean value of the weight, its elsewhere can also be done to the gradient of the resulting weight of each neural network model
It manages to generate the gradient mean value of the weight.
Each trained equipment in distributed training system 200 can be centralization deployment, be also possible to decentralization portion
Administration.If changing deployment centered on each trained equipment, gradient collection module collects mould independently of multiple trained equipment or gradient
Block is deployed on one in multiple trained equipment.Resulting gradient is transmitted to the gradient collection module by each trained equipment.
If each trained equipment is decentralization deployment, gradient collection module is deployed in each trained equipment.It is drawn out in Figure 1B
Distributed training system 200 centered on change deployment, and gradient collection module 230 is independently of training equipment 210 and training equipment
220.The distributed training system 200 of decentralization deployment as shown in Figure 1 C, is deployed with gradient collection module in training equipment 210
Gradient collection module 2302 is deployed on 2301, and training equipment 220.
Multiple training when being trained using distributed training system to neural network model, in distributed training system
Equipment executes the step of training neural network model parallel.Below with reference to Fig. 2A, in a distributed manner in training system a training
For equipment, distributed training system is introduced to the process of neural network model training.
Step 21: obtaining training data from training dataset kind, the training data input neural network model that will acquire;Root
According to the corresponding weight of layer neuron each in neural network model, the prediction result of the training data for input is obtained.
During iteration for the first time, the corresponding weight of every layer of neuron is that initialization procedure determines, initialization procedure
It is the process that the corresponding weight of every layer of neuron is set for each neural network model.
Step 22: the priori knowledge of the training data is obtained, according to the prediction result and priori knowledge of the training data, meter
Calculate loss function.
Step 23: propagation threshold is obtained, calculates the gradient of the corresponding weight of each layer neuron according to loss function, and according to
Propagation threshold transmits the gradient of the corresponding weight of each layer neuron.
When calculating the gradient of the corresponding weight of each layer neuron according to loss function, need according to the last layer neuron
(n-th layer neuron) successively calculates the gradient of the corresponding weight of every layer of neuron to the sequence of first layer neuron.
Due to needing to calculate the gradient mean value of weight, after the gradient that the corresponding weight of each layer neuron has been calculated,
The gradient of the corresponding weight of each layer neuron need to be transmitted.Gradient is sent to by the gradient for transmitting the corresponding weight of each layer neuron
Gradient collection module, so that gradient collection module calculates the gradient mean value of weight.When transmitting gradient, it can be and all layers have been calculated
After the gradient of the corresponding weight of neuron, the gradient of the corresponding weight of all layers of neuron is transmitted.The transmission of gradient needs
The time is wanted, it, can be after the gradient for calculating the corresponding weight of several layers neuron, by calculated gradient in order to improve efficiency
It is transmitted, which can start to calculate during transmission preceding layer (with the 1st layer for most preceding one layer
For) gradient of the corresponding weight of neuron, caused by reducing the transmission for needing to wait gradient before next iteration starts
Time delay.Specifically, the gradient of the corresponding weight of one layer of neuron often has been calculated, i.e., caches the gradient to the ladder in training equipment
Spend buffer area;Propagation threshold is set, the quantity of the gradient cached in gradient buffer area is compared with propagation threshold, gradient is slow
The quantity for rushing the gradient cached in area is more than the primary transmission of starting after propagation threshold.
The setting of propagation threshold is the key that improve training effectiveness.The corresponding propagation threshold of different neural network models is not
Together.Since the starting once transmitted needs the time, and too small propagation threshold will cause transmission and continually start, therefore, too small
Propagation threshold can not effectively reduce transmission bring time delay;When the corresponding weight of layer neuron each in neural network model
Total number is smaller, and excessive propagation threshold can not effectively improve the efficiency of transmission.Therefore, it need to be determined according to neural network model
The propagation threshold that the neural network model uses, the propagation threshold that different neural network models is determined are different.Optionally, root
Propagation threshold is determined according to the total number of the corresponding weight of each layer neuron of neural network model.It is determined according to neural network model
The detailed process of propagation threshold is subsequent to be described.
The gradient of the corresponding weight of each layer neuron is calculated in step 23 according to loss function, and is transmitted according to propagation threshold
The detailed process of the gradient of the corresponding weight of each layer neuron can be found in the description of step 231 to step 236.
Step 24: the weight calculated for each weight in the corresponding weight of each layer neuron, gradient collection module
Gradient mean value.
Step 25: obtaining the corresponding gradient mean value of each weight, respective weights are updated using gradient mean value.
By above-mentioned step 21- step 25, training equipment completes an iteration, concentrates in training data and obtains new instruction
Practice data, next iteration is carried out based on updated weight.Each trained equipment is according to step 21-25, using different
Training data completes an iteration.After carrying out weight update, each trained updated weight of equipment is identical, Duo Gexun
Next iteration can be carried out to neural network model by practicing equipment, until calculated loss function meets setting in step 22
Condition, then it is assumed that trained the neural network model.
The corresponding weight of every layer of neuron is calculated according to loss function in step 23 in an iteration described further below
Gradient, and the detailed process of the gradient of the corresponding weight of every layer of neuron is transmitted according to propagation threshold, referring specifically to Fig. 2 B:
Step 231: the gradient of the corresponding each weight of current layer neuron is calculated, in the gradient for calculating each weight
Afterwards, calculated gradient gradient buffer area is sent to cache.
It include n-layer neuron in neural network model, using i-th layer of neuron as current layer neuron, i is no more than n
Positive integer.The neuron for successively calculating each layer by the sequence of n-th layer neuron to the 1st layer of neuron in an iteration is corresponding
The corresponding gradient of weight, i.e., the initial value of i be n.
After the gradient for often calculating a weight, which can be cached.
Step 232: determining whether to complete the calculating of the gradient of the corresponding weight of all layers of neuron of neural network model.
Determine whether that the specific embodiment for completing the calculating of the gradient of the corresponding weight of all layers of neuron includes determining when front layer mind
Through member whether be neural network model first layer neuron.If current layer neuron is i-th layer of neuron, it is determined that i is
No is 1.If not yet completing the calculating of the gradient of the corresponding weight of all layers of neuron, step 233 is executed;If completing institute
There are the calculating of the gradient of the corresponding weight of layer neuron, directly execution step 236, all neurons pair are completed in step 236
The transmission of the gradient for the weight answered.
Step 233: determine whether the quantity of the gradient of current gradient buffer area storage is not less than propagation threshold, if so,
234 are thened follow the steps, the gradient of caching is transmitted;If it is not, then carrying out step 235, continue to calculate preceding layer neuron
The gradient of corresponding weight.In step 233, can also with current gradient buffer area store gradient size (memory capacity) with
Propagation threshold is compared, if the size of the gradient of current gradient buffer area storage is greater than propagation threshold, executes 235, no
Then follow the steps 234.
Step 234: the gradient of current cache is sent to gradient collection module.
Step 234 further includes after the gradient of current cache is sent to gradient collection module, and deletion has been transmitted to ladder
The gradient for spending collection module, after step 234 has executed, there is no the gradients transmitted in gradient buffer area.
Step 235: using (i-1)-th layer as current layer neuron, executing step 231, it is corresponding to calculate preceding layer neuron
The gradient of weight.The calculating of the gradient of the corresponding weight of all layers of neuron is not completed at this time, that is, currently calculates completion
Layer neuron is not first layer neuron, therefore also needs the preceding layer neuron to the current layer neuron for calculating and completing corresponding
The gradient of weight is calculated.
Step 236: the gradient of current cache is sent to gradient collection module.
The corresponding power of all layers of neuron is completed in the gradient for having calculated that the 1st layer of corresponding weight of neuron at this time
The gradient of weight calculates, and need to complete the transmission of the gradient of the corresponding weight of all neurons.At this point, being still stored in gradient buffer area
(gradient of current cache includes the ladder of the calculated 1st layer of corresponding weight of neuron in step 231 to the gradient of current cache
Degree), and the gradient of current cache is not yet transmitted at this time, it is therefore desirable to the gradient of current cache is transmitted, i.e., by the ladder of current cache
Degree is sent to gradient collection module.At this point, not needing to judge before the gradient of current cache is sent to gradient collection module
Whether the quantity of the gradient cached in buffer zone is not less than propagation threshold.After step 236, it can also delete in buffer area
Gradient through transmitting completes the calculating and transmission of the gradient of the corresponding weight of all layers of neuron in an iteration.
In this application, propagation threshold is determined according to neural network model, specifically, according to layer each in neural network model
The number of the corresponding weight of neuron determines.For example, every layer of neuron in traversal neural network model, obtains every layer of neuron
The total number of corresponding weight, the total number of the corresponding weight of i-th layer of neuron are qi, wherein neural network model includes n-layer
Neuron, then i is the positive integer no more than n.It can be by the maximum value in the total number of the corresponding weight of every layer of neuron, or most
Small value or intermediate value or average value etc. are used as propagation threshold.
It is possible to further predefine the threshold value set including at least two alternate transmission threshold values, in distribution training
During the iteration of system, propagation threshold is determined, at least two alternate transmission threshold values to realize the maximum of efficiency of transmission
Change.Specifically, can be found in Fig. 4 according to the detailed process of each alternate transmission threshold value in neural network model threshold value set
In step 41- step 44.
The process shown in Figure 3 that alternate transmission threshold value is chosen in threshold value set below.It include m in threshold value set
Alternate transmission threshold value, m are the positive integer not less than 2.Fig. 3 provides a kind of embodiment, is trained by neural network model
The determination propagation threshold of m iteration in journey.
Step 31: in a times iteration, training data being inputted into neural network model;According to every in neural network model
The corresponding weight of layer neuron, obtains the prediction result of the training data for input, and the stop value that the initial value of a is 1, a is
m。
Referring herein to step 31 be above-mentioned step 21, detailed process no longer repeated.
Step 32: according to prediction result and priori knowledge, calculating loss function.
Referring herein to step 32 be above-mentioned step 22, detailed process no longer repeated.
Step 33: the alternate transmission threshold value without obtaining corresponding transmission duration is chosen in threshold value set.
In this application, the gradient of the corresponding weight of every layer of neuron is transmitted according to propagation threshold, it is final in order to determine
Propagation threshold, determine that by the corresponding transmission market of each alternate transmission threshold value in threshold value set, alternate transmission threshold value is corresponding
Transmission when it is a length of when carrying out an iteration using the alternate transmission threshold value, from starting to calculate the corresponding weight of n-th layer neuron
Gradient to the gradient for completing the corresponding weight of n-layer neuron transmission duration.It, can be in threshold set in a times iteration
An alternate transmission threshold value without obtaining corresponding transmission duration is chosen in conjunction.
Step 34: the gradient of the corresponding weight of every layer of neuron is calculated according to loss function, and according to alternate transmission threshold value
Transmit the gradient of the corresponding weight of every layer of neuron.
Step 34 similar with aforementioned step 23 herein, and detailed process is no longer repeated herein.
Step 35: obtaining the corresponding transmission duration of the alternate transmission threshold value.
The transmission duration characterizes the training equipment and transmits the corresponding weight of each layer neuron according to the alternate transmission equipment
Time required for gradient.Specifically, a length of training equipment is refreshing from calculating n-th layer is started when the transmission of the alternate transmission threshold value
Duration is transmitted used in transmission of the gradient through the corresponding weight of member to the gradient for completing all layers of neuron.
Step 36: determining the corresponding transmission duration of each alternate transmission threshold value obtained in threshold value set.
Step 37: according to the corresponding transmission duration of each alternate transmission threshold value, a propagation threshold is chosen, by the biography of selection
Defeated threshold value carries out successive iterations as propagation threshold, can be found in the description of front according to the process that propagation threshold is iterated.
Specifically, can choose the transmission shortest alternate transmission threshold value of duration as propagation threshold for subsequent training
In.
It should be noted that can be each trained equipment when multiple alternate transmission threshold values in threshold value set
The alternate transmission threshold value determined respectively according to neural network model;It is also possible to some or certain several trained equipment according to nerve net
Network model determines alternate transmission threshold value, and determining alternate transmission threshold value is distributed to other training equipment;It can also be pipe
It manages equipment and alternate transmission threshold value is determined according to neural network model, and determining alternate transmission threshold value is distributed to each training and is set
It is standby.
The sequencing of above-mentioned step 33 and step 31 is in this application without limiting, the elder generation of step 33 and step 32
Sequence is in this application also without limiting afterwards.
Below by taking Fig. 4 as an example, the detailed process of each alternate transmission threshold value in threshold value set is described in detail:
Each of step 41, traversal neural network model layer neuron, the total number of the corresponding weight of every layer of neuron,
The total number of the corresponding weight of i-th layer of neuron is qi, and the total quantity s of the weight in neural network model is calculated, nerve
The total quantity of weight in network model is s, wherein neural network model includes n-layer neuron, and i is just whole no more than n
Number.
Step 42, to the total number q of the corresponding weight of every layer of neuroniTo qnDuplicate removal processing is carried out, by what is obtained after duplicate removal
Each total number is determined as each alternate transmission threshold value in threshold value set.
Step 43 adds k alternate transmission threshold value into threshold value set.
In order to find out the propagation threshold for the performance that can embody distributed training, can also be added in threshold value set
The number k of alternate transmission threshold value, the alternate transmission threshold value of addition can be preset, such as set k as 5,10,15 etc..
The sequencing of above-mentioned step 42 and step 43 is unlimited.
In a kind of enforceable mode, the number k of the alternate transmission threshold value of addition can be determined according to the following formula:
Wherein, s is the total quantity of the weight in neural network model, qmaxFor nerve
Maximum value in network model in the total number of the corresponding weight of each layer neuron, x are constant, and x can be 8,10,15, are used
Above-mentioned formula determines the value of k, it is intended that indicates to add at least x alternate transmission threshold in the threshold value set determined in step 42
Value.
K alternate transmission threshold value adding in threshold value set can be to be generated at random in the value range of setting.
Generally for the needs of adaptation scheme, each alternate transmission threshold value in threshold value set meets certain value range.In order to keep away
As soon as exempting to pass the gradient of the corresponding weight of this layer of neuron in the gradient for often calculating the corresponding weight of layer neuron
It is defeated, and bring establishes the problem more than number of communications, the value of alternate transmission threshold value can be not less than qmin, qminFor neural network
Minimum value in model in the total number of the corresponding weight of each layer neuron;In order to avoid calculating all layers of neurons correspondence
Weight gradient after, just the gradient of the corresponding weight of all layers of neuron of calculating is transmitted, and bring training when
Between long problem, the value range of threshold value can be no more than the total quantity s of the weight in neural network model.Therefore the k of addition
A threshold value can be in [qmin, s] k alternate transmission threshold value is randomly selected in value range is added in threshold value set.
In order to keep the alternate transmission threshold value in threshold value set more uniform, and make up qmaxWith the search blank of total quantity s
Section, in a kind of enforceable mode, k alternate transmission threshold value of addition can determine according to the following formula:
pj=qmax+(i+1)*(S-qmax)/k, i ∈ [0, k), PiFor any alternate transmission threshold value added in threshold value set.
Determine that the process of k alternate transmission threshold value can be with according to above-mentioned formula are as follows: by the total of the weight of neural network model
The total number q of quantity s weight corresponding with each layer neuron of neural network modeliMiddle maximum value qmaxDifference and k
Quotient is determined as target step value;By the maximum value q of total numbermaxWith the sum of target step value, as radix;With radix work
For the first alternate transmission threshold value of addition, and using in the upper alternate transmission threshold basis once newly determined, increase target
Mode of the step value as this alternate transmission threshold value newly determined, the k alternate transmission threshold added in threshold value set
Value.
Step 44, for each alternate transmission threshold value in threshold value set, it is pre- to determine whether the alternate transmission threshold value is less than
If value, if it is, the alternate transmission threshold value is filtered out from the threshold value set.
In order to rationally utilize transfer resource, and diminution search space, lesser alternate transmission threshold value can also be rejected
Fall, i.e., filters out the alternate transmission threshold value for being less than preset value from threshold value set, preset value can be S/1000.
Above-mentioned step 44 can carry out before or after step 43 and/or step 42, can also be with step 42 and/or step
Rapid 43 carry out simultaneously.Generally speaking, the step of alternate transmission threshold value filters out judgement is carried out, can determine an alternate transmission threshold
It just carries out once filtering out judgement after value, is also possible to be filtered out after determining a part, or whole alternate transmission threshold values
Judgement, step 44 is with the sequencing for determining other steps of alternate transmission threshold value in this application without specifically limiting
It is fixed.
41- step 44 through the above steps defines each alternate transmission threshold value in threshold value set.
Based on the same inventive concept with above-mentioned gradient transmission method, the embodiment of the present application also provides a kind of distributions
Training system, the distribution training system includes multiple neural network models and multiple transmission modules;The neural network model
Neural network model 211 or 221 in specially Figure 1B, the transmission module are specially the transmission module 212 or 222 in Figure 1B.
The neural network model, for according to the training data of input, obtaining the neural network in an iteration
The gradient of the corresponding weight of i-th layer of neuron of model, wherein i is the positive integer no more than n;
The transmission module, for sending the gradient of the corresponding weight of i-th layer of neuron of the neural network model
To the gradient buffer area of the neural network model;Judge the gradient of the gradient cache bank memories storage of the neural network model
Whether quantity is more than the propagation threshold determined;According to judging result, by the gradient cache bank memories of the neural network model
The gradient of storage is sent to gradient collection module;
The neural network model is also used to obtain according to the multiple nerve net stored in the gradient collection module
The gradient mean value of the corresponding weight of i-th layer of neuron for the neural network model that the gradient that network model is sent obtains;According to
The gradient mean value of the corresponding weight of i-th layer of neuron of the neural network model updates i-th layer of the neural network model
The corresponding weight of neuron, to execute the next iteration of the neural network model.
Exemplary, the neural network model is also used to obtain m alternative biographies in propagation threshold set by m iteration
The corresponding transmission duration of each alternate transmission threshold value in defeated threshold value;According to the m transmission duration, the transmission threshold is determined
Value, and trigger the transmission module and execute and judge whether the quantity of the gradient of gradient cache bank memories storage of neural network model surpasses
The step of crossing the propagation threshold determined.
Exemplary, the neural network model is used for according to the m transmission duration, when determining the propagation threshold,
It is specifically used for: selects in the m transmission duration the corresponding alternate transmission threshold value of shortest transmission duration as the transmission threshold
Value.
Exemplary, the m alternate transmission threshold value is not less than the minimum value in n total number, in the n total number
A total number be the neural network model n-layer neuron in the corresponding weight of one layer of neuron total number.
Based on the same inventive concept of the method with above-mentioned gradient transmission, as shown in figure 5, the embodiment of the present application also provides
A kind of distributed training system 500, the distribution training system include at least one multiple training equipment, and each training is set
It include n-layer neuron in standby, every layer of neuron corresponds at least one weight, wherein n is positive integer, includes place in training equipment
Manage device and memory.Illustratively, three trained equipment in distributed training system 500 are only drawn out in Fig. 5.Such as Fig. 5 institute
Show, includes that training equipment 50, training equipment 51 and training equipment 52, each trained equipment include: in distributed training system
Processor and memory.Training equipment 50 includes processor 501 and memory 502, and training equipment 51 includes processor 511 and deposits
Reservoir 512.Training equipment 52 includes processor 521 and memory 522.In training equipment in distributed training system 500
Memory is for storing computer instruction, and the processor in training equipment executes the computer instruction in memory, on realizing
State equipment in the distributed training system 200 of centralization and module or the distributed training system 200 of above-mentioned decentralization
In equipment, that is, module.Specifically, processor 501 and processor 511 are for realizing the distributed training system 200 in such as Figure 1B
In neural network model 211 and transmission module 212, and the gradient buffer area in neural network model 211 is by 502 He of memory
Memory 512 is realized;If the gradient collection module 230 in the distributed training system 200 in Figure 1B is realized by processor 521,
At this point, distributed training system 500 is centralization.In the distributed training system 500 of decentralization, processor 501, place
Reason device 511 and processor 521 are used to the neural network model 211 realized as in the distributed training system 200 in Fig. 1 C, pass
Defeated module 212 and gradient collection module 2301, and the gradient buffer area in neural network model 211 is by memory 502, memory
512 and memory 522 realize.
Calculating equipment in distributed training system 500 can also include communication interface.For example, it includes logical for calculating equipment 50
Believe interface 503, calculating equipment 51 includes communication interface 513.It calculates equipment and communication is realized by communication interface thereon.
Processor can be central processing unit (central processing unit, CPU), network processing unit
(network processor, NP) or image processor (graphic processing unit, GPU) or three's
Any combination.
Processor can further include hardware chip or other general processors.Above-mentioned hardware chip can be dedicated
Integrated circuit (application-specific integrated circuit, ASIC), programmable logic device
(programmable logic device, PLD) or combinations thereof.Above-mentioned PLD can be Complex Programmable Logic Devices
(complex programmable logic device, CPLD), field programmable gate array (field-
Programmable gate array, FPGA), Universal Array Logic (generic array logic, GAL) and other can compile
Journey logical device, discrete gate or transistor logic, discrete hardware components etc. or any combination thereof.General processor can be with
It is that microprocessor or the processor are also possible to any conventional processor etc..
It should also be understood that the memory referred in the embodiment of the present application can be volatile memory or non-volatile memories
Device, or may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory
(Read-Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), erasable programmable are only
Read memory (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM,
) or flash memory EEPROM.Volatile memory can be random access memory (Random Access Memory, RAM), use
Make External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random-access
Memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random-access
Memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data
Rate SDRAM, DDR SDRAM), it is enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), same
Step connection dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory
(Direct Rambus RAM, DR RAM).It should be noted that memory described herein is intended to include but is not limited to these and appoints
The memory for other suitable types of anticipating.
"and/or" in the application describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships, for example, A
And/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Before character "/" typicallys represent
Affiliated partner is a kind of relationship of "or" afterwards.
It is multiple involved in the application, refer to two or more.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, the computer in one or more which includes computer usable program code can be used in the application
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application
Please embodiment spirit and scope.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application
And its within the scope of equivalent technologies, then the application is also intended to including these modification and variations.
Claims (10)
1. a kind of method of gradient transmission, the method are applied to the distributed training system of neural network model, feature exists
In, the distribution training system includes multiple neural network models, and each neural network model includes n-layer neuron, and every layer
Neuron corresponds at least one weight, wherein n is positive integer, which comprises
Each neural network model, according to the training data of input, obtains the i-th of the neural network model in an iteration
The gradient of the corresponding weight of layer neuron, wherein i is the positive integer no more than n;
The gradient of the corresponding weight of i-th layer of neuron of the neural network model is sent to the ladder of the neural network model
Spend buffer area;
Whether the quantity for judging the gradient of the gradient cache bank memories storage of the neural network model is more than the transmission threshold determined
Value;
According to judging result, the gradient that the gradient cache bank memories of the neural network model store up is sent to gradient and collects mould
Block;
It obtains described in being obtained according to the gradient that the multiple neural network model that stores in the gradient collection module is sent
The gradient mean value of the corresponding weight of i-th layer of neuron of neural network model;
The neural network model is updated according to the gradient mean value of the i-th of the neural network model layer of corresponding weight of neuron
The corresponding weight of i-th layer of neuron, to execute the next iteration of the neural network model.
2. the method as described in claim 1, which is characterized in that the gradient cache bank memories storage of the judgement neural network model
Gradient quantity whether be more than the propagation threshold determined before, the method also includes:
Pass through each corresponding biography of alternate transmission threshold value in m alternate transmission threshold value in m iteration acquisition propagation threshold set
Defeated duration;
According to the m transmission duration, the propagation threshold is determined.
3. method according to claim 2, which is characterized in that it is described according to the m transmission duration, determine the transmission
Threshold value includes:
Select in the m transmission duration the corresponding alternate transmission threshold value of shortest transmission duration as the propagation threshold.
4. according to the method any in claim 2-3, which is characterized in that each alternate transmission threshold value is not less than described
The number of the corresponding weight of any layer neuron of neural network model.
5. it is a kind of distribution training system, which is characterized in that it is described distribution training system include multiple neural network models and
Transmission module, each neural network model include n-layer neuron, and every layer of neuron corresponds at least one weight, wherein n is positive
Integer;
The neural network model, for according to the training data of input, obtaining the neural network model in an iteration
The corresponding weight of i-th layer of neuron gradient, wherein i is positive integer no more than n;By the i-th of the neural network model
The gradient of the corresponding weight of layer neuron is sent to the gradient buffer area of the neural network model;
Whether the quantity of the transmission module, the gradient of the gradient cache bank memories storage for judging the neural network model surpasses
Cross the propagation threshold determined;According to judging result, the gradient that the gradient cache bank memories of the neural network model are stored up is sent out
It send to gradient collection module;
The neural network model is also used to obtain according to the multiple neural network mould stored in the gradient collection module
The gradient mean value of the corresponding weight of i-th layer of neuron for the neural network model that the gradient that type is sent obtains;According to described
The gradient mean value of the corresponding weight of i-th layer of neuron of neural network model updates i-th layer of nerve of the neural network model
The corresponding weight of member, to execute the next iteration of the neural network model.
6. distribution training system as claimed in claim 5, which is characterized in that the transmission module is also used to by m times repeatedly
The corresponding transmission duration of each alternate transmission threshold value in m alternate transmission threshold value in generation acquisition propagation threshold set;According to institute
M transmission duration is stated, determines the propagation threshold.
7. distribution training system as claimed in claim 6, which is characterized in that the transmission module is used for according to the m
Duration is transmitted, when determining the propagation threshold, is specifically used for:
Select in the m transmission duration the corresponding alternate transmission threshold value of shortest transmission duration as the propagation threshold.
8. according to the distributed training system any in claim 6-7, which is characterized in that each alternate transmission threshold value is equal
Not less than the number of the corresponding weight of any layer neuron of the neural network model.
9. a kind of distribution training system, which is characterized in that the distribution training system includes at least one training equipment, institute
Stating trained equipment includes: processor and memory;
The memory, for storing computer instruction;
The processor, for executing the computer instruction in the memory, to realize any one of the claims 1-4 institute
The method stated.
10. a kind of non-volatile computer readable storage medium storing program for executing, which is characterized in that be stored with computer in the storage medium and refer to
It enables, the computer instruction by processor for being executed to realize the described in any item methods of the claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910101338.9A CN109919313B (en) | 2019-01-31 | 2019-01-31 | Gradient transmission method and distributed training system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910101338.9A CN109919313B (en) | 2019-01-31 | 2019-01-31 | Gradient transmission method and distributed training system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109919313A true CN109919313A (en) | 2019-06-21 |
CN109919313B CN109919313B (en) | 2021-06-08 |
Family
ID=66961321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910101338.9A Active CN109919313B (en) | 2019-01-31 | 2019-01-31 | Gradient transmission method and distributed training system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109919313B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619388A (en) * | 2019-09-20 | 2019-12-27 | 北京金山数字娱乐科技有限公司 | Gradient synchronization method and device in distributed training |
CN111723933A (en) * | 2020-06-03 | 2020-09-29 | 上海商汤智能科技有限公司 | Training method of neural network model and related product |
CN111756602A (en) * | 2020-06-29 | 2020-10-09 | 上海商汤智能科技有限公司 | Communication timeout detection method in neural network model training and related product |
CN113282933A (en) * | 2020-07-17 | 2021-08-20 | 中兴通讯股份有限公司 | Federal learning method, device and system, electronic equipment and storage medium |
CN113469355A (en) * | 2020-03-30 | 2021-10-01 | 亚马逊技术股份有限公司 | Multi-model training pipeline in distributed system |
WO2021232907A1 (en) * | 2020-05-22 | 2021-11-25 | 华为技术有限公司 | Neural network model training apparatus and method, and related device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105825269A (en) * | 2016-03-15 | 2016-08-03 | 中国科学院计算技术研究所 | Parallel autoencoder based feature learning method and system |
US20160342887A1 (en) * | 2015-05-21 | 2016-11-24 | minds.ai inc. | Scalable neural network system |
US20160350651A1 (en) * | 2015-05-29 | 2016-12-01 | North Carolina State University | Automatically constructing training sets for electronic sentiment analysis |
CN106297774A (en) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | The distributed parallel training method of a kind of neutral net acoustic model and system |
CN106372402A (en) * | 2016-08-30 | 2017-02-01 | 中国石油大学(华东) | Parallelization method of convolutional neural networks in fuzzy region under big-data environment |
CN106991478A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | Apparatus and method for performing artificial neural network reverse train |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
US20170286830A1 (en) * | 2016-04-04 | 2017-10-05 | Technion Research & Development Foundation Limited | Quantized neural network training and inference |
CN107301454A (en) * | 2016-04-15 | 2017-10-27 | 北京中科寒武纪科技有限公司 | The artificial neural network reverse train apparatus and method for supporting discrete data to represent |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
JP2018036779A (en) * | 2016-08-30 | 2018-03-08 | 株式会社東芝 | Electronic device, method, and information processing system |
CN108122032A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of neural network model training method, device, chip and system |
CN108304918A (en) * | 2018-01-18 | 2018-07-20 | 中兴飞流信息科技有限公司 | A kind of the parameter exchange method and system of the deep learning of data parallel |
WO2018155232A1 (en) * | 2017-02-23 | 2018-08-30 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
CN109102075A (en) * | 2018-07-26 | 2018-12-28 | 联想(北京)有限公司 | Gradient updating method and relevant device during a kind of distribution is trained |
-
2019
- 2019-01-31 CN CN201910101338.9A patent/CN109919313B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160342887A1 (en) * | 2015-05-21 | 2016-11-24 | minds.ai inc. | Scalable neural network system |
US20160350651A1 (en) * | 2015-05-29 | 2016-12-01 | North Carolina State University | Automatically constructing training sets for electronic sentiment analysis |
CN106297774A (en) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | The distributed parallel training method of a kind of neutral net acoustic model and system |
CN106991478A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | Apparatus and method for performing artificial neural network reverse train |
CN105825269A (en) * | 2016-03-15 | 2016-08-03 | 中国科学院计算技术研究所 | Parallel autoencoder based feature learning method and system |
US20170286830A1 (en) * | 2016-04-04 | 2017-10-05 | Technion Research & Development Foundation Limited | Quantized neural network training and inference |
CN107301454A (en) * | 2016-04-15 | 2017-10-27 | 北京中科寒武纪科技有限公司 | The artificial neural network reverse train apparatus and method for supporting discrete data to represent |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN106372402A (en) * | 2016-08-30 | 2017-02-01 | 中国石油大学(华东) | Parallelization method of convolutional neural networks in fuzzy region under big-data environment |
JP2018036779A (en) * | 2016-08-30 | 2018-03-08 | 株式会社東芝 | Electronic device, method, and information processing system |
CN108122032A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of neural network model training method, device, chip and system |
WO2018155232A1 (en) * | 2017-02-23 | 2018-08-30 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
CN107018184A (en) * | 2017-03-28 | 2017-08-04 | 华中科技大学 | Distributed deep neural network cluster packet synchronization optimization method and system |
CN108304918A (en) * | 2018-01-18 | 2018-07-20 | 中兴飞流信息科技有限公司 | A kind of the parameter exchange method and system of the deep learning of data parallel |
CN109102075A (en) * | 2018-07-26 | 2018-12-28 | 联想(北京)有限公司 | Gradient updating method and relevant device during a kind of distribution is trained |
Non-Patent Citations (2)
Title |
---|
刘弘一: "基于深度学习开源框架的并行人脸识别方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
张函: "基于GPU的深度神经网络模型并行及优化方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619388A (en) * | 2019-09-20 | 2019-12-27 | 北京金山数字娱乐科技有限公司 | Gradient synchronization method and device in distributed training |
CN110619388B (en) * | 2019-09-20 | 2024-04-02 | 北京金山数字娱乐科技有限公司 | Gradient synchronization method and device in distributed training |
CN113469355A (en) * | 2020-03-30 | 2021-10-01 | 亚马逊技术股份有限公司 | Multi-model training pipeline in distributed system |
CN113469355B (en) * | 2020-03-30 | 2024-03-15 | 亚马逊技术股份有限公司 | Multi-model training pipeline in distributed system |
WO2021232907A1 (en) * | 2020-05-22 | 2021-11-25 | 华为技术有限公司 | Neural network model training apparatus and method, and related device |
CN111723933A (en) * | 2020-06-03 | 2020-09-29 | 上海商汤智能科技有限公司 | Training method of neural network model and related product |
WO2021244354A1 (en) * | 2020-06-03 | 2021-12-09 | 上海商汤智能科技有限公司 | Training method for neural network model, and related product |
CN111723933B (en) * | 2020-06-03 | 2024-04-16 | 上海商汤智能科技有限公司 | Training method of neural network model and related products |
CN111756602A (en) * | 2020-06-29 | 2020-10-09 | 上海商汤智能科技有限公司 | Communication timeout detection method in neural network model training and related product |
CN113282933A (en) * | 2020-07-17 | 2021-08-20 | 中兴通讯股份有限公司 | Federal learning method, device and system, electronic equipment and storage medium |
CN113282933B (en) * | 2020-07-17 | 2022-03-01 | 中兴通讯股份有限公司 | Federal learning method, device and system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109919313B (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109919313A (en) | A kind of method and distribution training system of gradient transmission | |
US20230252327A1 (en) | Neural architecture search for convolutional neural networks | |
CN109948029B (en) | Neural network self-adaptive depth Hash image searching method | |
Chuang et al. | The annealing robust backpropagation (ARBP) learning algorithm | |
CN108021983A (en) | Neural framework search | |
CN109690576A (en) | The training machine learning model in multiple machine learning tasks | |
CN109711534A (en) | Dimensionality reduction model training method, device and electronic equipment | |
CN108280207A (en) | A method of the perfect Hash of construction | |
CN111461284A (en) | Data discretization method, device, equipment and medium | |
CN108197307A (en) | The selection method and system of a kind of text feature | |
KR20080052940A (en) | Method for controlling game character | |
JP7073171B2 (en) | Learning equipment, learning methods and programs | |
US6813390B2 (en) | Scalable expandable system and method for optimizing a random system of algorithms for image quality | |
CN109697511B (en) | Data reasoning method and device and computer equipment | |
CN110610231A (en) | Information processing method, electronic equipment and storage medium | |
CN113705724B (en) | Batch learning method of deep neural network based on self-adaptive L-BFGS algorithm | |
CN108965016A (en) | A kind of mapping method and device of virtual network | |
CN113487870A (en) | Method for generating anti-disturbance to intelligent single intersection based on CW (continuous wave) attack | |
CN114417999A (en) | Pedestrian re-identification method based on federal split learning | |
JP6926045B2 (en) | Neural networks, learning devices, learning methods, and programs | |
CN113554169A (en) | Model optimization method and device, electronic equipment and readable storage medium | |
CN115210717A (en) | Hardware optimized neural architecture search | |
CN113297310A (en) | Method for selecting block chain fragmentation verifier in Internet of things | |
WO2020087254A1 (en) | Optimization method for convolutional neural network, and related product | |
WO2002080563A2 (en) | Scalable expandable system and method for optimizing a random system of algorithms for image quality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220214 Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province Patentee after: Huawei Cloud Computing Technology Co.,Ltd. Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd. |