CN110472731A - Gradient synchronous method and device during a kind of distribution is trained - Google Patents
Gradient synchronous method and device during a kind of distribution is trained Download PDFInfo
- Publication number
- CN110472731A CN110472731A CN201910760056.XA CN201910760056A CN110472731A CN 110472731 A CN110472731 A CN 110472731A CN 201910760056 A CN201910760056 A CN 201910760056A CN 110472731 A CN110472731 A CN 110472731A
- Authority
- CN
- China
- Prior art keywords
- training
- gradient
- son
- node
- accumulation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The application provides gradient synchronous method and device in a kind of distributed training, wherein gradient synchronous method includes: to be grouped to the training data in distributed training cluster on each trained node in the distributed training, obtain multiple sub- training datas on each trained node, wherein, the training node in distributed training cluster circularizes connection;Calculate the son training accumulation gradient of every sub- training data in the training node of the distributed training cluster;Son training accumulated gradient corresponding with the son training accumulation gradient is obtained according to the son training accumulation gradient;The son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.By accumulating the gradient of the sub- training data, the synchronization times of accumulation gradient are reduced, communication frequency is reduced, accelerates the speed of model training.
Description
Technical field
This application involves field of computer technology, in particular to gradient synchronous method and device in a kind of distributed training,
Calculate equipment, computer readable storage medium and chip.
Background technique
Currently, depth learning technology is also rapid progress, with depth with the fast development of computer technology
Habit technology is goed deep into, and the algorithm to become increasingly complex is developed, these algorithms need a large amount of data and take a substantial amount of time
It can effectively complete to train, therefore have developed distributed training.
In the model optimization of deep learning, needs to calculate gradient using the method that gradient declines and find the smallest loss letter
Number, carrys out training pattern with this, accelerates the convergence of model.In current distribution training, need every primary training of completion will
The transmitting for carrying out gradient information is synchronous with gradient information, in order to share the gradient on distributed training node, finds most
Small loss function, therefore can be because the transmitting of high-frequency gradient information and transmitting cause the problem of containing much information in model training
The model training time is long, span is big, has seriously delayed the speed of model training.
Therefore, how to improve the above problem, just become current urgent problem to be solved.
Summary of the invention
It is set in view of this, the embodiment of the present application provides gradient synchronous method and device, calculating in a kind of distributed training
Standby, computer readable storage medium and chip, to solve technological deficiency existing in the prior art.
According to the embodiment of the present application in a first aspect, providing gradient synchronous method in a kind of distributed training, comprising:
Training data on each trained node in distributed training cluster is grouped, is obtained on each trained node
Multiple sub- training datas, wherein distribution training cluster in training node circularize connection;
Calculate the son training accumulation gradient of every sub- training data in the training node of the distributed training cluster;
Son training accumulated gradient corresponding with the son training accumulation gradient is obtained according to the son training accumulation gradient;
The son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.
According to the second aspect of the embodiment of the present application, gradient synchronizing device in a kind of distributed training is provided, comprising:
Grouping module is configured as being grouped the training data in distributed training cluster on each trained node,
Obtain multiple sub- training datas on each trained node, wherein the training node in distribution training cluster circularizes connection;
Computing module is configured as the son of every sub- training data in the training node for calculating the distributed training cluster
Training accumulation gradient;
Accumulator module is configured as being obtained according to the son training accumulation gradient corresponding with the son training accumulation gradient
Son training accumulated gradient;
Synchronization module is configured as the son training accumulated gradient being synchronized to each instruction of the distributed training cluster
Practice node.
According to the third aspect of the embodiment of the present application, a kind of calculating equipment, including memory, processor and storage are provided
On a memory and the computer instruction that can run on a processor, the processor realize the distribution when executing described instruction
In formula training the step of gradient synchronous method.
According to the fourth aspect of the embodiment of the present application, a kind of computer readable storage medium is provided, is stored with calculating
The step of machine instruction, which realizes gradient synchronous method in distributed training when being executed by processor.
According to the 5th of the embodiment of the present application the aspect, a kind of chip is provided, computer instruction is stored with, the instruction quilt
Chip realizes the step of gradient synchronous method in the distributed training when executing.
Gradient synchronous method in distributed training provided by the present application, by being saved to each training in distribution training cluster
Training data on point is grouped, and obtains multiple sub- training datas on each trained node, wherein distribution training cluster
In training node circularize connection;Calculate the son instruction of every sub- training data in the training node of the distributed training cluster
Practice accumulation gradient;The cumulative ladder of son training corresponding with the son training accumulation gradient is obtained according to the son training accumulation gradient
Degree;The son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.In model training process
In, the synchronization of gradient will be carried out again after the gradient information repeatedly calculated accumulation, significantly reduce the communication frequency of gradient information,
The passing time for reducing gradient information accelerates the training speed of model, improves the efficiency of model training.
Detailed description of the invention
Fig. 1 is the structural block diagram provided by the embodiments of the present application for calculating equipment;
Fig. 2 is the flow chart of gradient synchronous method in distributed training provided by the embodiments of the present application;
Fig. 3 is the flow chart of son training accumulation gradient calculation method provided by the embodiments of the present application;
Fig. 4 is the flow chart of gradient synchronous method during the distribution that another embodiment of the application provides is trained;
Fig. 5 is the structural schematic diagram for the distribution training cluster that one embodiment of the application provides;
Fig. 6 is the structural schematic diagram of gradient synchronizing device in distributed training provided by the embodiments of the present application.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where
Under do similar popularization, therefore the application is not limited by following public specific implementation.
The term used in the application one or more embodiment be only merely for for the purpose of describing particular embodiments, and
It is not intended to be limiting the application one or more embodiment.The institute in the application one or more embodiment and the appended claims
The "an" of the singular used, " described " and "the" are also intended to including most forms, unless context clearly shows that it
His meaning.It is also understood that term "and/or" used in the application one or more embodiment refers to and includes one or more
A associated any or all of project listed may combine.
It will be appreciated that though may be described using term first, second etc. in the application one or more embodiment
Various information, but these information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.
For example, first can also be referred to as second in the case where not departing from the application one or more scope of embodiments, similarly,
Second can also be referred to as first.Depending on context, word as used in this " if " can be construed to " ...
When " or " when ... " or " in response to determination ".
Firstly, the vocabulary of terms being related to one or more embodiments of the invention explains.
Gradient: gradient is a vector, and it is maximum to indicate that the directional derivative of a certain function at this point is obtained along the direction
Value, i.e. function change most fast, change rate maximum, the gradient in model training along the direction (direction of this gradient) at this point
For finding least disadvantage function, training pattern accelerates the convergence of model.Number, that is, gradient step number of model training.
Gradient descent method (gradient descent): being an optimization algorithm, also commonly referred to as steepest descent method.Ladder
Degree descent method is to solve for one of common method of unconstrained optimization problem, now more for being used to recurrence in machine learning
Approach minimum deflection model to property.Especially for the back-propagation algorithm in neural network, gradient descent method is provided for
Theoretical basis.
Gradient accumulation: together by the gradient accumulation of multistep training.
Distribution training: the method being trained using multiple trained nodes.
Son training accumulation gradient: calculating the gradient of the training data on the trained node of each of distributed training cluster, will
Repeatedly calculate to gradient accumulation, obtain son training accumulation gradient.
Son training accumulated gradient: the corresponding son training accumulation gradient of nodes all in distribution training cluster is added up
Come, obtains son training accumulated gradient.
In this application, provide gradient synchronous method and device in a kind of distributed training, calculate equipment, computer can
Storage medium and chip are read, is described in detail one by one in the following embodiments.
Fig. 1 shows the structural block diagram of the calculating equipment 100 according to one embodiment of the application.The portion of the calculating equipment 100
Part includes but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130, data
Library 150 is for saving data.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or
Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network
(WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless
One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE802.11 wireless local area
Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect
Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In one embodiment of the application, unshowned other component in the above-mentioned component and Fig. 1 of equipment 100 is calculated
It can also be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in FIG. 1 is merely for the sake of showing
The purpose of example, rather than the limitation to the application range.Those skilled in the art can according to need, and increase or replace other portions
Part.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter
Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement
Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting
Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type
Server.
Wherein, processor 120 can execute the step in the training of distribution shown in Fig. 2 in gradient synchronous method.Fig. 2 shows
According to the flow chart of gradient synchronous method in the distribution training of the application one embodiment, including step 202 is to step 208.
Step 202: the training data on each trained node in distributed training cluster being grouped, each instruction is obtained
Practice multiple sub- training datas on node, wherein the training node in distribution training cluster circularizes connection.
Distribution training cluster in distribution training has multiple trained nodes, is placed in each trained node identical
To training pattern, all training samples are evenly distributed in each trained node, using difference in each trained node
Training sample treat training pattern and individually trained, the training node in distribution training cluster circularizes connection, is
Each trained node specifies corresponding previous trained node and latter trained node, and it is current that each trained node completes oneself
Gradient is trained and calculated to node, and gradient is transferred to the latter training node, while receiving previous trained node
Gradient.
Sub- training data is the multiple training datas that will be obtained after the training data grouping on each trained node.
Optionally, according to the quantity of training node in distributed training cluster to the training data on each trained node into
Row grouping, obtains multiple sub- training datas on each trained node, wherein the quantity of the sub- training data of acquisition and described point
The quantity of training node is equal in cloth training cluster.
The quantity for obtaining training node in distributed training cluster is n, and the training data on each trained node is equal
Deng be randomly divided into n group, obtain n sub- training datas, quantity and the distribution training of the training data on each trained node
The quantity of training node is equal in cluster, by the random grouping of the training data equalization on each trained node, ensure that training
The randomness of data ensure that the load balancing of each trained node in distributed training cluster.
Step 204: calculating the son training accumulation of every sub- training data in the training node of the distributed training cluster
Gradient.
In each trained node, model is repeatedly trained using sub- training data respectively, calculates sub- training data
The son training gradient of training every time, and the son training gradient accumulation of repeatedly training is obtained into son training accumulation gradient.
Gradient is closely related with training data, and then calculated corresponding gradient is different for training data difference, in each training
Corresponding son training gradient is calculated with the multiple sub- training datas being grouped at random in node, ensure that the random of son training gradient
Property, be conducive to be quickly found out least disadvantage function, accelerate the convergence of training pattern.
Optionally, referring to Fig. 3, step 204 can be realized by following step 302 to step 306.
Step 302: obtaining preset accumulation gradient step number.
After the completion of model training, needing the step number of accumulation gradient is accumulation gradient step number, and the accumulation gradient step number is
It pre-sets, preset accumulation gradient step number need to be obtained at this.
In embodiment provided by the present application, on each trained node, after completing model training, need to accumulate 5 steps
Gradient, the preset accumulation gradient step number of acquisition is 5.
Step 304: calculating the son training gradient of the sub- training data in the accumulation gradient step number.
On some training node, certain terraced gradient of son training in accumulation gradient step number for organizing sub- training data is calculated.
In embodiment provided by the present application, use the example above, on first trained node, first group of sub- training data note
It is d0, the son training gradient obtained, which is calculated, in the first step is denoted as g1, second step calculate obtain son training gradient be denoted as g2, third
Step calculates the son training gradient obtained and is denoted as g3, four-step calculation obtain son training gradient be denoted as g4, the calculating acquisition of the 5th step
Son training gradient is denoted as g5。
Step 306: by the son training gradient accumulation in the accumulation gradient step number, obtaining son training accumulation gradient.
Son training gradient in the accumulation gradient step number is accumulated, son training accumulation gradient is obtained.
It in embodiment provided by the present application, uses the example above, first group of sub- training data d0Corresponding son training accumulation ladder
Degree is a0, wherein a0=g1+g2+g3+g4+g5。
Step 206: it is tired that son training corresponding with the son training accumulation gradient being obtained according to the son training accumulation gradient
Add gradient.
Optionally, the son training accumulation gradient is carried out in the trained node of each of the distributed training cluster tired
Add, obtains son training accumulated gradient corresponding with the son training accumulation gradient.
In each trained node, the corresponding son training accumulation gradient of the sub- training data on each trained node is carried out
It is cumulative, such as have 5 trained nodes in distributed training cluster, there are 5 sub- training datas in each trained node, by each instruction
Practice the corresponding son training accumulation gradient of the 1st sub- training data on node to add up, the corresponding son of the 2nd sub- training data
Training accumulation gradient adds up, and so on, until the corresponding sub- training accumulation gradient of the 5th sub- training data is carried out tired
Add.
Step 208: the son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.
Optionally, it is sub accordingly that the son training accumulated gradient is successively synchronized to each trained node according to preset order
In training accumulation gradient.
Training node in distribution training cluster circularizes connection, therefore preset order can be clockwise, can also be with
It is counterclockwise.Son training accumulated gradient on training node is successively synchronized to the distributed training set according to preset sequence
On the trained node of each of group.
Gradient synchronous method in distributed training provided by the embodiments of the present application, by the son training for calculating sub- training data
Gradient, and the training gradient accumulation of multistep is obtained into son training accumulation gradient, then by son training accumulation gradient in distribution training
Gradient information transmitting is carried out in cluster on each trained node, reduces the number of communication frequency and gradient information transmitting, effectively
Solve the problems, such as that during model training, gradient information transmitting is time-consuming serious, the time is saved in acceleration model training.
Fig. 4 shows gradient synchronous method in the distribution training of one embodiment of the application, gradient in distribution training
Synchronous method is described by training for gradient is synchronous in cluster to the distribution for including 3 trained nodes, including step
402 to step 420.
Step 402: the training data on each trained node in distributed training cluster being grouped, each instruction is obtained
Practice multiple sub- training datas on node, wherein the training node in distribution training cluster circularizes connection.
Step 404: calculating the son training accumulation of every sub- training data in the training node of the distributed training cluster
Gradient.
Step 402 is consistent with the method for above-mentioned steps 202 to step 204 to step 404, the tool about step 402~404
Body explains that, referring to the detailed content of the step 202 in previous embodiment~204, details are not described herein again.
The structural schematic diagram of the distribution training cluster of one embodiment of the application offer is shown, referring to Fig. 5, Fig. 5 to divide
There are three explain for point three groups of sub- training datas in training node and each trained node in cloth training cluster
It is bright.
Training node in Fig. 5 circularizes connection, specifies corresponding previous training for the trained node of each of Fig. 5
Node and latter trained node, that is, training the previous trained node of node 0 is training node 2, the latter training section of training node 0
Point is training node 1;The previous trained node of training node 1 is training node 0, and the latter trained node of training node 1 is instruction
Practice node 2;The previous trained node of training node 2 is training node 1, and the latter trained node of training node 2 is training node
0。
Step 406: whether i-th of son training accumulation gradient in the current training node of judgement is starting son training accumulation ladder
Degree.If so, 408 are thened follow the steps, if it is not, thening follow the steps 410.
Wherein i is positive integer, when starting son training accumulation gradient refers to beginning with step training accumulation gradient, in distribution
In the trained node of each of training cluster, each node can correspond to a son training accumulation gradient, from son training accumulation ladder
It is synchronous that degree starts progress gradient.Starting son training accumulation gradient is first and starts the synchronous son training accumulation gradient of gradient.I-th
Height training accumulation gradient is only starting son training accumulation gradient in a trained node of distributed training cluster.For example,
2nd son training accumulation gradient is only starting son training accumulation gradient in the training node 1 of distributed training cluster, at other
It all will not be by as starting son training accumulation gradient in other training nodes of distribution training cluster.
It is further preferred that a starting son training accumulation gradient, starting sub- instruction is arranged automatically in each trained node
It is consistent with the current training number of node to practice the corresponding number of accumulation gradient, guarantees load balancing in distributed training cluster, fills
Divide using the training node in distributed training cluster, economizes on resources.
In embodiment provided by the present application, referring to table 1, as shown in table 1, a in training node 00, train in node 1
B1, train the c in node 22, accumulation gradient is trained for starting son, therefore to a0, b1, c2, step 408 is executed, for other sons
Training accumulation gradient, executes step 410.
Table 1
Step 408: starting son training accumulation gradient is sent to latter trained node.
In embodiment provided by the present application, referring to table 1, by starting son training accumulation gradient a in training node 00Hair
It send to training node 1, by starting son training accumulation gradient b in training node 11It is sent to trained node 2, by training node 2
In it is starting son training accumulation gradient c2It is sent to trained node 0.
Step 410:, will be previous in the case of receiving i-th of son training accumulation gradient that previous trained node is sent
I-th of son training accumulation gradient of training node and i-th of son training accumulation gradient of current training node carry out accumulation operations,
Son training accumulated gradient after being added up.
The son training that a training node is sent to be received of going forward such as non-starting son training accumulation gradient in training node is tired
Product gradient, and the son training accumulation gradient for the son training accumulation gradient and current training node that previous trained node is sent into
Row accumulation operations, the son training accumulated gradient after being added up.
In embodiment provided by the present application, referring to table 2, as shown in table 2, training node 0 receives trained node 2 and sends
The 3rd son training accumulation gradient c2, and the 3rd son training accumulation gradient c with training node 02Accumulation operations are carried out, are obtained
Son training accumulated gradient c after cumulative2+c0, and so on, the son training accumulated gradient after training node 1 is added up is a0+
a1, the son training accumulated gradient after training node 1 is added up is b1+b2。
Table 2
Step 412: whether the son training accumulated gradient after judgement is cumulative is final son training accumulated gradient, if it is not, then holding
Row step 414, if so, thening follow the steps 416.
Final son training accumulated gradient is that the son training accumulation gradient on each trained node passes through after one-accumulate
Son training accumulation gradient, cumulative number is that the training number of nodes in distributed training cluster subtracts 1, i.e., when distributed training set
When training number of nodes in group is n, final son training accumulated gradient is that the son training accumulated gradient on each trained node passes through
N-1 times it is cumulative after the gradient that obtains.When son training accumulated gradient after cumulative is not final son training accumulated gradient, step is executed
Rapid 414, when the son training accumulated gradient after cumulative is final son training accumulated gradient, execute step 416.
Step 414: the son training accumulated gradient after will be cumulative is sent to latter trained node.
In the embodiment that this Shen provides, training number of nodes is 3 in distribution training cluster, final son training accumulated gradient
It should be the gradient obtained after 2 times cumulative, referring to table 2, cumulative number is 1 time at this time, a0+a1, b1+b2, c2+c0By sentencing
Disconnected is not final son training accumulated gradient, it is therefore desirable to the son training accumulated gradient after adding up be added to be sent to latter training section
Point continues the cumulative of son training accumulation gradient.
Step 416: stopping cumulative, the final son training accumulated gradient of acquisition.
In the embodiment that this Shen provides, referring to table 3, the b in node 0 is trained1+b2+b0To be obtained after adding up twice
Gradient, be final son training accumulated gradient, similarly train the c in node 12+c0+c1For final son training accumulated gradient, training
A in node 20+a1+a2For final son training accumulated gradient.
Table 3
Step 418: default training node receives the synchronization that the trained node of each of the distributed training cluster issues
Information, and the instruction for synchronizing the son training accumulated gradient is issued to the trained node of each of the distributed training cluster.
It circularizes and is not only used for receiving the training node of synchronizing information in the distribution training cluster of connection, therefore in advance
A specified trained node does the instruction for receiving and sending synchronizing information, the node also assist in gradient information receive, transmission it is same
It walks in behavior act, flows gradient information in the annular training node in the distributed training cluster and synchronize.
In embodiment provided by the present application, chooses training node 0 and be used as preset trained node, training node 0 is connecing
The son training accumulated gradient accumulated completion that trained node 0, training node 1, training node 2 issue is received, can be synchronized
After synchronizing information, Xiang Xunlian node 0, training node 1, training node 2 issue the instruction with step training accumulated gradient.
Optionally, the son training accumulated gradient is compressed, obtains the cumulative compression gradient of son training.
Since son training accumulated gradient is after cumulative, causes the capacity of son training accumulated gradient bigger, synchronizing
Cheng Zhong needs to expend longer time and completes parameter synchronization, therefore, can be compressed, be pressed with antithetical phrase training accumulated gradient
The cumulative compression gradient of son training after contracting.
In embodiment provided by the present application, the son training accumulated gradient of acquisition is 32, will be described by compression gradient
Son training accumulated gradient is compressed to 8.By compression son training accumulated gradient, as concentration gradient compresses (Deep Gradient
Compression, DGC) etc., it is possible to reduce the parameter size of son training accumulated gradient reduces the big of parameter in synchronizing process
It is small, call duration time has been saved, model training efficiency is improved.
Step 420: the son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.
Optionally, the cumulative compression gradient of son training through overcompression is synchronized to each instruction of the distributed training cluster
Practice node.
Optionally, by the cumulative compression gradient of son training through overcompression in the distributed training cluster in sequence according to
The secondary synchronization for carrying out final son training accumulated gradient, synchronization times are that the training number of nodes in distributed training cluster subtracts 1.
In embodiment provided by the present application, to realize the distributed load balancing for training cluster, by each training
It is tired that final son training accumulated gradient in node successively carries out final son training in the distributed training cluster in sequence
Add the synchronization of gradient, as shown in table 4, the final son training accumulated gradient a in training node 20+a1+a2It is synchronized to trained node 0
In, the final son training accumulated gradient b in training node 01+b2+b0It is synchronized in trained node 1, it is final in training node 1
Son training accumulated gradient c2+c0+c1It is synchronized in trained node 2.
Table 4
As shown in table 5, the final son training accumulated gradient a in training node 00+a1+a2It is synchronized in trained node 1, instructs
Practice the final son training accumulated gradient b in node 11+b2+b0It is synchronized in trained node 2, the final son training in training node 2
Accumulated gradient c2+c0+c1It is synchronized in trained node 0.So far, in the present embodiment, final son training accumulated gradient is at described point
Cloth trains progress 2 in cluster subsynchronous, completes son training accumulated gradient in the distribution and trains each of cluster training section
Synchronization on point.
Table 5
Gradient synchronous method in distributed training provided by the embodiments of the present application, by the son training for calculating sub- training data
Gradient, and the training gradient accumulation of multistep is obtained into son training accumulation gradient, then by son training accumulation gradient in distribution training
Gradient information transmitting is carried out in cluster on each trained node, reduces the number of communication frequency and gradient information transmitting, effectively
Solve the problems, such as that during model training, gradient information transmitting is time-consuming serious, in same step training accumulated gradient, lead to
Overcompression trains accumulated gradient, and by the compressed cumulative compression gradient of son training in the training node for circularizing connection into
Row synchronizes, and has compressed the parameter in gradient synchronizing process, and the time has been saved in acceleration model training.
It is corresponding with above method embodiment, present invention also provides distribution training in gradient synchronizing device embodiment,
Fig. 5 shows the structural schematic diagram of gradient synchronizing device in the distribution training of the application one embodiment.As shown in figure 5, should
Device includes:
Grouping module 502 is configured as dividing the training data in distributed training cluster on each trained node
Group obtains multiple sub- training datas on each trained node, wherein the training node in distribution training cluster circularizes company
It connects.
Computing module 504 is configured as every sub- training data in the training node for calculating the distributed training cluster
Son training accumulation gradient.
Accumulator module 506 is configured as being obtained and the son training accumulation gradient pair according to the son training accumulation gradient
The son training accumulated gradient answered.
Synchronization module 508 is configured as the son training accumulated gradient being synchronized to the every of the distributed training cluster
A trained node.
Optionally, the grouping module 502 is configured to the number according to training node in distributed training cluster
Amount is grouped the training data on each trained node, obtains multiple sub- training datas on each trained node, wherein
The quantity of the sub- training data obtained is equal with the training quantity of node in the distributed training cluster.
Optionally, the computing module 504 is configured to obtain preset accumulation gradient step number;Described in calculating
The son training gradient of the sub- training data in accumulation gradient step number;Son training gradient in the accumulation gradient step number is tired out
Product obtains son training accumulation gradient.
Optionally, the accumulator module 506 is configured to the son training accumulation gradient in the distribution
It adds up in the trained node of each of training cluster, obtains the cumulative ladder of son training corresponding with the son training accumulation gradient
Degree.
Optionally, the accumulator module, comprising:
First judgment sub-unit, be configured as judging i-th of son training accumulation gradient in currently training node whether be
Starting son training accumulation gradient, wherein i is positive integer.
First transmission sub-unit is configured as starting son training accumulation gradient being sent to latter trained node.
Receiving subelement is configured as in the feelings for receiving i-th of son training accumulation gradient that previous trained node is sent
Under shape, by i-th of son training accumulation gradient of i-th of previous trained node training accumulation gradient and current training node into
Row accumulation operations, the son training accumulated gradient after being added up.
Second judgment sub-unit is configured as whether the sub- training accumulated gradient after judging to add up is that final sub train is added up
Gradient.
Subelement is obtained, is configured as stopping cumulative, acquisition son training accumulated gradient.
Second transmission sub-unit is configured as the son training accumulated gradient after adding up and is sent to latter trained node.
Optionally, described device further include: receive and send instruction module, be configured as default training node and receive described point
The synchronizing information that the trained node of each of cloth training cluster issues, and to each training in the distributed training cluster
Node issues the instruction for synchronizing the son training accumulated gradient.
Optionally, the synchronization module 508 is configured to the son training accumulated gradient according to preset order
It is successively synchronized in the corresponding son training accumulation gradient of each trained node.
Optionally, the synchronization module 508 is configured to compress the son training accumulated gradient, obtain
Obtain the cumulative compression gradient of sub- training;The cumulative compression gradient of son training is synchronized to each instruction of the distributed training cluster
Practice node.
Gradient synchronizing device in distributed training provided by the embodiments of the present application, by the son training for calculating sub- training data
Gradient, and the training gradient accumulation of multistep is obtained into son training accumulation gradient, then by son training accumulation gradient in distribution training
Gradient information transmitting is carried out in cluster on each trained node, reduces the number of communication frequency and gradient information transmitting, effectively
Solve the problems, such as that during model training, gradient information transmitting is time-consuming serious, in same step training accumulated gradient, lead to
Overcompression trains accumulated gradient, and by the compressed cumulative compression gradient of son training in the training node for circularizing connection into
Row synchronizes, and has compressed the parameter in gradient synchronizing process, and the time has been saved in acceleration model training.
A kind of calculating equipment is also provided in one embodiment of the application, including memory, processor and storage are on a memory
And the computer instruction that can be run on a processor, the processor are realized when executing described instruction in the distribution training
The step of gradient synchronous method.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction
The step of gradient synchronous method in distributed training as previously described is realized when being executed by processor.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited
The technical solution of gradient synchronous method belongs to same design in the technical solution of storage media and above-mentioned distribution training, and storage is situated between
The detail content that the technical solution of matter is not described in detail may refer to the technology of gradient synchronous method in above-mentioned distributed training
The description of scheme.
The embodiment of the present application discloses a kind of chip, is stored with computer instruction, real when which is executed by processor
Now as previously described in distributed training the step of gradient synchronous method.
It is above-mentioned that the application specific embodiment is described.Other embodiments are within the scope of the appended claims.
In some cases, the movement recorded in detail in the claims or step can be executed according to the sequence being different from embodiment
And desired result still may be implemented.In addition, process depicted in the drawing not necessarily require the particular order shown or
Person's consecutive order is just able to achieve desired result.In some embodiments, multitasking and parallel processing are also possible
Or it may be advantageous.
The computer instruction includes computer program code, the computer program code can for source code form,
Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute
State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code
Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts
Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen
It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment
All details are described, are not limited the invention to the specific embodiments described.It obviously, can according to present context
It makes many modifications and variations.The application chooses and specifically describes these embodiments, is the original in order to preferably explain the application
Reason and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only authorized
The limitation of sharp claim and its full scope and equivalent.
Claims (11)
1. gradient synchronous method in a kind of distributed training characterized by comprising
Training data on each trained node in distributed training cluster is grouped, is obtained more on each trained node
A sub- training data, wherein the training node in distribution training cluster circularizes connection;
Calculate the son training accumulation gradient of every sub- training data in the training node of the distributed training cluster;
Son training accumulated gradient corresponding with the son training accumulation gradient is obtained according to the son training accumulation gradient;
The son training accumulated gradient is synchronized to the trained node of each of the distributed training cluster.
2. gradient synchronous method in distributed training as described in claim 1, which is characterized in that in distribution training cluster
Training data on each trained node is grouped, and obtains multiple sub- training datas on each trained node, comprising:
The training data on each trained node is grouped according to the quantity of training node in distributed training cluster, is obtained
Multiple sub- training datas on each trained node, wherein the quantity of the sub- training data of acquisition and the distributed training set
The quantity of training node is equal in group.
3. gradient synchronous method in distributed training as described in claim 1, which is characterized in that calculate the distributed training
The son training accumulation gradient of every sub- training data in the training node of cluster, comprising:
Obtain preset accumulation gradient step number;
Calculate the son training gradient of the sub- training data in the accumulation gradient step number;
By the son training gradient accumulation in the accumulation gradient step number, son training accumulation gradient is obtained.
4. gradient synchronous method in distributed training as described in claim 1, which is characterized in that according to the son training accumulation
Gradient obtains son training accumulated gradient corresponding with the son training accumulation gradient, comprising:
The son training accumulation gradient is added up in the trained node of each of the distributed training cluster, acquisition and institute
State the corresponding son training accumulated gradient of son training accumulation gradient.
5. gradient synchronous method in distributed training as claimed in claim 4, which is characterized in that by the son training accumulation ladder
Degree adds up in the trained node of each of the distributed training cluster, obtains corresponding with the son training accumulation gradient
Son training accumulated gradient, comprising:
Whether i-th of son training accumulation gradient in the current training node of judgement is starting son training accumulation gradient, and wherein i is positive
Integer;
If so, starting son training accumulation gradient is sent to latter trained node;
If it is not, in the case of receiving i-th of son training accumulation gradient that previous trained node is sent, by previous trained node
I-th of son training accumulation gradient and i-th of son training accumulation gradient of current training node carry out accumulation operations, added up
Son training accumulated gradient afterwards;
Whether the son training accumulated gradient after judgement is cumulative is final son training accumulated gradient;
If so, stopping cumulative, acquisition son training accumulated gradient;
If it is not, the son training accumulated gradient after will be cumulative is sent to latter trained node.
6. gradient synchronous method in distributed training as described in claim 1, which is characterized in that the son training is cumulative
Gradient is synchronized to before the trained node of each of the distributed training cluster, further includes:
Default training node receives the synchronizing information that the trained node of each of the distributed training cluster issues, and to described
The trained node of each of distribution training cluster issues the instruction for synchronizing the son training accumulated gradient.
7. gradient synchronous method in distributed training as described in claim 1, which is characterized in that by the cumulative ladder of son training
Degree is synchronized to the trained node of each of the distributed training cluster, comprising:
The son training accumulated gradient is successively synchronized to the corresponding son training accumulation ladder of each trained node according to preset order
In degree.
8. gradient synchronous method in distributed training as claimed in claim 7, which is characterized in that by the cumulative ladder of son training
Degree is successively synchronized in the corresponding son training accumulation gradient of each trained node according to preset order, comprising:
The son training accumulated gradient is compressed, the cumulative compression gradient of son training is obtained;
The cumulative compression gradient of son training is synchronized to the trained node of each of the distributed training cluster.
9. gradient synchronizing device in a kind of distributed training characterized by comprising
Grouping module is configured as being grouped the training data in distributed training cluster on each trained node, obtain
Multiple sub- training datas on each trained node, wherein the training node in distribution training cluster circularizes connection;
Computing module is configured as the son training of every sub- training data in the training node for calculating the distributed training cluster
Accumulation gradient;
Accumulator module is configured as obtaining sub- instruction corresponding with the son training accumulation gradient according to the son training accumulation gradient
Practice accumulated gradient;
Synchronization module is configured as the son training accumulated gradient being synchronized to each of distributed training cluster training section
Point.
10. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine instruction, which is characterized in that the processor realizes claim 1-8 or any one the method when executing described instruction
The step of.
11. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor
The step of claim 1-8 any one the method is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760056.XA CN110472731A (en) | 2019-08-16 | 2019-08-16 | Gradient synchronous method and device during a kind of distribution is trained |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760056.XA CN110472731A (en) | 2019-08-16 | 2019-08-16 | Gradient synchronous method and device during a kind of distribution is trained |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472731A true CN110472731A (en) | 2019-11-19 |
Family
ID=68511040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910760056.XA Pending CN110472731A (en) | 2019-08-16 | 2019-08-16 | Gradient synchronous method and device during a kind of distribution is trained |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472731A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723932A (en) * | 2020-06-03 | 2020-09-29 | 上海商汤智能科技有限公司 | Training method of neural network model and related product |
CN114764601A (en) * | 2022-05-05 | 2022-07-19 | 北京瑞莱智慧科技有限公司 | Gradient data fusion method and device and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
-
2019
- 2019-08-16 CN CN201910760056.XA patent/CN110472731A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
Non-Patent Citations (3)
Title |
---|
ANDREW GIBIANSKY: "Bringing HPC Techniques to Deep Learning", 《BLOG》 * |
PASCAL: "PyTorch中在反向传播前为什么要手动将梯度清零", 《知乎》 * |
YUJUN LIN, SONG HAN, HUIZI MAO, YU WANG, WILLIAM J. DALLY: "DEEP GRADIENT COMPRESSION:", 《ICLR 2018》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723932A (en) * | 2020-06-03 | 2020-09-29 | 上海商汤智能科技有限公司 | Training method of neural network model and related product |
CN114764601A (en) * | 2022-05-05 | 2022-07-19 | 北京瑞莱智慧科技有限公司 | Gradient data fusion method and device and storage medium |
CN114764601B (en) * | 2022-05-05 | 2024-01-30 | 北京瑞莱智慧科技有限公司 | Gradient data fusion method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112181666B (en) | Equipment assessment and federal learning importance aggregation method based on edge intelligence | |
CN110619388B (en) | Gradient synchronization method and device in distributed training | |
CN110472731A (en) | Gradient synchronous method and device during a kind of distribution is trained | |
CN109800730A (en) | The method and apparatus for generating model for generating head portrait | |
CN107229966A (en) | A kind of model data update method, apparatus and system | |
CN105765955B (en) | A kind of user management method, terminal and terminal device | |
CN115473901B (en) | Distributed computing power cluster intelligent scheduling method and device and computer equipment | |
CN110689136B (en) | Deep learning model obtaining method, device, equipment and storage medium | |
CN112717388A (en) | Game object display control method and device | |
CN107729570A (en) | Data migration method and device for server | |
CN112288083A (en) | Neural network distributed training method, device, equipment and storage medium | |
Li et al. | Deep neural network based computational resource allocation for mobile edge computing | |
CN110008017A (en) | A kind of distributed processing system(DPS) and method, a kind of calculating equipment and storage medium | |
CN108512817A (en) | More video code conversion dispatching methods and device | |
CN106603689A (en) | Data processing method and device based on distributed message releasing and subscribing system | |
CN110147414A (en) | Entity characterization method and device of knowledge graph | |
CN109861864A (en) | A kind of MAC protocol recognition methods based on LSTM network | |
CN110175171B (en) | System for IT equipment intelligent recommendation of on-shelf position | |
CN107690799B (en) | A kind of method, apparatus, server and computer readable storage medium that data are synchronous | |
CN110310357A (en) | A kind of model interts processing method, device, calculates equipment and storage medium | |
El Gaily et al. | Derivation of Parameters of Quantum optimization in Resource Distribution Management | |
CN110009749B (en) | Virtual object positioning method, device, computing equipment and storage medium | |
CN113420874A (en) | Gradient synchronization method in distributed training and distributed training system | |
CN111309460B (en) | Task processing method of intelligent mobile equipment in mobile edge computing scene | |
CN113657136A (en) | Identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |
|
RJ01 | Rejection of invention patent application after publication |