CN109032630A - The update method of global parameter in a kind of parameter server - Google Patents
The update method of global parameter in a kind of parameter server Download PDFInfo
- Publication number
- CN109032630A CN109032630A CN201810695184.6A CN201810695184A CN109032630A CN 109032630 A CN109032630 A CN 109032630A CN 201810695184 A CN201810695184 A CN 201810695184A CN 109032630 A CN109032630 A CN 109032630A
- Authority
- CN
- China
- Prior art keywords
- parameter
- working node
- global
- global parameter
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Neurology (AREA)
- Computer Security & Cryptography (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a kind of update methods of global parameter in parameter server, parameter server relevant parameter is first set, and global parameter is initialized, then training data is downloaded from database, and it is pre-processed, global parameter after recycling pretreated training parameter and initialization calculates the local parameter of each working node, and local parameter is finally returned to parameter server, is iterated update global parameter by parameter server.
Description
Technical field
The invention belongs to technical field of photo communication, more specifically, are related to global parameter in a kind of parameter server
Update method.
Background technique
The target for designing a distributed machines learning system is can be accelerated --- optimally, it should
Linear acceleration can be obtained.Come in that is, often adding a calculate node, relative to single machine, should just obtain additional 1
Accelerate again.But additional expense would generally be generated due to synchronizing calculating task on different nodes or parameter, this expense can
It can be greater than, even times over computing cost.If the design of this system is unreasonable, this expense will lead to your training
It can not be accelerated on multimachine, even be appeared that when you with the computing resource of several times of single machines come simultaneously
Your machine learning training program of row, it has been found that also slower than single machine.
The framework of PS is in fact and client --- server (CS) framework is similar, and PS has mainly taken out two mainly generally
It reads: parameter server (parameter server) and working node (client, or worker).One has been put inside server
A little data, and calculate node then can send out data or request server return data to server.There are the two concepts, just
Can in the calculation process and PS that distributed machines learn server and working node the two modules do following mapping:
The server end of PS safeguards globally shared model parameter wt, and client then corresponds to each work section for executing calculating task
Point.Meanwhile server end provides two main API:push and pull to client.
At the beginning of each iteration, all clients first call pull to send a request, request to server
Server returns newest model parameter, and after each calculate node receives the model parameter of passback, it is just this part of newest ginseng
In number parameter old before copying and covering, then executes and gradient updating value is calculated.In other words, the pull of PS ensures
Each calculate node calculate start before can obtain the copies of a most recent parameters.
In the actual use process, distributed type assemblies environment itself the problems such as there are network delays, meanwhile, in cluster
Each machine performance has differences, so when distributed deep learning algorithm operates in this kind of isomeric group environment, algorithm
Stability can decline, and it is serious when will appear model and do not restrain.This carries out model using distributed type assemblies with us
Trained original intention is away from each other, and could not achieve the purpose that accelerans network training process.
For asynchronous stochastic gradient descent SGD algorithm, the delay in system, which effectively restrains algorithm, causes influence, i.e.,
When fast working node has been completed iteration several times, and after updated value has been updated to global parameter, at this moment parameter is taken
Business device has received the updated value for the delay that slow working node passes over, and the value of the update of this delay can be same
Update mode is updated into global parameter, and this results in the directions that global parameter deviates from optimal solution again, affect entire mould
The convergence rate of type.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of updates of global parameter in parameter server
Method dynamically updates global parameter according to the degree of delay of weighting parameter, to reduce influence of the high latency to algorithm.
For achieving the above object, in a kind of parameter server of the present invention global parameter update method, feature exists
In, comprising the following steps:
(1), global parameter initializes
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, ginseng
Number server random initializtion global parameters, and by the global parameter w after initialization0It is sent to all working nodes;
(2), training data pre-processes
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m respectively
Working node is stored in the local data block of each working node;
(3), working node is according to global parameter w0It is trained, obtains local parameter;
(3.1), in t-th of timestamp, each working node randomly selects n sample data, n from local data block
< < m;
(3.2), by Mini-batch algorithm, global parameter w is used0N sample data is trained, is obtained each
Each sample training output valve of nodeM is working node number;
(3.3), the loss function value L of j-th of working node is calculatedj:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
(3.4), according to loss function value LjCalculate gradient value ▽ Lj;
(3.5), local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
(4), global parameter updates
(4.1), parameter server successively receives the local parameter that each working node passes overAccording still further to sequencing
A degree of delay is determined for each working node;
dj=t-tτ
Wherein, djFor the degree of delay of j-th of working node, tτIndicate j-th of node last round of time for updating global parameter
Stamp;
(4.2), pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;
(4.3), parameter server updates the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
(4.4), after the completion of the global parameter of all working node when t-th of timestamp updates, t=t+1, parameter are enabled
Server is by updated global parameter wt+1Each corresponding working node is returned to, step (3) is returned again to and repeats the above steps, directly
When stabbing t arrival maximum number of iterations to length of a game, iteration terminates, and completes the update of global parameter.
Goal of the invention of the invention is achieved in that
The update method of global parameter in a kind of parameter server of the present invention, is first arranged parameter server relevant parameter, and
Global parameter is initialized, training data is then downloaded from database, and pre-processed, recycles pretreated training
Global parameter after parameter and initialization calculates the local parameter of each working node, and local parameter is finally returned to parameter clothes
Business device, is iterated update global parameter by parameter server.
Meanwhile the update method of global parameter also has the advantages that in a kind of parameter server of the present invention
(1), this method is changed to the method for transmitting gradient value in network directly to transmit weighting parameter, then in parameter service
Linear interpolation calculating directly is carried out to all parameters on device, it is not convergent on large data collection to solve asynchronous protocol algorithm
Problem, and this method achieves preferable effect in image classification problem.
(2), in isomeric group, this method can perceive the delay for the weighting parameter that each working node transmitting comes
It spends, and dynamically determines the update of global parameter according to the size of degree of delay, can effectively reduce high latency and the overall situation is joined
Several influence, so that algorithm has better stability in isomeric group.
Detailed description of the invention
Fig. 1 is the update method flow chart of global parameter in a kind of parameter server of the present invention;
Fig. 2 is that asynchronous SGD algorithm updates schematic diagram;
Fig. 3 is asynchronous SGD algorithm first step operation schematic diagram;
Fig. 4 is asynchronous SGD algorithm second step operation schematic diagram;
Fig. 5 is that asynchronous biography ginseng SGD algorithm updates schematic diagram;
Fig. 6 is asynchronous biography ginseng SGD algorithm first step operation schematic diagram;
Fig. 7 is asynchronous biography ginseng SGD algorithm second step operation schematic diagram;
Fig. 8 is influence schematic diagram of the high latency updated value to global parameter;
Fig. 9 is the schematic diagram handled using dynamic alpha high latency;
The number of iterations statistic histogram when Figure 10 is convergence;
Figure 11 is the average calculation times statistic histogram of working node;
Time statistic histogram is used when Figure 12 is convergence.
Specific embodiment
A specific embodiment of the invention is described with reference to the accompanying drawing, preferably so as to those skilled in the art
Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps
When can desalinate main contents of the invention, these descriptions will be ignored herein.
Embodiment
Fig. 1 is the update method flow chart of global parameter in a kind of parameter server of the present invention.
In the present embodiment, as shown in Figure 1, in a kind of parameter server of the present invention global parameter update method, including
Following steps:
S1, global parameter initialization
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, ginseng
Number server random initializtion global parameters, and by the global parameter w after initialization0It is sent to all working nodes;
S2, training data pretreatment
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m respectively
Working node is stored in the local data block of each working node;
S3, working node are according to global parameter w0It is trained, obtains local parameter;
S3.1, in t-th of timestamp, each working node randomly selects n sample data, n from local data block
< < m;
S3.2, pass through Mini-batch algorithm, use global parameter w0N sample data is trained, is obtained each
Each sample training output valve of nodeM is working node number;
S3.3, the loss function value L for calculating j-th of working nodej:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
S3.4, according to loss function value LjCalculate gradient value ▽ Lj;
Wherein, weight when carrying out sample data training by Mini-batch algorithm, between neuron a and neuron b
For wab, the output of this layer is x, and upper one layer of output is v, and is met: x=vwab;
S3.5, local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
In the present embodiment, usually between working node and parameter server, the value of transmission is gradient value, i.e. work
The gradient value of current mini-batch has been calculated as node, then gradient value can have been passed to parameter server.Parameter clothes
Business device can be updated global parameter according to certain algorithm after receiving gradient value, and such as synchronous consultation uses average side
All gradient values are done one and are averagely then added in global parameter by formula, and asynchronous protocol then can be directly by each ladder
Angle value is added in global parameter.
But if being updated using gradient value to global parameter, it will appear problem in the asynchronous case, due to each
Working node often passes to gradient value of parameter server, and global parameter will be updated once, and then global parameter exists
Many versions, this has very big influence for effective convergence of algorithm.As shown in Fig. 2, showing in the first row complete
The renewal process of office's parameter, correspond to working node in second row, and when algorithm brings into operation, W1, W2 and W3 have been taken initially
Global parameter w0, it is assumed that the update that global parameter is carried out according to the sequence that working node in figure arranges, when W1 transmits gradient value a
Into parameter server, newest global parameter w can be obtaineda, then the gradient value b of W2 is received by parameter server, and will meter
Obtained global parameter wa,bW2 is returned to, the initial parameter for next iteration.And so on, entirely it is based on asynchronous association
The parameter exchange of view can carry out always.
We can therefrom see the place gone wrong, by taking W1 as an example, in the first step in the algorithm, for trained initial ginseng
Number is w0, gradient value a is calculated, is then passed to parameter server, is still w on parameter server at this time0, pass through calculating
Newest parameter w is obtaineda, and return to initial parameter of the W1 for next iteration.Second is completed repeatedly when W1 is calculated
Obtained gradient value d is passed to parameter server by Dai Hou, and the newest global parameter in parameter server has been w at this timea,b,c
?.Usually, d is from waIt is calculated, is finally but used for parameter wa,b,cOn, in this case, for global parameter
Update to take a fancy to will seem in random change, and when machine quantity increases, as shown in figure 3, this randomness
It will lead to more serious consequence, and Fig. 3 and Fig. 4 illustrates this problem with more intuitive mode.
Assuming that working node is always with identical sequential update parameter, in the first step, all working node will be will use
Identical initial global parameter calculates gradient.As shown in figure 4, solid arrow indicates the update ladder from each working node in figure
Spend vector.It completes to update according to 1,2,3 update sequence, after third platform machine is completed to update, the position of global parameter is come
Final position 4 shown in Fig. 4.In the ideal case, after the completion of update, the updated value that working node 1 is taken has reformed into 1
The value of point, the value that the global parameter that working node 2 is taken is 2 points, the value that the value that working node 3 is taken is 3 points, subsequently into
The calculating of second step, as shown in Figure 4.
The update that working node 1 generates was to be obtained by the parameter training of 1 position, but be transmitted to parameter server originally
Afterwards, the position of gradient value starting point in figure can be used, equally, by the obtained gradient value of the training of working node 2,3 also by
It updates to above global parameter, the parameter position difference that at this moment can find out true global parameter and local computing is non-
Normal is big.If when the case where machine quantity is more, and consideration presence postpones, this species diversity can be more big, it lead to the overall situation
The update of parameter is appeared in random walking.
S4, global parameter update
S4.1, the concept this paper presents degree of delay realize the quantization to delay size, allow each working node
Updated value possesses the degree of delay of oneself, in this way in parameter server, can judge whether the value can be used by degree of delay
In the update of global parameter.The calculation method of degree of delay are as follows:
Parameter server successively receives the local parameter that each working node passes overIt is each according still further to sequencing
Working node determines a degree of delay;
dj=t-tτ
Wherein, djFor the degree of delay of j-th of working node, tτIndicate j-th of node last round of time for updating global parameter
Stamp;
S4.2, pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;It is to guarantee parameter alpha that exponential function, which is used herein, thenjValue still can be 0 to 1
In range.
S4.3, parameter server update the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
S4.4, all working node when t-th of timestamp global parameter update after the completion of, enable t=t+1, parameter
Server is by updated global parameter wt+1Each corresponding working node is returned to, step S3 is returned again to and repeats the above steps, until
When t arrival maximum number of iterations is stabbed by length of a game, iteration terminates, and completes the update of global parameter.
In the present embodiment, as shown in figure 5, the updated value that passes over of each working node can directly to global parameter into
Row updates, and when the performance of working node in cluster is there are when biggish difference, worker1 and worker2 be more from Fig. 5
New sequences can be found, when worker1 transmits the updated value of first time, worker2 is carrying out having updated for third time,
And the global parameter in parameter server has been updated four times, and the delay of the updated value of worker1 is non-at this time
It is often big, but according to the update mechanism of asynchronous SGD algorithm before, which still can directly update global parameter, this is
The unreasonable place of algorithm.By the analysis of front, conclusion, influence of the updated value of high latency to global parameter have been obtained herein
It is algorithm the main reason for performance and efficiency decline in distributed environment.
However, working node shall not be complete in each update in the distributed environment for possessing multiple working nodes
Parameter on full replacement server, because the parameter that a working node passes over only represents the knot of this working node
Fruit.The update of global parameter should contain the parameter value of all working node, i.e., when updating global parameter, need same
When retain the value of original global parameter and parameter value that working node passes over.There are many methods that this point may be implemented, this
The method that selected works have used linear interpolation.
In updating global parameter, it is updated using local parameter and is similar to using the difference that gradient value is updated
Another position was moved to from a position by one o'clock using coordinate and direction respectively.When use direction, due to starting point
Difference, the destination of arrival also can be different, but when use coordinate, regardless of initial position, destination is always identical, Fig. 6
With such case is illustrated in Fig. 7.
As shown in fig. 6, a, b, c point therein indicates the weighting parameter value that working node passes over, it will be assumed that every time
Update sequence be it is fixed, i.e., according to the sequential update of a, b, c, leftmost stain indicates under original state global parameter,
Global parameter value in parameter server, other black color dots 1,2,3 indicate the value obtained after updating, and carry out by update sequence
Mark.As shown, the update result of the first step be labeled as 3 black color dots position.
Analysis second step is carried out by same mode, as shown in Figure 7.Global parameter is after the update of two steps, also
It is that have passed through complete iteration twice, it can be seen that final result position is relative to initial initial value closer to optimum point
There is not randomized jitter in position, so that the case where far from optimum point.
In isomerous environment, influence of the high latency to global parameter in cluster is very big.An example is used herein
It is illustrated, as shown in Figure 8.What rightmost stain represented is newest global parameter in parameter current server, and the value is
Very close to optimum point.At this moment, parameter server has received the very high undated parameter of delay, i.e. position shown in 1 point,
According to the parameter update mechanism of asynchronous SGD algorithm, we can probably calculate the position of new global parameter, i.e., 2 points in figure
Position.It can be seen that at 2 points relative to initial initial value, not only deviate from optimal direction, and the distance deviateed is also very
It is remote, and this problem can be reflected more clearly in postponing serious cluster environment, and global parameter is made big fluctuation occur.
As shown in figure 9, wherein rightmost stain indicates the newest global parameter in parameter current server, 1 point is to prolong
Slow undated parameter, it is assumed here that its degree of delay is 8, and the value that the quantity m of entire cluster is 4, c is set as 1, available α
Value be 0.13, then it can be concluded that new global parameter, the position as shown in 2 points in Fig. 9.As can be seen that when a delay
When spending high local parameter and be pushed to parameter server, when it is updated in global parameter, the value of α can be reduced,
Such result, which is exactly the high local parameter of degree of delay, can become smaller for updating to the weight in new global parameter, thus
The local parameter for making degree of delay high influence global parameter as small as possible is achieved the purpose that.
Example
For test distributed algorithm, there is used herein a servers and four working nodes to have built convenient and fast distribution
Formula cluster environment, hardware and software configuration information are as shown in table 1.
In this experiment, in logic primary server and parameter server is operated above in same physical server,
Data server and Redis are also to operate in above the server simultaneously.Working node in experimental situation, due to being all more
The processor of core, it is possible to simulate the work of more calculate nodes, it is only necessary to open multiple Tab labels pages in a browser
?.
Table 1 is experimental situation configuration information table;
Table 1
All experiments of the present embodiment are to belong to solve the problems, such as image classification, pass through training mistake during analyzing and training
The indexs such as rate are to achieve the purpose that parser efficiency.The present embodiment selected two classical image data set MNIST and
CIFAR10 data set tests new algorithm on both data sets, is capable of the various functions of effective detection algorithm.
MNIST be having a size of $ 28 times28 times1 $ gray scale handwritten numeral image data set.Training dataset includes 10 classes
Other 50000 images.For MNIST data set, the structure of CNN can be directly configured using the user interface of MLitB, configured
Parameter is as shown in table 2.
Table 2 is CNN net-work parameter information table on MNIST data set
Network layer index | Network layer type | Parameter information |
1 | Input layer | Size=(28,28,1) |
2 | Convolutional layer | Size=(5,5), stride=1, filters=8, actFun=relu |
3 | Pond layer | Size=(2,2), stride=2 |
4 | Convolutional layer | Size=(5,5), stride=1, filters=16, actFun=relu |
5 | Pond layer | Size=(3,3), stride=3 |
6 | Full articulamentum | Neurons=10, actFun=softmax |
Table 2
For all experiments, the small lot size N that we usecIt is 100, learning rate η is set as 0.01, each experiment
We can run 5 times, and draw experimental result picture with average result.
In addition, setting 1 and 2 for isomery degree HL in the present embodiment, synchronous SGD algorithm is tested respectively, asynchronous SGD is calculated
Method and the asynchronous performance for passing ginseng SGD algorithm on MNIST data set.
When isomery degree is 1, the calculating time of all working node is maintained at 1 second left side by addition delay by us
The right side, and when isomery degree is 2, we increase to the delay of the working node of half 2 seconds.Code at parameter server end
In, variable step will record the number of iterations of whole system.The average calculation times of working node are by getting each work section
The system timestamp that point calculates front and back every time is calculated, and is printed upon the console of browser, finally by averagely obtaining.
As shown in Figure 10, several algorithms are illustrated to change when isomery degree increases to 2 by 1, algorithm reaches specified error rate
The meaning of the situation of change of generation number, the index is, when reaching convergence, the number of iterations is fewer, shows iteration each time
More effectively updates can be generated, i.e., global parameter can be closer to optimal solution, and the number of iterations is more, shows that algorithm exists
It is effectively updated in entire training process less.And other several algorithms of asynchronous protocol have been used, increasing all occurs in the number of iterations
Long, asynchronous SGD algorithm is directly updated global parameter using gradient value, and when asynchronous degree increases to 2, asynchronous SGD is calculated
Clearly, the case where illustrating randomized jitter, has seriously affected the convergence of model for the number of iterations increase when method reaches convergence.And
Asynchronous biography ginseng SGD algorithm iteration number growth is less, illustrates that this algorithm is the delay that can be perceived in cluster, reduces high latency
Influence to global parameter reduces invalid update.
As shown in figure 11, the growth of several algorithms algorithm average calculation times when isomery degree increases to 2 by 1 is illustrated
Situation.If the average calculation times and when linearly related isomery degree when the isomery degree in cluster increases, explanation
In algorithm operational process, the slow node of the arithmetic speed node fast for arithmetic speed is affected, to occur flat
Calculate the case where linearly increases.It is desired that average calculation times slowly increase when isomery degree increases
It is long, rather than the growth being of a straight line type, it is such the result shows that the slow node of arithmetic speed is fast for arithmetic speed in cluster
Node influence is smaller, increases computing resource utilization rate.The average calculation times of synchronous SGD algorithm are almost exactly a kind of linear increasing
Long situation, when isomery degree is 2, average calculation times increase original 2 times, because stringent synchronization mechanism makes often
The primary calculating time is all determined that experimental result matches with theoretical analysis by most slow working node.And it uses
The average calculation times of three kinds of algorithms of asynchronous protocol increase less, and in conjunction with the number of iterations and average calculation times, the two refer to
The average calculation times growth of mark, three kinds of algorithms is almost identical, illustrates that asynchronous protocol makes the computing resource in cluster
It is fully utilized.
As shown in figure 12, illustrate several algorithms makes when isomery degree increases to 2 by 1, algorithm reaches specified error rate
Time change situation.When the isomery degree in cluster increases, the whole service time of algorithm is checked, it can be intuitive
The speed of service for judging algorithm, in conjunction with both the above index, it can be determined that go out algorithm comprehensive performance, can an algorithm
Reach convergence in the effective time to be very important.Although synchronous SGD algorithm shows and bad on average calculation times,
It is effective update times is most, so upper or in acceptable degree in whole runing time performance.Asynchronous SGD
Algorithm there are obvious disadvantage, when isomery degree increases, will lead to the entirety of algorithm on average calculation times and the number of iterations
Runing time greatly increases, so that the efficiency of algorithm reduces obviously.And asynchronous biography ginseng SGD algorithm is temporal in overall operation
Performance is still best.In conclusion asynchronous biography ginseng SGD algorithm has extraordinary stability when isomery degree increases.
To sum up, method provided by the present invention is changed the value transmitted between parameter server and working node by gradient value
For weighting parameter, which can also be according to the degree of delay of weighting parameter, the dynamic update mechanism for changing global parameter, thus
Reduce influence of the high latency to algorithm.Experiment shows that this algorithm achieves preferable effect in image classification problem, and
Operation that also can be stable in isomerous environment.
Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art
Personnel understand the present invention, it should be apparent that the present invention is not limited to the range of specific embodiment, to the common skill of the art
For art personnel, if various change the attached claims limit and determine the spirit and scope of the present invention in, these
Variation is it will be apparent that all utilize the innovation and creation of present inventive concept in the column of protection.
Claims (1)
1. the update method of global parameter in a kind of parameter server, which comprises the following steps:
(1), global parameter initializes
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, parameter clothes
It is engaged in device random initializtion global parameter, and by the global parameter w after initialization0It is sent to all working nodes;
(2), training data pre-processes
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m work respectively
Node is stored in the local data block of each working node;
(3), working node is according to global parameter w0It is trained, obtains local parameter;
(3.1), in t-th of timestamp, each working node node randomly selects n sample data, n from local data block
< < m;
(3.2), by Mini-batch algorithm, global parameter w is used0N sample data is trained, each node is obtained
Each sample training output valveI=1,2 ..., n, j=1,2 ..., m, m are working node number;
(3.3), the loss function value L of j-th of working node is calculatedj:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
(3.4), gradient value ▽ L is calculated according to loss function value L;
(3.5), local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
(4), global parameter updates
(4.1), parameter server successively receives the local parameter that each working node passes overIt is every according still further to sequencing
A working node determines a degree of delay;
dj=t-tτ
Wherein, djFor the degree of delay of the working node, tτIndicate the last round of timestamp for updating global parameter of j-th of node;
(4.2), pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;
(4.3), parameter server updates the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
(4.4), after the completion of the global parameter of all working node when t-th of timestamp updates, t=t+1, parameter service are enabled
Device is by updated global parameter wt+1Each corresponding working node is returned to, step (3) is returned again to and repeats the above steps, Zhi Daoquan
When office timestamp t reaches maximum number of iterations, iteration terminates, and completes the update of global parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695184.6A CN109032630B (en) | 2018-06-29 | 2018-06-29 | Method for updating global parameters in parameter server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695184.6A CN109032630B (en) | 2018-06-29 | 2018-06-29 | Method for updating global parameters in parameter server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109032630A true CN109032630A (en) | 2018-12-18 |
CN109032630B CN109032630B (en) | 2021-05-14 |
Family
ID=65520873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810695184.6A Active CN109032630B (en) | 2018-06-29 | 2018-06-29 | Method for updating global parameters in parameter server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109032630B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109710289A (en) * | 2018-12-21 | 2019-05-03 | 南京邮电大学 | The update method of distributed parameters server based on deeply learning algorithm |
CN110929878A (en) * | 2019-10-30 | 2020-03-27 | 同济大学 | Distributed random gradient descent method |
CN111461207A (en) * | 2020-03-30 | 2020-07-28 | 北京奇艺世纪科技有限公司 | Picture recognition model training system and method |
CN112990422A (en) * | 2019-12-12 | 2021-06-18 | 中科寒武纪科技股份有限公司 | Parameter server, client and weight parameter processing method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140067738A1 (en) * | 2012-08-28 | 2014-03-06 | International Business Machines Corporation | Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization |
CN105630882A (en) * | 2015-12-18 | 2016-06-01 | 哈尔滨工业大学深圳研究生院 | Remote sensing data deep learning based offshore pollutant identifying and tracking method |
CN106709565A (en) * | 2016-11-16 | 2017-05-24 | 广州视源电子科技股份有限公司 | Optimization method and device for neural network |
CN107784364A (en) * | 2016-08-25 | 2018-03-09 | 微软技术许可有限责任公司 | The asynchronous training of machine learning model |
-
2018
- 2018-06-29 CN CN201810695184.6A patent/CN109032630B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140067738A1 (en) * | 2012-08-28 | 2014-03-06 | International Business Machines Corporation | Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization |
CN105630882A (en) * | 2015-12-18 | 2016-06-01 | 哈尔滨工业大学深圳研究生院 | Remote sensing data deep learning based offshore pollutant identifying and tracking method |
CN107784364A (en) * | 2016-08-25 | 2018-03-09 | 微软技术许可有限责任公司 | The asynchronous training of machine learning model |
CN106709565A (en) * | 2016-11-16 | 2017-05-24 | 广州视源电子科技股份有限公司 | Optimization method and device for neural network |
Non-Patent Citations (2)
Title |
---|
夏亚峰 等: "对数正态参数估计的损失函数和风险函数的Bayes推断", 《兰州理工大学学报》 * |
肖红 等: "基于分段线性插值的过程神经网络训练", 《计算机工程》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109710289A (en) * | 2018-12-21 | 2019-05-03 | 南京邮电大学 | The update method of distributed parameters server based on deeply learning algorithm |
CN110929878A (en) * | 2019-10-30 | 2020-03-27 | 同济大学 | Distributed random gradient descent method |
CN110929878B (en) * | 2019-10-30 | 2023-07-04 | 同济大学 | Distributed random gradient descent method |
CN112990422A (en) * | 2019-12-12 | 2021-06-18 | 中科寒武纪科技股份有限公司 | Parameter server, client and weight parameter processing method and system |
CN111461207A (en) * | 2020-03-30 | 2020-07-28 | 北京奇艺世纪科技有限公司 | Picture recognition model training system and method |
Also Published As
Publication number | Publication date |
---|---|
CN109032630B (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109032630A (en) | The update method of global parameter in a kind of parameter server | |
Nguyen et al. | Federated learning with buffered asynchronous aggregation | |
Liu et al. | Adaptive asynchronous federated learning in resource-constrained edge computing | |
CN104714852B (en) | A kind of parameter synchronization optimization method and its system suitable for distributed machines study | |
CN110533183A (en) | The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning | |
KR101046614B1 (en) | Learner for resource limited devices | |
CN106156810A (en) | General-purpose machinery learning algorithm model training method, system and calculating node | |
CN107590139B (en) | Knowledge graph representation learning method based on cyclic matrix translation | |
CN103942197B (en) | Data monitoring processing method and equipment | |
CN108009642A (en) | Distributed machines learning method and system | |
CN105989374A (en) | Online model training method and equipment | |
CN104077425B (en) | A kind of text editing real-time collaborative method based on operation conversion | |
Zhan et al. | Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking | |
Ribero et al. | Federated learning under intermittent client availability and time-varying communication constraints | |
CN102289491B (en) | Parallel application performance vulnerability analyzing method and system based on fuzzy rule reasoning | |
CN114428907B (en) | Information searching method, device, electronic equipment and storage medium | |
CN109408669A (en) | A kind of content auditing method and device for different application scene | |
Xia et al. | PervasiveFL: Pervasive federated learning for heterogeneous IoT systems | |
CN106156142A (en) | The processing method of a kind of text cluster, server and system | |
CN105786979B (en) | Hidden link-based behavior analysis method and system for user to participate in hot topic | |
US11275756B2 (en) | System for extracting, categorizing and analyzing data for training user selection of products and services, and a method thereof | |
CN109711555B (en) | Method and system for predicting single-round iteration time of deep learning model | |
CN115186738B (en) | Model training method, device and storage medium | |
CN106533756B (en) | A kind of communication feature extracts, flow generation method and device | |
CN106294457A (en) | Network information push method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |