CN109032630A - The update method of global parameter in a kind of parameter server - Google Patents

The update method of global parameter in a kind of parameter server Download PDF

Info

Publication number
CN109032630A
CN109032630A CN201810695184.6A CN201810695184A CN109032630A CN 109032630 A CN109032630 A CN 109032630A CN 201810695184 A CN201810695184 A CN 201810695184A CN 109032630 A CN109032630 A CN 109032630A
Authority
CN
China
Prior art keywords
parameter
working node
global
global parameter
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810695184.6A
Other languages
Chinese (zh)
Other versions
CN109032630B (en
Inventor
徐杰
唐淳
田野
盛纾纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810695184.6A priority Critical patent/CN109032630B/en
Publication of CN109032630A publication Critical patent/CN109032630A/en
Application granted granted Critical
Publication of CN109032630B publication Critical patent/CN109032630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Neurology (AREA)
  • Computer Security & Cryptography (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a kind of update methods of global parameter in parameter server, parameter server relevant parameter is first set, and global parameter is initialized, then training data is downloaded from database, and it is pre-processed, global parameter after recycling pretreated training parameter and initialization calculates the local parameter of each working node, and local parameter is finally returned to parameter server, is iterated update global parameter by parameter server.

Description

The update method of global parameter in a kind of parameter server
Technical field
The invention belongs to technical field of photo communication, more specifically, are related to global parameter in a kind of parameter server Update method.
Background technique
The target for designing a distributed machines learning system is can be accelerated --- optimally, it should Linear acceleration can be obtained.Come in that is, often adding a calculate node, relative to single machine, should just obtain additional 1 Accelerate again.But additional expense would generally be generated due to synchronizing calculating task on different nodes or parameter, this expense can It can be greater than, even times over computing cost.If the design of this system is unreasonable, this expense will lead to your training It can not be accelerated on multimachine, even be appeared that when you with the computing resource of several times of single machines come simultaneously Your machine learning training program of row, it has been found that also slower than single machine.
The framework of PS is in fact and client --- server (CS) framework is similar, and PS has mainly taken out two mainly generally It reads: parameter server (parameter server) and working node (client, or worker).One has been put inside server A little data, and calculate node then can send out data or request server return data to server.There are the two concepts, just Can in the calculation process and PS that distributed machines learn server and working node the two modules do following mapping: The server end of PS safeguards globally shared model parameter wt, and client then corresponds to each work section for executing calculating task Point.Meanwhile server end provides two main API:push and pull to client.
At the beginning of each iteration, all clients first call pull to send a request, request to server Server returns newest model parameter, and after each calculate node receives the model parameter of passback, it is just this part of newest ginseng In number parameter old before copying and covering, then executes and gradient updating value is calculated.In other words, the pull of PS ensures Each calculate node calculate start before can obtain the copies of a most recent parameters.
In the actual use process, distributed type assemblies environment itself the problems such as there are network delays, meanwhile, in cluster Each machine performance has differences, so when distributed deep learning algorithm operates in this kind of isomeric group environment, algorithm Stability can decline, and it is serious when will appear model and do not restrain.This carries out model using distributed type assemblies with us Trained original intention is away from each other, and could not achieve the purpose that accelerans network training process.
For asynchronous stochastic gradient descent SGD algorithm, the delay in system, which effectively restrains algorithm, causes influence, i.e., When fast working node has been completed iteration several times, and after updated value has been updated to global parameter, at this moment parameter is taken Business device has received the updated value for the delay that slow working node passes over, and the value of the update of this delay can be same Update mode is updated into global parameter, and this results in the directions that global parameter deviates from optimal solution again, affect entire mould The convergence rate of type.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of updates of global parameter in parameter server Method dynamically updates global parameter according to the degree of delay of weighting parameter, to reduce influence of the high latency to algorithm.
For achieving the above object, in a kind of parameter server of the present invention global parameter update method, feature exists In, comprising the following steps:
(1), global parameter initializes
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, ginseng Number server random initializtion global parameters, and by the global parameter w after initialization0It is sent to all working nodes;
(2), training data pre-processes
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m respectively Working node is stored in the local data block of each working node;
(3), working node is according to global parameter w0It is trained, obtains local parameter;
(3.1), in t-th of timestamp, each working node randomly selects n sample data, n from local data block < < m;
(3.2), by Mini-batch algorithm, global parameter w is used0N sample data is trained, is obtained each Each sample training output valve of nodeM is working node number;
(3.3), the loss function value L of j-th of working node is calculatedj:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
(3.4), according to loss function value LjCalculate gradient value ▽ Lj
(3.5), local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
(4), global parameter updates
(4.1), parameter server successively receives the local parameter that each working node passes overAccording still further to sequencing A degree of delay is determined for each working node;
dj=t-tτ
Wherein, djFor the degree of delay of j-th of working node, tτIndicate j-th of node last round of time for updating global parameter Stamp;
(4.2), pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;
(4.3), parameter server updates the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
(4.4), after the completion of the global parameter of all working node when t-th of timestamp updates, t=t+1, parameter are enabled Server is by updated global parameter wt+1Each corresponding working node is returned to, step (3) is returned again to and repeats the above steps, directly When stabbing t arrival maximum number of iterations to length of a game, iteration terminates, and completes the update of global parameter.
Goal of the invention of the invention is achieved in that
The update method of global parameter in a kind of parameter server of the present invention, is first arranged parameter server relevant parameter, and Global parameter is initialized, training data is then downloaded from database, and pre-processed, recycles pretreated training Global parameter after parameter and initialization calculates the local parameter of each working node, and local parameter is finally returned to parameter clothes Business device, is iterated update global parameter by parameter server.
Meanwhile the update method of global parameter also has the advantages that in a kind of parameter server of the present invention
(1), this method is changed to the method for transmitting gradient value in network directly to transmit weighting parameter, then in parameter service Linear interpolation calculating directly is carried out to all parameters on device, it is not convergent on large data collection to solve asynchronous protocol algorithm Problem, and this method achieves preferable effect in image classification problem.
(2), in isomeric group, this method can perceive the delay for the weighting parameter that each working node transmitting comes It spends, and dynamically determines the update of global parameter according to the size of degree of delay, can effectively reduce high latency and the overall situation is joined Several influence, so that algorithm has better stability in isomeric group.
Detailed description of the invention
Fig. 1 is the update method flow chart of global parameter in a kind of parameter server of the present invention;
Fig. 2 is that asynchronous SGD algorithm updates schematic diagram;
Fig. 3 is asynchronous SGD algorithm first step operation schematic diagram;
Fig. 4 is asynchronous SGD algorithm second step operation schematic diagram;
Fig. 5 is that asynchronous biography ginseng SGD algorithm updates schematic diagram;
Fig. 6 is asynchronous biography ginseng SGD algorithm first step operation schematic diagram;
Fig. 7 is asynchronous biography ginseng SGD algorithm second step operation schematic diagram;
Fig. 8 is influence schematic diagram of the high latency updated value to global parameter;
Fig. 9 is the schematic diagram handled using dynamic alpha high latency;
The number of iterations statistic histogram when Figure 10 is convergence;
Figure 11 is the average calculation times statistic histogram of working node;
Time statistic histogram is used when Figure 12 is convergence.
Specific embodiment
A specific embodiment of the invention is described with reference to the accompanying drawing, preferably so as to those skilled in the art Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps When can desalinate main contents of the invention, these descriptions will be ignored herein.
Embodiment
Fig. 1 is the update method flow chart of global parameter in a kind of parameter server of the present invention.
In the present embodiment, as shown in Figure 1, in a kind of parameter server of the present invention global parameter update method, including Following steps:
S1, global parameter initialization
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, ginseng Number server random initializtion global parameters, and by the global parameter w after initialization0It is sent to all working nodes;
S2, training data pretreatment
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m respectively Working node is stored in the local data block of each working node;
S3, working node are according to global parameter w0It is trained, obtains local parameter;
S3.1, in t-th of timestamp, each working node randomly selects n sample data, n from local data block < < m;
S3.2, pass through Mini-batch algorithm, use global parameter w0N sample data is trained, is obtained each Each sample training output valve of nodeM is working node number;
S3.3, the loss function value L for calculating j-th of working nodej:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
S3.4, according to loss function value LjCalculate gradient value ▽ Lj
Wherein, weight when carrying out sample data training by Mini-batch algorithm, between neuron a and neuron b For wab, the output of this layer is x, and upper one layer of output is v, and is met: x=vwab
S3.5, local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
In the present embodiment, usually between working node and parameter server, the value of transmission is gradient value, i.e. work The gradient value of current mini-batch has been calculated as node, then gradient value can have been passed to parameter server.Parameter clothes Business device can be updated global parameter according to certain algorithm after receiving gradient value, and such as synchronous consultation uses average side All gradient values are done one and are averagely then added in global parameter by formula, and asynchronous protocol then can be directly by each ladder Angle value is added in global parameter.
But if being updated using gradient value to global parameter, it will appear problem in the asynchronous case, due to each Working node often passes to gradient value of parameter server, and global parameter will be updated once, and then global parameter exists Many versions, this has very big influence for effective convergence of algorithm.As shown in Fig. 2, showing in the first row complete The renewal process of office's parameter, correspond to working node in second row, and when algorithm brings into operation, W1, W2 and W3 have been taken initially Global parameter w0, it is assumed that the update that global parameter is carried out according to the sequence that working node in figure arranges, when W1 transmits gradient value a Into parameter server, newest global parameter w can be obtaineda, then the gradient value b of W2 is received by parameter server, and will meter Obtained global parameter wa,bW2 is returned to, the initial parameter for next iteration.And so on, entirely it is based on asynchronous association The parameter exchange of view can carry out always.
We can therefrom see the place gone wrong, by taking W1 as an example, in the first step in the algorithm, for trained initial ginseng Number is w0, gradient value a is calculated, is then passed to parameter server, is still w on parameter server at this time0, pass through calculating Newest parameter w is obtaineda, and return to initial parameter of the W1 for next iteration.Second is completed repeatedly when W1 is calculated Obtained gradient value d is passed to parameter server by Dai Hou, and the newest global parameter in parameter server has been w at this timea,b,c ?.Usually, d is from waIt is calculated, is finally but used for parameter wa,b,cOn, in this case, for global parameter Update to take a fancy to will seem in random change, and when machine quantity increases, as shown in figure 3, this randomness It will lead to more serious consequence, and Fig. 3 and Fig. 4 illustrates this problem with more intuitive mode.
Assuming that working node is always with identical sequential update parameter, in the first step, all working node will be will use Identical initial global parameter calculates gradient.As shown in figure 4, solid arrow indicates the update ladder from each working node in figure Spend vector.It completes to update according to 1,2,3 update sequence, after third platform machine is completed to update, the position of global parameter is come Final position 4 shown in Fig. 4.In the ideal case, after the completion of update, the updated value that working node 1 is taken has reformed into 1 The value of point, the value that the global parameter that working node 2 is taken is 2 points, the value that the value that working node 3 is taken is 3 points, subsequently into The calculating of second step, as shown in Figure 4.
The update that working node 1 generates was to be obtained by the parameter training of 1 position, but be transmitted to parameter server originally Afterwards, the position of gradient value starting point in figure can be used, equally, by the obtained gradient value of the training of working node 2,3 also by It updates to above global parameter, the parameter position difference that at this moment can find out true global parameter and local computing is non- Normal is big.If when the case where machine quantity is more, and consideration presence postpones, this species diversity can be more big, it lead to the overall situation The update of parameter is appeared in random walking.
S4, global parameter update
S4.1, the concept this paper presents degree of delay realize the quantization to delay size, allow each working node Updated value possesses the degree of delay of oneself, in this way in parameter server, can judge whether the value can be used by degree of delay In the update of global parameter.The calculation method of degree of delay are as follows:
Parameter server successively receives the local parameter that each working node passes overIt is each according still further to sequencing Working node determines a degree of delay;
dj=t-tτ
Wherein, djFor the degree of delay of j-th of working node, tτIndicate j-th of node last round of time for updating global parameter Stamp;
S4.2, pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;It is to guarantee parameter alpha that exponential function, which is used herein, thenjValue still can be 0 to 1 In range.
S4.3, parameter server update the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
S4.4, all working node when t-th of timestamp global parameter update after the completion of, enable t=t+1, parameter Server is by updated global parameter wt+1Each corresponding working node is returned to, step S3 is returned again to and repeats the above steps, until When t arrival maximum number of iterations is stabbed by length of a game, iteration terminates, and completes the update of global parameter.
In the present embodiment, as shown in figure 5, the updated value that passes over of each working node can directly to global parameter into Row updates, and when the performance of working node in cluster is there are when biggish difference, worker1 and worker2 be more from Fig. 5 New sequences can be found, when worker1 transmits the updated value of first time, worker2 is carrying out having updated for third time, And the global parameter in parameter server has been updated four times, and the delay of the updated value of worker1 is non-at this time It is often big, but according to the update mechanism of asynchronous SGD algorithm before, which still can directly update global parameter, this is The unreasonable place of algorithm.By the analysis of front, conclusion, influence of the updated value of high latency to global parameter have been obtained herein It is algorithm the main reason for performance and efficiency decline in distributed environment.
However, working node shall not be complete in each update in the distributed environment for possessing multiple working nodes Parameter on full replacement server, because the parameter that a working node passes over only represents the knot of this working node Fruit.The update of global parameter should contain the parameter value of all working node, i.e., when updating global parameter, need same When retain the value of original global parameter and parameter value that working node passes over.There are many methods that this point may be implemented, this The method that selected works have used linear interpolation.
In updating global parameter, it is updated using local parameter and is similar to using the difference that gradient value is updated Another position was moved to from a position by one o'clock using coordinate and direction respectively.When use direction, due to starting point Difference, the destination of arrival also can be different, but when use coordinate, regardless of initial position, destination is always identical, Fig. 6 With such case is illustrated in Fig. 7.
As shown in fig. 6, a, b, c point therein indicates the weighting parameter value that working node passes over, it will be assumed that every time Update sequence be it is fixed, i.e., according to the sequential update of a, b, c, leftmost stain indicates under original state global parameter, Global parameter value in parameter server, other black color dots 1,2,3 indicate the value obtained after updating, and carry out by update sequence Mark.As shown, the update result of the first step be labeled as 3 black color dots position.
Analysis second step is carried out by same mode, as shown in Figure 7.Global parameter is after the update of two steps, also It is that have passed through complete iteration twice, it can be seen that final result position is relative to initial initial value closer to optimum point There is not randomized jitter in position, so that the case where far from optimum point.
In isomerous environment, influence of the high latency to global parameter in cluster is very big.An example is used herein It is illustrated, as shown in Figure 8.What rightmost stain represented is newest global parameter in parameter current server, and the value is Very close to optimum point.At this moment, parameter server has received the very high undated parameter of delay, i.e. position shown in 1 point, According to the parameter update mechanism of asynchronous SGD algorithm, we can probably calculate the position of new global parameter, i.e., 2 points in figure Position.It can be seen that at 2 points relative to initial initial value, not only deviate from optimal direction, and the distance deviateed is also very It is remote, and this problem can be reflected more clearly in postponing serious cluster environment, and global parameter is made big fluctuation occur.
As shown in figure 9, wherein rightmost stain indicates the newest global parameter in parameter current server, 1 point is to prolong Slow undated parameter, it is assumed here that its degree of delay is 8, and the value that the quantity m of entire cluster is 4, c is set as 1, available α Value be 0.13, then it can be concluded that new global parameter, the position as shown in 2 points in Fig. 9.As can be seen that when a delay When spending high local parameter and be pushed to parameter server, when it is updated in global parameter, the value of α can be reduced, Such result, which is exactly the high local parameter of degree of delay, can become smaller for updating to the weight in new global parameter, thus The local parameter for making degree of delay high influence global parameter as small as possible is achieved the purpose that.
Example
For test distributed algorithm, there is used herein a servers and four working nodes to have built convenient and fast distribution Formula cluster environment, hardware and software configuration information are as shown in table 1.
In this experiment, in logic primary server and parameter server is operated above in same physical server, Data server and Redis are also to operate in above the server simultaneously.Working node in experimental situation, due to being all more The processor of core, it is possible to simulate the work of more calculate nodes, it is only necessary to open multiple Tab labels pages in a browser ?.
Table 1 is experimental situation configuration information table;
Table 1
All experiments of the present embodiment are to belong to solve the problems, such as image classification, pass through training mistake during analyzing and training The indexs such as rate are to achieve the purpose that parser efficiency.The present embodiment selected two classical image data set MNIST and CIFAR10 data set tests new algorithm on both data sets, is capable of the various functions of effective detection algorithm. MNIST be having a size of $ 28 times28 times1 $ gray scale handwritten numeral image data set.Training dataset includes 10 classes Other 50000 images.For MNIST data set, the structure of CNN can be directly configured using the user interface of MLitB, configured Parameter is as shown in table 2.
Table 2 is CNN net-work parameter information table on MNIST data set
Network layer index Network layer type Parameter information
1 Input layer Size=(28,28,1)
2 Convolutional layer Size=(5,5), stride=1, filters=8, actFun=relu
3 Pond layer Size=(2,2), stride=2
4 Convolutional layer Size=(5,5), stride=1, filters=16, actFun=relu
5 Pond layer Size=(3,3), stride=3
6 Full articulamentum Neurons=10, actFun=softmax
Table 2
For all experiments, the small lot size N that we usecIt is 100, learning rate η is set as 0.01, each experiment We can run 5 times, and draw experimental result picture with average result.
In addition, setting 1 and 2 for isomery degree HL in the present embodiment, synchronous SGD algorithm is tested respectively, asynchronous SGD is calculated Method and the asynchronous performance for passing ginseng SGD algorithm on MNIST data set.
When isomery degree is 1, the calculating time of all working node is maintained at 1 second left side by addition delay by us The right side, and when isomery degree is 2, we increase to the delay of the working node of half 2 seconds.Code at parameter server end In, variable step will record the number of iterations of whole system.The average calculation times of working node are by getting each work section The system timestamp that point calculates front and back every time is calculated, and is printed upon the console of browser, finally by averagely obtaining.
As shown in Figure 10, several algorithms are illustrated to change when isomery degree increases to 2 by 1, algorithm reaches specified error rate The meaning of the situation of change of generation number, the index is, when reaching convergence, the number of iterations is fewer, shows iteration each time More effectively updates can be generated, i.e., global parameter can be closer to optimal solution, and the number of iterations is more, shows that algorithm exists It is effectively updated in entire training process less.And other several algorithms of asynchronous protocol have been used, increasing all occurs in the number of iterations Long, asynchronous SGD algorithm is directly updated global parameter using gradient value, and when asynchronous degree increases to 2, asynchronous SGD is calculated Clearly, the case where illustrating randomized jitter, has seriously affected the convergence of model for the number of iterations increase when method reaches convergence.And Asynchronous biography ginseng SGD algorithm iteration number growth is less, illustrates that this algorithm is the delay that can be perceived in cluster, reduces high latency Influence to global parameter reduces invalid update.
As shown in figure 11, the growth of several algorithms algorithm average calculation times when isomery degree increases to 2 by 1 is illustrated Situation.If the average calculation times and when linearly related isomery degree when the isomery degree in cluster increases, explanation In algorithm operational process, the slow node of the arithmetic speed node fast for arithmetic speed is affected, to occur flat Calculate the case where linearly increases.It is desired that average calculation times slowly increase when isomery degree increases It is long, rather than the growth being of a straight line type, it is such the result shows that the slow node of arithmetic speed is fast for arithmetic speed in cluster Node influence is smaller, increases computing resource utilization rate.The average calculation times of synchronous SGD algorithm are almost exactly a kind of linear increasing Long situation, when isomery degree is 2, average calculation times increase original 2 times, because stringent synchronization mechanism makes often The primary calculating time is all determined that experimental result matches with theoretical analysis by most slow working node.And it uses The average calculation times of three kinds of algorithms of asynchronous protocol increase less, and in conjunction with the number of iterations and average calculation times, the two refer to The average calculation times growth of mark, three kinds of algorithms is almost identical, illustrates that asynchronous protocol makes the computing resource in cluster It is fully utilized.
As shown in figure 12, illustrate several algorithms makes when isomery degree increases to 2 by 1, algorithm reaches specified error rate Time change situation.When the isomery degree in cluster increases, the whole service time of algorithm is checked, it can be intuitive The speed of service for judging algorithm, in conjunction with both the above index, it can be determined that go out algorithm comprehensive performance, can an algorithm Reach convergence in the effective time to be very important.Although synchronous SGD algorithm shows and bad on average calculation times, It is effective update times is most, so upper or in acceptable degree in whole runing time performance.Asynchronous SGD Algorithm there are obvious disadvantage, when isomery degree increases, will lead to the entirety of algorithm on average calculation times and the number of iterations Runing time greatly increases, so that the efficiency of algorithm reduces obviously.And asynchronous biography ginseng SGD algorithm is temporal in overall operation Performance is still best.In conclusion asynchronous biography ginseng SGD algorithm has extraordinary stability when isomery degree increases.
To sum up, method provided by the present invention is changed the value transmitted between parameter server and working node by gradient value For weighting parameter, which can also be according to the degree of delay of weighting parameter, the dynamic update mechanism for changing global parameter, thus Reduce influence of the high latency to algorithm.Experiment shows that this algorithm achieves preferable effect in image classification problem, and Operation that also can be stable in isomerous environment.
Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the present invention is not limited to the range of specific embodiment, to the common skill of the art For art personnel, if various change the attached claims limit and determine the spirit and scope of the present invention in, these Variation is it will be apparent that all utilize the innovation and creation of present inventive concept in the column of protection.

Claims (1)

1. the update method of global parameter in a kind of parameter server, which comprises the following steps:
(1), global parameter initializes
Length of a game is set and stabs t, setting one trained maximum number of iterations T, t=0,1,2 ..., T-1;When t=0, parameter clothes It is engaged in device random initializtion global parameter, and by the global parameter w after initialization0It is sent to all working nodes;
(2), training data pre-processes
Multiple training datas are downloaded from database, are divided into m parts according to working node number, then are issued to m work respectively Node is stored in the local data block of each working node;
(3), working node is according to global parameter w0It is trained, obtains local parameter;
(3.1), in t-th of timestamp, each working node node randomly selects n sample data, n from local data block < < m;
(3.2), by Mini-batch algorithm, global parameter w is used0N sample data is trained, each node is obtained Each sample training output valveI=1,2 ..., n, j=1,2 ..., m, m are working node number;
(3.3), the loss function value L of j-th of working node is calculatedj:
Wherein,Indicate desired output when j-th of working node carries out i-th of sample training;
(3.4), gradient value ▽ L is calculated according to loss function value L;
(3.5), local parameter of j-th of working node in t-th of timestamp is calculated
Wherein, η indicates learning rate;
(4), global parameter updates
(4.1), parameter server successively receives the local parameter that each working node passes overIt is every according still further to sequencing A working node determines a degree of delay;
dj=t-tτ
Wherein, djFor the degree of delay of the working node, tτIndicate the last round of timestamp for updating global parameter of j-th of node;
(4.2), pass through the degree of delay d of j-th of working nodej, calculate the parameter alpha of j-th of working nodej:
Wherein, c is constant;
(4.3), parameter server updates the global parameter of j-th of working node
Similarly, the global parameter of remaining working node when parameter server successively updates t-th of timestamp;
(4.4), after the completion of the global parameter of all working node when t-th of timestamp updates, t=t+1, parameter service are enabled Device is by updated global parameter wt+1Each corresponding working node is returned to, step (3) is returned again to and repeats the above steps, Zhi Daoquan When office timestamp t reaches maximum number of iterations, iteration terminates, and completes the update of global parameter.
CN201810695184.6A 2018-06-29 2018-06-29 Method for updating global parameters in parameter server Active CN109032630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810695184.6A CN109032630B (en) 2018-06-29 2018-06-29 Method for updating global parameters in parameter server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810695184.6A CN109032630B (en) 2018-06-29 2018-06-29 Method for updating global parameters in parameter server

Publications (2)

Publication Number Publication Date
CN109032630A true CN109032630A (en) 2018-12-18
CN109032630B CN109032630B (en) 2021-05-14

Family

ID=65520873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810695184.6A Active CN109032630B (en) 2018-06-29 2018-06-29 Method for updating global parameters in parameter server

Country Status (1)

Country Link
CN (1) CN109032630B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710289A (en) * 2018-12-21 2019-05-03 南京邮电大学 The update method of distributed parameters server based on deeply learning algorithm
CN110929878A (en) * 2019-10-30 2020-03-27 同济大学 Distributed random gradient descent method
CN111461207A (en) * 2020-03-30 2020-07-28 北京奇艺世纪科技有限公司 Picture recognition model training system and method
CN112990422A (en) * 2019-12-12 2021-06-18 中科寒武纪科技股份有限公司 Parameter server, client and weight parameter processing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067738A1 (en) * 2012-08-28 2014-03-06 International Business Machines Corporation Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization
CN105630882A (en) * 2015-12-18 2016-06-01 哈尔滨工业大学深圳研究生院 Remote sensing data deep learning based offshore pollutant identifying and tracking method
CN106709565A (en) * 2016-11-16 2017-05-24 广州视源电子科技股份有限公司 Optimization method and device for neural network
CN107784364A (en) * 2016-08-25 2018-03-09 微软技术许可有限责任公司 The asynchronous training of machine learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067738A1 (en) * 2012-08-28 2014-03-06 International Business Machines Corporation Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization
CN105630882A (en) * 2015-12-18 2016-06-01 哈尔滨工业大学深圳研究生院 Remote sensing data deep learning based offshore pollutant identifying and tracking method
CN107784364A (en) * 2016-08-25 2018-03-09 微软技术许可有限责任公司 The asynchronous training of machine learning model
CN106709565A (en) * 2016-11-16 2017-05-24 广州视源电子科技股份有限公司 Optimization method and device for neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏亚峰 等: "对数正态参数估计的损失函数和风险函数的Bayes推断", 《兰州理工大学学报》 *
肖红 等: "基于分段线性插值的过程神经网络训练", 《计算机工程》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710289A (en) * 2018-12-21 2019-05-03 南京邮电大学 The update method of distributed parameters server based on deeply learning algorithm
CN110929878A (en) * 2019-10-30 2020-03-27 同济大学 Distributed random gradient descent method
CN110929878B (en) * 2019-10-30 2023-07-04 同济大学 Distributed random gradient descent method
CN112990422A (en) * 2019-12-12 2021-06-18 中科寒武纪科技股份有限公司 Parameter server, client and weight parameter processing method and system
CN111461207A (en) * 2020-03-30 2020-07-28 北京奇艺世纪科技有限公司 Picture recognition model training system and method

Also Published As

Publication number Publication date
CN109032630B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109032630A (en) The update method of global parameter in a kind of parameter server
Nguyen et al. Federated learning with buffered asynchronous aggregation
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
CN104714852B (en) A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN110533183A (en) The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning
KR101046614B1 (en) Learner for resource limited devices
CN106156810A (en) General-purpose machinery learning algorithm model training method, system and calculating node
CN107590139B (en) Knowledge graph representation learning method based on cyclic matrix translation
CN103942197B (en) Data monitoring processing method and equipment
CN108009642A (en) Distributed machines learning method and system
CN105989374A (en) Online model training method and equipment
CN104077425B (en) A kind of text editing real-time collaborative method based on operation conversion
Zhan et al. Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking
Ribero et al. Federated learning under intermittent client availability and time-varying communication constraints
CN102289491B (en) Parallel application performance vulnerability analyzing method and system based on fuzzy rule reasoning
CN114428907B (en) Information searching method, device, electronic equipment and storage medium
CN109408669A (en) A kind of content auditing method and device for different application scene
Xia et al. PervasiveFL: Pervasive federated learning for heterogeneous IoT systems
CN106156142A (en) The processing method of a kind of text cluster, server and system
CN105786979B (en) Hidden link-based behavior analysis method and system for user to participate in hot topic
US11275756B2 (en) System for extracting, categorizing and analyzing data for training user selection of products and services, and a method thereof
CN109711555B (en) Method and system for predicting single-round iteration time of deep learning model
CN115186738B (en) Model training method, device and storage medium
CN106533756B (en) A kind of communication feature extracts, flow generation method and device
CN106294457A (en) Network information push method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant