CN108287763A

CN108287763A - Parameter exchange method, working node and parameter server system

Info

Publication number: CN108287763A
Application number: CN201810084671.9A
Authority: CN
Inventors: 姚波
Original assignee: Zhongxing Flying Mdt Infotech Ltd
Current assignee: Zhongxing Flying Mdt Infotech Ltd
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2018-07-17

Abstract

The present embodiments relate to field of artificial intelligence, a kind of parameter exchange method, working node and parameter server system are disclosed.In the present invention, which is applied to the working node in parameter server system comprising：New model parameter is calculated according to training data；Parameter of the model parameter for the algorithm model of undated parameter server system；Judge whether new model parameter is that the final mask parameter of algorithm model cannot be made to obtain the Invalid parameter of expected optimization, and if Invalid parameter, then the service node not into parameter server system pushes new model parameter.Embodiment of the present invention so as to lifting system efficiency and can reduce training time of deep learning by being exchanged for model customizing parameter filtering rule to reduce the Invalid parameter of parameter server.

Description

Parameter exchange method, working node and parameter server system

Technical field

The present embodiments relate to field of artificial intelligence, more particularly to a kind of parameter exchange method, working node with And parameter server system.

Background technology

Artificial intelligence (Artificial Intelligence, abbreviation AI) is a broad concept, it is therefore an objective to allow computer Think deeply as people, and machine learning (Machine Learning) is the branch of artificial intelligence, for how studying computer Simulation and realization mankind's learning behavior, and then obtain new knowledge and realize performance optimization.Deep learning (Deep Learning) is machine A kind of method of device study is using the multiple process layer (nerve nets constituted comprising labyrinth or by multiple nonlinear transformation Network) to the algorithm of data progress higher level of abstraction.

In machine learning and deep learning field, distributed optimization is at a kind of prerequisite, because single machine is Through the training data and model parameter that can't resolve rapid growth.Parameter server (Parameter Server) belongs to distribution The third generation frame of formula optimization, it has done more optimizations based on basis before, passes through asynchronous communication and loose consistency It is required that greatly reducing cost and the delay of network overhead and synchronization.

Inventor has found that at least there are the following problems in the prior art：With the accumulation of information, the complication of model, training The data volume of data reaches the TB even order of magnitude of PB, and the parameter of the model in training process can also rise to hundred several, examples Such as 109 or even thousands of, such as 1012.Since the quantity of parameters of model is generally required by all working nodes (workernodes) it continually accesses, so many problems and challenge, including bandwidth pressure are brought, synchronous and fault-tolerant ability Deng.As the parameter scale for needing to exchange by parameter server is increasing, the time of training pattern is also increasingly longer, therefore Reducing the training time becomes one of the critical issue of deep learning development.

Invention content

Embodiment of the present invention is designed to provide a kind of parameter exchange method, working node and parameter server system System, by being exchanged for model customizing parameter filtering rule to reduce the Invalid parameter of parameter server, so as to lifting system Efficiency and the training time for reducing deep learning.

In order to solve the above technical problems, embodiments of the present invention provide a kind of parameter exchange method, it is applied to parameter Working node in server system, including：New model parameter is calculated according to training data；The model parameter is used for Update the parameter of the algorithm model of the parameter server system；Judge the new model parameter whether be cannot make it is described The final mask parameter of algorithm model obtains the Invalid parameter of expected optimization, if Invalid parameter, then not to the parameter service Service node in device system pushes the new model parameter.

Embodiments of the present invention additionally provide a kind of working node, are applied to parameter server system, including：At least one A processor；And the memory being connect at least one processor communication；Wherein, be stored with can quilt for the memory The instruction that at least one processor executes, described instruction is executed by least one processor, so that described at least one A processor is able to carry out parameter exchange method as described above.

Embodiments of the present invention additionally provide a kind of parameter server system, including：Service node group and M such as power Profit requires the working node described in 1；M is the natural number more than or equal to 1；The M working node is saved with the service Point group communication connection.

In terms of existing technologies, each working node in parameter server system is calculating embodiment of the present invention To after new model parameter, it is not pushed to service node not instead of directly, by judging whether the new model parameter is invalid Parameter determines whether to push, i.e. algorithm model in judging that the new model parameter cannot make parameter server system When obtaining expected optimization, judges that the new model parameter is Invalid parameter, do not push the new model to service node at this time Parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so by filtering out nothing The exchange for imitating parameter, can not only mitigate the bandwidth pressure of parameter server system, the synchronization of lifting system and fault-tolerant ability Deng, but also can effectively shorten the training time.

In addition, the algorithm model is gradient descent algorithm model, the new model parameter is Grad；The judgement Whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain the invalid ginseng of expected optimization Number, specifically includes：The parameter mould that the new model parameter is calculated is long, judges whether the parameter mould length is less than default mould It is long, if long less than the default mould, judge that the new model parameter is Invalid parameter.For this depth of gradient descent algorithm The basic type algorithm of degree learning areas can ignore not the optimization function of total algorithm model when due to Grad very little Meter therefore can be long by calculating the mould of Grad, and goes out whether current model parameter is invalid with the long multilevel iudge of default mould Parameter, consequently facilitating realizing the filtering of Invalid parameter.

In addition, further including：If it is long that the parameter mould length is more than or equal to default mould, the new model parameter is judged It is not Invalid parameter.

In addition, the algorithm model is gradient descent algorithm model, the new model parameter is Grad；The judgement Whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain the invalid ginseng of expected optimization Number, specifically includes：Whether the new model parameter is judged close to optimal model parameters, if close to the optimal model parameters, Then judge that the new model parameter is Invalid parameter.In gradient descent algorithm model, close to the optimal models of algorithm model The model parameter of parameter is also negligible for the optimization function of algorithm model, therefore, by judging working node Calculated new model parameter whether close to algorithm model optimal model parameters, it is possibility to have effect filter out Invalid parameter Exchange.

In addition, described judge the new model parameter whether close in the final mask parameter of the algorithm model, lead to Cross model parameter new described in KKT condition judgments whether close to the algorithm model final mask parameter.

In addition, further including：If the new model parameter keeps off the final mask parameter, the new mould is judged Shape parameter is not Invalid parameter.

In addition, further including：It is described new to service node push if the new model parameter is not Invalid parameter Model parameter.

In addition, described new model parameter is calculated according to training data, specifically include：It is pulled from the service node Training data；The new model parameter is calculated according to the training data pulled.

Description of the drawings

One or more embodiments are illustrated by the picture in corresponding attached drawing, these exemplary theorys The bright restriction not constituted to embodiment, the element with same reference numbers label is expressed as similar element in attached drawing, removes Non- to have special statement, composition does not limit the figure in attached drawing.

Fig. 1 is that the structure for the parameter server system applied according to the parameter exchange method of first embodiment of the invention is shown It is intended to；

Fig. 2 is the flow chart according to the parameter exchange method of first embodiment of the invention；

Fig. 3 is the flow chart according to the parameter exchange method of second embodiment of the invention；

Fig. 4 is the structural schematic diagram according to the working node of third embodiment of the invention.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Each embodiment be explained in detail.However, it will be understood by those skilled in the art that in each embodiment party of the present invention In formula, in order to make the reader understand this application better, many technical details are proposed.But even if without these technical details And various changes and modifications based on the following respective embodiments, it can also realize the application technical solution claimed.

The first embodiment of the present invention is related to a kind of parameter exchange method, the work being applied in parameter server system Node.The parameter exchange method includes：New model parameter is calculated according to training data, which joins for updating The parameter of the algorithm model of number server system, judges whether new model parameter is the final mask that cannot make algorithm model Parameter obtains the Invalid parameter of expected optimization, if Invalid parameter, then the service node push not into parameter server system New model parameter.In terms of existing technologies, each working node in parameter server system exists embodiment of the present invention After new model parameter is calculated, it is not pushed to service node not instead of directly, by whether judging the new model parameter Determine whether to push for Invalid parameter, i.e. the calculation in judging that the new model parameter cannot make parameter server system When method model obtains expected optimization, judge that the new model parameter is Invalid parameter, does not push this newly to service node at this time Model parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so passing through The exchange of Invalid parameter is filtered, the bandwidth pressure of parameter server system, the synchronization of lifting system and appearance can be not only mitigated Wrong ability etc., but also can effectively shorten the training time.It is thin to the realization of the parameter exchange method of present embodiment below Section is specifically described, and the following contents only for convenience of the realization details provided is understood, not implements the necessary of this programme.

The structural schematic diagram of parameter server system shown in please referring to Fig.1 comprising：Service node group 1 and work Node group.Working node group includes M working node, and M is the natural number more than or equal to 1.Each work in working node group Make node 2 to communicate to connect with the middle service node of service node group.Wherein, each service node in service node group is tieed up jointly Protect the parameter of more New Algorithm Model.Present embodiment in parameter server system service node and working node number it is equal It is not specifically limited, the communication mode of service node and working node is also not specifically limited, it is any for realizing engineering The parameter server system of habit is applicable.

The flow chart of the parameter exchange method of first embodiment shown in Fig. 2 is please referred to, which includes Step 201 is to step 204.

Step 201：New model parameter is calculated according to training data.

Wherein, ginseng of the calculated new model parameter of working node for the algorithm model of undated parameter server system Number.

Specifically, each working node (also known as Worker nodes) is from server-side (the also known as ends Server, i.e. service node group) Training data is pulled, and new model parameter is calculated according to the training data pulled.Wherein, the ends Server are according to Worker The number of node carries out cutting, for example, when parameter server system has 3 Worker nodes, 1 Server node and one When a scheduling node (also known as Scheduler nodes), training data is divided into 3 parts by the ends Server, is then distributed to each Worker nodes etc. are to be calculated.

Step 202：Judge whether new model parameter is that cannot so that the final mask parameter of algorithm model is expected The Invalid parameter of optimization thens follow the steps 203 if Invalid parameter, if not Invalid parameter, thens follow the steps 204.

Since the different corresponding filtering rules of algorithm model (i.e. the judgment mode of Invalid parameter) is different, so in reality It needs to be directed to the corresponding filtering rule of algorithms of different model customizing in, and the filtering rule made is added into parameter clothes Each working node in device system of being engaged in.By taking gradient descent algorithm model (abbreviation gradient descent method) as an example, gradient descent method is one A optimization algorithm, usually also known as steepest descent method.It is most simple and most ancient that steepest descent method is to solve for unconstrained optimization problem One of old method, many efficient algorithms are all obtained from being improved and corrected based on it.Steepest descent method is to use Negative gradient direction is the direction of search, and for steepest descent method closer to desired value, step-length is smaller, is advanced slower.In other words, in ladder It spends in descent method, the parameter update when Grad (refer to working node be calculated according to training data model parameter) very little Be it is very inefficient, it is negligible to the optimization function of total algorithm model, even if that is, service node is not according to such Grad The final mask parameter of total algorithm model solution will not be influenced in the case of being updated on the parameter of algorithm model just True property, therefore can be by pre-setting threshold value, each working node is before parameter exchanges to the parameter of model parameter to be exchanged Scale is confirmed, to filter out the small Grad of parameter to reduce the exchange of Invalid parameter.

Specifically, for stochastic gradient descent (Stochastic GradientDescent, abbreviation SGD) algorithm, It is one kind in gradient descent algorithm, algorithm model uses gradient descent algorithm model, each calculated new mould of working node Shape parameter is Grad.Step 202 specifically includes sub-step 2021 and sub-step 2022.

Sub-step 2021：The parameter mould that new model parameter is calculated is long.

Sub-step 2022：Judge whether parameter mould length is long less than default mould, if long less than default mould, then follow the steps 203, It is long if more than or equal to default mould, then follow the steps 204.

Wherein, the computational methods of parameter mould length are well known to those skilled in the art, and details are not described herein again.

Step 203：The service node into parameter server system does not push new model parameter.

Such as working node can directly delete calculated Invalid parameter.

Step 204：Service node into parameter server system pushes new model parameter.

The new model parameter push (push) can be given to service node according to known parameter exchanged form.

User's portrait is carried out to use gradient descent algorithm to carry out logistic regression below, training data is user characteristics For data and label, the parameter filter method of present embodiment is described in detail：

Training data is carried out cutting by the first step, the ends Server, and is distributed to each Worker nodes.The present embodiment for The slit mode of training data is not specifically limited.

Second step, each Worker nodes pull (pull) newest parameter from the ends Server, then use training data It is trained.The present invention models specific label using logistic regression, with reference to linear regression prediction function, it is assumed that feature Vector x=(x1, x2 ...), h_θ(x)=θ^Tx；Wherein, x is variable, and θ is the parameter of multidimensional, θ^TIt is the transposition of θ.As h (x)>0 When, it is 1 to export, and is otherwise 0.Result is normalized using sigmoid functions, enables the anticipation function of Logic Regression Models be：

The interval of this function is between (0,1).When predicted value is more than 0.5,1 is taken, otherwise takes 0.Then P is obtained (y|x；θ)=(h_θ(x))^y(1-h_θ(x))^1-y, by Maximum-likelihood estimation to θ carry out stipulations, it is contemplated that data dimension it is unrelated Property, it can be deduced that：Wherein, P () represents conditional probability, Π tables Show and even multiply, i is from 1 to m, x⁽ⁱ⁾Indicate the x of i-th of sample.

Logarithm is taken to L (θ)：

After change is returned, cost function is expressed as：

Parameter update is carried out using gradient descent method, main thought is, to the minimum direction of parameter θ, to repeat following Operation：Until algorithmic statement；

According toAnd cost function, it is as follows that we obtain parameter more new formula：

In above-mentioned formula, α is step-length, indicates the amplitude declined per subgradient, the training that m expression parameters are relied on when updating The item number (i.e. total sample number) of record.The embodiment of the present invention uses stochastic gradient descent (SGD), i.e., every time according only to a sample Parameter in model is adjusted.It should be noted that the formula that the present embodiment is related to is formula well known in the art, it is public Meaning and symbol in formula are well known to the skilled person, and details are not described herein again.

Third walks, and worker nodes carry out parameter filtering, i.e., have executed stochastic gradient descent method in each Worker nodes After obtaining new model parameter to be exchanged, new model parameter to be exchanged is filtered.It is customized in present embodiment Following filtering rule is filtered according to the scale of (can also claim model parameter to be updated) of model parameter to be exchanged, The index as parameter scale is grown using the mould of newer parameter, and mould length is more than to the parameter update of threshold value (i.e. default mould is long) It pushes (push) and arrives the ends Server, the setting of default mould length can be determined according to specific data scale.

4th step arrives the parameter update push for being unsatisfactory for filtering rule (i.e. it is long to be more than or equal to default mould for mould length) The ends Server, the ends Server receive parameter and update and merge；The parameter of filtering rule (i.e. it is long to be less than default mould for mould length) will be met Update filters out, i.e., not by its end push to Server.

5th step repeats the first to the 5th step until training data all participates in training.

By the description of above-described embodiment it is found that this embodiment presents the advantage that：It is not influencing integrally to train just Under the premise of true property, reduces unnecessary parameter and exchanges, reduce parameter update times, reduce the bandwidth load of whole system, The training time is shortened, the efficiency of system is improved.

Second embodiment of the present invention is related to a kind of parameter exchange method.Second embodiment is big with first embodiment It causes identical, is in place of the main distinction：In the first embodiment, for gradient descent algorithm propose based on parameter scale into Row parameter filters.And in second embodiment of the invention, it is proposed that according to parameter update whether close to final mask parameter come Parameter filtering is carried out, embodiments of the present invention are further enriched.

In gradient descent algorithm, when parameter update close to optimal model parameters when parameter update be also it is inefficient, it is right The optimization function of block mold can be ignored, therefore the promotion that can more be newly arrived by the parameter for filtering out close to optimal model parameters System effectiveness shortens the training time.Specifically, model can cannot be made to the newer parameter of optimal solution by KKT condition handles Update filters out, and is exchanged to reduce unnecessary parameter.KKT conditions are a kind of methods for solving to use when optimization problem. Optimization problem mentioned herein typically refers to seek given a certain function its global minima in specified action scope Value.All inequality constraints, equality constraint and object function are all written as formula L (a, b, x)=f (x)+a*g (x) + b*h (x), KKT condition refer to that optimal value must satisfy the following conditions：

L (a, b, x) is zero to x derivations；

H (x)=0；

A*g (x)=0；

Parameter update close to optimal value condition can be filtered.

The flow chart of parameter exchange method shown in Fig. 3 is please referred to, the parameter exchange method of present embodiment includes step 301 to step 304.Wherein, step 301 is corresponding with step 201 identical, step 303,304 respectively with step 203 and step 204 correspondences are identical, and details are not described herein again.

Step 302：Judge that new model parameter whether close to optimal model parameters, if close to optimal model parameters, is sentenced Fixed new model parameter is Invalid parameter, and executes step 303, if keeping off optimal model parameters, judges not to be invalid ginseng Number executes step 304.

Specifically, in conjunction in first embodiment user is carried out to use gradient descent algorithm to carry out logistic regression For portrait, training data is user characteristic data and label, is described as follows to the parameter filter method of present embodiment：

Each working node brings θ j into J (θ) when carrying out parameter filtering, according to KKT conditions, acquiresWhen its value connects Representation parameter update (i.e. each calculated new model parameter of working node) is already close to optimal, parameter at this time when being bordering on 0 Update can filter, i.e., do not push the new model parameter to the ends Server, otherwise, when its value is kept off in 0 when, then need by The new model parameter push is to the ends Server.

It is noted that in one example, each working node can also simultaneously according to first embodiment and the Whether the filtering rule that two embodiments are illustrated carries out the filtering of Invalid parameter, i.e., connect according to parameter mould length and parameter simultaneously Nearly optimal model parameters carry out parameter filtering, so as to more effectively filter out Invalid parameter.

Present embodiment compared with prior art, for SGD algorithm models, by filtering out close to system algorithm model The parameter of optimal model parameters updates, and so as to efficiently reduce the newer number of parameter, reduces bandwidth load, lifting system Efficiency, and shorten the training time.

The step of various methods divide above, be intended merely to describe it is clear, when realization can be merged into a step or Certain steps are split, multiple steps are decomposed into, as long as including identical logical relation, all in the protection domain of this patent It is interior；To either adding inessential modification in algorithm in flow or introducing inessential design, but its algorithm is not changed Core design with flow is all in the protection domain of the patent.

Third embodiment of the invention is related to a kind of working node, as shown in figure 4, including：At least one processor 21 with And the memory 22 communicated to connect at least one processor 21, wherein memory 22 is stored with can be by least one processor 21 instructions executed, instruction are executed by least one processor 21, so that at least one processor 21 is able to carry out such as the One or second embodiment described in parameter exchange method.

Wherein, memory is connected with processor using bus mode, and bus may include the bus of any number of interconnection And one or more processors and the various of memory are electrically connected to together by bridge, bus.Bus can also will be such as peripheral The various other of equipment, voltage-stablizer and management circuit or the like are electrically connected to together, these are all well known in the art , therefore, it will not be further described herein.Bus interface provides interface between bus and transceiver.Transceiver Can be an element, can also be multiple element, such as multiple receivers and transmitter, provide for over a transmission medium with The unit of various other device communications.The data handled through processor are transmitted on the radio medium by antenna, further, Antenna also receives data and transfers data to processor.

Processor is responsible for bus and common processing, can also provide various functions, including periodically, peripheral interface, Voltage adjusting, power management and other control functions.And memory can be used to store processor and execute operation when institute The data used.

It is not difficult to find that present embodiment is apparatus embodiments corresponding with first or second embodiment, this implementation Mode can work in coordination implementation with first or second embodiment.First or second embodiment in the relevant technologies mentioned Details is still effective in the present embodiment, and in order to reduce repetition, which is not described herein again.Correspondingly, it is mentioned in present embodiment Relevant technical details be also applicable in first or second embodiment.

It is noted that each module involved in present embodiment is logic module, and in practical applications, one A logic unit can be a physical unit, can also be a part for a physical unit, can also be with multiple physics lists The combination of member is realized.In addition, in order to protrude the innovative part of the present invention, it will not be with solution institute of the present invention in present embodiment The technical issues of proposition, the less close unit of relationship introduced, but this does not indicate that there is no other single in present embodiment Member.

Four embodiment of the invention is related to a kind of parameter server system.Please continue to refer to Fig. 1, the ginseng of present embodiment Counting server system includes：Service node group 1 and the M working nodes as described in third embodiment, M are to be more than or wait In 1 natural number, M working node is communicated to connect with service node group, and wherein each service node in service node group is total With the parameter for safeguarding more New Algorithm Model, each working node in parameter server system be used for according to from service node group (i.e. Server-side) training data that pulls calculates new model parameter, and is filtered rear push to taking to calculated model parameter Business end.Present embodiment is not particularly limited the value of M.

That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are making It obtains an equipment (can be microcontroller, chip etc.) or processor (processor) executes side described in each embodiment of the application The all or part of step of method.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

It will be understood by those skilled in the art that the respective embodiments described above are to realize specific embodiments of the present invention, And in practical applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.

Claims

1. a kind of parameter exchange method, the working node being applied in parameter server system, which is characterized in that including：

New model parameter is calculated according to training data；The model parameter is for updating the parameter server system The parameter of algorithm model；

Judge whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain expected optimization Invalid parameter, if Invalid parameter, then the service node not into the parameter server system pushes the new model Parameter.

2. parameter exchange method according to claim 1, which is characterized in that the algorithm model is gradient descent algorithm mould Type, the new model parameter are Grad；

It is described to judge whether the new model parameter is that so that the final mask parameter of the algorithm model is expected The Invalid parameter of optimization, specifically includes：

The parameter mould that the new model parameter is calculated is long,

Judge whether the parameter mould length is long less than default mould, if long less than the default mould, judges the new model ginseng Number is Invalid parameter.

3. parameter exchange method according to claim 2, which is characterized in that further include：If parameter mould length be more than or Person is long equal to default mould, then it is Invalid parameter to judge the new model parameter not.

4. parameter exchange method according to claim 1, which is characterized in that the algorithm model is gradient descent algorithm mould Type, the new model parameter are Grad；

Judge that the new model parameter whether close to optimal model parameters, if close to the optimal model parameters, judges institute It is Invalid parameter to state new model parameter.

5. parameter exchange method according to claim 4, which is characterized in that described whether to judge the new model parameter Close in the final mask parameter of the algorithm model, by model parameter new described in KKT condition judgments whether described in The final mask parameter of algorithm model.

6. parameter exchange method according to claim 4, which is characterized in that further include：If the new model parameter is not Close to the final mask parameter, then it is Invalid parameter to judge the new model parameter not.

7. parameter exchange method according to claim 1, which is characterized in that further include：If the new model parameter is not For Invalid parameter, then the new model parameter is pushed to the service node.

8. parameter exchange method according to claim 1, which is characterized in that it is described be calculated according to training data it is new Model parameter specifically includes：

Training data is pulled from the service node；

The new model parameter is calculated according to the training data pulled.

9. a kind of working node is applied to parameter server system, which is characterized in that including：

At least one processor；And

The memory being connect at least one processor communication；Wherein,

The memory is stored with the instruction that can be executed by least one processor, and described instruction is by least one place It manages device to execute, so that at least one processor is able to carry out the parameter exchange side as described in any in claim 1 to 8 Method.

10. a kind of parameter server system, which is characterized in that including：Service node group and M are a as claimed in claim 9 Working node；M is the natural number more than or equal to 1；

The M working node is communicated to connect with the service node group.