CN108287763A - Parameter exchange method, working node and parameter server system - Google Patents
Parameter exchange method, working node and parameter server system Download PDFInfo
- Publication number
- CN108287763A CN108287763A CN201810084671.9A CN201810084671A CN108287763A CN 108287763 A CN108287763 A CN 108287763A CN 201810084671 A CN201810084671 A CN 201810084671A CN 108287763 A CN108287763 A CN 108287763A
- Authority
- CN
- China
- Prior art keywords
- parameter
- new model
- model
- invalid
- server system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present embodiments relate to field of artificial intelligence, a kind of parameter exchange method, working node and parameter server system are disclosed.In the present invention, which is applied to the working node in parameter server system comprising:New model parameter is calculated according to training data;Parameter of the model parameter for the algorithm model of undated parameter server system;Judge whether new model parameter is that the final mask parameter of algorithm model cannot be made to obtain the Invalid parameter of expected optimization, and if Invalid parameter, then the service node not into parameter server system pushes new model parameter.Embodiment of the present invention so as to lifting system efficiency and can reduce training time of deep learning by being exchanged for model customizing parameter filtering rule to reduce the Invalid parameter of parameter server.
Description
Technical field
The present embodiments relate to field of artificial intelligence, more particularly to a kind of parameter exchange method, working node with
And parameter server system.
Background technology
Artificial intelligence (Artificial Intelligence, abbreviation AI) is a broad concept, it is therefore an objective to allow computer
Think deeply as people, and machine learning (Machine Learning) is the branch of artificial intelligence, for how studying computer
Simulation and realization mankind's learning behavior, and then obtain new knowledge and realize performance optimization.Deep learning (Deep Learning) is machine
A kind of method of device study is using the multiple process layer (nerve nets constituted comprising labyrinth or by multiple nonlinear transformation
Network) to the algorithm of data progress higher level of abstraction.
In machine learning and deep learning field, distributed optimization is at a kind of prerequisite, because single machine is
Through the training data and model parameter that can't resolve rapid growth.Parameter server (Parameter Server) belongs to distribution
The third generation frame of formula optimization, it has done more optimizations based on basis before, passes through asynchronous communication and loose consistency
It is required that greatly reducing cost and the delay of network overhead and synchronization.
Inventor has found that at least there are the following problems in the prior art:With the accumulation of information, the complication of model, training
The data volume of data reaches the TB even order of magnitude of PB, and the parameter of the model in training process can also rise to hundred several, examples
Such as 109 or even thousands of, such as 1012.Since the quantity of parameters of model is generally required by all working nodes
(workernodes) it continually accesses, so many problems and challenge, including bandwidth pressure are brought, synchronous and fault-tolerant ability
Deng.As the parameter scale for needing to exchange by parameter server is increasing, the time of training pattern is also increasingly longer, therefore
Reducing the training time becomes one of the critical issue of deep learning development.
Invention content
Embodiment of the present invention is designed to provide a kind of parameter exchange method, working node and parameter server system
System, by being exchanged for model customizing parameter filtering rule to reduce the Invalid parameter of parameter server, so as to lifting system
Efficiency and the training time for reducing deep learning.
In order to solve the above technical problems, embodiments of the present invention provide a kind of parameter exchange method, it is applied to parameter
Working node in server system, including:New model parameter is calculated according to training data;The model parameter is used for
Update the parameter of the algorithm model of the parameter server system;Judge the new model parameter whether be cannot make it is described
The final mask parameter of algorithm model obtains the Invalid parameter of expected optimization, if Invalid parameter, then not to the parameter service
Service node in device system pushes the new model parameter.
Embodiments of the present invention additionally provide a kind of working node, are applied to parameter server system, including:At least one
A processor;And the memory being connect at least one processor communication;Wherein, be stored with can quilt for the memory
The instruction that at least one processor executes, described instruction is executed by least one processor, so that described at least one
A processor is able to carry out parameter exchange method as described above.
Embodiments of the present invention additionally provide a kind of parameter server system, including:Service node group and M such as power
Profit requires the working node described in 1;M is the natural number more than or equal to 1;The M working node is saved with the service
Point group communication connection.
In terms of existing technologies, each working node in parameter server system is calculating embodiment of the present invention
To after new model parameter, it is not pushed to service node not instead of directly, by judging whether the new model parameter is invalid
Parameter determines whether to push, i.e. algorithm model in judging that the new model parameter cannot make parameter server system
When obtaining expected optimization, judges that the new model parameter is Invalid parameter, do not push the new model to service node at this time
Parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so by filtering out nothing
The exchange for imitating parameter, can not only mitigate the bandwidth pressure of parameter server system, the synchronization of lifting system and fault-tolerant ability
Deng, but also can effectively shorten the training time.
In addition, the algorithm model is gradient descent algorithm model, the new model parameter is Grad;The judgement
Whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain the invalid ginseng of expected optimization
Number, specifically includes:The parameter mould that the new model parameter is calculated is long, judges whether the parameter mould length is less than default mould
It is long, if long less than the default mould, judge that the new model parameter is Invalid parameter.For this depth of gradient descent algorithm
The basic type algorithm of degree learning areas can ignore not the optimization function of total algorithm model when due to Grad very little
Meter therefore can be long by calculating the mould of Grad, and goes out whether current model parameter is invalid with the long multilevel iudge of default mould
Parameter, consequently facilitating realizing the filtering of Invalid parameter.
In addition, further including:If it is long that the parameter mould length is more than or equal to default mould, the new model parameter is judged
It is not Invalid parameter.
In addition, the algorithm model is gradient descent algorithm model, the new model parameter is Grad;The judgement
Whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain the invalid ginseng of expected optimization
Number, specifically includes:Whether the new model parameter is judged close to optimal model parameters, if close to the optimal model parameters,
Then judge that the new model parameter is Invalid parameter.In gradient descent algorithm model, close to the optimal models of algorithm model
The model parameter of parameter is also negligible for the optimization function of algorithm model, therefore, by judging working node
Calculated new model parameter whether close to algorithm model optimal model parameters, it is possibility to have effect filter out Invalid parameter
Exchange.
In addition, described judge the new model parameter whether close in the final mask parameter of the algorithm model, lead to
Cross model parameter new described in KKT condition judgments whether close to the algorithm model final mask parameter.
In addition, further including:If the new model parameter keeps off the final mask parameter, the new mould is judged
Shape parameter is not Invalid parameter.
In addition, further including:It is described new to service node push if the new model parameter is not Invalid parameter
Model parameter.
In addition, described new model parameter is calculated according to training data, specifically include:It is pulled from the service node
Training data;The new model parameter is calculated according to the training data pulled.
Description of the drawings
One or more embodiments are illustrated by the picture in corresponding attached drawing, these exemplary theorys
The bright restriction not constituted to embodiment, the element with same reference numbers label is expressed as similar element in attached drawing, removes
Non- to have special statement, composition does not limit the figure in attached drawing.
Fig. 1 is that the structure for the parameter server system applied according to the parameter exchange method of first embodiment of the invention is shown
It is intended to;
Fig. 2 is the flow chart according to the parameter exchange method of first embodiment of the invention;
Fig. 3 is the flow chart according to the parameter exchange method of second embodiment of the invention;
Fig. 4 is the structural schematic diagram according to the working node of third embodiment of the invention.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Each embodiment be explained in detail.However, it will be understood by those skilled in the art that in each embodiment party of the present invention
In formula, in order to make the reader understand this application better, many technical details are proposed.But even if without these technical details
And various changes and modifications based on the following respective embodiments, it can also realize the application technical solution claimed.
The first embodiment of the present invention is related to a kind of parameter exchange method, the work being applied in parameter server system
Node.The parameter exchange method includes:New model parameter is calculated according to training data, which joins for updating
The parameter of the algorithm model of number server system, judges whether new model parameter is the final mask that cannot make algorithm model
Parameter obtains the Invalid parameter of expected optimization, if Invalid parameter, then the service node push not into parameter server system
New model parameter.In terms of existing technologies, each working node in parameter server system exists embodiment of the present invention
After new model parameter is calculated, it is not pushed to service node not instead of directly, by whether judging the new model parameter
Determine whether to push for Invalid parameter, i.e. the calculation in judging that the new model parameter cannot make parameter server system
When method model obtains expected optimization, judge that the new model parameter is Invalid parameter, does not push this newly to service node at this time
Model parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so passing through
The exchange of Invalid parameter is filtered, the bandwidth pressure of parameter server system, the synchronization of lifting system and appearance can be not only mitigated
Wrong ability etc., but also can effectively shorten the training time.It is thin to the realization of the parameter exchange method of present embodiment below
Section is specifically described, and the following contents only for convenience of the realization details provided is understood, not implements the necessary of this programme.
The structural schematic diagram of parameter server system shown in please referring to Fig.1 comprising:Service node group 1 and work
Node group.Working node group includes M working node, and M is the natural number more than or equal to 1.Each work in working node group
Make node 2 to communicate to connect with the middle service node of service node group.Wherein, each service node in service node group is tieed up jointly
Protect the parameter of more New Algorithm Model.Present embodiment in parameter server system service node and working node number it is equal
It is not specifically limited, the communication mode of service node and working node is also not specifically limited, it is any for realizing engineering
The parameter server system of habit is applicable.
The flow chart of the parameter exchange method of first embodiment shown in Fig. 2 is please referred to, which includes
Step 201 is to step 204.
Step 201:New model parameter is calculated according to training data.
Wherein, ginseng of the calculated new model parameter of working node for the algorithm model of undated parameter server system
Number.
Specifically, each working node (also known as Worker nodes) is from server-side (the also known as ends Server, i.e. service node group)
Training data is pulled, and new model parameter is calculated according to the training data pulled.Wherein, the ends Server are according to Worker
The number of node carries out cutting, for example, when parameter server system has 3 Worker nodes, 1 Server node and one
When a scheduling node (also known as Scheduler nodes), training data is divided into 3 parts by the ends Server, is then distributed to each
Worker nodes etc. are to be calculated.
Step 202:Judge whether new model parameter is that cannot so that the final mask parameter of algorithm model is expected
The Invalid parameter of optimization thens follow the steps 203 if Invalid parameter, if not Invalid parameter, thens follow the steps 204.
Since the different corresponding filtering rules of algorithm model (i.e. the judgment mode of Invalid parameter) is different, so in reality
It needs to be directed to the corresponding filtering rule of algorithms of different model customizing in, and the filtering rule made is added into parameter clothes
Each working node in device system of being engaged in.By taking gradient descent algorithm model (abbreviation gradient descent method) as an example, gradient descent method is one
A optimization algorithm, usually also known as steepest descent method.It is most simple and most ancient that steepest descent method is to solve for unconstrained optimization problem
One of old method, many efficient algorithms are all obtained from being improved and corrected based on it.Steepest descent method is to use
Negative gradient direction is the direction of search, and for steepest descent method closer to desired value, step-length is smaller, is advanced slower.In other words, in ladder
It spends in descent method, the parameter update when Grad (refer to working node be calculated according to training data model parameter) very little
Be it is very inefficient, it is negligible to the optimization function of total algorithm model, even if that is, service node is not according to such Grad
The final mask parameter of total algorithm model solution will not be influenced in the case of being updated on the parameter of algorithm model just
True property, therefore can be by pre-setting threshold value, each working node is before parameter exchanges to the parameter of model parameter to be exchanged
Scale is confirmed, to filter out the small Grad of parameter to reduce the exchange of Invalid parameter.
Specifically, for stochastic gradient descent (Stochastic GradientDescent, abbreviation SGD) algorithm,
It is one kind in gradient descent algorithm, algorithm model uses gradient descent algorithm model, each calculated new mould of working node
Shape parameter is Grad.Step 202 specifically includes sub-step 2021 and sub-step 2022.
Sub-step 2021:The parameter mould that new model parameter is calculated is long.
Sub-step 2022:Judge whether parameter mould length is long less than default mould, if long less than default mould, then follow the steps 203,
It is long if more than or equal to default mould, then follow the steps 204.
Wherein, the computational methods of parameter mould length are well known to those skilled in the art, and details are not described herein again.
Step 203:The service node into parameter server system does not push new model parameter.
Such as working node can directly delete calculated Invalid parameter.
Step 204:Service node into parameter server system pushes new model parameter.
The new model parameter push (push) can be given to service node according to known parameter exchanged form.
User's portrait is carried out to use gradient descent algorithm to carry out logistic regression below, training data is user characteristics
For data and label, the parameter filter method of present embodiment is described in detail:
Training data is carried out cutting by the first step, the ends Server, and is distributed to each Worker nodes.The present embodiment for
The slit mode of training data is not specifically limited.
Second step, each Worker nodes pull (pull) newest parameter from the ends Server, then use training data
It is trained.The present invention models specific label using logistic regression, with reference to linear regression prediction function, it is assumed that feature
Vector x=(x1, x2 ...), hθ(x)=θTx;Wherein, x is variable, and θ is the parameter of multidimensional, θTIt is the transposition of θ.As h (x)>0
When, it is 1 to export, and is otherwise 0.Result is normalized using sigmoid functions, enables the anticipation function of Logic Regression Models be:
The interval of this function is between (0,1).When predicted value is more than 0.5,1 is taken, otherwise takes 0.Then P is obtained
(y|x;θ)=(hθ(x))y(1-hθ(x))1-y, by Maximum-likelihood estimation to θ carry out stipulations, it is contemplated that data dimension it is unrelated
Property, it can be deduced that:Wherein, P () represents conditional probability, Π tables
Show and even multiply, i is from 1 to m, x(i)Indicate the x of i-th of sample.
Logarithm is taken to L (θ):
After change is returned, cost function is expressed as:
Parameter update is carried out using gradient descent method, main thought is, to the minimum direction of parameter θ, to repeat following
Operation:Until algorithmic statement;
According toAnd cost function, it is as follows that we obtain parameter more new formula:
In above-mentioned formula, α is step-length, indicates the amplitude declined per subgradient, the training that m expression parameters are relied on when updating
The item number (i.e. total sample number) of record.The embodiment of the present invention uses stochastic gradient descent (SGD), i.e., every time according only to a sample
Parameter in model is adjusted.It should be noted that the formula that the present embodiment is related to is formula well known in the art, it is public
Meaning and symbol in formula are well known to the skilled person, and details are not described herein again.
Third walks, and worker nodes carry out parameter filtering, i.e., have executed stochastic gradient descent method in each Worker nodes
After obtaining new model parameter to be exchanged, new model parameter to be exchanged is filtered.It is customized in present embodiment
Following filtering rule is filtered according to the scale of (can also claim model parameter to be updated) of model parameter to be exchanged,
The index as parameter scale is grown using the mould of newer parameter, and mould length is more than to the parameter update of threshold value (i.e. default mould is long)
It pushes (push) and arrives the ends Server, the setting of default mould length can be determined according to specific data scale.
4th step arrives the parameter update push for being unsatisfactory for filtering rule (i.e. it is long to be more than or equal to default mould for mould length)
The ends Server, the ends Server receive parameter and update and merge;The parameter of filtering rule (i.e. it is long to be less than default mould for mould length) will be met
Update filters out, i.e., not by its end push to Server.
5th step repeats the first to the 5th step until training data all participates in training.
By the description of above-described embodiment it is found that this embodiment presents the advantage that:It is not influencing integrally to train just
Under the premise of true property, reduces unnecessary parameter and exchanges, reduce parameter update times, reduce the bandwidth load of whole system,
The training time is shortened, the efficiency of system is improved.
Second embodiment of the present invention is related to a kind of parameter exchange method.Second embodiment is big with first embodiment
It causes identical, is in place of the main distinction:In the first embodiment, for gradient descent algorithm propose based on parameter scale into
Row parameter filters.And in second embodiment of the invention, it is proposed that according to parameter update whether close to final mask parameter come
Parameter filtering is carried out, embodiments of the present invention are further enriched.
In gradient descent algorithm, when parameter update close to optimal model parameters when parameter update be also it is inefficient, it is right
The optimization function of block mold can be ignored, therefore the promotion that can more be newly arrived by the parameter for filtering out close to optimal model parameters
System effectiveness shortens the training time.Specifically, model can cannot be made to the newer parameter of optimal solution by KKT condition handles
Update filters out, and is exchanged to reduce unnecessary parameter.KKT conditions are a kind of methods for solving to use when optimization problem.
Optimization problem mentioned herein typically refers to seek given a certain function its global minima in specified action scope
Value.All inequality constraints, equality constraint and object function are all written as formula L (a, b, x)=f (x)+a*g (x)
+ b*h (x), KKT condition refer to that optimal value must satisfy the following conditions:
L (a, b, x) is zero to x derivations;
H (x)=0;
A*g (x)=0;
Parameter update close to optimal value condition can be filtered.
The flow chart of parameter exchange method shown in Fig. 3 is please referred to, the parameter exchange method of present embodiment includes step
301 to step 304.Wherein, step 301 is corresponding with step 201 identical, step 303,304 respectively with step 203 and step
204 correspondences are identical, and details are not described herein again.
Step 302:Judge that new model parameter whether close to optimal model parameters, if close to optimal model parameters, is sentenced
Fixed new model parameter is Invalid parameter, and executes step 303, if keeping off optimal model parameters, judges not to be invalid ginseng
Number executes step 304.
Specifically, in conjunction in first embodiment user is carried out to use gradient descent algorithm to carry out logistic regression
For portrait, training data is user characteristic data and label, is described as follows to the parameter filter method of present embodiment:
Each working node brings θ j into J (θ) when carrying out parameter filtering, according to KKT conditions, acquiresWhen its value connects
Representation parameter update (i.e. each calculated new model parameter of working node) is already close to optimal, parameter at this time when being bordering on 0
Update can filter, i.e., do not push the new model parameter to the ends Server, otherwise, when its value is kept off in 0 when, then need by
The new model parameter push is to the ends Server.
It is noted that in one example, each working node can also simultaneously according to first embodiment and the
Whether the filtering rule that two embodiments are illustrated carries out the filtering of Invalid parameter, i.e., connect according to parameter mould length and parameter simultaneously
Nearly optimal model parameters carry out parameter filtering, so as to more effectively filter out Invalid parameter.
Present embodiment compared with prior art, for SGD algorithm models, by filtering out close to system algorithm model
The parameter of optimal model parameters updates, and so as to efficiently reduce the newer number of parameter, reduces bandwidth load, lifting system
Efficiency, and shorten the training time.
The step of various methods divide above, be intended merely to describe it is clear, when realization can be merged into a step or
Certain steps are split, multiple steps are decomposed into, as long as including identical logical relation, all in the protection domain of this patent
It is interior;To either adding inessential modification in algorithm in flow or introducing inessential design, but its algorithm is not changed
Core design with flow is all in the protection domain of the patent.
Third embodiment of the invention is related to a kind of working node, as shown in figure 4, including:At least one processor 21 with
And the memory 22 communicated to connect at least one processor 21, wherein memory 22 is stored with can be by least one processor
21 instructions executed, instruction are executed by least one processor 21, so that at least one processor 21 is able to carry out such as the
One or second embodiment described in parameter exchange method.
In terms of existing technologies, each working node in parameter server system is calculating embodiment of the present invention
To after new model parameter, it is not pushed to service node not instead of directly, by judging whether the new model parameter is invalid
Parameter determines whether to push, i.e. algorithm model in judging that the new model parameter cannot make parameter server system
When obtaining expected optimization, judges that the new model parameter is Invalid parameter, do not push the new model to service node at this time
Parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so by filtering out nothing
The exchange for imitating parameter, can not only mitigate the bandwidth pressure of parameter server system, the synchronization of lifting system and fault-tolerant ability
Deng, but also can effectively shorten the training time.
Wherein, memory is connected with processor using bus mode, and bus may include the bus of any number of interconnection
And one or more processors and the various of memory are electrically connected to together by bridge, bus.Bus can also will be such as peripheral
The various other of equipment, voltage-stablizer and management circuit or the like are electrically connected to together, these are all well known in the art
, therefore, it will not be further described herein.Bus interface provides interface between bus and transceiver.Transceiver
Can be an element, can also be multiple element, such as multiple receivers and transmitter, provide for over a transmission medium with
The unit of various other device communications.The data handled through processor are transmitted on the radio medium by antenna, further,
Antenna also receives data and transfers data to processor.
Processor is responsible for bus and common processing, can also provide various functions, including periodically, peripheral interface,
Voltage adjusting, power management and other control functions.And memory can be used to store processor and execute operation when institute
The data used.
It is not difficult to find that present embodiment is apparatus embodiments corresponding with first or second embodiment, this implementation
Mode can work in coordination implementation with first or second embodiment.First or second embodiment in the relevant technologies mentioned
Details is still effective in the present embodiment, and in order to reduce repetition, which is not described herein again.Correspondingly, it is mentioned in present embodiment
Relevant technical details be also applicable in first or second embodiment.
It is noted that each module involved in present embodiment is logic module, and in practical applications, one
A logic unit can be a physical unit, can also be a part for a physical unit, can also be with multiple physics lists
The combination of member is realized.In addition, in order to protrude the innovative part of the present invention, it will not be with solution institute of the present invention in present embodiment
The technical issues of proposition, the less close unit of relationship introduced, but this does not indicate that there is no other single in present embodiment
Member.
Four embodiment of the invention is related to a kind of parameter server system.Please continue to refer to Fig. 1, the ginseng of present embodiment
Counting server system includes:Service node group 1 and the M working nodes as described in third embodiment, M are to be more than or wait
In 1 natural number, M working node is communicated to connect with service node group, and wherein each service node in service node group is total
With the parameter for safeguarding more New Algorithm Model, each working node in parameter server system be used for according to from service node group (i.e.
Server-side) training data that pulls calculates new model parameter, and is filtered rear push to taking to calculated model parameter
Business end.Present embodiment is not particularly limited the value of M.
In terms of existing technologies, each working node in parameter server system is calculating embodiment of the present invention
To after new model parameter, it is not pushed to service node not instead of directly, by judging whether the new model parameter is invalid
Parameter determines whether to push, i.e. algorithm model in judging that the new model parameter cannot make parameter server system
When obtaining expected optimization, judges that the new model parameter is Invalid parameter, do not push the new model to service node at this time
Parameter.Since Invalid parameter is negligible to the optimization of the final mask parameter of algorithm model, so by filtering out nothing
The exchange for imitating parameter, can not only mitigate the bandwidth pressure of parameter server system, the synchronization of lifting system and fault-tolerant ability
Deng, but also can effectively shorten the training time.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with
Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are making
It obtains an equipment (can be microcontroller, chip etc.) or processor (processor) executes side described in each embodiment of the application
The all or part of step of method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize specific embodiments of the present invention,
And in practical applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.
Claims (10)
1. a kind of parameter exchange method, the working node being applied in parameter server system, which is characterized in that including:
New model parameter is calculated according to training data;The model parameter is for updating the parameter server system
The parameter of algorithm model;
Judge whether the new model parameter is that the final mask parameter of the algorithm model cannot be made to obtain expected optimization
Invalid parameter, if Invalid parameter, then the service node not into the parameter server system pushes the new model
Parameter.
2. parameter exchange method according to claim 1, which is characterized in that the algorithm model is gradient descent algorithm mould
Type, the new model parameter are Grad;
It is described to judge whether the new model parameter is that so that the final mask parameter of the algorithm model is expected
The Invalid parameter of optimization, specifically includes:
The parameter mould that the new model parameter is calculated is long,
Judge whether the parameter mould length is long less than default mould, if long less than the default mould, judges the new model ginseng
Number is Invalid parameter.
3. parameter exchange method according to claim 2, which is characterized in that further include:If parameter mould length be more than or
Person is long equal to default mould, then it is Invalid parameter to judge the new model parameter not.
4. parameter exchange method according to claim 1, which is characterized in that the algorithm model is gradient descent algorithm mould
Type, the new model parameter are Grad;
It is described to judge whether the new model parameter is that so that the final mask parameter of the algorithm model is expected
The Invalid parameter of optimization, specifically includes:
Judge that the new model parameter whether close to optimal model parameters, if close to the optimal model parameters, judges institute
It is Invalid parameter to state new model parameter.
5. parameter exchange method according to claim 4, which is characterized in that described whether to judge the new model parameter
Close in the final mask parameter of the algorithm model, by model parameter new described in KKT condition judgments whether described in
The final mask parameter of algorithm model.
6. parameter exchange method according to claim 4, which is characterized in that further include:If the new model parameter is not
Close to the final mask parameter, then it is Invalid parameter to judge the new model parameter not.
7. parameter exchange method according to claim 1, which is characterized in that further include:If the new model parameter is not
For Invalid parameter, then the new model parameter is pushed to the service node.
8. parameter exchange method according to claim 1, which is characterized in that it is described be calculated according to training data it is new
Model parameter specifically includes:
Training data is pulled from the service node;
The new model parameter is calculated according to the training data pulled.
9. a kind of working node is applied to parameter server system, which is characterized in that including:
At least one processor;And
The memory being connect at least one processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one processor, and described instruction is by least one place
It manages device to execute, so that at least one processor is able to carry out the parameter exchange side as described in any in claim 1 to 8
Method.
10. a kind of parameter server system, which is characterized in that including:Service node group and M are a as claimed in claim 9
Working node;M is the natural number more than or equal to 1;
The M working node is communicated to connect with the service node group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810084671.9A CN108287763A (en) | 2018-01-29 | 2018-01-29 | Parameter exchange method, working node and parameter server system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810084671.9A CN108287763A (en) | 2018-01-29 | 2018-01-29 | Parameter exchange method, working node and parameter server system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108287763A true CN108287763A (en) | 2018-07-17 |
Family
ID=62835951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810084671.9A Pending CN108287763A (en) | 2018-01-29 | 2018-01-29 | Parameter exchange method, working node and parameter server system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287763A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214512A (en) * | 2018-08-01 | 2019-01-15 | 中兴飞流信息科技有限公司 | A kind of parameter exchange method, apparatus, server and the storage medium of deep learning |
CN109492753A (en) * | 2018-11-05 | 2019-03-19 | 中山大学 | A kind of method of the stochastic gradient descent of decentralization |
WO2020084618A1 (en) * | 2018-10-24 | 2020-04-30 | Technion Research & Development Foundation Limited | System and method for distributed training of a neural network |
CN111126627A (en) * | 2019-12-25 | 2020-05-08 | 四川新网银行股份有限公司 | Model training system based on separation degree index |
WO2021147620A1 (en) * | 2020-01-23 | 2021-07-29 | 华为技术有限公司 | Communication method, device, and system based on model training |
CN115906982A (en) * | 2022-11-15 | 2023-04-04 | 北京百度网讯科技有限公司 | Distributed training method, gradient communication method, device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346629A (en) * | 2014-10-24 | 2015-02-11 | 华为技术有限公司 | Model parameter training method, device and system |
CN106126578A (en) * | 2016-06-17 | 2016-11-16 | 清华大学 | A kind of web service recommendation method and device |
US20170092264A1 (en) * | 2015-09-24 | 2017-03-30 | Microsoft Technology Licensing, Llc | Detecting Actionable Items in a Conversation among Participants |
CN107256393A (en) * | 2017-06-05 | 2017-10-17 | 四川大学 | The feature extraction and state recognition of one-dimensional physiological signal based on deep learning |
CN107330516A (en) * | 2016-04-29 | 2017-11-07 | 腾讯科技(深圳)有限公司 | Model parameter training method, apparatus and system |
US20170351511A1 (en) * | 2015-12-22 | 2017-12-07 | Opera Solutions Usa, Llc | System and Method for Code and Data Versioning in Computerized Data Modeling and Analysis |
-
2018
- 2018-01-29 CN CN201810084671.9A patent/CN108287763A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346629A (en) * | 2014-10-24 | 2015-02-11 | 华为技术有限公司 | Model parameter training method, device and system |
US20170092264A1 (en) * | 2015-09-24 | 2017-03-30 | Microsoft Technology Licensing, Llc | Detecting Actionable Items in a Conversation among Participants |
US20170351511A1 (en) * | 2015-12-22 | 2017-12-07 | Opera Solutions Usa, Llc | System and Method for Code and Data Versioning in Computerized Data Modeling and Analysis |
CN107330516A (en) * | 2016-04-29 | 2017-11-07 | 腾讯科技(深圳)有限公司 | Model parameter training method, apparatus and system |
CN106126578A (en) * | 2016-06-17 | 2016-11-16 | 清华大学 | A kind of web service recommendation method and device |
CN107256393A (en) * | 2017-06-05 | 2017-10-17 | 四川大学 | The feature extraction and state recognition of one-dimensional physiological signal based on deep learning |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214512A (en) * | 2018-08-01 | 2019-01-15 | 中兴飞流信息科技有限公司 | A kind of parameter exchange method, apparatus, server and the storage medium of deep learning |
CN109214512B (en) * | 2018-08-01 | 2021-01-22 | 中兴飞流信息科技有限公司 | Deep learning parameter exchange method, device, server and storage medium |
WO2020084618A1 (en) * | 2018-10-24 | 2020-04-30 | Technion Research & Development Foundation Limited | System and method for distributed training of a neural network |
CN109492753A (en) * | 2018-11-05 | 2019-03-19 | 中山大学 | A kind of method of the stochastic gradient descent of decentralization |
CN111126627A (en) * | 2019-12-25 | 2020-05-08 | 四川新网银行股份有限公司 | Model training system based on separation degree index |
WO2021147620A1 (en) * | 2020-01-23 | 2021-07-29 | 华为技术有限公司 | Communication method, device, and system based on model training |
CN115906982A (en) * | 2022-11-15 | 2023-04-04 | 北京百度网讯科技有限公司 | Distributed training method, gradient communication method, device and electronic equipment |
CN115906982B (en) * | 2022-11-15 | 2023-10-24 | 北京百度网讯科技有限公司 | Distributed training method, gradient communication device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287763A (en) | Parameter exchange method, working node and parameter server system | |
Xie et al. | An efficient approach for reducing the conservatism of LMI-based stability conditions for continuous-time T–S fuzzy systems | |
Valdez et al. | Modular neural networks architecture optimization with a new nature inspired method using a fuzzy combination of particle swarm optimization and genetic algorithms | |
CN112598150B (en) | Method for improving fire detection effect based on federal learning in intelligent power plant | |
Rkhami et al. | On the use of graph neural networks for virtual network embedding | |
Leung et al. | Parameter control system of evolutionary algorithm that is aided by the entire search history | |
CN113312177B (en) | Wireless edge computing system and optimizing method based on federal learning | |
CN115686846B (en) | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation | |
CN112287990A (en) | Model optimization method of edge cloud collaborative support vector machine based on online learning | |
CN116644804B (en) | Distributed training system, neural network model training method, device and medium | |
CN107870810A (en) | Using method for cleaning, device, storage medium and electronic equipment | |
CN113853621A (en) | Feature engineering in neural network optimization | |
Xu et al. | Living with artificial intelligence: A paradigm shift toward future network traffic control | |
Behmandpoor et al. | Federated learning based resource allocation for wireless communication networks | |
Chang et al. | Power system network partitioning using tabu search | |
Ghesmoune et al. | Clustering over data streams based on growing neural gas | |
Williams et al. | Experimental results on learning stochastic memoryless policies for partially observable markov decision processes | |
CN116500896A (en) | Intelligent real-time scheduling model and method for intelligent network-connected automobile domain controller multi-virtual CPU tasks | |
CN115983275A (en) | Named entity identification method, system and electronic equipment | |
CN115470520A (en) | Differential privacy and denoising data protection method under vertical federal framework | |
Ho et al. | Adaptive communication for distributed deep learning on commodity GPU cluster | |
Tziouvaras et al. | Edge AI for Industry 4.0: an Internet of Things approach | |
Chandrasekharam et al. | Genetic algorithm for embedding a complete graph in a hypercube with a VLSI application | |
CN110516795A (en) | A kind of method, apparatus and electronic equipment for model variable allocation processing device | |
Su et al. | Cognitive virtual network topology reconfiguration method based on traffic prediction and link importance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180717 |
|
RJ01 | Rejection of invention patent application after publication |