CN109978177A

CN109978177A - Model training method, method for processing business, device and relevant device

Info

Publication number: CN109978177A
Application number: CN201910211389.7A
Authority: CN
Inventors: 孙浩博; 张红林
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2019-07-05
Anticipated expiration: 2039-03-19
Also published as: CN109978177B

Abstract

The embodiment of the invention discloses a kind of model training method, method for processing business, device and relevant devices, wherein the model training method for being applied to first node equipment includes: the weight parameter for obtaining the target signature in object module；The current gradient cumulant of the target signature is determined according to the weight parameter of the target signature；If the current gradient cumulant meets preset condition, the approximate momentum of the target signature is obtained；The approximate momentum that the target signature is sent to second node equipment makes the second node equipment update the object module using the approximate momentum of the target signature；The embodiment of the present invention uses the model training method that can be alleviated communication pressure with lift scheme training effectiveness, be avoided additional memory consumption.

Description

Model training method, method for processing business, device and relevant device

Technical field

The present invention relates to Internet technical fields, and in particular to distributed machines learning art field, more particularly to it is a kind of Model training method, a kind of method for processing business, a kind of model training apparatus, a kind of business processing device, a kind of node device And a kind of terminal.

Background technique

Machine learning (Machine Learning, ML) is a multi-field cross discipline, be can be applicable in model training； Specific: machine learning can be learnt based on sample data, be updated according to learning outcome to model complete to obtain performance Kind model.With the development of science and technology, distributed machines study is come into being；So-called distributed machines study refers to: by machine Device learning tasks are assigned to the machine learning mode that parallel processing is carried out on multiple node devices.The mainstream of distributed machines study Frame structure (such as parameter server) is by two kinds of role constructions: a kind of role is worker (working node equipment), main negative Blame reading and the data calculating etc. of sample data；Another role is server (service node device), is mainly responsible for weight ginseng Several storage and update etc..

In the model training learnt based on distributed machines, a key factor for influencing model training efficiency is Call duration time between worker and server.And with sample data increase and the expansion of model, need more Worker and server participates in model training, so that communication interaction is more frequent, call duration time is increasingly becoming the bottle of model training Neck.GD (Gradient Dropping, gradient decline) scheme or DGC (Deep Gradient are generallyd use at present Compression, concentration gradient compression) scheme carry out gradient compression (Gradient Compressor), with reduce worker with Communication interaction number between server, to reduce call duration time.But inventor is in practice, it has been found that current gradient is compressed There are various problems for scheme: GD scheme model convergence rate is slower, and model training efficiency is lower；DGC scheme needs to consume additional Memory store momentum, increase memory overhead.

Summary of the invention

The embodiment of the invention provides model training method, method for processing business, device and relevant devices, can be with Lifting Modules Type training effectiveness alleviates communication pressure, avoids additional memory consumption.

On the one hand, the embodiment of the invention provides a kind of model training methods, are applied to first node equipment, the model Training method includes:

Obtain the weight parameter of the target signature in object module；

The current gradient cumulant of the target signature is determined according to the weight parameter of the target signature；

If the current gradient cumulant meets preset condition, the approximate momentum of the target signature is obtained；It is described close Apparent movement amount is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；

The approximate momentum that the target signature is sent to second node equipment makes the second node equipment using the mesh The approximate momentum for marking feature updates the object module.

On the other hand, the embodiment of the invention provides another model training methods, are applied to second node equipment, described Model training method includes:

Receive first node equipment send object module in target signature approximate momentum, the target signature it is close Apparent movement amount is to be obtained simultaneously when the current gradient cumulant of the target signature meets preset condition by the first node equipment It sends, the current gradient cumulant is determined according to the weight parameter of the target signature；The approximation momentum is according to institute The gradient cumulant for stating target signature carries out the momentum that momentum approximate calculation obtains；

The object module is updated using the approximate momentum of the target signature.

In another aspect, the embodiment of the invention provides a kind of method for processing business, which includes:

Obtain service request；

In response to the service request, invocation target model handles to obtain processing result to requested business execution；Its In, the object module is trained to obtain using above-mentioned model training method；

Export the processing result.

In another aspect, running on first node equipment, the mould the embodiment of the invention provides a kind of model training apparatus Type training device includes:

Acquiring unit, for obtaining the weight parameter of the target signature in object module；

Determination unit, for determining that the current gradient of the target signature is accumulated according to the weight parameter of the target signature Amount；

The acquiring unit obtains the target signature if meeting preset condition for the current gradient cumulant Approximate momentum；The approximation momentum is moved according to what the progress momentum approximate calculation of the gradient cumulant of the target signature obtained Amount；

Transmission unit makes the second node for sending the approximate momentum of the target signature to second node equipment Equipment updates the object module using the approximate momentum of the target signature.

In another aspect, running on second node equipment, the mould the embodiment of the invention provides a kind of model training apparatus Type training device includes:

Receiving unit, the approximate momentum of the target signature in object module for receiving the transmission of first node equipment, institute The approximate momentum for stating target signature is when the current gradient cumulant of the target signature meets preset condition, by described first Node device is obtained and is sent, and the current gradient cumulant is determined according to the weight parameter of the target signature；It is described close Apparent movement amount is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；

Updating unit, for updating the object module using the approximate momentum of the target signature.

In another aspect, the embodiment of the invention provides a kind of business processing device, which includes:

Acquiring unit, for obtaining service request；

Processing unit, in response to the service request, invocation target model to handle requested business execution To processing result；Wherein, the object module is trained to obtain using above-mentioned model training method；

Output unit, for exporting the processing result.

In another aspect, the node device includes communication interface, described the embodiment of the invention provides a kind of node device Node device further include:

Processor is adapted for carrying out one or one or more instruction；And

Computer storage medium, the computer storage medium be stored with one or one or more first instruction, described one Item or one or more first instruction are suitable for being loaded by the processor and executing following steps:

Obtain the weight parameter of the target signature in object module；

Alternatively, the computer storage medium be stored with one or one or more second instruction, described one or one with Upper second instruction is suitable for being loaded by the processor and executing following steps:

In another aspect, the terminal includes input equipment and output equipment, institute the embodiment of the invention provides a kind of terminal State terminal further include:

Processor is adapted for carrying out one or one or more instruction；And

Computer storage medium, the computer storage medium is stored with one or one or more third instructs, and described one Item or the instruction of one or more third are suitable for being loaded by the processor and executing following steps:

Obtain service request；

Export the processing result.

In another aspect, the embodiment of the invention provides a kind of computer storage medium, the computer storage medium storage There are one or one or more instruction, described one or one or more instruction are suitable for being loaded by processor and being executed above-mentioned model instruction Practice method and/or method for processing business.

In the embodiment of the present invention, the current of target signature can be determined according to the weight parameter of the target signature in object module Gradient cumulant, and when the current gradient cumulant of target signature meets preset condition, first node equipment (as Worker the approximate momentum that target signature) is sent to second node equipment (as server) uses mesh by second node equipment The approximate momentum for marking feature updates object module；During this model training, first node equipment and second node equipment are simultaneously Pass on mutual gradient, but the approximate momentum of interaction, can reduce in this way logical between first node equipment and second node equipment Believe interaction times, realize higher gradient compression ratio, reduce call duration time and alleviates communication pressure；It is additional without increasing simultaneously Memory overhead；In addition, using approximate momentum more new model can acceleration model convergence rate, the training time is reduced, to be promoted Model training efficiency.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 a is a kind of architecture diagram of model training systems provided in an embodiment of the present invention；

Fig. 1 b is a kind of operation principle schematic diagram of model training systems provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram of model training scheme provided in an embodiment of the present invention；

Fig. 3 is a kind of flow diagram of model training method provided in an embodiment of the present invention；

Fig. 4 be another embodiment of the present invention provides a kind of model training method flow diagram；

Fig. 5 is a kind of flow diagram of concentration gradient compression scheme provided in an embodiment of the present invention；

Fig. 6 is a kind of effect picture of scheme test provided in an embodiment of the present invention；

Fig. 7 is the effect picture of another scheme test provided in an embodiment of the present invention；

Fig. 8 is a kind of flow diagram of method for processing business provided in an embodiment of the present invention；

Fig. 9 a is a kind of schematic diagram of user interface provided in an embodiment of the present invention；

Fig. 9 b is a kind of application scenario diagram of method for processing business provided in an embodiment of the present invention；

Figure 10 is a kind of structural schematic diagram of model training apparatus provided in an embodiment of the present invention；

Figure 11 is the structural schematic diagram of another model training apparatus provided in an embodiment of the present invention；

Figure 12 is a kind of structural schematic diagram of node device provided in an embodiment of the present invention；

Figure 13 is a kind of structural schematic diagram of business processing device provided in an embodiment of the present invention；

Figure 14 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.

In embodiments of the present invention, distributed machines study refers to: machine learning task is assigned to multiple node devices The upper machine learning mode for carrying out parallel processing, node device herein can include but is not limited to: intelligent terminal, PC (PersonalComputer, personal computer), desktop computer and server etc..Participate in multiple sections of distributed machines study Point device may be constructed the model training systems for being based on PS (Parameter Server, parameter server).Referring to Fig. 1 a It is shown, it may include at least one first node equipment 11 and at least one second node equipment 12 in model training systems；First The quantity of node device can be identical with the quantity of second node equipment, can also be different.Wherein, first node equipment 11 can be used as worker；Second node equipment 12 can be used as server.Optionally, model training systems may also include third node device 13, The third node device 13 can be used as scheduler (scheduling node equipment), be mainly responsible for scheduling server and worker and The management such as task distribution, heartbeat monitor, fault recovery is carried out to server and worker.It can be put between server and worker pair It is communicated to point, and can not be communicated between any two worker.Physics dispose model training systems when, server and Worker can be separated and be disposed, and server and worker can also be regarded to one as integral into deployment, each is saved with equilibrium The network communication of point device.

Included the following steps using the main flow that model training systems shown in Fig. 1 a carry out model training, it specifically can one And referring to shown in Fig. 1 b:

1, Read (reading): first node equipment reads the sample data for being used for training objective model, such as shown in Fig. 1 b The sample data can be read from HDFS (Hadoop Distributed File System, distributed file system)；And Caching is established in the local spatial of first node equipment.

2, Pull (pulling): first node equipment pulls effective target signature in sample data from second node equipment The first weight parameter, so-called effective target signature refers to the target signature with characteristic value；Correspondingly, second node is set It is standby to pull operation in response to first node equipment, the first weight parameter of target signature is sent to first node equipment.

3, Compute (calculating): first node equipment calculates target signature according to the first weight parameter of target signature Gradient.

4, Push (transmission): the gradient of target signature is sent to second node by first node equipment Equipment.

5, Update (update): second node equipment is according to the gradient updating object module of target signature；Specifically, second Node device is according to the first weight parameter of the gradient updating target signature of target signature.

Iteration executes above-mentioned steps 1-5, can obtain the object module for updating the perfect performance of training completion.Wherein, if mesh Marking model is model on line (i.e. model trained in real time), then when the accuracy rate of object module reaches default accuracy rate threshold value, table The training of improving eyesight mark model modification is completed；If object module is line drag (i.e. model trained in advance), object module is no longer When convergence, show that object module updates training and completes.It should be noted that when each model training, the response of second node equipment Operation is pulled in first node equipment, the first weight parameter updated in last model training is sent to first node Equipment.

Practice have shown that there are the following problems for the main flow of above-mentioned model training: 1. with sample data increase and The expansion of object module needs more first node equipment and second node equipment to participate in model training；And first node is set After the gradient of target signature is calculated every time, gradient can be sent to second node equipment, this sets first node The standby communication interaction between second node equipment is more frequent, and call duration time has been increasingly becoming the communication performance bottleneck of model training, Its communication pressure is larger；2. using gradient updating model, the convergence rate of model is slower, and model training efficiency is lower；3. mould Type training system updates object module using the mechanism of Machine Learning of parallel processing, and each first node equipment is according to respective Machine learning task carries out data calculating；Since there may be differences for the calculating progress between each first node equipment, work as calculating Outmoded parameter updated value (such as gradient of target signature) is sent to second node and set by the slower first node equipment of progress When standby, second node equipment may be made when updating object module according to outmoded parameter updated value, be directed into mistake On more new direction, to damage model training effect.

The relevant technologies of the embodiment of the present invention refer to, generally use following technical scheme at present to solve the above problems: (1) GD scheme.The program mainly accumulates the gradient of target signature in the local spatial of first node equipment, when gradient cumulant When meeting transmission condition, gradient cumulant is sent to second node equipment by first node equipment.(2) DGC scheme.The program carries out model training mainly in combination with GD scheme and Error function, mainly by Momentum Correction (momentum amendment) and Momentum Factor Masking (factor of momentum masking) two parts are constituted.Wherein, it moves Amount amendment refers to: the calculating step of momentum is transferred to first node equipment by second node equipment；Factor of momentum masking refers to: After current momentum cumulant meets transmission condition, second node is sent to using the current momentum cumulant as parameter updated value Equipment simultaneously empties corresponding momentum and momentum cumulant；Otherwise continue to cache the current momentum cumulant.

Inventor has found that above-mentioned existing scheme has the following problems in practice process: (1) GD scheme is in higher compression ratio In the case where will appear the performance loss of model, and GD scheme can not solve the above problems 2. -3..(2) DGC scheme can not solve 3., and DGC scheme needs to consume additional memory to store momentum to the above problem.In addition to this, inventor is it is also found that: in gradient Under the background of compression, the second node equipment in existing scheme is in the parameter updated value for receiving the transmission of first node equipment Afterwards, object module is directly updated according to the parameter updated value, such processing mode would generally lose the part of target signature History gradient information causes compression ratio lower.Also, since GD scheme and DGC scheme are just sent out after meeting transmission condition Parameter updated value is sent, can also cause the outmoded problem of parameter in this way；So that first node equipment pulls from second node equipment To the first weight parameter be usually lost and be delayed by the information of transmission, so as to cause what is be calculated according to the first weight parameter The gradient of target signature can aggravate inconsistent, the damage model training effect that world model updates with delay.

Based on this, the embodiment of the present invention proposes a kind of model training scheme (abbreviation SGC scheme) as shown in Figure 2, should Model training scheme can be applied in above-mentioned mentioned model training systems.The SGC scheme is a variety of suitable for algorithms library Algorithm, such as: Linear Regression (linear regression) algorithm, Logistic Regression (logistic regression) algorithm, Factorization Machine (disassembler model) algorithm, Field-aware Factorization Machine (are based on The disassembler model in domain) algorithm, DNN (deep neural network) algorithm, etc..In model training, user only needs configuring Suitable compression ratio is set in file；Model training systems can be trained update to object module according to the compression ratio, can have Effect reduces call duration time, expands parallel scale.The embodiment of the present invention can also apply the SGC scheme in difference according to business demand Application scenarios in, such as sparse scene；So-called sparse scene refers to: the feature of model reach 10,000,000,000 grades, hundred billion grades or Greater number grade, and the scene of the negligible amounts of validity feature.

T in Fig. 2 indicates the number of iterations of model training；w_tIndicate that first node equipment pulls from second node equipment The first weight parameter；g_tIndicate the goal gradient of target signature；r_tThe current gradient cumulant of expression target signature, i.e., the t times Gradient cumulant when model training；r_t-1The history gradient cumulant for indicating target signature, i.e. when the t-1 times model training Gradient cumulant；Indicate the approximate momentum that i-th of first node equipment is sent in the t times model training；Indicate i-th The parameter updated value for the target signature that a first node equipment is sent in the t times model training；S_tIt indicates to instruct in the t times model The mean value of the approximate momentum for the target signature being calculated when practicing；V_tIndicate the current global momentum of target signature, i.e. the t times mould Global momentum when type training.Process shown in Figure 2 it is found that the SGC scheme that is proposed of the embodiment of the present invention compared to existing Having scheme at least has following three improvements:

(1) momentum is approximate (Momentum Approximation): when current gradient cumulant meets preset condition, the One node device can obtain the approximate momentum of target signature, so that it is approximate dynamic for pushing to the parameter updated value of second node equipment Amount；Approximate momentum herein, which refers to, carries out the obtained momentum of momentum approximate calculation according to the gradient cumulant of target signature.The Interaction is approximate momentum between one node device and second node equipment, can reduce first node equipment and second node equipment Between communication interaction number, reduce call duration time and alleviate communication pressure；Simultaneously without increasing additional memory overhead；And And object module is updated using approximate momentum, and the convergence rate of object module can be accelerated, reduce the training time, lift scheme training Efficiency.

(2) long-term gradient compensation (Long-term Gradient Compensation): by second node equipment Target signature corresponding global momentum the history gradient information of target signature is saved, in order to subsequent in more fresh target When model, convergence rate can be accelerated by the history gradient information, compensate the history gradient letter of the target signature in existing scheme Loss problem is ceased, effectively promotion compression ratio.

(3) local update (Local Update): first node equipment is according to the pulled from second node equipment Before one weight parameter calculates the gradient of target signature, the history gradient cumulant of target signature can be used to the first weight parameter Carry out a local update；It may make the gradient being calculated according to updated first weight parameter considers to be delayed by this way The information of transmission can solve the problems, such as that parameter is outmoded, alleviate world model and update inconsistent, decrease gradient compression skill with delay Negative effect brought by art, can lift scheme training effect.

Based on the description above, the embodiment of the present invention proposes a kind of model training method, which can answer For above-mentioned mentioned model training systems, by the model training systems any first node equipment and any second section Point device executes.Fig. 3 is referred to, which may include following steps S301-S305:

S301, first node equipment obtain the weight parameter of the target signature in object module.

Object module can be the model that user specifies, and be also possible to the update to be optimized according to determined by business demand Model；It usually may include one or more features in object module, it is mould that so-called feature, which refers to the performance of the prominent property of model, Type distinguishes things/data key；Feature possessed by different models is different.For example, object module A is one for being to use The model of information is recommended at family, then the feature in object module A can include: age characteristics, sex character, user interest profile And information theme feature etc., i.e. object module A are mainly to use from dimensions such as age, gender, user interest and information themes Family determines information to be recommended.For another example, object module B is one for carrying out the model of image procossing, then object module B In feature can include: textural characteristics, color characteristic and shape feature etc., i.e. object module B mainly from texture, color and The dimensions such as shape carry out image procossing.

Target signature can be any feature referred in object module, and the weight parameter of the target signature may include the first power Weight parameter or the second weight parameter.Wherein, the first weight parameter refers to that first node equipment pulls from second node equipment The parameter arrived, the second weight parameter refer to that first node equipment joins the first weight using the history gradient cumulant of target signature Number carries out obtained parameter after local update.Correspondingly, a kind of specific embodiment of step S301 can be with are as follows: from the second section Point device pulls the first weight parameter of the target signature in object module；Correspondingly, second node equipment may be in response to first Node device pulls operation, and the first weight parameter of the target signature in object module is sent to first node equipment；This reality The weight parameter for applying the target signature under mode includes the first weight parameter.Optionally, another specific implementation of step S301 Mode can be with are as follows: the first weight parameter that the target signature in object module is pulled from second node equipment, using target spy The history gradient cumulant of sign is updated the first weight parameter to obtain the second weight parameter；Target under this embodiment is special The weight parameter of sign includes the second weight parameter.

S302, first node equipment determine the current gradient cumulant of target signature according to the weight parameter of target signature.

In embodiments of the present invention, current gradient cumulant refers to that "current" model is trained and obtains and send approximate with the last time Gradient cumulant between the model repetitive exercise of momentum.For example, setting "current" model is trained for the 10th repetitive exercise, if on The model training for once obtaining and sending approximate momentum is the 5th repetitive exercise, then current gradient cumulant is the 6th iteration It trains to the gradient cumulant between the 10th repetitive exercise；And if first node equipment before the 10th repetitive exercise, Have not been obtained and be transmitted across approximate momentum, then current gradient cumulant be the 1st repetitive exercise to the 10th repetitive exercise it Between gradient cumulant.The embodiment of the present invention stores gradient in accumulation gradient, using key-value (key-value) structure；Its In, key stores the characteristic ID of target signature, the gradient of value storage target signature.

In the specific implementation process, it is calculated when the current gradient cumulant of target signature can be according to "current" model training Goal gradient merge processing with history gradient cumulant and obtain.Specifically, first node equipment can be first according to target spy The weight parameter of sign carries out gradient calculating, obtains goal gradient；Then the history gradient cumulant and mesh of target signature are remerged Gradient is marked, the current gradient cumulant of target signature is obtained.

S303 obtains the approximate momentum of target signature if current gradient cumulant meets preset condition.

In one embodiment, preset condition may include: that the value of current gradient cumulant is greater than preset threshold.This reality It applies under mode, whether the value that first node equipment can detect current gradient cumulant is greater than preset threshold；If more than then showing to work as The step of preceding gradient cumulant meets preset condition, and the approximate momentum for obtaining target signature can be performed.

In another embodiment, first node equipment can periodically obtain the approximation of target signature with predeterminated frequency Momentum is simultaneously sent；Correspondingly, preset condition can also include: that the corresponding cumulative frequency of current gradient cumulant is greater than default time Number.Under this embodiment, first node equipment can detect whether the corresponding cumulative frequency of current gradient cumulant is greater than default time Number；If more than, then show that current gradient cumulant meets preset condition, can be performed obtain target signature approximate momentum step Suddenly.Wherein, preset times can be determined according to predeterminated frequency, such as predeterminated frequency are as follows: 5 subgradients of every accumulation then obtain and send 1 Secondary approximation momentum, then preset times are equal to 5.The corresponding cumulative frequency of current gradient cumulant and model the number of iterations can be with It is identical, it can also be different.For example, "current" model is trained for the 3rd repetitive exercise, i.e. model the number of iterations is 3；And current gradient Cumulant has accumulated the gradient in this 3 model iteration, i.e., the corresponding cumulative frequency of current gradient cumulant is also 3, this situation Under the corresponding cumulative frequency of current gradient cumulant it is identical with model the number of iterations.For another example, "current" model is trained for the 5th and changes Generation training, i.e. model the number of iterations are 5；And 3 subgradients of the every accumulation of first node equipment then obtain and send 1 approximate momentum, Then illustrate that first node equipment has sent 1 approximate momentum in the 3rd model repetitive exercise, then current gradient cumulant is only The gradient in the 4th and the 5th model repetitive exercise is had accumulated, i.e., the corresponding cumulative frequency of current gradient cumulant is 2, this feelings The corresponding cumulative frequency of current gradient cumulant and model the number of iterations under condition be not identical.

It should be noted that above-mentioned mentioned preset threshold and predeterminated frequency can be according to the practical businesses of model training Demand or empirical value setting；It will also be appreciated that the preset condition can also be adjusted according to actual business demand.First Node device meets preset condition in current gradient cumulant every time, after the approximate momentum for obtaining and sending target signature, Current gradient cumulant can be reset.

S304, first node equipment send the approximate momentum of target signature to second node equipment.

First node equipment, can be using the approximation momentum as parameter more after getting the approximate momentum of target signature New value pushes to second node equipment.Correspondingly, second node equipment can receive in the object module that first node equipment is sent Target signature approximate momentum；The approximate momentum of the target signature is preset when the current gradient cumulant of target signature meets It when condition, is obtained and is sent by first node equipment, gradient cumulant is determined according to the weight parameter of target signature before deserving.

S305, second node equipment update object module using the approximate momentum of target signature.

The approximate momentum of target signature of the second node equipment in the object module for receiving the transmission of first node equipment Later, the approximate momentum that target signature can be used updates object module.In a kind of embodiment, second node equipment can be adopted directly Object module is updated with the approximate momentum of target signature.Specifically, second node equipment can directly adopt the approximation of target signature Momentum updates the first weight parameter of the target signature in object module, to update object module.

In another embodiment, second node equipment can be based on long-term gradient compensation policy and using the close of target signature Apparent movement amount updates object module.Long-term gradient compensation policy herein, which refers to, compensates the history gradient information of target signature to mesh It marks to obtain the global momentum of target signature in the approximate momentum of feature, and updates the strategy of object module using global momentum. Second node equipment can update the mesh in object module based on long-term gradient compensation policy and using the approximate momentum of target signature The first weight parameter for marking feature, to update the object module.In the specific implementation process, second node equipment can be sought The mean value of the approximate momentum of target signature；The current global momentum of the target signature according to the mean value computation sought, and adopt The first weight parameter of the target signature is updated with the current global momentum.Based on long-term gradient compensation policy more fresh target Model, can retain the history gradient information of target signature, so that history gradient information be avoided to lose problem, can effectively promote compression Rate can also further speed up the convergence rate of model by history gradient information, effectively shorten the time-consuming of model training task.

In the embodiment of the present invention, the current of target signature can be determined according to the weight parameter of the target signature in object module Gradient cumulant, and when the current gradient cumulant of target signature meets preset condition, first node equipment (as Worker the approximate momentum that target signature) is sent to second node equipment (as server) uses mesh by second node equipment The approximate momentum for marking feature updates object module；During this model training, first node equipment and second node equipment are simultaneously The mutual gradient that passes on or gradient cumulant, but the approximate momentum of interaction, can reduce first node equipment and second node in this way Communication interaction number between equipment realizes higher gradient compression ratio, reduces call duration time and alleviates communication pressure；Simultaneously Without increasing additional memory overhead；In addition, using approximate momentum more new model can acceleration model convergence rate, reduce training Time, thus lift scheme training effectiveness.

Fig. 4 is referred to, is the flow diagram of another model training method provided in an embodiment of the present invention.Model instruction Practicing method can be applied to above-mentioned mentioned model training systems, by any first node equipment in the model training systems It is executed with any second node equipment.As shown in figure 4, the model training method may include following steps S401-S407:

S401, first node equipment pull the first weight ginseng of the target signature in object module from second node equipment Number.Correspondingly, second node equipment pulls operation in response to first node equipment, object module is sent to first node equipment In target signature the first weight parameter.

S402, first node equipment are updated the first weight parameter using the history gradient cumulant of target signature To the second weight parameter.

From the foregoing it will be appreciated that the first weight parameter that first node equipment pulls from second node equipment is usually lost It is delayed by the information of transmission, leads to the outmoded problem of parameter.Therefore, in order to consider the gradient information for being delayed by transmission, the present invention is real It applies example history gradient cumulant can be used and the first weight parameter is updated to obtain the second weight parameter, so that subsequent basis The gradient that second weight parameter is calculated considers the gradient information for being delayed by transmission, to solve the problems, such as that parameter is outmoded, delays Solve inconsistent, the lift scheme training effect that world model updates with delay.

Research has shown that each object module be be updated along the direction opposite with gradient it is trained, therefore When being updated to obtain the second weight parameter to the first weight parameter using history gradient cumulant, the accumulation of history gradient can be sought Measure r_t-1With the first weight parameter w_tBetween difference, the difference sought is determined as the second weight parameter w '_t.Specifically, can Referring to following calculation formula 1.1:

w′_t=w_t-r_t-1Formula 1.1

S403, first node equipment carry out gradient calculating according to the weight parameter of target signature, obtain goal gradient, the power Weight parameter includes the second weight parameter.

S404, first node equipment merge the history gradient cumulant and goal gradient of target signature, obtain target signature Current gradient cumulant.

In the specific implementation process of step S403-S404, calculation formula shown in formula 1.2 is can be used in first node equipment Gradient calculating is carried out according to the second weight parameter of target signature, obtains goal gradient.

First node equipment can also obtain the history gradient accumulation of target signature from the local spatial of first node equipment Amount, and step S404 is executed after obtaining goal gradient.Step S404 specific embodiment, which may is that, first obtains Gradient learning Rate；Secondly, being weighted to obtain Middle-gradient to goal gradient using Gradient learning rate, as shown in formula 1.3；Then, in merging Between gradient and target signature history gradient cumulant, the current gradient cumulant of target signature is obtained, as shown in formula 1.4.

g′_t=ε g_tFormula 1.3

r_t=sort (r_t-1+εg_t) formula 1.4

In above-mentioned formula 1.2- formula 1.4, g_tIndicate the mesh being calculated when the t times model training according to the second weight parameter Mark gradient；Indicate gradient computation rule；g′_tIndicate Middle-gradient；Sort () indicates to merge.

S405 obtains the approximate momentum of target signature if current gradient cumulant meets preset condition.

It is close to propose the momentum based on gradient compression by comparing the relationship of momentum and gradient cumulant for the embodiment of the present invention Like algorithm, which is a kind of algorithm that momentum approximate calculation can be carried out according to gradient cumulant.Based on this, step The specific embodiment of S405 may include following steps s11-s12:

S11 obtains momentum approximate algorithm if current gradient cumulant meets preset condition.

S12, according to the history gradient cumulant of target signature and current gradient cumulant, using the momentum approximate algorithm meter Calculate the approximate momentum of target signature.

Specifically, can first call gradient attenuation factor to history gradient cumulant r_t-1Attenuation processing is carried out, such as 1.5 institute of formula Show.Then by the history gradient cumulant r ' after attenuation processing_t-1With current gradient cumulant r_tIt merges processing and obtains target The approximate momentum of featureAs shown in formula 1.6.

r′_t-1=α r_t-1Formula 1.5

It should be understood that the specific embodiment of step S405 is not limited to above-mentioned one kind, in other embodiments Momentum approximate calculation only can also be carried out according to history gradient cumulant or current gradient cumulant and obtain the close of target signature Apparent movement amount.Specifically, the specific embodiment of step S405 may also is that if current gradient cumulant meets preset condition, Obtain momentum approximate algorithm；Target signature is calculated according to the history gradient cumulant of target signature using the momentum approximate algorithm Approximate momentum.Alternatively, the specific embodiment of step S405 may also is that if current gradient cumulant meets preset condition, Obtain momentum approximate algorithm；Target signature is calculated according to the current gradient cumulant of target signature using the momentum approximate algorithm Approximate momentum.

S406, first node equipment send the approximate momentum of target signature to second node equipment；Correspondingly, second node Equipment can receive the approximate momentum of the target signature in the object module that first node equipment is sent.

S407, second node equipment is based on long-term gradient compensation policy and using the approximate momentum more fresh target of target signature Model.

In the specific implementation process, step S407 may include following steps s21-s23:

S21 seeks the mean value of the approximate momentum of target signature.Specifically, second node equipment can receive and target The target that the approximate momentum for the target signature that the relevant all related first node equipment of feature are sent and then calculating receive The mean value S of each approximate momentum of feature_t, as shown in formula 1.7.

Wherein, N indicates the quantity of related first node equipment relevant to target signature；It is so-called related to target signature Related first node equipment refer to: participate in calculate target signature approximate momentum first node equipment.For example, model training It include 10 first node equipment in system, number is 1~10 respectively；And only number the first node equipment for being 2 and volume Number participate in calculating the approximate momentum of target signature for 5 first node equipment, then related first node equipment is are as follows: first segment Point device 2 and first node equipment 5.

S22, according to the current global momentum for the mean value computation target signature sought；Specifically, the history overall situation can be obtained Momentum V_t-1, using momentum attenuation factor to history overall situation momentum V_t-1Attenuation processing is carried out, it is complete according to the history after attenuation processing Office's momentum and mean value S_tDetermine current global momentum V_t, as shown in formula 1.8.

V_t=α V_t-1+S_tFormula 1.8

S23, using current global momentum V_tUpdate the first weight parameter w of target signature_t, as shown in formula 1.9.

w″_t=w_t-V_tFormula 1.9 wherein, w "_tIt indicates obtained by being updated using global momentum to the first weight parameter Weight parameter.Since the embodiment of the present invention is to receive all related first node equipment transmissions relevant to target signature Target signature approximate momentum and then calculate the target signature received each approximate momentum mean value, to obtain mesh Mark the global momentum of feature；Therefore, which remains the history gradient information of target signature；And the overall situation momentum is examined The slower certain first node equipment of calculating progress are considered, have avoided due to the calculating progress of certain first node equipment is relatively slow The caused outmoded problem of parameter, so that the inconsistence problems of world model Yu outmoded parameter are avoided, it can lift scheme training effect And model training efficiency.Moreover, second node equipment is special using the global momentum more fresh target for remaining history gradient information First weight parameter of sign, so that at no point in the update process target mould can be further speeded up by the history gradient information of target signature The convergence rate of type.

Inventor is in order to further verify the superiority of the long-term gradient compensation policy in the embodiment of the present invention, in limit feelings The momentum of second node equipment side in the scheme of DGC scheme and the embodiment of the present invention is tested respectively under condition, obtain as Lower test result:

DGC scheme: V_KT+1=g_KT+1

The scheme of the embodiment of the present invention:

Compare known to above-mentioned test result: in the limiting case, the second node equipment in DGC scheme directly uses gradient As global momentum, the not cumulative history gradient information of the overall situation momentum；And the global momentum in the embodiment of the present invention is still gone through History gradient information adds up.Even if still global momentum reservation can be used to go through it can be seen that the embodiment of the present invention is in the limiting case History gradient information, the history gradient information for compensating for the target signature in existing scheme lose problem；So that parallel algorithm and list The realization consistency of machine algorithm further effectively promotes compression ratio and model training efficiency.

In the embodiment of the present invention, the current of target signature can be determined according to the weight parameter of the target signature in object module Gradient cumulant, and when the current gradient cumulant of target signature meets preset condition, first node equipment (as Worker the approximate momentum that target signature) is sent to second node equipment (as server) uses mesh by second node equipment The approximate momentum for marking feature updates object module；During this model training, first node equipment and second node equipment are simultaneously The mutual gradient that passes on or gradient cumulant, but the approximate momentum of interaction, can reduce first node equipment and second node in this way Communication interaction number between equipment realizes higher gradient compression ratio, reduces call duration time and alleviates communication pressure；Simultaneously Without increasing additional memory overhead；In addition, using approximate momentum more new model can acceleration model convergence rate, reduce training Time, thus lift scheme training effectiveness；Also, the history gradient letter of target signature can be retained based on long-term gradient compensation policy Breath can effectively promote compression ratio, can also further add by history gradient information so that history gradient information be avoided to lose problem The convergence rate of fast model, effectively shortens the time-consuming of model training task.

In order to protrude the beneficial effect for the model training method that the embodiment of the present invention is proposed, inventor from model calculate with And two aspect such as model measurement has carried out scheme comparison, it is specific as follows:

(1) model calculates:

By taking DGC scheme as an example, the flow diagram of the DGC scheme can be found in shown in Fig. 5 existing scheme, specific model meter Calculating process can be found in following formula 2.1- formula 2.5:

v_t=β v_t-1+εg_toFormula 2.2

R_t=R_t-1+v_tFormula 2.3

In above-mentioned formula 2.1- formula 2.5, g_toIndicate the gradient being calculated according to the first weight parameter；β indicates that momentum declines Subtracting coefficient；ε indicates Gradient learning rate；v_tIndicate the current local momentum of target signature, the i.e. part in the t times model training Momentum；v_t-1Indicate the history part momentum of target signature, the i.e. local momentum in the t-1 times model training；R_tIndicate target The current momentum cumulant of feature, i.e. momentum cumulant when the t times model training；R_t-1Indicate the history momentum of target signature Momentum cumulant when cumulant, i.e. the t-1 times model training；R_t> δ indicates that current momentum cumulant meets transmission condition； Indicate the parameter updated value for the target signature that i-th of first node equipment is sent in the t times model training；w″′_tIndicate second Node device uses parameter updated valueTo the first weight parameter w_tIt is updated rear obtained weight parameter.To above-mentioned formula 1.1- formula 1.5 carries out arrangement merging, and following expression can be obtained:

And in the case where only considering the approximate situation of momentum, the model for the model training scheme that the embodiment of the present invention is proposed calculates Process can be found in following formula 2.6-2.10:

r_t=r_t-1+εg_toFormula 2.7

Symbol meaning in formula 2.6- formula 2.10 can be found in the associated description of previous embodiment, and details are not described herein.To upper It states formula 2.6- formula 2.10 and carries out arrangement merging, following expression can be obtained:

w″′_t+T=w_t-ε[g_to+T+(1+α)g_to+T-1+…+(1+α)g_to+1]

The scheme that the formula 2.1- formula 2.5 and the embodiment of the present invention for comparing above-mentioned DGC scheme propose only is considering that power is approximate When model calculation process formula 2.6- formula 2.10 known to: compared to DGC scheme, this programme uses momentum approximate calculation, without meter It calculates and consumes additional memory to store the history momentum of target signature, save memory；So that the memory of first node equipment No longer become gradient and be compressed in the bottleneck in model training, facilitates the parallel scale of lift scheme training.

(2) model measurement:

(1) test environment setting:

Object module is arranged: convex Optimized model is that l2-Regularized Logistic Regression (patrol by regularization Collect and return) model, non-convex optimization model is FNN model.Wherein, convex optimization refers to: objective function is convex function, collection belonging to variable Conjunction is the optimization of convex set.

Test data set: a public data collection (KDD2010 data set) and Liang Ge Tencent internal data collection is set (URP data set and CTR data set), as shown in table 1:

Table 1

Dataset	Size	Dimension	#non-zero-features
				KDD2010	2.5GB	20.2M	19.3M
URP	169.7GB	400M	100M
				CTR	3.1TB	100G	10G

Testing scheme: do not carry out the Error function scheme (Baseline scheme) of gradient compression, GD scheme, GD-LU scheme, DGC scheme, DGC-LU scheme, SGC-MA scheme, SGC-MA-LG scheme and SGC scheme.Wherein, GD-LU is in GD scheme The scheme of local update is added, DGC-LU is the scheme that local update is added in DGC scheme, and SGC-MA is the embodiment of the present invention In the approximate scheme of momentum is used only, SGC-MA-LG is that momentum approximation and long-term gradient are used only in the embodiment of the present invention to compensate Scheme, SGC is the scheme for having used momentum approximate, long-term gradient compensation and local update, i.e. the present invention is implemented complete Scheme.

To this index: AUC (a kind of for assessing the index of sequencing ability) and logloss (loss function).

Cluster environment: 128GB RAM, 48cores and10GB Ethernet.

(2) test result:

A, model performance and convergence rate: the momentum approximate algorithm for the SGC scheme that the embodiment of the present invention is proposed can obtain Obtain convergence rate similar with DGC scheme；The long-term gradient compensation policy of SGC scheme can be obviously improved convergence rate；The side SGC The local update strategy of case alleviates the outmoded problem of parameter, can further lift scheme effect.Specific test result data It can be found in shown in table 2:

Table 2

Wherein, respectively using above-mentioned test data set to different schemes test obtained model AUC effect picture with And model logloss effect picture can be found in shown in Fig. 6.Wherein, (a)-(c) in Fig. 6 is respectively that different data collection uses difference The model AUC effect that scheme obtains；It is best to model AUC effect corresponding to SGC scheme known to this.(d)-(f) points in Fig. 6 It Wei not the effect of model logloss that is obtained using different schemes of different data collection；Comparison is it is found that and corresponding to SGC scheme Curve it is minimum, i.e., logloss effect corresponding to SGC scheme is best.

B, memory use and call duration time: for higher-dimension sparse model, gradient compresses bring memory bottleneck very Seriously.The present invention, which implements proposed SGC scheme, reduces memory usage amount relative to DGC scheme.It is used by reducing memory, SGC scheme further reduces communication cost.

(a) in Fig. 7 is DGC scheme and SGC scheme gradient compression ratio is respectively 99.99%, when 90.00%, 50.00% Memory consumption situation；To this it is found that the memory consumption of DGC scheme is much larger than SGC scheme.(b) in Fig. 7 is the side DGC Gradient compression ratio is the memory consumption situation in the case of 99.99% in case and SGC scheme；To this it is found that in same gradient Under compression, DGC scheme needs to carry out communication interaction three times, and SGC scheme then only needs a communication interaction.It can be seen that SGC Scheme can be further reduced communication interaction, alleviate communication pressure, shorten the model training time.(c) in Fig. 7 is unpressed dynamic The elapsed time of the network communication of the first node equipment of quantity algorithm scheme, DGC scheme and SGC scheme；To this it is found that sending out Send the stage, the time-consuming of SGC scheme will be far fewer than unpressed Error function scheme, and again smaller than DGC scheme.

Description based on the above embodiment, the embodiment of the present invention also propose a kind of method for processing business, the business processing side Method can be executed by terminal, and terminal herein may include but be not limited to: intelligent terminal, personal computer, tablet computer, desktop Brain, etc..Fig. 8 is referred to, which may particularly include following steps S801-S803:

S801 obtains service request.

Service request may include but be not limited to: web page contents recommendation request, audio-video recommendation request, image processing requests, Recognition of face request etc..Terminal can obtain service request after detecting business trigger action；Business trigger action herein It may include but be not limited to: to the refresh operation of multi-medium data, the search operation of searching multimedia data, image processing operations etc. Deng；Wherein, multi-medium data can include: web content data and/or audio, video data.

Optionally, terminal can also periodically obtain service request with fixed frequency, which can be according to practical need It asks or empirical value is arranged.For example, applying by the method for processing business in real time information application (such as browser), terminal The demand for frequently pushing newest information to user is larger, a settable biggish fixed frequency in the case of this, and such as 5 beats/min. For another example, it applies by the method for processing business in music application, music is usually to update daily once, i.e., terminal is without frequent Music, a settable lesser fixed frequency in the case of this, such as 1 times/day are pushed to user.

S802, in response to service request, invocation target model handles to obtain processing result to requested business execution.

Wherein, object module can be used any model training method as shown in Figure 3 or Figure 4 and be trained to obtain.? In a kind of embodiment, which can be in advance to be trained to update using above-mentioned model training method and obtain.Again In a kind of embodiment, which is also possible to when in response to service request, is drawn a portrait number according to the history of user in real time It is trained what update obtained according to and using above-mentioned model training method；Present invention implementation is not construed as limiting this.

When service request includes web page contents recommendation request, the specific embodiment of step S802 may is that in response to The web page contents recommendation request obtains at least one candidate web pages content to be recommended, which includes with next Kind or a plurality of types of contents: text, audio-video and image；Invocation target model carries out clicking rate to each candidate web pages content Prediction, obtains the prediction clicking rate of each candidate web pages content；Target network is determined according to the prediction clicking rate of each candidate web pages content Page content, the prediction clicking rate of the targeted web content meet output condition.Wherein, the prediction clicking rate of targeted web content Meet the prediction clicking rate that output condition includes: targeted web content and is greater than preset clicking rate threshold value；Alternatively, in target webpage The prediction clicking rate of appearance is greater than the prediction clicking rate of remaining candidate web pages content, etc. in addition to targeted web content.

When service request includes recognition of face request, the specific embodiment of step S802 be may is that in response to face Identification request, obtains target facial image to be identified；Invocation target model carries out recognition of face to the target facial image, obtains To recognition result, which may include at least one of following: the expression mark of target facial image, identity etc..

S803 exports processing result.

For ease of understanding, below to apply the method for processing business for web page contents recommend scene, to the present invention Embodiment is illustrated:

During user's using terminal browsed web content, terminal can provide a business in user interface for user Interface；The business interface may include the through interface of search for recommending the recommendation interface 11 and/or searching webpage content of Feeds stream 12, as illustrated in fig. 9.Wherein, the information flow that Feeds refers to continuous updating and is presented to the user.User, which can click, recommends interface 11 Newest web page contents are obtained, can also click and search for the web page contents that through interface 12 obtains the desired search inquiry of user.

It is clicked with user and recommends interface 11, and the web page contents that the user interface of terminal is currently shown are that web page contents a is Example: after terminal detects the clicking operation of user, web page contents recommendation request can be obtained；And it is asked in response to web page contents recommendation It asks, obtains at least one candidate web pages content to be recommended, which includes: candidate web pages content A, candidate network Content B, candidate web pages content C and candidate web pages content D.Then can call object module to this 4 candidate web pages contents into The prediction of row clicking rate, the prediction clicking rate for obtaining candidate web pages content A is 80%, the prediction clicking rate of candidate web pages content B is 50%, the prediction clicking rate of candidate web pages content C is 85% and the prediction clicking rate of candidate web pages content D is 70%.Terminal It can will predict that the highest candidate web pages content C of clicking rate is determined as targeted web content, and export candidate's net in user interface Page content C, as shown in figure 9b.It is carried out by the object module for calling model training method training as shown in Figure 3 or Figure 4 to obtain Business processing improves the overall exposing efficiency of web page contents and the exposure efficiency of main Feed.

The embodiment of the present invention can be after getting service request, in response to the service request, and invocation target model is to being asked The business execution asked handles to obtain processing result；Then processing result is exported.Since object module is using shown in Fig. 3 or Fig. 4 Model training method be trained, therefore the response speed of the object module is very fast, and the object module is called to carry out Business processing, it is possible to reduce the processing time improves treatment effeciency.

Based on the description of above-mentioned model training method embodiment, the embodiment of the invention also provides a kind of model training dresses It sets, which can be operate in a computer program (including program code) in first node equipment.Institute Step S301-S304 shown in Fig. 3 or step S401-S406 shown in Fig. 4 can be executed by stating model training apparatus.It please join See that Figure 10, the model training apparatus can be run such as lower unit:

Acquiring unit 101, for obtaining the weight parameter of the target signature in object module；

Determination unit 102, for determining the current gradient of the target signature according to the weight parameter of the target signature Cumulant；

It is special to obtain the target if meeting preset condition for the current gradient cumulant for the acquiring unit 101 The approximate momentum of sign；The approximation momentum is to carry out momentum approximate calculation according to the gradient cumulant of the target signature to obtain Momentum；

Transmission unit 103 makes second section for sending the approximate momentum of the target signature to second node equipment Point device updates the object module using the approximate momentum of the target signature.

In one embodiment, if acquiring unit 101 is meeting preset condition for the current gradient cumulant, When obtaining the approximate momentum of the target signature, it can be specifically used for:

If the current gradient cumulant meets preset condition, momentum approximate algorithm is obtained；

It is approximate using the momentum according to the history gradient cumulant of the target signature and the current gradient cumulant Algorithm calculates the approximate momentum of the target signature.

In another embodiment, acquiring unit 101 for according to the history gradient cumulant of the target signature and The current gradient cumulant can be used specifically when calculating the approximate momentum of the target signature using the momentum approximate algorithm In:

Gradient decay factor is called to carry out attenuation processing to the history gradient cumulant；

By after attenuation processing history gradient cumulant and the current gradient cumulant merge processing obtain it is described The approximate momentum of target signature.

In another embodiment, the weight parameter includes the first weight parameter；Correspondingly, acquiring unit 101 with When obtaining the weight parameter of the target signature in object module, it can be specifically used for:

The first weight parameter of the target signature in the object module is pulled from second node equipment.

In another embodiment, the weight parameter includes the second weight parameter；Correspondingly, acquiring unit 101 with When obtaining the weight parameter of the target signature in object module, it can be specifically used for:

The first weight parameter of the target signature in the object module is pulled from second node equipment；

First weight parameter is updated to obtain the second power using the history gradient cumulant of the target signature Weight parameter.

In another embodiment, determination unit 102 is for according to the determination of the weight parameter of the target signature When the current gradient cumulant of target signature, it can be specifically used for:

Gradient calculating is carried out according to the weight parameter of the target signature, obtains goal gradient；

Merge the target signature history gradient cumulant and the goal gradient, obtain the current of the target signature Gradient cumulant.

In another embodiment, determination unit 102 for merging the target signature history gradient cumulant and The goal gradient can be specifically used for when obtaining the current gradient cumulant of the target signature:

Obtain Gradient learning rate；

The goal gradient is weighted to obtain Middle-gradient using the Gradient learning rate；

The history gradient cumulant for merging the Middle-gradient and the target signature, obtains the current of the target signature Gradient cumulant.

According to one embodiment of present invention, step S301-S304 shown in Fig. 3 and step S401- shown in Fig. 4 Each unit that S406 may each be in model training apparatus as shown in Figure 10 is performed.For example, step shown in Fig. 3 S301 and S303 acquiring unit 101 shown in Figure 10 executes, step S302 determination unit as shown in Figure 10 102 execute, and step S304 transmission unit 103 shown in Figure 10 executes；For another example, step S401- shown in Fig. 4 S402 acquiring unit 101 shown in Figure 10 executes, step S403-S404 determination unit as shown in Figure 10 102 execute, and step S405 and S406 shown in Figure 10 acquiring unit 101 and transmission unit 103 can be executed respectively.

Based on the description of above-mentioned model training method embodiment, another model instruction is additionally provided in another embodiment Practice device, which can be operate in a computer program (including program generation in second node equipment Code).Referring to Figure 11, the model training apparatus can be run such as lower unit:

Receiving unit 201, the approximate momentum of the target signature in object module for receiving the transmission of first node equipment, The approximate momentum of the target signature is when the current gradient cumulant of the target signature meets preset condition, by described One node device is obtained and is sent, and the current gradient cumulant is determined according to the weight parameter of the target signature；It is described Approximate momentum is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；

Updating unit 202, for updating the object module using the approximate momentum of the target signature.

In one embodiment, which may also include transmission unit 203；Described first is received described Before the approximate momentum for the target signature in object module that node device is sent, which can be used for:

In response to the operation that pulls of first node equipment, it is special that Xiang Suoshu first node equipment sends the target in object module First weight parameter of sign.

In another embodiment, updating unit 202 is for described in the approximate momentum update using the target signature When object module, it can be specifically used for:

The object module is updated based on long-term gradient compensation policy and using the approximate momentum of the target signature.

In another embodiment, updating unit 202 is for based on long-term gradient compensation policy and using the target When the approximate momentum of feature updates the object module, it can be specifically used for:

Seek the mean value of the approximate momentum of the target signature；

The current global momentum of the target signature according to the mean value computation sought, and using the current global momentum Update the first weight parameter of the target signature.

According to one embodiment of present invention, step S305 shown in Fig. 3 and step S407 shown in Fig. 4 can be by scheming Updating unit 202 in model training apparatus shown in 11 executes.

According to another embodiment of the invention, each unit in model training apparatus shown in Figure 10 or Figure 11 is equal Respectively or all one or several other units can be merged into constitute or some (a little) unit therein can be with It is split as functionally smaller multiple units again to constitute, this may be implemented similarly to operate, without influencing implementation of the invention The realization of the technical effect of example.Said units are logic-based function divisions, in practical applications, the function of a unit It can be realized by multiple units or the function of multiple units is realized by a unit.In other embodiments of the invention, It also may include other units based on model training apparatus, in practical applications, these functions can also be assisted by other units It realizes, and can be cooperated and be realized by multiple units.

It according to another embodiment of the invention, can be by including central processing unit (CPU), random access memory It is transported on the universal computing device of such as computer of the processing elements such as medium (RAM), read-only storage medium (ROM) and memory element Row is able to carry out the computer program (including program code) of each step involved in the correlation method as shown in Fig. 3 to Fig. 4, Construct the model training apparatus equipment as shown in Figure 10 or Figure 11, and come the model training side that realizes the embodiment of the present invention Method.The computer program can be recorded in such as computer readable recording medium, and pass through computer readable recording medium It is loaded into above-mentioned calculating equipment, and runs wherein.

Description based on above-mentioned model training method embodiment and model training apparatus embodiment, the embodiment of the present invention is also A kind of node device is provided.The node device includes at least processor 301, communication interface 302 and computer storage medium 303.Optionally, it may also include radio frequency receiver and RF transmitter in communication interface 302.Section provided by the embodiment of the present invention Point device can be above-mentioned mentioned first node equipment, or above-mentioned mentioned second node equipment.First segment Radio frequency receiver in point device can be used for receiving the first weight parameter of the target signature of second node equipment transmission, radio frequency hair The approximate momentum for sending device to can be used for sending target signature to second node equipment；Radio frequency receiver in second node equipment is available In the approximate momentum for receiving the target signature that first node equipment is sent, RF transmitter can be used for sending to first node equipment First weight parameter of target signature.By taking first node equipment as an example, structural schematic diagram be can be found in shown in Figure 12.

Computer storage medium 303 can store in the memory of node device, and the computer storage medium 303 is used In storage computer program, the computer program includes program instruction, and the processor 301 is deposited for executing the computer The program instruction that storage media 303 stores.Processor 301 (or CPU (Central Processing Unit, central processing Device)) be node device calculating core and control core, be adapted for carrying out one or one or more instruction, be particularly adapted to plus It carries and executes one or one or more instruction to realizing correlation method process or corresponding function.In one embodiment, work as section When point device is first node equipment, processor 301 described in the embodiment of the present invention can load and execute computer storage medium One or one or more first instruction stored in 303, carry out a series of model training processing, comprising: obtain object module In target signature weight parameter；Determine that the current gradient of the target signature is tired according to the weight parameter of the target signature Accumulated amount；If the current gradient cumulant meets preset condition, the approximate momentum of the target signature is obtained；It is described approximate dynamic Amount is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；It is sent to second node equipment The approximate momentum of the target signature makes the second node equipment update the mesh using the approximate momentum of the target signature Mark model, etc..In further embodiment, when node device is second node equipment, processing described in the embodiment of the present invention Device 301 can load and execute one stored in computer storage medium 303 or one or more second instruction, carry out a series of Model training processing, comprising: receive the approximate momentum of the target signature in the object module that first node equipment is sent, the mesh The approximate momentum for marking feature is when the current gradient cumulant of the target signature meets preset condition, by the first node Equipment is obtained and is sent, and the current gradient cumulant is determined according to the weight parameter of the target signature；It is described approximate dynamic Amount is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；Using the target signature Approximate momentum updates described object module, etc..

The embodiment of the invention also provides a kind of computer storage medium (Memory), the computer storage medium is section Memory device in point device, for storing program and data.It is understood that computer storage medium herein both can be with Including the built-in storage medium in node device, naturally it is also possible to the expansion storage medium supported including node device.It calculates Machine storage medium provides memory space, which stores the operating system of node device.Also, in the memory space Also house and be suitable for by one or more than one instructions that processor 301 loads and executes, these instructions can be one or More than one computer program (including program code).It should be noted that computer storage medium herein can be height Fast RAM memory is also possible to non-labile memory (non-volatile memory), and a for example, at least disk is deposited Reservoir；It optionally can also be that at least one is located remotely from the computer storage medium of aforementioned processor.

In one embodiment, it can be loaded by processor 301 and execute one or one stored in computer storage medium More than item the first instruction, to realize the above-mentioned corresponding steps in relation to the method in model training embodiment；In the specific implementation, calculating One in machine storage medium or one or more first instruction are loaded by processor 301 and execute following steps:

Obtain the weight parameter of the target signature in object module；

In one embodiment, if meeting preset condition in the current gradient cumulant, it is special to obtain the target When the approximate momentum of sign, described one or one or more first instruction by processor 301 load and specifically execution:

In another embodiment, accumulated according to the history gradient cumulant and the current gradient of the target signature Amount, when calculating the approximate momentum of the target signature using the momentum approximate algorithm, described one or one or more first finger It enables and is loaded by processor 301 and specifically executed:

In another embodiment, the weight parameter includes the first weight parameter；Correspondingly, in obtaining object module Target signature weight parameter when, described one or one or more first instruction by processor 301 load and specifically execution:

In another embodiment, the weight parameter includes the second weight parameter；Correspondingly, in obtaining object module Target signature weight parameter when, described one or one or more first instruction by processor 301 load and specifically execution:

In another embodiment, work as front ladder determine the target signature according to the weight parameter of the target signature Spend cumulant when, described one or one or more first instruction can by processor 301 load and specifically execution:

In another embodiment, in the history gradient cumulant for merging the target signature and the goal gradient, obtain To the target signature current gradient cumulant when, described one or one or more first instruction can be loaded by processor 301 And it specifically executes:

Obtain Gradient learning rate；

In further embodiment, it can be loaded by processor 301 and execute one or one stored in computer storage medium More than item the second instruction, to realize the above-mentioned corresponding steps in relation to the method in model training embodiment；In the specific implementation, calculating One in machine storage medium or one or more second instruction are loaded by processor 301 and execute following steps:

In one embodiment, in the target signature received in the object module that the first node equipment is sent Approximate momentum before, described one or one or more second instruction by processor 301 load and specifically execution:

It is described when updating the object module using the approximate momentum of the target signature in another embodiment One or one or more second instruction by processor 301 load and specifically execution:

It is another that the object module is updated based on long-term gradient compensation policy and using the approximate momentum of the target signature In kind embodiment, the target mould is being updated based on long-term gradient compensation policy and using the approximate momentum of the target signature When type, described one or one or more second instruction by processor 301 load and specifically execution:

Seek the mean value of the approximate momentum of the target signature；

Based on the description of above-mentioned method for processing business embodiment, the embodiment of the invention also discloses a kind of business processing dresses It sets, the business processing device can be operate in a computer program (including program code) in terminal.At the business Reason device can execute method shown in Fig. 8.Referring to Figure 13, the business processing device can be run such as lower unit:

Acquiring unit 401, for obtaining service request；

Processing unit 402, in response to the service request, invocation target model to execute processing to requested business Obtain processing result；Wherein, the object module is trained to obtain using above-mentioned model training method；

Output unit 403, for exporting the processing result.

In one embodiment, the service request includes: web page contents recommendation request；Correspondingly, processing unit 402 For can when invocation target model handles to obtain processing result to requested business execution in response to the service request It is specifically used for:

In response to the web page contents recommendation request, at least one candidate web pages content to be recommended, the candidate are obtained Web page contents include the content of one or more of type: text, audio-video and image；

Invocation target model carries out clicking rate prediction to each candidate web pages content, obtains the pre- of each candidate web pages content Survey clicking rate；

Targeted web content is determined according to the prediction clicking rate of each candidate web pages content, the targeted web content Prediction clicking rate meets output condition.

According to one embodiment of present invention, each step involved in method shown in Fig. 8 may each be by Figure 13 institute The each unit in business processing device shown is performed.For example, step S801-S803 shown in fig. 8 can respectively by Acquiring unit 401, processing unit 402 shown in Figure 13 and output unit 403 execute.Another implementation according to the present invention , each unit in business processing device shown in Figure 13 can respectively or all merge into one or several other lists Member is constituted or some (a little) unit therein can also be split as functionally smaller multiple units again and constitute, this can To realize same operation, the realization of the technical effect without influencing the embodiment of the present invention.Said units are logic-based function It can divide, in practical applications, the function of a unit can also be realized by multiple units or the function of multiple units It is realized by a unit.It in other embodiments of the invention, also may include other units based on business processing device, in reality In the application of border, these functions can also be assisted to realize by other units, and can be cooperated and be realized by multiple units.

It according to another embodiment of the invention, can be by including central processing unit (CPU), random access memory It is transported on the universal computing device of such as computer of the processing elements such as medium (RAM), read-only storage medium (ROM) and memory element Row is able to carry out the computer program (including program code) of each step involved in correlation method as shown in Figure 8, carrys out structure Make business processing device equipment as shown in Figure 13, and come the method for processing business of realizing the embodiment of the present invention.The meter Calculation machine program can be recorded in such as computer readable recording medium, and be loaded by computer readable recording medium above-mentioned It calculates in equipment, and runs wherein.

Description based on above method embodiment and Installation practice, the embodiment of the present invention also provide a kind of terminal.Please Referring to Figure 14, which includes at least processor 501, input equipment 502, output equipment 503 and computer storage medium 504.Optionally, input equipment 502 may also include keyboard, touch screen etc. for carrying out the hardware device of human-computer interaction；Terminal is also It may include battery, for providing electricity for terminal to support normal operation of terminal, etc..

Computer storage medium 504 can store in the memory of terminal, and the computer storage medium 504 is for depositing Computer program is stored up, the computer program includes program instruction, and the processor 501 is situated between for executing the computer storage The program instruction that matter 504 stores.Processor 501 (or CPU (Central Processing Unit, central processing unit)) is The calculating core and control core of terminal are adapted for carrying out one or one or more instruction, are particularly adapted to load and execute one Item or one or more instruction are to realize correlation method process or corresponding function；In one embodiment, institute of the embodiment of the present invention The processor 501 stated can be used for carrying out a series of business processing, comprising: obtain service request；It is asked in response to the business It asks, invocation target model handles to obtain processing result to requested business execution；Export described processing result, etc..

The embodiment of the invention also provides a kind of computer storage medium (Memory), the computer storage medium is eventually Memory device in end, for storing program and data.It is understood that computer storage medium herein both may include Built-in storage medium in terminal, naturally it is also possible to the expansion storage medium supported including terminal.Computer storage medium mentions For memory space, which stores the operating system of terminal.Also, it is also housed in the memory space and is suitable for being located One or more than one instructions that reason device 501 is loaded and executed, these instructions can be one or more computer Program (including program code).It should be noted that computer storage medium herein can be high speed RAM memory, it can also To be non-labile memory (non-volatile memory), for example, at least a magnetic disk storage；It optionally can be with It is the computer storage medium that at least one is located remotely from aforementioned processor.

In one embodiment, it can be loaded by processor 501 and execute one or one stored in computer storage medium Above instructions, to realize the above-mentioned corresponding steps in relation to the method in business processing embodiment；In the specific implementation, computer is deposited One in storage media or the instruction of one or more third are loaded by processor 501 and execute following steps:

Obtain service request；

In response to the service request, invocation target model handles to obtain processing result to requested business execution；Its In, the object module is trained to obtain using Fig. 3 or model training method shown in Fig. 4；

Export the processing result.

In one embodiment, the service request includes: web page contents recommendation request；Correspondingly, in response to institute State service request, when invocation target model handles to obtain processing result to requested business execution, described one or one with Upper instruction can also be loaded by processor 501 and specifically be executed:

The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of model training method is applied to first node equipment characterized by comprising

Obtain the weight parameter of the target signature in object module；

If the current gradient cumulant meets preset condition, the approximate momentum of the target signature is obtained；It is described approximate dynamic Amount is to carry out the momentum that momentum approximate calculation obtains according to the gradient cumulant of the target signature；

The approximate momentum that the target signature is sent to second node equipment keeps the second node equipment special using the target The approximate momentum of sign updates the object module.

2. the method as described in claim 1, which is characterized in that if the current gradient cumulant meets preset condition, Then obtain the approximate momentum of the target signature, comprising:

According to the history gradient cumulant of the target signature and the current gradient cumulant, using the momentum approximate algorithm Calculate the approximate momentum of the target signature.

3. method according to claim 2, which is characterized in that the history gradient cumulant according to the target signature and The current gradient cumulant, the approximate momentum of the target signature is calculated using the momentum approximate algorithm, comprising:

History gradient cumulant after attenuation processing is merged into processing with the current gradient cumulant and obtains the target The approximate momentum of feature.

4. the method according to claim 1, which is characterized in that the weight parameter includes the first weight parameter； The weight parameter of target signature in the acquisition object module, comprising:

5. the method according to claim 1, which is characterized in that the weight parameter includes the second weight parameter； The weight parameter of target signature in the acquisition object module, comprising:

First weight parameter is updated to obtain the second weight ginseng using the history gradient cumulant of the target signature Number.

6. the method according to claim 1, which is characterized in that the weight parameter according to the target signature Determine the current gradient cumulant of the target signature, comprising:

Merge the target signature history gradient cumulant and the goal gradient, obtain the current gradient of the target signature Cumulant.

7. method as claimed in claim 6, which is characterized in that the history gradient cumulant for merging the target signature and The goal gradient obtains the current gradient cumulant of the target signature, comprising:

Obtain Gradient learning rate；

The history gradient cumulant for merging the Middle-gradient and the target signature, obtains the current gradient of the target signature Cumulant.

8. a kind of model training method is applied to second node equipment characterized by comprising

The approximate momentum of the target signature in the object module that first node equipment is sent is received, the approximation of the target signature is dynamic Amount is obtained and is sent by the first node equipment when the current gradient cumulant of the target signature meets preset condition , the current gradient cumulant is determined according to the weight parameter of the target signature；The approximation momentum is according to the mesh The gradient cumulant of mark feature carries out the momentum that momentum approximate calculation obtains；

9. method according to claim 8, which is characterized in that in the target mould for receiving the first node equipment and sending Before the approximate momentum of target signature in type, further includes:

In response to the operation that pulls of first node equipment, Xiang Suoshu first node equipment sends the target signature in object module First weight parameter.

10. method as claimed in claim 9, which is characterized in that the approximate momentum using the target signature updates institute State object module, comprising:

11. method as claimed in claim 10, which is characterized in that described based on long-term gradient compensation policy and using the mesh The approximate momentum for marking feature updates the object module, comprising:

Seek the mean value of the approximate momentum of the target signature；

The current global momentum of the target signature according to the mean value computation sought, and updated using the current global momentum First weight parameter of the target signature.

12. a kind of method for processing business, which is characterized in that the described method includes:

Obtain service request；

In response to the service request, invocation target model handles to obtain processing result to requested business execution；Wherein, institute Object module is stated to use as the described in any item model training methods of claim 1-11 are trained to obtain；

Export the processing result.

13. method as claimed in claim 12, which is characterized in that the service request includes: web page contents recommendation request；Institute It states in response to the service request, invocation target model handles to obtain processing result to requested business execution, comprising:

In response to the web page contents recommendation request, at least one candidate web pages content to be recommended, the candidate web pages are obtained Content includes the content of one or more of type: text, audio-video and image；

Invocation target model carries out clicking rate prediction to each candidate web pages content, obtains the future position of each candidate web pages content Hit rate；

Targeted web content, the prediction of the targeted web content are determined according to the prediction clicking rate of each candidate web pages content Clicking rate meets output condition.

14. a kind of node device, including communication interface, which is characterized in that further include:

Processor is adapted for carrying out one or one or more instruction；And

Computer storage medium, the computer storage medium be stored with one or one or more first instruction, described one or One or more first instruction is suitable for being loaded by the processor and being executed such as the described in any item model training sides claim 1-7 Method；Alternatively, the computer storage medium be stored with one or one or more second instruction, described one or one or more second Instruction is suitable for being loaded by the processor and being executed such as the described in any item model training methods of claim 8-11.

15. a kind of terminal, including input equipment and output equipment, which is characterized in that further include:

Processor is adapted for carrying out one or one or more instruction；And

Computer storage medium, the computer storage medium be stored with one or one or more third instruction, described one or The instruction of one or more third is suitable for being loaded by the processor and executing method for processing business as described in claim 12 or 13.