CN109389222A

CN109389222A - A kind of quick adaptive neural network optimization method

Info

Publication number: CN109389222A
Application number: CN201811318475.XA
Authority: CN
Inventors: 王好谦; 章书豪; 张永兵; 戴琼海
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2019-02-26

Abstract

The invention discloses a kind of quick adaptive neural network optimization methods, comprising: S1, initialize neural network optimizer parameter；S2, the gradient data that the initial parameter of the optimizer and neural network propagated forward come are as the input of the optimizer；S3, the gradient data according to neural network, the current time single order moment of momentum is calculated using current time gradient, last moment gradient and the last moment single order moment of momentum, calculates the current time second order moment of momentum using current time gradient, last moment gradient and the last moment second order moment of momentum；The initial learning rate of S4, the amendment optimizer；S5, the network parameter of the neural network is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum；S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update the obtained network parameter of step S5.Optimization method performance of the invention is better than existing SGD method.

Description

A kind of quick adaptive neural network optimization method

Technical field

The present invention relates to machine learning and area of pattern recognition, and in particular to a kind of quick adaptive neural network optimization Device algorithm.

Background technique

Optimization method based on stochastic gradient descent (SGD), in many science and the practical application of engineering field Tool plays a very important role, and many problems in these areas can be converted into the optimization to certain objective functions, to it It carries out parameter maximum, minimize.If objective function can be micro- for its parameter, stochastic gradient descent is one opposite Effective optimal way calculates its single order local derviation according to the parameter of objective function and calculates mesh because saying from arithmetic complexity The complexity of scalar functions is identical.Under normal conditions, objective function is random, for example, many objective functions are by different increments Subfunction composition in this.In this case, for each subfunction, take stochastic gradient descent more efficient.

It is a kind of effective optimization method that SGD, which has been proved, thus is become in present deep learning and other study The optimization method generallyd use.However, there are some problems for existing SGD optimization algorithm:

1, SGD optimization algorithm learning rate is constant, can not accomplish that different study is arranged according to different network parameters Rate is to improving performance；

2, SGD optimization algorithm convergence rate is slower, and training process is easy to appear concussion；

3, the existing SGD optimization algorithm first step usually carries out weight decaying to parameter, however below using fixed study Rate carries out parameter and updates the effect that can offset weight decaying to a certain extent, to influence the performance of algorithm.

The disclosure of background above technology contents is only used for auxiliary and understands inventive concept and technical solution of the invention, not The prior art for necessarily belonging to present patent application shows above content in the applying date of present patent application in no tangible proof Before have disclosed in the case where, above-mentioned background technique should not be taken to evaluation the application novelty and creativeness.

Summary of the invention

It is existing to solve it is a primary object of the present invention to propose a kind of quick adaptive neural network optimization method Foregoing problems present in stochastic gradient descent method.

A kind of quick adaptive neural network optimization method, comprising the following steps:

S1, the initialization neural network optimizer parameter；

S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as described in The input of optimizer；

S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment The single order moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order The moment of momentum calculates the current time second order moment of momentum；

The initial learning rate of S4, the amendment optimizer；

S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in The network parameter of neural network；

S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step The obtained network parameter of S5.

Further, the initial parameter of optimizer described in step S2 include initial learning rate, weight attenuation coefficient and Differential coefficient.

Further, in step S3, the current time single order moment of momentumWith the current time second order moment of momentumCalculating It is as follows:

Wherein, α₁、α₂Respectively single order momentum moment coefficient and second order momentum moment coefficient and α₁、α₂Value range be 0.9~1,Respectively indicate the single order moment of momentum and the second order moment of momentum of last moment, k_dFor differential coefficient,It is described Neural network t moment gradient,For the neural network the last moment of t moment gradient.

Further, α₁=0.9, α₂=0.999.

Further, the initial learning rate is corrected by following formula in step S4:

Wherein, lr_tFor the learning rate that t moment is corrected, lr₀For initial learning rate, n is the number of iterations.

Further, step S5 is according to revised learning rate, the current time single order moment of momentum and current time second order The mode that the moment of momentum updates the network parameter of the neural network is as follows:

Wherein, f_t(θ)、f_t-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is one normal Number, and 0 < ε < 0.001.

Further, as follows to the mode of the gradient progress weight decaying of the neural network in step S6:

Wherein, w_dFor weight attenuation coefficient；sign(f_t(θ)) it is sign function, represent the symbol of gradient；Carrying out weight decaying When, if the symbol of the gradient is negative, corresponding gradient value is 0.

Above-mentioned optimization method provided by the invention seeks the single order moment of momentum and second order by the gradient data to neural network The moment of momentum carries out drift correction to gradient: for the single order moment of momentum, first differential item is added, for the second order moment of momentum, addition Square of first differential, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative；By right Initial learning rate is modified, and realizes in optimization process and different learning rates is arranged for different parameters.Therefore, the present invention is not It needs largely to adjust ginseng, and has cracking convergence rate.Finally, the phenomenon that offsetting weight decaying for tradition SGD algorithm, this Invention changes the sequence of weight decaying, using no negative coefficient damped manner, optimization process is made to be easier to obtain coefficient solution, accelerates to receive Hold back speed.

Detailed description of the invention

Fig. 1 is a kind of flow chart for quick adaptive neural network optimization method that the specific embodiment of the invention provides；

Fig. 2 is quick adaptive neural network optimization method provided by the invention compared with the performance of existing SGD method Figure.

Specific embodiment

The invention will be further described with specific embodiment with reference to the accompanying drawing.

A specific embodiment of the invention provides a kind of quick adaptive neural network optimization method, can be adapted for The neural network of arbitrary structures.With reference to Fig. 1, which includes the following steps S1~S6:

S1, the initialization neural network optimizer parameter.The initial parameter of optimizer includes initial learning rate lr₀, weight attenuation coefficient w_dWith differential coefficient k_d。

S2, the gradient data that the initial parameter of optimizer and neural network propagated forward come are as the defeated of optimizer Enter.

S3, the gradient data according to neural network use current time gradient, last moment gradient and last moment single order The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum Square calculates the current time second order moment of momentum.According to the gradient data that neural network propagated forward comes, using following formula (1) and (2) the single order moment of momentum at current time (t moment) is calculated separatelyThe second order moment of momentum

Wherein, α₁、α₂Respectively single order momentum moment coefficient and second order momentum moment coefficient and α₁、α₂Value range be 0.9 ~1, more preferably α₁=0.9, α₂=0.999；The single order moment of momentum and the second order for respectively indicating last moment are dynamic Measure square；For neural network t moment gradient；For the neural network t moment last moment Gradient.By above-mentioned two formula, for the single order moment of momentum, add first differential item, to the second order moment of momentum, it is micro- to add single order Square divided, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative.

The initial learning rate of S4, the amendment optimizer.Specifically corrected using following formula (3) Lai Jinhang learning rate:

Wherein, lr_tFor the learning rate that t moment is corrected, n is the number of iterations.By being modified to initial learning rate, realize Different learning rates is set for different parameters in optimization process, overcomes the existing SGD method to limit since learning rate is constant The problem of algorithm performance.

S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in The network parameter of neural network.Can specifically following formula (4) Lai Shixian be used in this step:

Wherein, f_t(θ)、f_t-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is One constant is 0 its role is to prevent the denominator of above formula (4)；Preferably 0 < ε < 0.001.

S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step The obtained network parameter of S5.It is that the current single order moment of momentum and second order are used based on original gradient data in previous step S5 The moment of momentum first once updates the network parameter of neural network；Then gradient weight decaying is carried out by this step S6 again, It is subsequent that the gradient after weight decaying is recycled to go to update the network parameter.Specific weight decaying uses following formula (5):

Wherein, sign (f_t(θ)) it is sign function, represent the symbol of gradient；When carrying out weight decaying, if the symbol of the gradient It number is negative, then corresponding gradient value is 0.The present invention changes the sequence of weight decaying, using no negative coefficient damped manner, so that excellent Change process is easier to obtain coefficient solution, accelerates convergence rate.

By emulating optimization method of the invention and existing SGD method, performance curve as shown in Figure 2, Fig. 2 are obtained In, curve L1 represents the simulation result of optimization method of the invention, and curve L2 represents the simulation result of traditional SGD method.At this Field is well known that the lower Loss values the better, and the higher the better for Acc values.As can be seen from Figure 2: (a) with repeatedly The progress in generation, TrainLoss of the invention are significantly lower than SGD method；(b) ValidLoss of the invention ratio SGD method is slightly lower；(c) With iterations going on, TrainAcc of the invention is apparently higher than SGD method；(d) ValidAcc of the invention ratio SGD method is high. As it can be seen that the properties of optimization method of the invention are better than existing SGD method.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those skilled in the art to which the present invention belongs, it is not taking off Under the premise of from present inventive concept, several equivalent substitute or obvious modifications can also be made, and performance or use is identical, all answered When being considered as belonging to protection scope of the present invention.

Claims

1. a kind of quick adaptive neural network optimization method, which comprises the following steps:

S1, the initialization neural network optimizer parameter；

S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as the optimization The input of device；

S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment single order The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum Square calculates the current time second order moment of momentum；

The initial learning rate of S4, the amendment optimizer；

S5, the nerve is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum The network parameter of network；

S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step S5 institute Obtained network parameter.

2. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that excellent described in step S2 The initial parameter for changing device includes initial learning rate, weight attenuation coefficient and differential coefficient.

3. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that in step S3, currently The moment single order moment of momentumWith the current time second order moment of momentumCalculating it is as follows:

4. quickly adaptive neural network optimization method as claimed in claim 3, which is characterized in that α₁=0.9, α₂= 0.999。

5. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that by such as in step S4 Lower formula corrects the initial learning rate:

6. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that step S5 is according to amendment Learning rate, the current time single order moment of momentum and the current time second order moment of momentum afterwards updates the network parameter of the neural network Mode is as follows:

7. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that described in step S6 The mode that the gradient of neural network carries out weight decaying is as follows: