CN109389222A - A kind of quick adaptive neural network optimization method - Google Patents

A kind of quick adaptive neural network optimization method Download PDF

Info

Publication number
CN109389222A
CN109389222A CN201811318475.XA CN201811318475A CN109389222A CN 109389222 A CN109389222 A CN 109389222A CN 201811318475 A CN201811318475 A CN 201811318475A CN 109389222 A CN109389222 A CN 109389222A
Authority
CN
China
Prior art keywords
moment
neural network
gradient
momentum
current time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811318475.XA
Other languages
Chinese (zh)
Inventor
王好谦
章书豪
张永兵
戴琼海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CN201811318475.XA priority Critical patent/CN109389222A/en
Publication of CN109389222A publication Critical patent/CN109389222A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of quick adaptive neural network optimization methods, comprising: S1, initialize neural network optimizer parameter;S2, the gradient data that the initial parameter of the optimizer and neural network propagated forward come are as the input of the optimizer;S3, the gradient data according to neural network, the current time single order moment of momentum is calculated using current time gradient, last moment gradient and the last moment single order moment of momentum, calculates the current time second order moment of momentum using current time gradient, last moment gradient and the last moment second order moment of momentum;The initial learning rate of S4, the amendment optimizer;S5, the network parameter of the neural network is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum;S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update the obtained network parameter of step S5.Optimization method performance of the invention is better than existing SGD method.

Description

A kind of quick adaptive neural network optimization method
Technical field
The present invention relates to machine learning and area of pattern recognition, and in particular to a kind of quick adaptive neural network optimization Device algorithm.
Background technique
Optimization method based on stochastic gradient descent (SGD), in many science and the practical application of engineering field Tool plays a very important role, and many problems in these areas can be converted into the optimization to certain objective functions, to it It carries out parameter maximum, minimize.If objective function can be micro- for its parameter, stochastic gradient descent is one opposite Effective optimal way calculates its single order local derviation according to the parameter of objective function and calculates mesh because saying from arithmetic complexity The complexity of scalar functions is identical.Under normal conditions, objective function is random, for example, many objective functions are by different increments Subfunction composition in this.In this case, for each subfunction, take stochastic gradient descent more efficient.
It is a kind of effective optimization method that SGD, which has been proved, thus is become in present deep learning and other study The optimization method generallyd use.However, there are some problems for existing SGD optimization algorithm:
1, SGD optimization algorithm learning rate is constant, can not accomplish that different study is arranged according to different network parameters Rate is to improving performance;
2, SGD optimization algorithm convergence rate is slower, and training process is easy to appear concussion;
3, the existing SGD optimization algorithm first step usually carries out weight decaying to parameter, however below using fixed study Rate carries out parameter and updates the effect that can offset weight decaying to a certain extent, to influence the performance of algorithm.
The disclosure of background above technology contents is only used for auxiliary and understands inventive concept and technical solution of the invention, not The prior art for necessarily belonging to present patent application shows above content in the applying date of present patent application in no tangible proof Before have disclosed in the case where, above-mentioned background technique should not be taken to evaluation the application novelty and creativeness.
Summary of the invention
It is existing to solve it is a primary object of the present invention to propose a kind of quick adaptive neural network optimization method Foregoing problems present in stochastic gradient descent method.
A kind of quick adaptive neural network optimization method, comprising the following steps:
S1, the initialization neural network optimizer parameter;
S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as described in The input of optimizer;
S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment The single order moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order The moment of momentum calculates the current time second order moment of momentum;
The initial learning rate of S4, the amendment optimizer;
S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in The network parameter of neural network;
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step The obtained network parameter of S5.
Further, the initial parameter of optimizer described in step S2 include initial learning rate, weight attenuation coefficient and Differential coefficient.
Further, in step S3, the current time single order moment of momentumWith the current time second order moment of momentumCalculating It is as follows:
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9~1,Respectively indicate the single order moment of momentum and the second order moment of momentum of last moment, kdFor differential coefficient,It is described Neural network t moment gradient,For the neural network the last moment of t moment gradient.
Further, α1=0.9, α2=0.999.
Further, the initial learning rate is corrected by following formula in step S4:
Wherein, lrtFor the learning rate that t moment is corrected, lr0For initial learning rate, n is the number of iterations.
Further, step S5 is according to revised learning rate, the current time single order moment of momentum and current time second order The mode that the moment of momentum updates the network parameter of the neural network is as follows:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is one normal Number, and 0 < ε < 0.001.
Further, as follows to the mode of the gradient progress weight decaying of the neural network in step S6:
Wherein, wdFor weight attenuation coefficient;sign(ft(θ)) it is sign function, represent the symbol of gradient;Carrying out weight decaying When, if the symbol of the gradient is negative, corresponding gradient value is 0.
Above-mentioned optimization method provided by the invention seeks the single order moment of momentum and second order by the gradient data to neural network The moment of momentum carries out drift correction to gradient: for the single order moment of momentum, first differential item is added, for the second order moment of momentum, addition Square of first differential, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative;By right Initial learning rate is modified, and realizes in optimization process and different learning rates is arranged for different parameters.Therefore, the present invention is not It needs largely to adjust ginseng, and has cracking convergence rate.Finally, the phenomenon that offsetting weight decaying for tradition SGD algorithm, this Invention changes the sequence of weight decaying, using no negative coefficient damped manner, optimization process is made to be easier to obtain coefficient solution, accelerates to receive Hold back speed.
Detailed description of the invention
Fig. 1 is a kind of flow chart for quick adaptive neural network optimization method that the specific embodiment of the invention provides;
Fig. 2 is quick adaptive neural network optimization method provided by the invention compared with the performance of existing SGD method Figure.
Specific embodiment
The invention will be further described with specific embodiment with reference to the accompanying drawing.
A specific embodiment of the invention provides a kind of quick adaptive neural network optimization method, can be adapted for The neural network of arbitrary structures.With reference to Fig. 1, which includes the following steps S1~S6:
S1, the initialization neural network optimizer parameter.The initial parameter of optimizer includes initial learning rate lr0, weight attenuation coefficient wdWith differential coefficient kd
S2, the gradient data that the initial parameter of optimizer and neural network propagated forward come are as the defeated of optimizer Enter.
S3, the gradient data according to neural network use current time gradient, last moment gradient and last moment single order The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum Square calculates the current time second order moment of momentum.According to the gradient data that neural network propagated forward comes, using following formula (1) and (2) the single order moment of momentum at current time (t moment) is calculated separatelyThe second order moment of momentum
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9 ~1, more preferably α1=0.9, α2=0.999;The single order moment of momentum and the second order for respectively indicating last moment are dynamic Measure square;For neural network t moment gradient;For the neural network t moment last moment Gradient.By above-mentioned two formula, for the single order moment of momentum, add first differential item, to the second order moment of momentum, it is micro- to add single order Square divided, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative.
The initial learning rate of S4, the amendment optimizer.Specifically corrected using following formula (3) Lai Jinhang learning rate:
Wherein, lrtFor the learning rate that t moment is corrected, n is the number of iterations.By being modified to initial learning rate, realize Different learning rates is set for different parameters in optimization process, overcomes the existing SGD method to limit since learning rate is constant The problem of algorithm performance.
S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in The network parameter of neural network.Can specifically following formula (4) Lai Shixian be used in this step:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is One constant is 0 its role is to prevent the denominator of above formula (4);Preferably 0 < ε < 0.001.
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step The obtained network parameter of S5.It is that the current single order moment of momentum and second order are used based on original gradient data in previous step S5 The moment of momentum first once updates the network parameter of neural network;Then gradient weight decaying is carried out by this step S6 again, It is subsequent that the gradient after weight decaying is recycled to go to update the network parameter.Specific weight decaying uses following formula (5):
Wherein, sign (ft(θ)) it is sign function, represent the symbol of gradient;When carrying out weight decaying, if the symbol of the gradient It number is negative, then corresponding gradient value is 0.The present invention changes the sequence of weight decaying, using no negative coefficient damped manner, so that excellent Change process is easier to obtain coefficient solution, accelerates convergence rate.
By emulating optimization method of the invention and existing SGD method, performance curve as shown in Figure 2, Fig. 2 are obtained In, curve L1 represents the simulation result of optimization method of the invention, and curve L2 represents the simulation result of traditional SGD method.At this Field is well known that the lower Loss values the better, and the higher the better for Acc values.As can be seen from Figure 2: (a) with repeatedly The progress in generation, TrainLoss of the invention are significantly lower than SGD method;(b) ValidLoss of the invention ratio SGD method is slightly lower;(c) With iterations going on, TrainAcc of the invention is apparently higher than SGD method;(d) ValidAcc of the invention ratio SGD method is high. As it can be seen that the properties of optimization method of the invention are better than existing SGD method.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those skilled in the art to which the present invention belongs, it is not taking off Under the premise of from present inventive concept, several equivalent substitute or obvious modifications can also be made, and performance or use is identical, all answered When being considered as belonging to protection scope of the present invention.

Claims (7)

1. a kind of quick adaptive neural network optimization method, which comprises the following steps:
S1, the initialization neural network optimizer parameter;
S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as the optimization The input of device;
S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment single order The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum Square calculates the current time second order moment of momentum;
The initial learning rate of S4, the amendment optimizer;
S5, the nerve is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum The network parameter of network;
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step S5 institute Obtained network parameter.
2. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that excellent described in step S2 The initial parameter for changing device includes initial learning rate, weight attenuation coefficient and differential coefficient.
3. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that in step S3, currently The moment single order moment of momentumWith the current time second order moment of momentumCalculating it is as follows:
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9~1,Respectively indicate the single order moment of momentum and the second order moment of momentum of last moment, kdFor differential coefficient,It is described Neural network t moment gradient,For the neural network the last moment of t moment gradient.
4. quickly adaptive neural network optimization method as claimed in claim 3, which is characterized in that α1=0.9, α2= 0.999。
5. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that by such as in step S4 Lower formula corrects the initial learning rate:
Wherein, lrtFor the learning rate that t moment is corrected, lr0For initial learning rate, n is the number of iterations.
6. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that step S5 is according to amendment Learning rate, the current time single order moment of momentum and the current time second order moment of momentum afterwards updates the network parameter of the neural network Mode is as follows:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is one normal Number, and 0 < ε < 0.001.
7. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that described in step S6 The mode that the gradient of neural network carries out weight decaying is as follows:
Wherein, wdFor weight attenuation coefficient;sign(ft(θ)) it is sign function, represent the symbol of gradient;Carrying out weight decaying When, if the symbol of the gradient is negative, corresponding gradient value is 0.
CN201811318475.XA 2018-11-07 2018-11-07 A kind of quick adaptive neural network optimization method Pending CN109389222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811318475.XA CN109389222A (en) 2018-11-07 2018-11-07 A kind of quick adaptive neural network optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811318475.XA CN109389222A (en) 2018-11-07 2018-11-07 A kind of quick adaptive neural network optimization method

Publications (1)

Publication Number Publication Date
CN109389222A true CN109389222A (en) 2019-02-26

Family

ID=65428554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811318475.XA Pending CN109389222A (en) 2018-11-07 2018-11-07 A kind of quick adaptive neural network optimization method

Country Status (1)

Country Link
CN (1) CN109389222A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110187499A (en) * 2019-05-29 2019-08-30 哈尔滨工业大学(深圳) A kind of design method of on piece integrated optical power attenuator neural network based
CN110782016A (en) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 Method and apparatus for optimizing neural network architecture search
CN110782017A (en) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 Method and device for adaptively adjusting learning rate
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium
US11631030B2 (en) 2020-02-11 2023-04-18 International Business Machines Corporation Learning with moment estimation using different time constants

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110187499A (en) * 2019-05-29 2019-08-30 哈尔滨工业大学(深圳) A kind of design method of on piece integrated optical power attenuator neural network based
CN110782016A (en) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 Method and apparatus for optimizing neural network architecture search
CN110782017A (en) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 Method and device for adaptively adjusting learning rate
CN110782017B (en) * 2019-10-25 2022-11-22 北京百度网讯科技有限公司 Method and device for adaptively adjusting learning rate
US11631030B2 (en) 2020-02-11 2023-04-18 International Business Machines Corporation Learning with moment estimation using different time constants
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109389222A (en) A kind of quick adaptive neural network optimization method
CN102998973B (en) The multi-model Adaptive Control device of a kind of nonlinear system and control method
CN103995805B (en) The word processing method of the big data of text-oriented
RU2351974C1 (en) Method and device to control flying vehicle pitch
CN105759612A (en) Differential game anti-interception maneuver penetration/accurate strike guiding method with falling angle constraint
CN107086916A (en) A kind of Synchronization of Chaotic Systems based on fractional order adaptive sliding-mode observer
CN109858612A (en) A kind of adaptive deformation cavity convolution method
CN106202666A (en) A kind of computational methods of marine shafting bearing adjustment of displacement
CN106682735A (en) BP neural network algorithm based on PID adjustment
WO2018076331A1 (en) Neural network training method and apparatus
CN115686048B (en) Dynamic triggering limited time control method for executor limited spacecraft intersection system
CN108280207A (en) A method of the perfect Hash of construction
CN103676786B (en) A kind of curve smoothing method based on acceleration principle
CN103984200A (en) Design method of auxiliary graph as well as production method and photoetching method of test map
CN107273975A (en) A kind of rarefaction back-propagating training method of neural network model
CN104717058B (en) Password traversal method and device
CN110399697A (en) Control distribution method based on the aircraft for improving genetic learning particle swarm algorithm
CN106019949A (en) Adaptive order fractional order fuzzy PI lambda controller method
CN108710944A (en) One kind can train piece-wise linear activation primitive generation method
CN104571100B (en) A kind of non-minimum phase hypersonic aircraft control method
Parent Positivity-preserving flux difference splitting schemes
CN106384002B (en) Flood forecasting real-time correction method based on back-fitting algorithm
CN110826688B (en) Training method for guaranteeing stable convergence of maximum and minimum loss functions of GAN model
CN103903003A (en) Method for using Widrow-Hoff learning algorithm
Dong et al. Application of Adam-BP neural network in leveling fitting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190226