CN109389222A - A kind of quick adaptive neural network optimization method - Google Patents
A kind of quick adaptive neural network optimization method Download PDFInfo
- Publication number
- CN109389222A CN109389222A CN201811318475.XA CN201811318475A CN109389222A CN 109389222 A CN109389222 A CN 109389222A CN 201811318475 A CN201811318475 A CN 201811318475A CN 109389222 A CN109389222 A CN 109389222A
- Authority
- CN
- China
- Prior art keywords
- moment
- neural network
- gradient
- momentum
- current time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of quick adaptive neural network optimization methods, comprising: S1, initialize neural network optimizer parameter;S2, the gradient data that the initial parameter of the optimizer and neural network propagated forward come are as the input of the optimizer;S3, the gradient data according to neural network, the current time single order moment of momentum is calculated using current time gradient, last moment gradient and the last moment single order moment of momentum, calculates the current time second order moment of momentum using current time gradient, last moment gradient and the last moment second order moment of momentum;The initial learning rate of S4, the amendment optimizer;S5, the network parameter of the neural network is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum;S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update the obtained network parameter of step S5.Optimization method performance of the invention is better than existing SGD method.
Description
Technical field
The present invention relates to machine learning and area of pattern recognition, and in particular to a kind of quick adaptive neural network optimization
Device algorithm.
Background technique
Optimization method based on stochastic gradient descent (SGD), in many science and the practical application of engineering field
Tool plays a very important role, and many problems in these areas can be converted into the optimization to certain objective functions, to it
It carries out parameter maximum, minimize.If objective function can be micro- for its parameter, stochastic gradient descent is one opposite
Effective optimal way calculates its single order local derviation according to the parameter of objective function and calculates mesh because saying from arithmetic complexity
The complexity of scalar functions is identical.Under normal conditions, objective function is random, for example, many objective functions are by different increments
Subfunction composition in this.In this case, for each subfunction, take stochastic gradient descent more efficient.
It is a kind of effective optimization method that SGD, which has been proved, thus is become in present deep learning and other study
The optimization method generallyd use.However, there are some problems for existing SGD optimization algorithm:
1, SGD optimization algorithm learning rate is constant, can not accomplish that different study is arranged according to different network parameters
Rate is to improving performance;
2, SGD optimization algorithm convergence rate is slower, and training process is easy to appear concussion;
3, the existing SGD optimization algorithm first step usually carries out weight decaying to parameter, however below using fixed study
Rate carries out parameter and updates the effect that can offset weight decaying to a certain extent, to influence the performance of algorithm.
The disclosure of background above technology contents is only used for auxiliary and understands inventive concept and technical solution of the invention, not
The prior art for necessarily belonging to present patent application shows above content in the applying date of present patent application in no tangible proof
Before have disclosed in the case where, above-mentioned background technique should not be taken to evaluation the application novelty and creativeness.
Summary of the invention
It is existing to solve it is a primary object of the present invention to propose a kind of quick adaptive neural network optimization method
Foregoing problems present in stochastic gradient descent method.
A kind of quick adaptive neural network optimization method, comprising the following steps:
S1, the initialization neural network optimizer parameter;
S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as described in
The input of optimizer;
S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment
The single order moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order
The moment of momentum calculates the current time second order moment of momentum;
The initial learning rate of S4, the amendment optimizer;
S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in
The network parameter of neural network;
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step
The obtained network parameter of S5.
Further, the initial parameter of optimizer described in step S2 include initial learning rate, weight attenuation coefficient and
Differential coefficient.
Further, in step S3, the current time single order moment of momentumWith the current time second order moment of momentumCalculating
It is as follows:
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9~1,Respectively indicate the single order moment of momentum and the second order moment of momentum of last moment, kdFor differential coefficient,It is described
Neural network t moment gradient,For the neural network the last moment of t moment gradient.
Further, α1=0.9, α2=0.999.
Further, the initial learning rate is corrected by following formula in step S4:
Wherein, lrtFor the learning rate that t moment is corrected, lr0For initial learning rate, n is the number of iterations.
Further, step S5 is according to revised learning rate, the current time single order moment of momentum and current time second order
The mode that the moment of momentum updates the network parameter of the neural network is as follows:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is one normal
Number, and 0 < ε < 0.001.
Further, as follows to the mode of the gradient progress weight decaying of the neural network in step S6:
Wherein, wdFor weight attenuation coefficient;sign(ft(θ)) it is sign function, represent the symbol of gradient;Carrying out weight decaying
When, if the symbol of the gradient is negative, corresponding gradient value is 0.
Above-mentioned optimization method provided by the invention seeks the single order moment of momentum and second order by the gradient data to neural network
The moment of momentum carries out drift correction to gradient: for the single order moment of momentum, first differential item is added, for the second order moment of momentum, addition
Square of first differential, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative;By right
Initial learning rate is modified, and realizes in optimization process and different learning rates is arranged for different parameters.Therefore, the present invention is not
It needs largely to adjust ginseng, and has cracking convergence rate.Finally, the phenomenon that offsetting weight decaying for tradition SGD algorithm, this
Invention changes the sequence of weight decaying, using no negative coefficient damped manner, optimization process is made to be easier to obtain coefficient solution, accelerates to receive
Hold back speed.
Detailed description of the invention
Fig. 1 is a kind of flow chart for quick adaptive neural network optimization method that the specific embodiment of the invention provides;
Fig. 2 is quick adaptive neural network optimization method provided by the invention compared with the performance of existing SGD method
Figure.
Specific embodiment
The invention will be further described with specific embodiment with reference to the accompanying drawing.
A specific embodiment of the invention provides a kind of quick adaptive neural network optimization method, can be adapted for
The neural network of arbitrary structures.With reference to Fig. 1, which includes the following steps S1~S6:
S1, the initialization neural network optimizer parameter.The initial parameter of optimizer includes initial learning rate
lr0, weight attenuation coefficient wdWith differential coefficient kd。
S2, the gradient data that the initial parameter of optimizer and neural network propagated forward come are as the defeated of optimizer
Enter.
S3, the gradient data according to neural network use current time gradient, last moment gradient and last moment single order
The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum
Square calculates the current time second order moment of momentum.According to the gradient data that neural network propagated forward comes, using following formula (1) and
(2) the single order moment of momentum at current time (t moment) is calculated separatelyThe second order moment of momentum
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9
~1, more preferably α1=0.9, α2=0.999;The single order moment of momentum and the second order for respectively indicating last moment are dynamic
Measure square;For neural network t moment gradient;For the neural network t moment last moment
Gradient.By above-mentioned two formula, for the single order moment of momentum, add first differential item, to the second order moment of momentum, it is micro- to add single order
Square divided, come the overshoot for inhibiting the single order moment of momentum and the second order moment of momentum to generate during gradient is accumulative.
The initial learning rate of S4, the amendment optimizer.Specifically corrected using following formula (3) Lai Jinhang learning rate:
Wherein, lrtFor the learning rate that t moment is corrected, n is the number of iterations.By being modified to initial learning rate, realize
Different learning rates is set for different parameters in optimization process, overcomes the existing SGD method to limit since learning rate is constant
The problem of algorithm performance.
S5, according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum update described in
The network parameter of neural network.Can specifically following formula (4) Lai Shixian be used in this step:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is
One constant is 0 its role is to prevent the denominator of above formula (4);Preferably 0 < ε < 0.001.
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step
The obtained network parameter of S5.It is that the current single order moment of momentum and second order are used based on original gradient data in previous step S5
The moment of momentum first once updates the network parameter of neural network;Then gradient weight decaying is carried out by this step S6 again,
It is subsequent that the gradient after weight decaying is recycled to go to update the network parameter.Specific weight decaying uses following formula (5):
Wherein, sign (ft(θ)) it is sign function, represent the symbol of gradient;When carrying out weight decaying, if the symbol of the gradient
It number is negative, then corresponding gradient value is 0.The present invention changes the sequence of weight decaying, using no negative coefficient damped manner, so that excellent
Change process is easier to obtain coefficient solution, accelerates convergence rate.
By emulating optimization method of the invention and existing SGD method, performance curve as shown in Figure 2, Fig. 2 are obtained
In, curve L1 represents the simulation result of optimization method of the invention, and curve L2 represents the simulation result of traditional SGD method.At this
Field is well known that the lower Loss values the better, and the higher the better for Acc values.As can be seen from Figure 2: (a) with repeatedly
The progress in generation, TrainLoss of the invention are significantly lower than SGD method;(b) ValidLoss of the invention ratio SGD method is slightly lower;(c)
With iterations going on, TrainAcc of the invention is apparently higher than SGD method;(d) ValidAcc of the invention ratio SGD method is high.
As it can be seen that the properties of optimization method of the invention are better than existing SGD method.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those skilled in the art to which the present invention belongs, it is not taking off
Under the premise of from present inventive concept, several equivalent substitute or obvious modifications can also be made, and performance or use is identical, all answered
When being considered as belonging to protection scope of the present invention.
Claims (7)
1. a kind of quick adaptive neural network optimization method, which comprises the following steps:
S1, the initialization neural network optimizer parameter;
S2, the gradient data that the initial parameter of the optimizer and the neural network propagated forward come are as the optimization
The input of device;
S3, according to the gradient data of the neural network, use current time gradient, last moment gradient and last moment single order
The moment of momentum calculates the current time single order moment of momentum, uses current time gradient, last moment gradient and last moment second order momentum
Square calculates the current time second order moment of momentum;
The initial learning rate of S4, the amendment optimizer;
S5, the nerve is updated according to revised learning rate, the current time single order moment of momentum and the current time second order moment of momentum
The network parameter of network;
S6, weight decaying is carried out to the gradient of the neural network, then the gradient after being decayed with weight continues to update step S5 institute
Obtained network parameter.
2. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that excellent described in step S2
The initial parameter for changing device includes initial learning rate, weight attenuation coefficient and differential coefficient.
3. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that in step S3, currently
The moment single order moment of momentumWith the current time second order moment of momentumCalculating it is as follows:
Wherein, α1、α2Respectively single order momentum moment coefficient and second order momentum moment coefficient and α1、α2Value range be 0.9~1,Respectively indicate the single order moment of momentum and the second order moment of momentum of last moment, kdFor differential coefficient,It is described
Neural network t moment gradient,For the neural network the last moment of t moment gradient.
4. quickly adaptive neural network optimization method as claimed in claim 3, which is characterized in that α1=0.9, α2=
0.999。
5. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that by such as in step S4
Lower formula corrects the initial learning rate:
Wherein, lrtFor the learning rate that t moment is corrected, lr0For initial learning rate, n is the number of iterations.
6. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that step S5 is according to amendment
Learning rate, the current time single order moment of momentum and the current time second order moment of momentum afterwards updates the network parameter of the neural network
Mode is as follows:
Wherein, ft(θ)、ft-1(θ) is respectively the current time t of the neural network, the network parameter of last moment, and ε is one normal
Number, and 0 < ε < 0.001.
7. quickly adaptive neural network optimization method as described in claim 1, which is characterized in that described in step S6
The mode that the gradient of neural network carries out weight decaying is as follows:
Wherein, wdFor weight attenuation coefficient;sign(ft(θ)) it is sign function, represent the symbol of gradient;Carrying out weight decaying
When, if the symbol of the gradient is negative, corresponding gradient value is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811318475.XA CN109389222A (en) | 2018-11-07 | 2018-11-07 | A kind of quick adaptive neural network optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811318475.XA CN109389222A (en) | 2018-11-07 | 2018-11-07 | A kind of quick adaptive neural network optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109389222A true CN109389222A (en) | 2019-02-26 |
Family
ID=65428554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811318475.XA Pending CN109389222A (en) | 2018-11-07 | 2018-11-07 | A kind of quick adaptive neural network optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109389222A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187499A (en) * | 2019-05-29 | 2019-08-30 | 哈尔滨工业大学(深圳) | A kind of design method of on piece integrated optical power attenuator neural network based |
CN110782016A (en) * | 2019-10-25 | 2020-02-11 | 北京百度网讯科技有限公司 | Method and apparatus for optimizing neural network architecture search |
CN110782017A (en) * | 2019-10-25 | 2020-02-11 | 北京百度网讯科技有限公司 | Method and device for adaptively adjusting learning rate |
CN114925829A (en) * | 2022-07-18 | 2022-08-19 | 山东海量信息技术研究院 | Neural network training method and device, electronic equipment and storage medium |
US11631030B2 (en) | 2020-02-11 | 2023-04-18 | International Business Machines Corporation | Learning with moment estimation using different time constants |
-
2018
- 2018-11-07 CN CN201811318475.XA patent/CN109389222A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187499A (en) * | 2019-05-29 | 2019-08-30 | 哈尔滨工业大学(深圳) | A kind of design method of on piece integrated optical power attenuator neural network based |
CN110782016A (en) * | 2019-10-25 | 2020-02-11 | 北京百度网讯科技有限公司 | Method and apparatus for optimizing neural network architecture search |
CN110782017A (en) * | 2019-10-25 | 2020-02-11 | 北京百度网讯科技有限公司 | Method and device for adaptively adjusting learning rate |
CN110782017B (en) * | 2019-10-25 | 2022-11-22 | 北京百度网讯科技有限公司 | Method and device for adaptively adjusting learning rate |
US11631030B2 (en) | 2020-02-11 | 2023-04-18 | International Business Machines Corporation | Learning with moment estimation using different time constants |
CN114925829A (en) * | 2022-07-18 | 2022-08-19 | 山东海量信息技术研究院 | Neural network training method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109389222A (en) | A kind of quick adaptive neural network optimization method | |
CN102998973B (en) | The multi-model Adaptive Control device of a kind of nonlinear system and control method | |
CN103995805B (en) | The word processing method of the big data of text-oriented | |
RU2351974C1 (en) | Method and device to control flying vehicle pitch | |
CN105759612A (en) | Differential game anti-interception maneuver penetration/accurate strike guiding method with falling angle constraint | |
CN107086916A (en) | A kind of Synchronization of Chaotic Systems based on fractional order adaptive sliding-mode observer | |
CN109858612A (en) | A kind of adaptive deformation cavity convolution method | |
CN106202666A (en) | A kind of computational methods of marine shafting bearing adjustment of displacement | |
CN106682735A (en) | BP neural network algorithm based on PID adjustment | |
WO2018076331A1 (en) | Neural network training method and apparatus | |
CN115686048B (en) | Dynamic triggering limited time control method for executor limited spacecraft intersection system | |
CN108280207A (en) | A method of the perfect Hash of construction | |
CN103676786B (en) | A kind of curve smoothing method based on acceleration principle | |
CN103984200A (en) | Design method of auxiliary graph as well as production method and photoetching method of test map | |
CN107273975A (en) | A kind of rarefaction back-propagating training method of neural network model | |
CN104717058B (en) | Password traversal method and device | |
CN110399697A (en) | Control distribution method based on the aircraft for improving genetic learning particle swarm algorithm | |
CN106019949A (en) | Adaptive order fractional order fuzzy PI lambda controller method | |
CN108710944A (en) | One kind can train piece-wise linear activation primitive generation method | |
CN104571100B (en) | A kind of non-minimum phase hypersonic aircraft control method | |
Parent | Positivity-preserving flux difference splitting schemes | |
CN106384002B (en) | Flood forecasting real-time correction method based on back-fitting algorithm | |
CN110826688B (en) | Training method for guaranteeing stable convergence of maximum and minimum loss functions of GAN model | |
CN103903003A (en) | Method for using Widrow-Hoff learning algorithm | |
Dong et al. | Application of Adam-BP neural network in leveling fitting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190226 |