CN110458287A

CN110458287A - Parameter updating method, device, terminal and the storage medium of Neural Network Optimization device

Info

Publication number: CN110458287A
Application number: CN201910117536.4A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-02-15
Filing date: 2019-02-15
Publication date: 2019-11-15
Anticipated expiration: 2039-02-15
Also published as: CN110458287B

Abstract

The present invention is applied to deep learning field, specifically disclose parameter updating method, device, terminal and the storage medium of a kind of Neural Network Optimization device, method includes: when being optimized using optimizer to neural network model, learning rate initial value in optimizer is arranged as benchmark learning rate, and obtain gradient when optimizer carries out stochastic gradient descent optimization, to search corresponding learning rate according to gradient, wherein learning rate reduces with the decline of gradient；The ratio k of the learning rate and benchmark learning rate that find is calculated, and the learning rate found is updated for benchmark learning rate；Attenuation rate is promoted according to ratio k, and it re-execute the steps when being optimized using optimizer to neural network model, gradient when optimizer carries out stochastic gradient descent optimization is obtained, to search corresponding learning rate according to gradient, until neural network model optimization is completed.The dynamic that the present invention realizes Neural Network Optimization device parameter updates, and improves the implementation effect of disaggregated model.

Description

Parameter updating method and device of neural network optimizer, terminal and storage medium

Technical Field

The present invention relates to the field of computers, and in particular, to a method, an apparatus, a terminal, and a computer-readable storage medium for updating parameters of a neural network optimizer.

Background

At present, the optimization mode of the neural network model based on deep learning mostly adopts a method of SGD (Stochastic Gradient Descent), and the method obtains good precision, namely low error rate to a certain extent. The SGD is a randomly collected sample, which may cause the model to fall into a locally optimal solution, such as a saddle point, with a certain probability, and the obtained model has a poor execution effect.

Disclosure of Invention

The invention mainly aims to provide a parameter updating method, a parameter updating device, a parameter updating terminal and a computer-readable storage medium of a neural network optimizer, and aims to solve the problem that when the deep learning model is optimized by the optimizer based on an SGD (generalized minimum deviation) method, the execution effect of the model is poor.

In order to achieve the above object, the present invention provides a parameter updating method for a neural network optimizer, the method comprising the steps of:

when an optimizer based on random gradient descent combined with a momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the reduction of the gradient;

calculating a ratio k of the searched learning rate to the reference learning rate, and updating the searched learning rate to the reference learning rate;

and increasing the attenuation rate according to the ratio k, and re-executing the steps, when the optimizer is used for optimizing the neural network model, obtaining the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed, wherein k belongs to (0, 1).

Optionally, the step of increasing the attenuation rate according to the ratio k includes:

according to the formulaIncreasing the attenuation ratio, wherein gamma₁For adjusted attenuation ratio, gamma₀K is the ratio of the found learning rate to the reference learning rate for the attenuation rate before adjustment.

Optionally, after the step of searching for the corresponding learning rate according to the gradient, the method further includes:

monitoring whether the optimization of the optimizer on the neural network model reaches a preset stage or not;

when the optimization of the optimizer on the neural network model is monitored to reach a preset stage, executing the following steps: calculating the ratio k of the searched learning rate to the reference learning rate;

when the optimization of the optimizer on the neural network model is monitored not to reach a preset stage, returning to the execution step, and when the optimizer is used for optimizing the neural network model, acquiring the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed.

Optionally, the step of monitoring whether the optimization of the neural network model by the optimizer reaches a preset stage includes:

when the optimizer completes the optimization of the neural network model once and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model, monitoring whether the accumulated data volume of the training sample data input into the neural network model is equal to a preset stage threshold value or not; when the accumulated data volume of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimizer is determined to optimize the neural network model to reach a preset stage.

Optionally, the preset criterion is that the batch size increases as the number of times the optimizer optimizes the neural network model increases.

Optionally, when the preset stage threshold is N times of the preset value, where N is an integer greater than or equal to 1.

Optionally, the step of optimizing the neural network model using an optimizer based on a stochastic gradient descent combined with a momentum method includes:

according to a formula including_n＝θ_n-1-v_tThe optimizer of (a) optimizes the neural network model, where θ_nFor the currently optimized neural network model parameter, θ_n-1Neural network model optimized for previous timeType parameter, v_tFor the moment when the neural network model is optimized,gamma is the attenuation ratio, v_t-1For the previous momentum when optimizing the neural network model, epsilon is the learning rate,for the gradient, t is the optimization order.

In order to achieve the above object, the present invention further provides a parameter updating apparatus of a neural network optimizer, the apparatus including:

the searching module is used for setting an initial value of a learning rate in the optimizer as a reference learning rate when the optimizer based on random gradient descent combined with a momentum method is used for optimizing the neural network model, and acquiring a gradient when the optimizer performs random gradient descent optimization so as to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the reduction of the gradient;

the calculation module is used for calculating the ratio k of the searched learning rate to the reference learning rate and updating the searched learning rate into the reference learning rate;

and the promotion module is used for promoting the attenuation rate according to the ratio k, and obtaining the gradient of the optimizer during random gradient descent optimization when the optimizer is used for optimizing the neural network model in the re-execution step, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed, wherein k belongs to (0, 1).

In order to achieve the above object, the present invention further provides a terminal, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the parameter updating method of the neural network optimizer as described above.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the parameter updating method of the neural network optimizer as described above.

When an optimizer based on random gradient descent combined with a momentum method is used for optimizing a neural network model, setting an initial value of a learning rate in the optimizer as a reference learning rate, and acquiring a gradient of the optimizer during random gradient descent optimization to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the gradient; calculating a ratio k of the searched learning rate to the reference learning rate, and updating the searched learning rate to the reference learning rate; and increasing the attenuation rate according to the ratio k, and re-executing the steps, when the optimizer is used for optimizing the neural network model, obtaining the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed, wherein k belongs to (0, 1). Therefore, the dynamic adjustment and updating of the parameters of the optimizer including the learning rate and the attenuation rate are realized, the optimizer based on the combination of the random gradient descent method, the momentum method and the parameters of the dynamic adjustment optimizer is used for optimizing the neural network model, and the execution effect of the model is improved. In addition, because the attenuation rate is generally defined as a constant in the prior art, compared with the prior art, the attenuation rate is improved through the change of the learning rate, and the convergence rate of the neural network model is accelerated in the actual operation process.

Drawings

Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a parameter updating method of a neural network optimizer according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a parameter updating method of a neural network optimizer according to another embodiment of the present invention;

FIG. 4 is a functional block diagram of a parameter updating apparatus of the neural network optimizer.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a terminal provided by the present invention. The terminal may be a server or a computer comprising components such as a memory 10 and a processor 20. In the terminal, the processor 20 is connected to the memory 10, and the memory 10 stores a computer program, which is executed by the processor 20 at the same time, so as to implement the steps of the method corresponding to the following embodiments.

The memory 10 may be used to store software programs and various data. The memory 10 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as optimizing a neural network model using an optimizer), and the like; the storage data area may include a database, and the storage data area may store data or information created according to the use of the terminal, and the like. Further, the memory 10 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 20, which is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 10 and calling data stored in the memory 10, thereby performing overall monitoring of the terminal. Processor 20 may include one or more processing units; alternatively, the processor 20 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 20.

Although not shown in fig. 1, the terminal may further include a circuit control module for connecting to a power supply to ensure the normal operation of other components. The terminal may further include a display module, configured to extract data in the memory 10 and display the data as a front-end display interface of the terminal and an operation result of the neural network model when the neural network model is applied to classification. The terminal may further include a communication module for connecting with an external communication device through a network. The communication module can receive a request sent by an external communication device and can also send the request, an instruction and information to the external communication device.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Based on the hardware structure, various embodiments of the method of the invention are provided.

Referring to fig. 2, in an embodiment of the parameter updating method of the neural network optimizer of the present invention, the method includes:

step S10, when optimizing the neural network model by using the optimizer based on random gradient descent combined with momentum method, setting the initial value of the learning rate in the optimizer as the reference learning rate, and obtaining the gradient of the optimizer during random gradient descent optimization to search the corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the reduction of the gradient;

the neural network model in the present scheme is a classification model, and can be used for text classification, image classification, and the like, and a brief process description is given below by using the text classification. After pre-processing operations such as word segmentation and the like are carried out on a training text, and the text word segmentation is converted into corresponding word vectors through a pre-trained dictionary, the word vectors are required to be input into a feature extraction neural network to obtain output word vectors, then the output word vectors are input into a preset classifier, the classifier has multiple lines, and after the operation of the classifier is finished, the classifier can output the classification probability of the input word vectors corresponding to each line. It is understood that the result of adding the classification probabilities corresponding to all the rows is 1, and the row with the highest classification probability can be selected as the classification result corresponding to the input word vector by default in the program.

It should be noted that the feature extraction neural network for the input word vector is a feature extraction project, and is used to retain the main features of the word vector, so that the output word vector encapsulates enough information for classification, and has a strong feature expression capability. The program operation of the classifier can be performed by referring to the technical means commonly used in the art, for example, the probability corresponding to each row can be obtained by using a Softmax function. In addition, it should be noted that the classifier and the feature extraction neural network form a complete neural network model, which can be actually connected between neural network layers, and are introduced in a differentiated manner for defining functional roles in detail herein; of course, the classifier and the feature extraction neural network may be separately configured.

In this process, parameters and weights in the Neural Network model need to be optimized, so that a classification result output by the Text classification model (i.e., the Neural Network model) fits reality, where the Text classification model may be at least one of Text CNN (Text Convolutional Neural Network), Text RNN (Text Convolutional Neural Network), or Text RCNN (Text Convolutional Neural Network).

In this embodiment, the process of optimizing the neural network model by the optimizer refers to a complete training and optimizing process of the neural network model by the optimizer manufactured by combining the SGD with the momentum method, which includes multiple optimizations until the training of the neural network model is completed. All parameters related to random gradient descent and momentum are involved in the optimizer, and the parameters have corresponding initial values in the first optimization, and the initial values of the learning rate can be set as reference learning rates. The learning rate represents the speed of the process of reaching the optimal quality of the adjusted parameters and is used for determining the performance of the neural network model operated by the computer. In order to achieve the purpose of optimizing the neural network model, the learning rate can be controlled by the processor to be gradually reduced along with the gradient reduction in the random gradient reduction method, so that the learning rate is gradually reduced in the iterative optimization process of the model, the adjustment speed is reduced, and the accuracy is ensured.

For the adjustment of the learning rate, an association relationship between the gradient and the learning rate may be established in the memory in advance, and then the corresponding learning rate is found for adjustment by obtaining the gradient during the random gradient descent optimization, wherein the overall adjustment trend based on the association relationship is that the learning rate decreases with the descent of the gradient. Further, the obtaining of the gradient when the stochastic gradient is decreased by using the optimizer is obtained after derivation of the loss function of the neural network model operation is obtained, and in text classification, the loss function can be obtained by combining the output probability of each line and the output result of the actual text in each line in a manner of calculating the cross entropy, which is not described herein again.

Step S20, calculating the ratio k of the searched learning rate to the reference learning rate, and updating the searched learning rate to the reference learning rate;

in the optimization process of the whole optimizer on the neural network model, once optimization is performed, the learning rate can be adjusted, the ratio of the adjusted learning rate to the reference learning rate can be further calculated, then the adjusted learning rate is updated to the reference learning rate, and the ratio is convenient to update in real time along with the increase of the optimization times of the optimizer on the neural network model.

And step S30, increasing the attenuation rate according to the ratio k, and re-executing the steps, when the optimizer is used for optimizing the neural network model, obtaining the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed, wherein k belongs to (0, 1).

It should be noted that, in the optimization process of the existing optimizer for the neural network model, the attenuation rate is generally a constant, the attenuation rate is defined as a variable in the present application, and according to the change of the learning rate, the ratio of the learning rate before and after optimization is determined to adjust the attenuation rate. It can be understood that the searched learning rate is reduced along with the reduction of the gradient, and the adjustment of the learning rate is usually the change of the digit level, so the value range of the ratio is between 0 and 1, and the actual attenuation rate is larger and larger. After the adjustment of the attenuation rate and the learning rate is completed, the optimizer can be used for optimizing the neural network model again until the optimization of the neural network model is completed.

Alternatively, the adjustment process of the attenuation rate before and after optimization of the neural network model may be according to a formulaIncreasing the attenuation ratio, wherein gamma₁For adjusted attenuation ratio, gamma₀K is the ratio of the found learning rate to the reference learning rate for the attenuation rate before adjustment. For example, if the learning rate is adjusted from 0.02 to 0.0002, k is equal to 0.0002/0.02 to 0.01, and the boosted attenuation rate can be calculated in combination with the attenuation rate before adjustment.

In the embodiment, when an optimizer based on random gradient descent combined with a momentum method is used for optimizing a neural network model, an initial value of a learning rate in the optimizer is set as a reference learning rate, and a gradient of the optimizer during random gradient descent optimization is obtained, so as to search a corresponding learning rate according to the gradient, wherein the learning rate is reduced along with the reduction of the gradient; calculating a ratio k of the searched learning rate to the reference learning rate, and updating the searched learning rate to the reference learning rate; and increasing the attenuation rate according to the ratio k, and re-executing the steps, when the optimizer is used for optimizing the neural network model, obtaining the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed, wherein k belongs to (0, 1). Therefore, the dynamic adjustment and updating of the parameters of the optimizer including the learning rate and the attenuation rate are realized, the optimizer based on the combination of the random gradient descent method, the momentum method and the parameters of the dynamic adjustment optimizer is used for optimizing the neural network model, and the execution effect of the model is improved. In addition, because the attenuation rate is generally defined as a constant in the prior art, compared with the prior art, the attenuation rate in the optimizer is improved through the change of the learning rate, and the convergence rate of the neural network model can be accelerated in the process of optimizing and running the neural network by the actual optimizer, so that the optimal state can be achieved as soon as possible.

Further, in other embodiments, the optimization of the neural network model by random gradient descent based method in combination with momentum method may be according to a process including the formula θ_n＝θ_n-1-v_tThe optimizer of (a) optimizes the neural network model, where θ_nFor the currently optimized neural network model parameter, θ_n-1Neural network model parameters, v, for a previous optimization_tFor the moment when the neural network model is optimized,gamma is the attenuation ratio, v_t-1For the previous momentum when optimizing the neural network model, epsilon is the learning rate,for the gradient, t is the optimization order. The neural network model parameter may refer to a weight coefficient of the neural network model, and the like. Momentum and random gradient descent are combined with updating and adjusting of learning rate and attenuation rate parameters in the optimizer, so that when parameters in the neural network model are optimized and updated through a computer program, vibration is reduced, the calculation efficiency is greatly improved, the convergence rate of the neural network model is increased, the training speed is increased, and the effect is better.

Further, referring to fig. 3, in another embodiment, after the step S10, the method further includes:

step S40, monitoring whether the optimization of the optimizer on the neural network model reaches a preset stage; if yes, go to step S20; if not, go to step S50;

and step S50, returning to the execution step, acquiring the gradient of the optimizer during random gradient descent optimization when the optimizer is used for optimizing the neural network model, and searching the corresponding learning rate according to the gradient until the optimization of the neural network model is completed.

In the present embodiment, the number of times of adjusting the attenuation factor is limited based on the above embodiments, mainly because the adjustment of the learning rate is usually a change in the order of bits, and if the adjustment of the attenuation factor is performed every time the learning rate is adjusted, the change of the actual attenuation factor may be small, and the influence on the convergence rate of the neural network model is small. After the learning rate is adjusted through each search operation, whether the optimization effect reaches a preset stage or not is determined, whether the adjustment of the attenuation rate is performed or not is determined, the optimization times of the attenuation rate can be reduced, the optimization of the neural network model can be substantially influenced by the updating and adjusting of the attenuation rate each time, and the time required by the terminal for updating the parameters of the optimizer is indirectly shortened.

Alternatively, in this embodiment, whether the optimization of the neural network model by the optimizer reaches the preset stage may be determined by the size of the continuously input batch size data. When the optimizer completes the optimization of the neural network model once and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model, monitoring whether the accumulated data volume of the training sample data input into the neural network model is equal to a preset stage threshold value or not; when the accumulated data volume of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimizer is determined to optimize the neural network model to reach a preset stage. Otherwise, when the accumulated data quantity of the training sample data input to the neural network model is not equal to the threshold value of the preset stage, the optimization of the optimizer on the neural network model is determined not to reach the preset stage. Further, the preset stage threshold may be set according to actual needs, for example, the threshold corresponding to different stages may be set along with the increase of the data amount in the cumulative input neural network model, and the difference between adjacent thresholds may be the same or different. Taking the same difference between adjacent thresholds in each stage as an example, that is, all the preset stage thresholds are N times of a certain preset value, and the value of N is an integer greater than or equal to 1.

In addition, whether the preset stage is reached can be determined according to the optimization times of the neural network model by the optimizer, for example, the attenuation rate is adjusted once every Q times of adjustment of the learning rate, for example, Q can be equal to 10.

In the whole process of optimizing the neural network model by the optimizer, after each optimization of the neural network model by the optimizer, batch size (batch _ size) data is input into the once optimized neural network model. Where the batch size data is applied to the text classification, i.e., the batch size input word vector. It should be noted that, in the process of optimizing and training the neural network model, the training text needs to be input into the neural network, and a complete training sample data set passes through the neural network and returns to the neural network, which may be called an EPOCH, but when the data set is very large, the EPOCH needs to be divided into a plurality of batch-size data inputs. The size of the batch in the batch size data determines the descending direction, and the larger the batch size is in a reasonable range, the more accurate the determined gradient descending direction is, and the smaller the training oscillation is. The batch size adjustment can be larger and larger along with the increase of the optimization times, and can also be dynamically adjusted according to the condition of the output result.

Referring to fig. 4, the present invention also provides a parameter updating apparatus of a neural network optimizer, where the apparatus may be a computer or a server, and the apparatus includes:

a searching module 10, configured to, when an optimizer based on a stochastic gradient descent combined with a momentum method is used to optimize a neural network model, set an initial value of a learning rate in the optimizer as a reference learning rate, and obtain a gradient of the optimizer during stochastic gradient descent optimization to search a corresponding learning rate according to the gradient, where the learning rate decreases as the gradient decreases;

a calculating module 20, configured to calculate a ratio k between the found learning rate and the reference learning rate, and update the found learning rate to the reference learning rate;

and the lifting module 30 is configured to lift the attenuation rate according to the ratio k, and when the optimizer is used to optimize the neural network model, re-execute the step, obtain a gradient when the optimizer performs random gradient descent optimization, so as to search a corresponding learning rate according to the gradient until the neural network model is optimized, where k belongs to (0, 1).

Further, in another embodiment, the lifting module is further configured to lift the elevator car according to a formulaIncreasing the attenuation ratio, wherein gamma₁For adjusted attenuation ratio, gamma₀K is the ratio of the found learning rate to the reference learning rate for the attenuation rate before adjustment.

Further, in yet another embodiment, the apparatus further comprises a monitoring module; wherein,

the monitoring module is used for monitoring whether the optimization of the optimizer on the neural network model reaches a preset stage; triggering the calculation module to calculate the ratio k of the searched learning rate to the reference learning rate when the optimization of the optimizer on the neural network model is monitored to reach a preset stage; when the optimization of the optimizer on the neural network model is monitored to reach a preset stage, returning to the execution step, and when the optimizer is used for optimizing the neural network model, acquiring the gradient of the optimizer during random gradient descent optimization, so as to search the corresponding learning rate according to the gradient until the optimization of the neural network model is completed.

Further, in another embodiment, the monitoring module is further configured to monitor whether an accumulated data amount of training sample data input to the neural network model is equal to a preset stage threshold when the optimizer completes one-time optimization of the neural network model and batch size data in training sample data selected according to a preset standard is input into the optimized neural network model; when the accumulated data volume of the training sample data input to the neural network model is equal to a preset stage threshold value, the optimizer is determined to optimize the neural network model to reach a preset stage.

Further, in yet another embodiment, the preset criterion is that the batch size increases as the number of times the optimizer optimizes the neural network model increases.

Further, in another embodiment, when the preset phase threshold is N times the preset value, where N is an integer greater than or equal to 1.

Further, in yet another embodiment, the apparatus further comprises:

an optimization module for optimizing the data according to a formula including θ_n＝θ_n-1-v_tThe optimizer of (a) optimizes the neural network model, where θ_nFor the currently optimized neural network model parameter, θ_n-1Neural network model parameters, v, for a previous optimization_tFor the moment when the neural network model is optimized,gamma is the attenuation ratio, v_t-1For the previous momentum when optimizing the neural network model, epsilon is the learning rate,for the gradient, t is the optimization order.

The invention also proposes a computer-readable storage medium on which a computer program is stored. The computer-readable storage medium may be the Memory 10 in the terminal in fig. 1, and may also be at least one of a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, and an optical disk, and the computer-readable storage medium includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, a terminal, or a network device) having a processor to execute the method according to the embodiments of the present invention.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or server that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or server. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or service that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for updating parameters of a neural network optimizer, the method comprising the steps of:

2. The parameter updating method of the neural network optimizer of claim 1, wherein the step of increasing the decay rate according to the ratio k comprises:

3. The method of claim 1, wherein the step of searching for the corresponding learning rate according to the gradient is followed by further comprising:

4. The parameter updating method of the neural network optimizer of claim 3, wherein the step of monitoring whether the optimization of the neural network model by the optimizer reaches a preset stage comprises:

5. The parameter updating method of the neural network optimizer of claim 4, wherein the preset criterion is that the batch size increases as the number of times the optimizer optimizes the neural network model increases.

6. The method of claim 4, wherein the predetermined phase threshold is N times the predetermined value, where N is an integer greater than or equal to 1.

7. The parameter updating method of the neural network optimizer of any one of claims 1-6, wherein the step of optimizing the neural network model using the optimizer based on stochastic gradient descent combined with momentum method comprises:

according to a formula including_n＝θ_n-1-v_tThe optimizer of (a) optimizes the neural network model, where θ_nFor the currently optimized neural network model parameter, θ_n-1Neural network model parameters, v, for a previous optimization_tFor the moment when the neural network model is optimized,gamma is the attenuation ratio, v_t-1For the previous momentum when optimizing the neural network model, epsilon is the learning rate,for the gradient, t is the optimization order.

8. An apparatus for updating parameters of a neural network optimizer, the apparatus comprising:

9. A terminal, characterized in that the terminal comprises: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the parameter updating method of the neural network optimizer of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the parameter updating method of a neural network optimizer as claimed in any one of claims 1 to 7.