WO2022095432A1 - 神经网络模型训练方法、装置、计算机设备及存储介质 - Google Patents

神经网络模型训练方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022095432A1
WO2022095432A1 PCT/CN2021/097319 CN2021097319W WO2022095432A1 WO 2022095432 A1 WO2022095432 A1 WO 2022095432A1 CN 2021097319 W CN2021097319 W CN 2021097319W WO 2022095432 A1 WO2022095432 A1 WO 2022095432A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
order moment
preset
current
neural network
Prior art date
Application number
PCT/CN2021/097319
Other languages
English (en)
French (fr)
Inventor
李国安
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095432A1 publication Critical patent/WO2022095432A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and in particular, to a neural network model training method, apparatus, computer equipment, and storage medium.
  • the current text classification model based on deep learning is to input the word vector into the trained neural network model to obtain the classification result of the word vector.
  • the neural network model needs to be trained before text classification.
  • the number of neurons may exceed tens of millions.
  • the efficiency of the gradient descent method will be much higher than that of analyzing the inverse matrix of the standard equation, making the gradient descent method a neural network.
  • the main method of network training In modern deep learning, the training period of neural network usually ranges from several hours to several days. How to improve the training efficiency of gradient descent method and stabilize gradient descent in complex and large scenes has always been the research direction of deep learning researchers. .
  • the purpose of the embodiments of the present application is to provide a neural network model training method, device, computer equipment and computer-readable storage medium, so as to solve the problem that the training effect is not good when training the neural network model in the prior art. , and the problem of low training efficiency.
  • an embodiment of the present application provides a neural network model training method, including:
  • the training sample set includes a plurality of training sample data
  • the loss function is calculated according to the current weight of the neural network model during training at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm The gradient at the current number of training steps;
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step of the current training step the gradient, the preset first-order moment estimation
  • the calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimation when the current training steps are trained;
  • the first second-order moment estimate of the 1 th layer in the neural network model when training is performed at the last training step calculates the corrected second-order moment estimation when training is performed at the current number of training steps;
  • the modified second-order moment estimation and the preset coefficient correction calculation formula calculate the correction coefficient when the current training steps are trained
  • the correction coefficient, the preset third parameter, the first modified learning rate when training at the current number of training steps, and the preset learning rate correction calculation formula The second modified learning rate when training with the number of training steps;
  • the updated weight of the first layer is calculated according to the current weight, the second corrected learning rate, the correction coefficient, the third parameter and a preset weight calculation formula.
  • the gradient algorithm is specifically:
  • the calculation according to the first parameter, the first first-order moment estimate, the gradient, a preset first-order moment estimation calculation formula, and a preset first-order moment estimation correction formula in the current training include:
  • the first-order moment estimate, the gradient and the preset first-order moment estimation formula, the second first-order moment estimate when training is performed at the current number of training steps is calculated, wherein the The first-order moment estimation formula is as follows: is the second first-order moment estimate, is the first first-order moment estimation, and ⁇ 1 is the first parameter;
  • the modified first-order moment estimate is calculated according to the second first-order moment estimate and the first-order moment estimation modification formula, wherein the first-order moment estimation modification formula is specifically: for the modified first moment estimate, is the power of t of the first parameter ⁇ 1 , and t is the number of the current training steps.
  • the calculation is performed in the current training step according to the second parameter, the first second moment estimation, the gradient, the preset second moment estimation calculation formula, and the preset second moment estimation correction formula.
  • the modified second-order moment estimates for training include:
  • the second-order moment estimate when training is performed at the current number of training steps is calculated, wherein the second-order moment estimate is
  • the moment estimation formula is as follows: is the second second moment estimate for the second, is the first second-order moment estimate, and ⁇ 2 is the second parameter;
  • the corrected second-order moment estimate when the current training step is trained is calculated, wherein the second-order moment estimation correction formula is specifically: for the modified second moment estimate, is the power of t of the second parameter ⁇ 2 , and t is the current number of training steps.
  • the coefficient correction calculation formula is specifically: is the correction coefficient, and ⁇ is a preset constant.
  • the learning rate correction calculation formula is specifically:
  • ⁇ 3 is the third parameter, is the first corrected learning rate when training at the current number of training steps.
  • the weight calculation formula is specifically: said is the second weight.
  • the embodiment of the present application also provides a neural network model training device, including:
  • an acquisition module configured to acquire a training sample set, the training sample set includes a plurality of training sample data
  • a training module configured to input the training sample data in the training sample set into a preset neural network model for iterative training until the neural network model converges, wherein each training session is performed on the neural network model After the training of the number of steps, update the weight of the lth layer in the neural network model through the following steps;
  • the loss function is calculated according to the current weight of the neural network model during training at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm The gradient at the current number of training steps;
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step of the current training step the gradient, the preset first-order moment estimation
  • the calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimation when the current training steps are trained;
  • the first second-order moment estimate of the 1 th layer in the neural network model when training is performed at the last training step calculates the corrected second-order moment estimation when training is performed at the current number of training steps;
  • the modified second-order moment estimation and the preset coefficient correction calculation formula calculate the correction coefficient when the current training steps are trained
  • the correction coefficient, the preset third parameter, the first modified learning rate when training at the current number of training steps, and the preset learning rate correction calculation formula The second modified learning rate when training with the number of training steps;
  • the updated weight of the first layer is calculated according to the current weight, the second corrected learning rate, the correction coefficient, the third parameter and a preset weight calculation formula.
  • the embodiments of the present application also provide a computer device, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the computer-readable instructions.
  • the following steps are implemented when the computer readable instructions are described:
  • the training sample set includes a plurality of training sample data
  • the loss function is calculated according to the current weight of the neural network model during training at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm The gradient at the current number of training steps;
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step of the current training step the gradient, the preset first-order moment estimation
  • the calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimation when the current training steps are trained;
  • the first second-order moment estimate of the 1 th layer in the neural network model when training is performed at the last training step calculates the corrected second-order moment estimation when training is performed at the current number of training steps;
  • the modified second-order moment estimation and the preset coefficient correction calculation formula calculate the correction coefficient when the current training steps are trained
  • the correction coefficient, the preset third parameter, the first modified learning rate when training at the current number of training steps, and the preset learning rate correction calculation formula The second modified learning rate when training with the number of training steps;
  • the updated weight of the first layer is calculated according to the current weight, the second corrected learning rate, the correction coefficient, the third parameter and a preset weight calculation formula.
  • the embodiments of the present application further provide a computer-readable storage medium, where computer-readable instructions are stored in the computer-readable storage medium, and the computer-readable instructions can be executed by at least one processor, to cause the at least one processor to perform the following steps:
  • the training sample set includes a plurality of training sample data
  • the loss function is calculated according to the current weight of the neural network model during training at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm The gradient at the current number of training steps;
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step of the current training step the gradient, the preset first-order moment estimation
  • the calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimation when the current training steps are trained;
  • the first second-order moment estimate of the 1 th layer in the neural network model when training is performed at the last training step calculates the corrected second-order moment estimation when training is performed at the current number of training steps;
  • the modified second-order moment estimation and the preset coefficient correction calculation formula calculate the correction coefficient when the current training steps are trained
  • the correction coefficient, the preset third parameter, the first modified learning rate when training at the current number of training steps, and the preset learning rate correction calculation formula The second modified learning rate when training with the number of training steps;
  • the updated weight of the first layer is calculated according to the current weight, the second corrected learning rate, the correction coefficient, the third parameter and a preset weight calculation formula.
  • the training sample set includes a plurality of training sample data; the training samples in the training sample set are The data is input into a preset neural network model for iterative training until the neural network model converges, wherein, after the neural network model is trained for each training step, the learning rate is adjusted in an adaptive manner.
  • Update so that the weight of the neural network model can be adaptively updated correspondingly, so that there is no need to adjust other hyperparameters during the training process of the neural network model, which reduces the difficulty of training parameters adjustment, saves time and cost, and improves training efficiency.
  • FIG. 1 is a flowchart of Embodiment 1 of the neural network model training method of the present application.
  • FIG. 2 is a schematic flow chart of the refinement of the steps of updating the weight of the first layer in the neural network model after the neural network model is trained for each number of training steps.
  • FIG. 3 is a schematic diagram of program modules of Embodiment 2 of the neural network model training apparatus of the present application.
  • FIG. 4 is a schematic diagram of a hardware structure of Embodiment 3 of a computer device of the present application.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • FIG. 1 a flowchart of steps of a neural network model training method according to Embodiment 1 of the present application is shown. It can be understood that the flowchart in this embodiment of the method is not used to limit the sequence of executing steps. The following description will be exemplified by taking the computer device 2 as the execution subject. details as follows.
  • step S10 a training sample set is obtained, and the training sample set includes a plurality of training sample data.
  • the training sample set may be a text set, an image set, or a voice set, etc.
  • the training sample set is described by taking a text set as an example.
  • the text set contains a plurality of text data, and each text data carries a text label, and the text label is used to indicate the category to which the text belongs.
  • Step S11 input the training sample data in the training sample set into a preset neural network model for iterative training until the neural network model converges, wherein the number of training steps is performed on the neural network model.
  • the weights of all can be updated through the steps S20-S26.
  • convergence refers to the fact that in the process of neural network training, if the loss value fluctuates back and forth or remains high and cannot enter the tolerance range, it means that the network does not converge; if the loss value is the smallest, that is, the training result is the same as The real results are closer, and the optimal solution is obtained, which means that the network has converged.
  • the gradient descent method is used to update the current weight ⁇ t of the neural network name model, and when the current weight ⁇ t is updated, the global learning rate is usually used to update, specifically.
  • the algorithm is: Among them, t represents the current number of training steps, ⁇ t represents the current weight when the number of training steps is t, ⁇ represents the learning rate, which is a fixed value, represents the gradient of the loss function L( ⁇ t ), ⁇ t+1 represents the weight when the number of training steps is t+1, Indicates guidance.
  • the weight parameters are optimized continuously according to the gradient descent direction to reduce the value of the loss function.
  • the learning rate is used as a hyperparameter to control the magnitude of weight update, as well as the speed and accuracy of training. If the learning rate is too large, it is easy to cause the target (cost) function to fluctuate greatly, making it difficult to find the optimal one. If the weak learning rate is set too small, the convergence will be too slow and time-consuming.
  • the weight of the model is not updated by means of the global learning rate, but after each training step is performed. , and update the weight of the first layer in the neural network model through steps S20-S26.
  • the number of training steps refers to the process of inputting a batch size (Batch Size) of training sample data into the neural network model for one iteration (Iteration) training.
  • the weights ⁇ t in each layer in the network model are updated, that is, the model is optimized once.
  • one iteration in this embodiment refers to the process of inputting a batch of sample data into the neural network model to complete the training of the batch of training sample data.
  • updating the weight of the first layer in the neural network model includes:
  • Step S20 according to the current weight of the neural network model when training at the current number of training steps, the training sample data sampled when training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm Calculate the gradient of the loss function at the current number of training steps.
  • the training sample data sampled during training at the current number of training steps t The target value corresponding to the training sample data
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step t-1 of the current training step The first second-order moment estimation of the lth layer in the neural network model when training at the last training step.
  • the first modified learning rate when training at the current number of training steps and the current weight of the lth layer in the neural network model when the current training steps are trained
  • the first parameter ⁇ 1 , the second parameter ⁇ 2 , and the third parameter ⁇ 3 are preset values, and the specific value of the first parameter ⁇ 1 is preferably: 0.9 ⁇ 1 ⁇ 1, the second parameter ⁇ 1 is preferably The specific value of the parameter ⁇ 2 is preferably: 0.99 ⁇ 2 ⁇ 1, and the specific value of the third parameter is preferably: 0 ⁇ 3 ⁇ 0.1.
  • first-order moment estimation is the first-order moment estimate calculated during the last iterative training of the neural network model. For example, if the current training step t is the fifth iterative training of the neural network model, the first-order moment estimate is the first-order moment estimate calculated after the fourth iteration of training the neural network model.
  • first second moment estimation is the first-order moment estimate calculated during the last iterative training of the neural network model. For example, if the current number of training steps t is the fifth iterative training of the neural network model, the first and second-order moment estimates is the second-order moment estimate calculated after the fourth iteration of training the neural network model.
  • first correction learning rate is the learning rate calculated during the last iterative training of the neural network model. For example, if the current number of training steps t is the fifth iterative training of the neural network model, the first revised learning rate is the learning rate calculated after the fourth iteration of training the neural network model.
  • the training sample data refers to the training sample data of a batch size (Batch Size) sampled from the training sample data set during the iterative training process with the number of training steps t. For example, if the batch size is 256, then the training sample data Specifically, 256 training sample data sampled from the training sample set.
  • Batch Size a batch size sampled from the training sample data set during the iterative training process with the number of training steps t. For example, if the batch size is 256, then the training sample data Specifically, 256 training sample data sampled from the training sample set.
  • the target value for the training sample data The corresponding sample labels, the number of target values is the same as the batch size.
  • the gradient algorithm is:
  • Step S21 according to the preset first parameter ⁇ 1 , when training is performed at the last training step number of the current training step number, the first first moment estimation of the 1st layer in the neural network model is estimated the gradient
  • the preset first-order moment estimation calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimate when training is performed at the current training step number
  • the first-order moment estimation formula is as follows:
  • ⁇ 1 is the first parameter.
  • the first-order moment estimation correction formula is specifically: for the modified first moment estimate, is the power of t of the first parameter ⁇ 1 , and t is the number of the current training steps.
  • the second first-order moment estimate during training at the current number of training steps can be calculated according to the obtained values and the first-order moment estimation formula After that, it can be estimated from the calculated second first moment Calculate the modified first-order moment estimate when training at the current training step with the first-order moment estimate correction formula Among them, the modified first-order moment estimate is an estimate of the second first-order moment The first moment estimate obtained after making the correction.
  • Step S22 according to the preset second parameter ⁇ 2 , the first second-order moment estimation of the 1 th layer in the neural network model when the last training step number is trained the gradient
  • the preset second-moment estimation calculation formula and the preset second-order-moment estimation correction formula calculate the revised second-order moment estimation when training is performed at the current training steps
  • ⁇ 2 is the second parameter.
  • the second-order moment estimation correction formula is specifically: for the modified second moment estimate, is the power of t of the second parameter ⁇ 2 , and t is the current number of training steps.
  • the second second-order moment estimate during training at the current number of training steps can be calculated according to the obtained values and the second-moment estimation formula. After that, it can be estimated from the calculated second second moment Calculate the revised second-order moment estimate when training at the current training step with the second-order moment estimation correction formula Among them, the modified second-order moment estimate is an estimate of the second second moment The second-order moment estimate obtained after making the correction.
  • Step S23 according to the modified first-order moment estimation
  • the coefficient correction calculation formula is as follows: is the correction coefficient, and ⁇ is a preset constant.
  • the constant is preferably a very small value, such as 10 ⁇ 10 .
  • the division by zero operation can be avoided by adding a very small constant.
  • Step S24 according to the current weight the correction factor
  • the preset third parameter ⁇ 3 the first modified learning rate when training at the current number of training steps and the preset learning rate correction calculation formula to calculate the second correction learning rate when training is performed at the next training step of the current training step.
  • the learning rate correction calculation formula is as follows:
  • ⁇ 3 is the third parameter, is the first corrected learning rate when training at the current number of training steps.
  • the learning rate is calculated in an adaptive manner for each layer in the neural network, so that the calculated learning rate is more accurate, thereby promoting the range of weight update, and improving the speed and accuracy of training and the speed of convergence.
  • Step S25 according to the current weight
  • the second modified learning rate the correction factor the third parameter and the preset weight calculation formula to calculate the updated weight of the lth layer
  • the weight calculation formula is as follows: said is the updated weight of the lth layer.
  • a training sample set is obtained, which includes a plurality of training sample data; the training sample data in the training sample set is input into a preset neural network model for iterative training until the neural network model is Convergence, wherein, after the neural network model is trained for each number of training steps, the weight of the first layer in the neural network model is updated through the following steps, and the current training steps are performed according to the neural network model.
  • the first second-order moment estimation of the 1st layer, the gradient, the preset second-order moment estimation calculation formula, and the preset second-order moment estimation correction formula calculate the revised second-order moment estimate when the current training steps are trained ; Calculate the correction coefficient when the current training steps are trained according to the modified first-order moment estimation, the modified second-order moment estimation and the preset coefficient correction calculation formula;
  • the learning rate is adaptively updated through the above method, so that the weight of the neural network model is adaptively updated correspondingly, and other hyperparameters do not need to be adjusted during the training process of the neural network model. , reduce the difficulty of training parameters, save time and cost, and improve training efficiency.
  • FIG. 3 shows a schematic diagram of program modules of a neural network model training apparatus 300 in an embodiment of the present application.
  • the neural network model training apparatus 300 can be applied to computer equipment, and the computer equipment can be a mobile phone, a tablet personal computer, a laptop computer (laptop computer), a server and other equipment with a data transmission function.
  • the neural network model training apparatus 300 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to The present application is completed, and the above-mentioned multi-dimensional data aggregation method can be realized.
  • the program module referred to in the embodiments of the present application refers to an instruction segment of a series of computer-readable instructions capable of completing a specific function, and is more suitable for describing the execution process of the multi-dimensional data aggregation method in the storage medium than the program itself.
  • the neural network model training apparatus 300 includes an acquisition module 301 and a training module 302 . The following description will specifically introduce the functions of each program module in this embodiment:
  • the obtaining module 301 is configured to obtain a training sample set, where the training sample set includes a plurality of training sample data.
  • the training sample set may be a text set, an image set, or a voice set, etc.
  • the training sample set is described by taking a text set as an example.
  • the text set contains a plurality of text data, and each text data carries a text label, and the text label is used to indicate the category to which the text belongs.
  • the training module 302 is configured to input the training sample data in the training sample set into a preset neural network model for iterative training until the neural network model converges, wherein each After the training of the number of training steps, the training module 302 updates the first calculation unit in the neural network model through the first calculation unit, the second calculation unit, the third calculation unit, the fourth calculation unit, the fifth calculation unit and the sixth calculation unit.
  • the weight of the layer, wherein the first layer refers to each layer in the neural network model, that is, the weight of each layer in the neural network model can be calculated by the first calculation unit, the second calculation unit, the third calculation unit unit, the fourth computing unit, the fifth computing unit, and the sixth computing unit to update.
  • convergence refers to the fact that in the process of neural network training, if the loss value fluctuates back and forth or remains high and cannot enter the tolerance range, it means that the network does not converge; if the loss value is the smallest, that is, the training result is the same as The real results are closer, and the optimal solution is obtained, which means that the network has converged.
  • the gradient descent method is used to update the current weight ⁇ t of the neural network name model, and when the current weight ⁇ t is updated, the global learning rate is usually used to update, specifically.
  • the algorithm is: Among them, t represents the current number of training steps, ⁇ t represents the current weight when the number of training steps is t, ⁇ represents the learning rate, which is a fixed value, represents the gradient of the loss function L( ⁇ t ), and ⁇ t+1 represents the weight when the number of training steps is t+1.
  • the weight parameters are optimized continuously according to the gradient descent direction to reduce the value of the loss function.
  • the learning rate is used as a hyperparameter to control the magnitude of weight update, as well as the speed and accuracy of training. If the learning rate is too large, it is easy to cause the target (cost) function to fluctuate greatly, making it difficult to find the optimal one. If the weak learning rate is set too small, the convergence will be too slow and time-consuming.
  • the training module 302 realizes the updating of the weight of the first layer in the neural network model through the first computing unit, the second computing unit, the third computing unit, the fourth computing unit, the fifth computing unit and the sixth computing unit .
  • the number of training steps refers to the process of inputting a batch size (Batch Size) of training sample data into the neural network model for one iteration (Iteration) training.
  • the weights ⁇ t in each layer in the network model are updated, that is, the model is optimized once.
  • one iteration in this embodiment refers to the process of inputting a batch of sample data into the neural network model to complete the training of the batch of training sample data.
  • the first computing unit is used for the current weight when the neural network model is trained at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data and the prediction.
  • the designed gradient algorithm calculates the gradient of the loss function at the current number of training steps.
  • the training sample data sampled during training at the current number of training steps t The target value corresponding to the training sample data
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step t-1 of the current training step The first second-order moment estimation of the lth layer in the neural network model when training at the last training step.
  • the first modified learning rate when training at the current number of training steps and the current weight of the lth layer in the neural network model when the current training steps are trained
  • the first parameter ⁇ 1 , the second parameter ⁇ 2 , and the third parameter ⁇ 3 are preset values, and the specific value of the first parameter ⁇ 1 is preferably: 0.9 ⁇ 1 ⁇ 1, the second parameter ⁇ 1 is preferably The specific value of the parameter ⁇ 2 is preferably: 0.99 ⁇ 2 ⁇ 1, and the specific value of the third parameter is preferably: 0 ⁇ 3 ⁇ 0.1.
  • first-order moment estimation is the first-order moment estimate calculated during the last iterative training of the neural network model. For example, if the current training step t is the fifth iterative training of the neural network model, the first-order moment estimate is the first-order moment estimate calculated after the fourth iteration of training the neural network model.
  • first second moment estimation is the first-order moment estimate calculated during the last iterative training of the neural network model. For example, if the current number of training steps t is the fifth iterative training of the neural network model, the first and second-order moment estimates is the second-order moment estimate calculated after the fourth iteration of training the neural network model.
  • first correction learning rate is the learning rate calculated during the last iterative training of the neural network model. For example, if the current number of training steps t is the fifth iterative training of the neural network model, the first revised learning rate is the learning rate calculated after the fourth iteration of training the neural network model.
  • the training sample data refers to the training sample data of a batch size (Batch Size) sampled from the training sample data set during the iterative training process with the number of training steps t. For example, if the batch size is 256, then the training sample data Specifically, 256 training sample data sampled from the training sample set.
  • Batch Size a batch size sampled from the training sample data set during the iterative training process with the number of training steps t. For example, if the batch size is 256, then the training sample data Specifically, 256 training sample data sampled from the training sample set.
  • the target value for the training sample data The corresponding sample labels, the number of target values is the same as the batch size.
  • the gradient algorithm is:
  • the second computing unit is used for estimating the first moment of the first layer of the 1st layer in the neural network model when training is performed according to the preset first parameter ⁇ 1 and the number of training steps preceding the current number of training steps the gradient
  • the preset first-order moment estimation calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimate when training is performed at the current training step number
  • the first-order moment estimation formula is as follows:
  • ⁇ 1 is the first parameter.
  • the first-order moment estimation correction formula is specifically: for the modified first moment estimate, is the power of t of the first parameter ⁇ 1 , and t is the number of the current training steps.
  • the second first-order moment estimate during training at the current number of training steps can be calculated according to the obtained values and the first-order moment estimation formula After that, it can be estimated from the calculated second first moment Calculate the modified first-order moment estimate when training at the current training step with the first-order moment estimate correction formula Among them, the modified first-order moment estimate is an estimate of the second first-order moment The first moment estimate obtained after making the correction.
  • the third computing unit is used for estimating the first second moment of the lth layer in the neural network model according to the preset second parameter ⁇ 2 and during the training of the last number of training steps the gradient
  • the preset second-moment estimation calculation formula and the preset second-order-moment estimation correction formula calculate the revised second-order moment estimation when training is performed at the current training steps
  • ⁇ 2 is the second parameter.
  • the second-order moment estimation correction formula is specifically: for the modified second moment estimate, is the power of t of the second parameter ⁇ 2 , and t is the current number of training steps.
  • the second second-order moment estimate during training at the current number of training steps can be calculated according to the obtained values and the second-moment estimation formula. After that, it can be estimated from the calculated second second moment Calculate the revised second-order moment estimate when training at the current training step with the second-order moment estimation correction formula Among them, the modified second-order moment estimate is an estimate of the second second moment The second-order moment estimate obtained after making the correction.
  • a fourth computing unit for estimating the first-order moment according to the modified The modified second moment estimate and the preset coefficient correction calculation formula to calculate the correction coefficient when training with the current number of training steps
  • the coefficient correction calculation formula is as follows: is the correction coefficient, and ⁇ is a preset constant.
  • the constant is preferably a very small value, such as 10 ⁇ 10 .
  • the division by zero operation can be avoided by adding a very small constant.
  • a fifth calculation unit used for according to the current weight the correction factor
  • the preset third parameter ⁇ 3 the first modified learning rate when training at the current number of training steps and the preset learning rate correction calculation formula to calculate the second correction learning rate when training is performed at the next training step of the current training step.
  • the learning rate correction calculation formula is as follows:
  • ⁇ 3 is the third parameter, is the first corrected learning rate when training at the current number of training steps.
  • the learning rate is calculated in an adaptive manner for each layer in the neural network, so that the calculated learning rate is more accurate, thereby promoting the range of weight update, and improving the speed and accuracy of training and the speed of convergence.
  • the sixth calculation unit is used for according to the current weight
  • the second modified learning rate the correction factor the third parameter and the preset weight calculation formula to calculate the updated weight of the lth layer
  • the weight calculation formula is as follows: said is the second weight.
  • a training sample set is obtained, which includes a plurality of training sample data; the training sample data in the training sample set is input into a preset neural network model for iterative training until the neural network model is Convergence, wherein, after the neural network model is trained for each number of training steps, the weight of the first layer in the neural network model is updated through the following steps, and the current training steps are performed according to the neural network model.
  • the first second-order moment estimation of the 1st layer, the gradient, the preset second-order moment estimation calculation formula, and the preset second-order moment estimation correction formula calculate the revised second-order moment estimation when the current training steps are trained.
  • the learning rate is adaptively updated through the above method, so that the weights of the neural network model are adaptively updated accordingly, and other hyperparameters do not need to be adjusted during the training of the neural network model. , reduce the difficulty of training parameters, save time and cost, and improve training efficiency.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the computer equipment 2 may be a rack-type server, a blade-type server, a tower-type server or a cabinet-type server (including an independent server, or a server cluster composed of multiple servers) and the like.
  • the computer device 2 at least includes, but is not limited to, a memory 21 , a processor 22 , and a network interface 23 that can communicate with each other through a system bus. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, and the like.
  • the memory 21 may be an internal storage unit of the computer device 2 , such as a hard disk or a memory of the computer device 2 .
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the computer device 2 , SD) card, flash memory card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used for storing the operating device installed in the computer device 2 and various application software, such as the program code of the neural network model training device 300 and the like.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is typically used to control the overall operation of the computer device 2 .
  • the processor 22 is configured to run the program code or process data stored in the memory 21, for example, run the neural network model training apparatus 300, so as to implement the multi-dimensional data aggregation methods in the foregoing embodiments.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer equipment 2 and other electronic devices.
  • the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network may be an intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network Wireless or wired network such as network, Bluetooth (Bluetooth), Wi-Fi, etc.
  • FIG. 4 only shows the computer device 2 having components 21-23, but it should be understood that it is not required to implement all of the shown components and that more or less components may be implemented instead.
  • the neural network model training apparatus 300 stored in the memory 21 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and are composed of a or multiple processors (the processor 22 in this embodiment) are executed to complete the multi-dimensional data aggregation method of the present application.
  • This embodiment also provides a computer-readable storage medium, which may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory) etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, A magnetic disk, an optical disk, a server, an App application mall, etc., on which computer-readable instructions are stored, and when the program is executed by the processor, a corresponding function is realized.
  • the computer-readable storage medium of this embodiment is used to store the neural network model training apparatus 300, so as to implement the following steps when executed by the processor:
  • the training sample set includes a plurality of training sample data
  • the loss function is calculated according to the current weight of the neural network model during training at the current number of training steps, the training sample data sampled during training at the current number of training steps, the target value corresponding to the training sample data, and a preset gradient algorithm The gradient at the current number of training steps;
  • the first first-order moment estimation of the lth layer in the neural network model when training is performed at the previous training step of the current training step the gradient, the preset first-order moment estimation
  • the calculation formula and the preset first-order moment estimation correction formula calculate the revised first-order moment estimation when the current training steps are trained;
  • the first second-order moment estimate of the 1 th layer in the neural network model when training is performed at the last training step calculates the corrected second-order moment estimation when training is performed at the current number of training steps;
  • the modified second-order moment estimation and the preset coefficient correction calculation formula calculate the correction coefficient when the current training steps are trained
  • the correction coefficient, the preset third parameter, the first modified learning rate when training at the current number of training steps, and the preset learning rate correction calculation formula The second modified learning rate when training with the number of training steps;
  • the updated weight of the first layer is calculated according to the current weight, the second corrected learning rate, the correction coefficient, the third parameter and a preset weight calculation formula.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络模型训练方法,在对神经网络模型进行每一训练步数的训练后,根据当前步数的当前权重、样本数据、目标值及梯度算法计算梯度;根据第一参数、在上一步数的一阶矩估计、梯度、一阶矩估计公式及一阶矩估计修正公式计算修正一阶矩估计;根据第二参数、在上一步数的二阶矩估计、梯度、二矩估计公式、二阶矩估计修正公式计算修正二阶矩估计;根据修正一阶矩估计、修正二阶矩估计及系数修正公式计算修正系数;根据当前权重、修正系数、第三参数、当前学习率及学习率修正公式计算在下一步数的修正学习率;根据当前权重、修正学习率、修正系数、第三参数及权重公式计算得到模型更新后的权重。可以提高模型训练效率。

Description

神经网络模型训练方法、装置、计算机设备及存储介质
本申请要求于2020年11月5日提交中国专利局、申请号为202011225964.8,发明名称为“神经网络模型训练方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能领域,尤其涉及一种神经网络模型训练方法、装置、计算机设备及存储介质。
背景技术
目前基于深度学习的文本分类模型是将词向量输入训练好的神经网络模型后得到词向量的分类结果。为了使分类结果更为贴合实际,需要在文本分类前对神经网络模型进行训练。随着神经网络的发展,尤其是深度学习技术的发展,神经元数量可能超过数千万,这种情况下梯度下降法的效率将远高于解析标准方程的逆矩阵,使得梯度下降法成为神经网络训练的主要方法。现代深度学习中,神经网络的训练周期通常在数小时到数天不等,如何提高梯度下降法的训练效率,以及在复杂大量的场景中可以稳定梯度下降,一直是深度学习研究人员的研究方向。
目前一些优化算法在实践取得了一些成绩,例如SGD,RMSProp,AdaDelta,Adam等梯度优化器,在不同领域都有应用。但是,发明人发现,随着训练数据量的增加和计算资源的增加,在大规模深度训练时,有时出现训练过程不收敛,无法得到预期结果的问题,使得训练过程变得愈加困难。
发明内容
有鉴于此,本申请实施例的目的是提供一种神经网络模型训练方法、装置、计算机设备及计算机可读存储介质,以解决现有技术中在对神经网络模型进行训练时,训练效果不好,且训练效率较低的问题。
为实现上述目的,本申请实施例提供了一种神经网络模型训练方法,包括:
获取训练样本集,所述训练样本集中包括多个训练样本数据;
将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重;
根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失 函数在当前训练步数时的梯度;
根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
可选地,所述梯度算法具体为:
Figure PCTCN2021097319-appb-000001
其中,
Figure PCTCN2021097319-appb-000002
为所述损失函数在当前训练步数时的梯度,
Figure PCTCN2021097319-appb-000003
为所述当前权重,
Figure PCTCN2021097319-appb-000004
为所述训练样本数据,
Figure PCTCN2021097319-appb-000005
为所述目标值,
Figure PCTCN2021097319-appb-000006
为所述损失函数。
可选地,所述根据所述第一参数、所述第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计包括:
根据所述第一参数、所述第一一阶矩估计、所述梯度及预设的一阶矩估计计算公式计算在当前训练步数进行训练时的第二一阶矩估计,其中,所述一阶矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000007
Figure PCTCN2021097319-appb-000008
为所述第二一阶矩估计,
Figure PCTCN2021097319-appb-000009
为所述第一一阶矩估计,β 1为所述第一参数;
根据所述第二一阶矩估计及所述一阶矩估计修正公式计算所述修正一阶矩估计,其中,所述一阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000010
Figure PCTCN2021097319-appb-000011
为所述修正一阶矩估计,
Figure PCTCN2021097319-appb-000012
为所述第一参数β 1的t次幂,t为所述当前训练步数。
可选地,所述根据所述第二参数、所述第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计包括:
根据所述第二参数、所述第一二阶矩估计、所述梯度、预设的二矩估计计算公式计算在当前训练步数进行训练时的第二二阶矩估计,其中,所述二矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000013
Figure PCTCN2021097319-appb-000014
为所述第二二阶矩估计,
Figure PCTCN2021097319-appb-000015
为所述第一二阶矩估计,β 2为所述第二参数;
根据所述第二二阶矩估计及二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计,其中,所述二阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000016
Figure PCTCN2021097319-appb-000017
为所述修正二阶矩估计,
Figure PCTCN2021097319-appb-000018
为所述第二参数β 2的t次幂,t为所述当前训练步数。
可选地,所述系数修正计算公式具体为:
Figure PCTCN2021097319-appb-000019
Figure PCTCN2021097319-appb-000020
为所述修正系数,ε为预设的常数。
可选地,所述学习率修正计算公式具体为:
Figure PCTCN2021097319-appb-000021
Figure PCTCN2021097319-appb-000022
为所述第二修正学习率,β 3为所述第三参数,
Figure PCTCN2021097319-appb-000023
为在当前训练步数进行训练时的第一修正学习率。
可选地,所述权重计算公式具体为:
Figure PCTCN2021097319-appb-000024
所述
Figure PCTCN2021097319-appb-000025
为所述第二权重。
为实现上述目的,本申请实施例还提供了神经网络模型训练装置,包括:
获取模块,用于获取训练样本集,所述训练样本集中包括多个训练样本数据;
训练模块,用于将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重;
根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
为实现上述目的,本申请实施例还提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
获取训练样本集,所述训练样本集中包括多个训练样本数据;
将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重;
根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
获取训练样本集,所述训练样本集中包括多个训练样本数据;
将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重;
根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层 的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
本申请实施例提供的神经网络模型训练方法、装置、计算机设备与计算机可读存储介质,通过获取训练样本集,所述训练样本集中包括多个训练样本数据;将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过自适应方式对学习率进行更新,从而使得神经网络模型的权重相对应的进行自适应更新,进而可以在对神经网络模型的训练过程中无需调整其他超参数,降低训练调参难度,节省时间成本,提高训练效率。
附图说明
图1为本申请神经网络模型训练方法实施例一的流程图。
图2为本申请在对所述神经网络模型进行每一训练步数的训练后,对所述神经网络模型中第l层的权重进行更新的步骤细化流程示意图。
图3为本申请神经网络模型训练装置实施例二的程序模块示意图。
图4为本申请计算机设备实施例三的硬件结构示意图。
具体实施方式
以下结合附图与具体实施例进一步阐述本申请的优点。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
参阅图1,示出了本申请实施例一之神经网络模型训练方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。
步骤S10,获取训练样本集,所述训练样本集中包括多个训练样本数据。
具体地,所述训练样本集可以为文本集,图像集或者语音集等,在本实施例中,该训练样本集以文本集为例进行说明。文本集中包含有多个文本数据,每一个文本数据携带有文本标签,该文本标签用于表示该文本所属的类别。
步骤S11,将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过步骤S20-S26来更新所述神经网络模型中第l层的权重,其中,该第l层指代为神经网络模型中的每一层,即该神经网络模型中的每一层的权重都可以通过该步骤S20-S26来进行更新。
具体地,收敛指的是在神经网络训练的过程中,若loss值一直来回波动或者一直居高不下,无法进入到容忍范围内,则表示网络不收敛;若loss值最小,也即训练结果与真实结果更为接近,得到最优解,则表示网络收敛。
现有技术在对神经网络模型进行训练时,采用梯度下降法对神经网名模型的当前权重θ t进行更新,在对当前权重θ t进行更新时,通常采用全局学习率的方式进行更新,具体算法为:
Figure PCTCN2021097319-appb-000026
其中,t表示当前的训练步数,θ t表示训练步数为t时的当前权重,η表示学习率,该学习率是一个固定值,
Figure PCTCN2021097319-appb-000027
表示损失函数L(θ t)的梯度,θ t+1表示训练步数为t+1时的权重,
Figure PCTCN2021097319-appb-000028
表示求导。现有的方法中根据梯度下降方向不断迭代,优化权重参数,以减少损失函数的值。
需要说明的是,在进行神经网络训练过程中,学习率作为一个超参数控制了权重更新的幅度,以及训练的速度和精度。学习率太大,容易导致目标(代价)函数波动较大从而难以找到最优,而弱学习率设置太小,则会导致收敛过慢耗时太长。
本实施例在将训练样本数据集中的训练样本数据输入至神经网络模型中进行迭代训练时,不是全局学习率的方式对模型的权重进行更新,而是在对进行每一训练步数的训练后,通过步骤S20-S26更新所述神经网络模型中第l层的权重。
其中,训练步数指的是将一个批尺寸(Batch Size)的训练样本数据输入至神经网络模型中进行一次迭代(Iteration)训练的过程,在完成对神经网络模型的一次迭代训练,会对神经网络模型中的各个层中的权重θ t进行更新,即对模型进行一次优化。
需要说明的是,本实施例中的一次迭代指的是将一个批尺寸的样本数据输入至神经网络模型中完成该批次的训练样本数据训练的过程。
示例性的,参阅图2,所述在对所述神经网络模型进行每一训练步数的训练后,对所述神经网络模型中第l层的权重进行更新包括:
步骤S20,根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度。
本实施例中,在对所述神经网络模型中第l层的权重进行更新之前,可以先获取预设的第一参数β 1、第二参数β 2、第三参数β 3、损失函数L(θ)及当前的训练步数t、在当前训练步数t进行训练时采样的训练样本数据
Figure PCTCN2021097319-appb-000029
所述训练样本数据相对应的目标值
Figure PCTCN2021097319-appb-000030
在当前训练步数的上一个训练步数t-1进行训练时所述神经网络模型中第l层的第一一阶矩估计
Figure PCTCN2021097319-appb-000031
在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计
Figure PCTCN2021097319-appb-000032
在当前训练步数进行训练时的第一修正学习率
Figure PCTCN2021097319-appb-000033
及在当前训练步数进行训练时所述神经网络模型中第l层的当前权重
Figure PCTCN2021097319-appb-000034
其中,所述第一参数β 1、第二参数β 2、第三参数β 3为预先设定的值,该第一参数β 1的具体值优选为:0.9≤β 1<1,该第二参数β 2的具体值优选为:0.99≤β 2<1,该第三参数的具体值优选为:0<β 3≤0.1。
当前的训练步数t指的是对当前已完成对神经网络模型进行迭代训练的次数,也就是说,当前的训练步数t具体值根据当前完成对神经网络模型进行迭代训练的次数进行确定,比如,当前已迭代训练5次,则该当前的训练步数t=5。
第一一阶矩估计
Figure PCTCN2021097319-appb-000035
为在上一次对神经网络模型进行迭代训练时所计算得到的一阶矩估计,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一一阶矩估计
Figure PCTCN2021097319-appb-000036
则为在对对神经网络模型进行第4次迭代训练后所计算得到的一阶矩估计。
第一二阶矩估计
Figure PCTCN2021097319-appb-000037
为在上一次对神经网络模型进行迭代训练时所计算得到的一阶矩估计,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一二阶矩估计
Figure PCTCN2021097319-appb-000038
则为在对对神经网络模型进行第4次迭代训练后所计算得到的二阶矩估计。
第一修正学习率
Figure PCTCN2021097319-appb-000039
为在上一次对神经网络模型进行迭代训练时所计算得到的学习率,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一修正学习率
Figure PCTCN2021097319-appb-000040
则为在对对神经网络模型进行第4次迭代训练后所计算得到的学习率。
所述训练样本数据
Figure PCTCN2021097319-appb-000041
指的是在进行训练步数为t的迭代训练过程时,从训练样本数据集中所采样的一个批尺寸(Batch Size)的训练样本数据,比如,该批尺寸为256,则该训练样本数据
Figure PCTCN2021097319-appb-000042
具体为从训练样本集中采样的256个训练样本数据。
所述目标值
Figure PCTCN2021097319-appb-000043
为该训练样本数据
Figure PCTCN2021097319-appb-000044
对应的样本标签,该目标值的数量与批尺寸相同。
在一示例性的实施方式中,在开始对神经网络模型进行迭代训练之前,可以对训练步数t、一阶矩估计m 0、二阶矩估计v 0以及神经网络的权重θ 0进行初始化,具体而言,可以初始化步数t=0,初始化一阶矩估计m 0=0,初始化二阶矩估计v 0=0,初始化神经网络的权重θ 0∈R d,其中,R d为所述训练样本集。
在一示例性的实施方式中,梯度算法为:
Figure PCTCN2021097319-appb-000045
其中,
Figure PCTCN2021097319-appb-000046
为所述损失函数在当前训练步数时的梯度,
Figure PCTCN2021097319-appb-000047
为所述当前权重,
Figure PCTCN2021097319-appb-000048
为所述训练样本数据,
Figure PCTCN2021097319-appb-000049
为所述目标值,
Figure PCTCN2021097319-appb-000050
为所述损失函数。
步骤S21,根据预设的第一参数β 1、在当前训练步数的上一个训练步数进行训练时所述 神经网络模型中第l层的第一一阶矩估计
Figure PCTCN2021097319-appb-000051
所述梯度
Figure PCTCN2021097319-appb-000052
预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计
Figure PCTCN2021097319-appb-000053
具体地,所述一阶矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000054
Figure PCTCN2021097319-appb-000055
为所述第二一阶矩估计,
Figure PCTCN2021097319-appb-000056
为所述第一一阶矩估计,β 1为所述第一参数。
所述一阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000057
Figure PCTCN2021097319-appb-000058
为所述修正一阶矩估计,
Figure PCTCN2021097319-appb-000059
为所述第一参数β 1的t次幂,t为所述当前训练步数。
本实施例中,在获取到所述第一参数β 1、所述第一一阶矩估计
Figure PCTCN2021097319-appb-000060
所述梯度
Figure PCTCN2021097319-appb-000061
之后,可以根据这些获取到的值与该一阶矩估计计算公式先计算出在当前训练步数进行训练时的第二一阶矩估计
Figure PCTCN2021097319-appb-000062
之后,可以根据计算出的第二一阶矩估计
Figure PCTCN2021097319-appb-000063
与一阶矩估计修正公式计算出在当前训练步数进行训练时的修正一阶矩估计
Figure PCTCN2021097319-appb-000064
其中,修正一阶矩估计
Figure PCTCN2021097319-appb-000065
是对第二一阶矩估计
Figure PCTCN2021097319-appb-000066
进行修正后所得到的一阶矩估计。
步骤S22,根据预设的第二参数β 2、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计
Figure PCTCN2021097319-appb-000067
所述梯度
Figure PCTCN2021097319-appb-000068
预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计
Figure PCTCN2021097319-appb-000069
具体地,所述二矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000070
Figure PCTCN2021097319-appb-000071
为所述第二二阶矩估计,
Figure PCTCN2021097319-appb-000072
为所述第一二阶矩估计,β 2为所述第二参数。
所述二阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000073
Figure PCTCN2021097319-appb-000074
为所述修正二阶矩估计,
Figure PCTCN2021097319-appb-000075
为所述第二参数β 2的t次幂,t为所述当前训练步数。
本实施例中,在获取到所述第二参数β 2、所述第一二阶矩估计
Figure PCTCN2021097319-appb-000076
所述梯度
Figure PCTCN2021097319-appb-000077
之后,可以根据这些获取到的值与该二矩估计计算公式先计算出在当前训练步数进行训练时的第二二阶矩估计
Figure PCTCN2021097319-appb-000078
之后,可以根据计算出的第二二阶矩估计
Figure PCTCN2021097319-appb-000079
与该二阶矩估计修正公式计算出在当前训练步数进行训练时的修正二阶矩估计
Figure PCTCN2021097319-appb-000080
其中,修正二阶矩估计
Figure PCTCN2021097319-appb-000081
是对第二二阶矩估计
Figure PCTCN2021097319-appb-000082
进行修正后所得到的二阶矩估计。
步骤S23,根据所述修正一阶矩估计
Figure PCTCN2021097319-appb-000083
所述修正二阶矩估计
Figure PCTCN2021097319-appb-000084
及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数
Figure PCTCN2021097319-appb-000085
具体地,系数修正计算公式具体为:
Figure PCTCN2021097319-appb-000086
Figure PCTCN2021097319-appb-000087
为所述修正系数,ε为预设的常数。所述常数优选为一个非常小的值,比如为10 -10
本实施例中,通过增加一个非常小的常数,可以避免除零操作。
步骤S24,根据所述当前权重
Figure PCTCN2021097319-appb-000088
所述修正系数
Figure PCTCN2021097319-appb-000089
预设的第三参数β 3、在当前训练步数进行训练时的第一修正学习率
Figure PCTCN2021097319-appb-000090
及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率
Figure PCTCN2021097319-appb-000091
具体地,所述学习率修正计算公式具体为:
Figure PCTCN2021097319-appb-000092
Figure PCTCN2021097319-appb-000093
为所述第二修正学习率,β 3为所述第三参数,
Figure PCTCN2021097319-appb-000094
为在当前训练步数进行训练时的第一修正学习率。
本实施例通过对神经网络中的每一层采用自适应的方式计算学习率,使得计算得到的学习率更为准确,从而可以促进权重更新的幅度,提高了训练的速度和精度以及收敛速度。
步骤S25,根据所述当前权重
Figure PCTCN2021097319-appb-000095
所述第二修正学习率
Figure PCTCN2021097319-appb-000096
所述修正系数
Figure PCTCN2021097319-appb-000097
所述第三参数
Figure PCTCN2021097319-appb-000098
及预设的权重计算公式计算得到更新后的第l层的权重
Figure PCTCN2021097319-appb-000099
具体地,所述权重计算公式具体为:
Figure PCTCN2021097319-appb-000100
所述
Figure PCTCN2021097319-appb-000101
为更新后的第l层的权重。
本实施例通过获取训练样本集,所述训练样本集中包括多个训练样本数据;将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重,根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。本实施通过在训练过程中,通过上述方式对学习率进行自适应更新,从而使得神经网络模型的权重相对应的进行自适应更新,进而可以在对神经网络模型的训练过程中无需调整其他超参数,降低训练调参难度,节省时间成本,提高训练效率。
请继续参阅图3,示出了本申请一实施例中的神经网络模型训练装置300的程序模块示意图。神经网络模型训练装置300可以应用于计算机设备中,所述计算机设备可以是移动电话、平板个人计算机(tablet personal computer)、膝上型计算机(laptop computer)、服务器等具有数据传输功能的设备。在本实施例中,神经网络模型训练装置300可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述多维度数据聚合方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令的指令段,比程序本身更适合 于描述多维度数据聚合方法在存储介质中的执行过程。在一示例性的实施方式中,神经网络模型训练装置300包括获取模块301、训练模块302。以下描述将具体介绍本实施例各程序模块的功能:
获取模块301,用于获取训练样本集,所述训练样本集中包括多个训练样本数据。
具体地,所述训练样本集可以为文本集,图像集或者语音集等,在本实施例中,该训练样本集以文本集为例进行说明。文本集中包含有多个文本数据,每一个文本数据携带有文本标签,该文本标签用于表示该文本所属的类别。
训练模块302,用于将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,训练模块302通过第一计算单元、第二计算单元、第三计算单元、第四计算单元、第五计算单元及第六计算单元来更新所述神经网络模型中第l层的权重,其中,该第l层指代为神经网络模型中的每一层,即该神经网络模型中的每一层的权重都可以通过该第一计算单元、第二计算单元、第三计算单元、第四计算单元、第五计算单元及第六计算单元来进行更新。
具体地,收敛指的是在神经网络训练的过程中,若loss值一直来回波动或者一直居高不下,无法进入到容忍范围内,则表示网络不收敛;若loss值最小,也即训练结果与真实结果更为接近,得到最优解,则表示网络收敛。
现有技术在对神经网络模型进行训练时,采用梯度下降法对神经网名模型的当前权重θ t进行更新,在对权重当前θ t进行更新时,通常采用全局学习率的方式进行更新,具体算法为:
Figure PCTCN2021097319-appb-000102
其中,t表示当前的训练步数,θ t表示训练步数为t时的当前权重,η表示学习率,该学习率是一个固定值,
Figure PCTCN2021097319-appb-000103
表示损失函数L(θ t)的梯度,θ t+1表示训练步数为t+1时的权重。现有的方法中根据梯度下降方向不断迭代,优化权重参数,以减少损失函数的值。
需要说明的是,在进行神经网络训练过程中,学习率作为一个超参数控制了权重更新的幅度,以及训练的速度和精度。学习率太大,容易导致目标(代价)函数波动较大从而难以找到最优,而弱学习率设置太小,则会导致收敛过慢耗时太长。
本实施例在将训练样本数据集中的训练样本数据输入至神经网络模型中进行迭代训练时,不是全局学习率的方式对模型的权重进行更新,而是在对进行每一训练步数的训练后,训练模块302通过第一计算单元、第二计算单元、第三计算单元、第四计算单元、第五计算单元及第六计算单元来实现对所述神经网络模型中第l层的权重的更新。
其中,训练步数指的是将一个批尺寸(Batch Size)的训练样本数据输入至神经网络模型中进行一次迭代(Iteration)训练的过程,在完成对神经网络模型的一次迭代训练,会对神经网络模型中的各个层中的权重θ t进行更新,即对模型进行一次优化。
需要说明的是,本实施例中的一次迭代指的是将一个批尺寸的样本数据输入至神经网 络模型中完成该批次的训练样本数据训练的过程。
第一计算单元,用于根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度。
本实施例中,在对所述神经网络模型中第l层的权重进行更新之前,可以先获取预设的第一参数β 1、第二参数β 2、第三参数β 3、损失函数L(θ)及当前的训练步数t、在当前训练步数t进行训练时采样的训练样本数据
Figure PCTCN2021097319-appb-000104
所述训练样本数据相对应的目标值
Figure PCTCN2021097319-appb-000105
在当前训练步数的上一个训练步数t-1进行训练时所述神经网络模型中第l层的第一一阶矩估计
Figure PCTCN2021097319-appb-000106
在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计
Figure PCTCN2021097319-appb-000107
在当前训练步数进行训练时的第一修正学习率
Figure PCTCN2021097319-appb-000108
及在当前训练步数进行训练时所述神经网络模型中第l层的当前权重
Figure PCTCN2021097319-appb-000109
其中,所述第一参数β 1、第二参数β 2、第三参数β 3为预先设定的值,该第一参数β 1的具体值优选为:0.9≤β 1<1,该第二参数β 2的具体值优选为:0.99≤β 2<1,该第三参数的具体值优选为:0<β 3≤0.1。
当前的训练步数t指的是对当前已完成对神经网络模型进行迭代训练的次数,也就是说,当前的训练步数t具体值根据当前完成对神经网络模型进行迭代训练的次数进行确定,比如,当前已迭代训练5次,则该当前的训练步数t=5。
第一一阶矩估计
Figure PCTCN2021097319-appb-000110
为在上一次对神经网络模型进行迭代训练时所计算得到的一阶矩估计,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一一阶矩估计
Figure PCTCN2021097319-appb-000111
则为在对对神经网络模型进行第4次迭代训练后所计算得到的一阶矩估计。
第一二阶矩估计
Figure PCTCN2021097319-appb-000112
为在上一次对神经网络模型进行迭代训练时所计算得到的一阶矩估计,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一二阶矩估计
Figure PCTCN2021097319-appb-000113
则为在对对神经网络模型进行第4次迭代训练后所计算得到的二阶矩估计。
第一修正学习率
Figure PCTCN2021097319-appb-000114
为在上一次对神经网络模型进行迭代训练时所计算得到的学习率,比如,当前的训练步数t为对神经网络模型进行第5次迭代训练,则该第一修正学习率
Figure PCTCN2021097319-appb-000115
则为在对对神经网络模型进行第4次迭代训练后所计算得到的学习率。
所述训练样本数据
Figure PCTCN2021097319-appb-000116
指的是在进行训练步数为t的迭代训练过程时,从训练样本数据集中所采样的一个批尺寸(Batch Size)的训练样本数据,比如,该批尺寸为256,则该训练样本数据
Figure PCTCN2021097319-appb-000117
具体为从训练样本集中采样的256个训练样本数据。
所述目标值
Figure PCTCN2021097319-appb-000118
为该训练样本数据
Figure PCTCN2021097319-appb-000119
对应的样本标签,该目标值的数量与批尺寸相同。
在一示例性的实施方式中,在开始对神经网络模型进行迭代训练之前,可以对训练步数t、一阶矩估计m 0、二阶矩估计v 0以及神经网络的权重θ 0进行初始化,具体而言,可以初始化步数t=0,初始化一阶矩估计m 0=0,初始化二阶矩估计v 0=0,初始化神经网络的权重θ 0∈R d,其中,R d为所述训练样本集。
在一示例性的实施方式中,梯度算法为:
Figure PCTCN2021097319-appb-000120
其中,
Figure PCTCN2021097319-appb-000121
为所述损失函数在当前训练步数时的梯度,
Figure PCTCN2021097319-appb-000122
为所述当前权重,
Figure PCTCN2021097319-appb-000123
为所述训练样本数据,
Figure PCTCN2021097319-appb-000124
为所述目标值,
Figure PCTCN2021097319-appb-000125
为所述损失函数。
第二计算单元,用于根据预设的第一参数β 1、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计
Figure PCTCN2021097319-appb-000126
所述梯度
Figure PCTCN2021097319-appb-000127
预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计
Figure PCTCN2021097319-appb-000128
具体地,所述一阶矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000129
Figure PCTCN2021097319-appb-000130
为所述第二一阶矩估计,
Figure PCTCN2021097319-appb-000131
为所述第一一阶矩估计,β 1为所述第一参数。
所述一阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000132
Figure PCTCN2021097319-appb-000133
为所述修正一阶矩估计,
Figure PCTCN2021097319-appb-000134
为所述第一参数β 1的t次幂,t为所述当前训练步数。
本实施例中,在获取到所述第一参数β 1、所述第一一阶矩估计
Figure PCTCN2021097319-appb-000135
所述梯度
Figure PCTCN2021097319-appb-000136
之后,可以根据这些获取到的值与该一阶矩估计计算公式先计算出在当前训练步数进行训练时的第二一阶矩估计
Figure PCTCN2021097319-appb-000137
之后,可以根据计算出的第二一阶矩估计
Figure PCTCN2021097319-appb-000138
与一阶矩估计修正公式计算出在当前训练步数进行训练时的修正一阶矩估计
Figure PCTCN2021097319-appb-000139
其中,修正一阶矩估计
Figure PCTCN2021097319-appb-000140
是对第二一阶矩估计
Figure PCTCN2021097319-appb-000141
进行修正后所得到的一阶矩估计。
第三计算单元,用于根据预设的第二参数β 2、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计
Figure PCTCN2021097319-appb-000142
所述梯度
Figure PCTCN2021097319-appb-000143
预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计
Figure PCTCN2021097319-appb-000144
具体地,所述二矩估计计算公式具体为:
Figure PCTCN2021097319-appb-000145
Figure PCTCN2021097319-appb-000146
为所述第二二阶矩估计,
Figure PCTCN2021097319-appb-000147
为所述第一二阶矩估计,β 2为所述第二参数。
所述二阶矩估计修正公式具体为:
Figure PCTCN2021097319-appb-000148
Figure PCTCN2021097319-appb-000149
为所述修正二阶矩估计,
Figure PCTCN2021097319-appb-000150
为所述第二参数β 2的t次幂,t为所述当前训练步数。
本实施例中,在获取到所述第二参数β 2、所述第一二阶矩估计
Figure PCTCN2021097319-appb-000151
所述梯度
Figure PCTCN2021097319-appb-000152
之后,可以根据这些获取到的值与该二矩估计计算公式先计算出在当前训练步数进行训练时的第二二阶矩估计
Figure PCTCN2021097319-appb-000153
之后,可以根据计算出的第二二阶矩估计
Figure PCTCN2021097319-appb-000154
与该二阶矩估计修正公式计算出在当前训练步数进行训练时的修正二阶矩估计
Figure PCTCN2021097319-appb-000155
其中,修正二阶矩估计
Figure PCTCN2021097319-appb-000156
是对第二二阶矩估计
Figure PCTCN2021097319-appb-000157
进行修正后所得到的二阶矩估计。
第四计算单元,用于根据所述修正一阶矩估计
Figure PCTCN2021097319-appb-000158
所述修正二阶矩估计
Figure PCTCN2021097319-appb-000159
及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数
Figure PCTCN2021097319-appb-000160
具体地,系数修正计算公式具体为:
Figure PCTCN2021097319-appb-000161
Figure PCTCN2021097319-appb-000162
为所述修正系数,ε为预设的常数。所述常数优选为一个非常小的值,比如为10 -10
本实施例中,通过增加一个非常小的常数,可以避免除零操作。
第五计算单元,用于根据所述当前权重
Figure PCTCN2021097319-appb-000163
所述修正系数
Figure PCTCN2021097319-appb-000164
预设的第三参数β 3、在当前训练步数进行训练时的第一修正学习率
Figure PCTCN2021097319-appb-000165
及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率
Figure PCTCN2021097319-appb-000166
具体地,所述学习率修正计算公式具体为:
Figure PCTCN2021097319-appb-000167
Figure PCTCN2021097319-appb-000168
为所述第二修正学习率,β 3为所述第三参数,
Figure PCTCN2021097319-appb-000169
为在当前训练步数进行训练时的第一修正学习率。
本实施例通过对神经网络中的每一层采用自适应的方式计算学习率,使得计算得到的学习率更为准确,从而可以促进权重更新的幅度,提高了训练的速度和精度以及收敛速度。
第六计算单元,用于根据所述当前权重
Figure PCTCN2021097319-appb-000170
所述第二修正学习率
Figure PCTCN2021097319-appb-000171
所述修正系数
Figure PCTCN2021097319-appb-000172
所述第三参数
Figure PCTCN2021097319-appb-000173
及预设的权重计算公式计算得到更新后的第l层的权重
Figure PCTCN2021097319-appb-000174
具体地,所述权重计算公式具体为:
Figure PCTCN2021097319-appb-000175
所述
Figure PCTCN2021097319-appb-000176
为所述第二权重。
本实施例通过获取训练样本集,所述训练样本集中包括多个训练样本数据;将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重,根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。本实施通过在训练过程中,通过上述方式对学习率进行自适应更新,从而使得神经网络模型的权重相对应的进行自适应更新,进而可以在对神经网络模型的训练过程中无需调整其他超参数,降低训练调参难度,节省时间成本,提高训练效率。
参阅图4,是本申请实施例之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设 备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图4所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23。其中:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如所述计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如所述计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作装置和各类应用软件,例如神经网络模型训练装置300的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。所述处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行神经网络模型训练装置300,以实现上述各个实施例中的多维度数据聚合方法。
所述网络接口23可包括无线网络接口或有线网络接口,所述网络接口23通常用于在所述计算机设备2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯装置(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图4仅示出了具有部件21-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的所述神经网络模型训练装置300还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请之多维度数据聚合方法。
本实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦 除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储神经网络模型训练装置300,以被处理器执行时实现以下步骤:
获取训练样本集,所述训练样本集中包括多个训练样本数据;
将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重:
根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种神经网络模型训练方法,包括:
    获取训练样本集,所述训练样本集中包括多个训练样本数据;
    将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重:
    根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
    根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
    根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
    根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
    根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
    根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
  2. 根据权利要求1所述的神经网络模型训练方法,所述梯度算法具体为:
    Figure PCTCN2021097319-appb-100001
    其中,
    Figure PCTCN2021097319-appb-100002
    为所述损失函数在当前训练步数时的梯度,
    Figure PCTCN2021097319-appb-100003
    为所述当前权重,
    Figure PCTCN2021097319-appb-100004
    为所述训练样本数据,
    Figure PCTCN2021097319-appb-100005
    为所述目标值,
    Figure PCTCN2021097319-appb-100006
    为所述损失函数。
  3. 根据权利要求2所述的神经网络模型训练方法,所述根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计包括:
    根据所述第一参数、所述第一一阶矩估计、所述梯度及预设的一阶矩估计计算公式计算在当前训练步数进行训练时的第二一阶矩估计,其中,所述一阶矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100007
    为所述第二一阶矩估计,
    Figure PCTCN2021097319-appb-100008
    为所述第一一阶矩估计,β 1为所述第一参数;
    根据所述第二一阶矩估计及所述一阶矩估计修正公式计算所述修正一阶矩估计,其中,所述一阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100009
    为所述修正一阶矩估计,
    Figure PCTCN2021097319-appb-100010
    为所述第一参数β 1的t次幂,t为所述当前训练步数。
  4. 根据权利要求3所述的神经网络模型训练方法,所述根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计包括:
    根据所述第二参数、所述第一二阶矩估计、所述梯度、预设的二矩估计计算公式计算在当前训练步数进行训练时的第二二阶矩估计,其中,所述二矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100011
    为所述第二二阶矩估计,
    Figure PCTCN2021097319-appb-100012
    为所述第一二阶矩估计,β 2为所述第二参数;
    根据所述第二二阶矩估计及二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计,其中,所述二阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100013
    为所述修正二阶矩估计,
    Figure PCTCN2021097319-appb-100014
    为所述第二参数β 2的t次幂,t为所述当前训练步数。
  5. 根据权利要求4所述的神经网络模型训练方法,所述系数修正计算公式具体为:
    Figure PCTCN2021097319-appb-100015
    为所述修正系数,ε为预设的常数。
  6. 根据权利要求5所述的神经网络模型训练方法,所述学习率修正计算公式具体为:
    Figure PCTCN2021097319-appb-100016
    为所述第二修正学习率,β 3为所述第三参数,
    Figure PCTCN2021097319-appb-100017
    为在当前训练步数进行训练时的第一修正学习率。
  7. 根据权利要求6所述的神经网络模型训练方法,所述权重计算公式具体为:
    Figure PCTCN2021097319-appb-100018
    所述
    Figure PCTCN2021097319-appb-100019
    为更新后的权重。
  8. 一种神经网络模型训练装置,包括:
    获取模块,用于获取训练样本集,所述训练样本集中包括多个训练样本数据;
    训练模块,用于将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重:
    根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
    根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩 估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
    根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
    根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
    根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
    根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
  9. 一种计算机设备,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机可读指令,所述计算机可读指令被所述处理器执行时实现以下步骤:
    获取训练样本集,所述训练样本集中包括多个训练样本数据;
    将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重:
    根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
    根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
    根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
    根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
    根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
    根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
  10. 根据权利要求9所述的计算机设备,所述梯度算法具体为:
    Figure PCTCN2021097319-appb-100020
    其中,
    Figure PCTCN2021097319-appb-100021
    为所述损失函数在当前训练步数时的梯度,
    Figure PCTCN2021097319-appb-100022
    为所述当前权重,
    Figure PCTCN2021097319-appb-100023
    为所述训练样本数据,
    Figure PCTCN2021097319-appb-100024
    为所述目标值,
    Figure PCTCN2021097319-appb-100025
    为所述损失函数。
  11. 根据权利要求10所述的计算机设备,所述根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计包括:
    根据所述第一参数、所述第一一阶矩估计、所述梯度及预设的一阶矩估计计算公式计算在当前训练步数进行训练时的第二一阶矩估计,其中,所述一阶矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100026
    为所述第二一阶矩估计,
    Figure PCTCN2021097319-appb-100027
    为所述第一一阶矩估计,β 1为所述第一参数;
    根据所述第二一阶矩估计及所述一阶矩估计修正公式计算所述修正一阶矩估计,其中,所述一阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100028
    为所述修正一阶矩估计,
    Figure PCTCN2021097319-appb-100029
    为所述第一参数β 1的t次幂,t为所述当前训练步数。
  12. 根据权利要求11所述的计算机设备,所述根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计包括:
    根据所述第二参数、所述第一二阶矩估计、所述梯度、预设的二矩估计计算公式计算在当前训练步数进行训练时的第二二阶矩估计,其中,所述二矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100030
    为所述第二二阶矩估计,
    Figure PCTCN2021097319-appb-100031
    为所述第一二阶矩估计,β 2为所述第二参数;
    根据所述第二二阶矩估计及二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计,其中,所述二阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100032
    为所述修正二阶矩估计,
    Figure PCTCN2021097319-appb-100033
    为所述第二参数β 2的t次幂,t为所述当前训练步数。
  13. 根据权利要求12所述的计算机设备,所述系数修正计算公式具体为:
    Figure PCTCN2021097319-appb-100034
    Figure PCTCN2021097319-appb-100035
    为所述修正系数,ε为预设的常数。
  14. 根据权利要求13所述的计算机设备,所述学习率修正计算公式具体为:
    Figure PCTCN2021097319-appb-100036
    为所述第二修正学习率,β 3为所述第三参数,
    Figure PCTCN2021097319-appb-100037
    为在当前训练步数进行训练时的第一修正学习率。
  15. 一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
    获取训练样本集,所述训练样本集中包括多个训练样本数据;
    将所述训练样本集中的训练样本数据输入至预设的神经网络模型中进行迭代训练,直到所述神经网络模型收敛为止,其中,在对所述神经网络模型进行每一训练步数的训练后,通过以下步骤更新所述神经网络模型中第l层的权重:
    根据所述神经网络模型在当前训练步数进行训练时的当前权重、在当前训练步数进行训练时采样的训练样本数据、所述训练样本数据对应的目标值及预设的梯度算法计算损失函数在当前训练步数时的梯度;
    根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计;
    根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计;
    根据所述修正一阶矩估计、所述修正二阶矩估计及预设的系数修正计算公式计算在当前训练步数进行训练时的修正系数;
    根据所述当前权重、所述修正系数、预设的第三参数、在当前训练步数进行训练时的第一修正学习率及预设的学习率修正计算公式计算在当前训练步数的下一个训练步数进行训练时的第二修正学习率;
    根据所述当前权重、所述第二修正学习率、所述修正系数、所述第三参数及预设的权重计算公式计算得到更新后的第l层的权重。
  16. 根据权利要求15所述的计算机可读存储介质,所述梯度算法具体为:
    Figure PCTCN2021097319-appb-100038
    其中,
    Figure PCTCN2021097319-appb-100039
    为所述损失函数在当前训练步数时的梯度,
    Figure PCTCN2021097319-appb-100040
    为所述当前权重,
    Figure PCTCN2021097319-appb-100041
    为所述训练样本数据,
    Figure PCTCN2021097319-appb-100042
    为所述目标值,
    Figure PCTCN2021097319-appb-100043
    为所述损失函数。
  17. 根据权利要求16所述的计算机可读存储介质,所述根据预设的第一参数、在当前训练步数的上一个训练步数进行训练时所述神经网络模型中第l层的第一一阶矩估计、所述梯度、预设的一阶矩估计计算公式、及预设的一阶矩估计修正公式计算在当前训练步数进行训练时的修正一阶矩估计包括:
    根据所述第一参数、所述第一一阶矩估计、所述梯度及预设的一阶矩估计计算公式计算在当前训练步数进行训练时的第二一阶矩估计,其中,所述一阶矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100044
    为所述第二一阶矩估计,
    Figure PCTCN2021097319-appb-100045
    为所述第一一阶矩估计,β 1为所述第一参数;
    根据所述第二一阶矩估计及所述一阶矩估计修正公式计算所述修正一阶矩估计,其中,所述一阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100046
    为所述修正一阶矩估计,
    Figure PCTCN2021097319-appb-100047
    为所述第一参数β 1的t次幂,t为所述当前训练步数。
  18. 根据权利要求17所述的计算机可读存储介质,所述根据预设的第二参数、在所述上一个训练步数进行训练时所述神经网络模型中第l层的第一二阶矩估计、所述梯度、预设的二矩估计计算公式、及预设的二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计包括:
    根据所述第二参数、所述第一二阶矩估计、所述梯度、预设的二矩估计计算公式计算在当前训练步数进行训练时的第二二阶矩估计,其中,所述二矩估计计算公式具体为:
    Figure PCTCN2021097319-appb-100048
    为所述第二二阶矩估计,
    Figure PCTCN2021097319-appb-100049
    为所述第一二阶矩估计,β 2为所述第二参数;
    根据所述第二二阶矩估计及二阶矩估计修正公式计算在当前训练步数进行训练时的修正二阶矩估计,其中,所述二阶矩估计修正公式具体为:
    Figure PCTCN2021097319-appb-100050
    为所述修正二阶矩估计,
    Figure PCTCN2021097319-appb-100051
    为所述第二参数β 2的t次幂,t为所述当前训练步数。
  19. 根据权利要求18所述的计算机可读存储介质,所述系数修正计算公式具体为:
    Figure PCTCN2021097319-appb-100052
    为所述修正系数,ε为预设的常数。
  20. 根据权利要求19所述的计算机可读存储介质,所述学习率修正计算公式具体为:
    Figure PCTCN2021097319-appb-100053
    为所述第二修正学习率,β 3为所述第三参数,
    Figure PCTCN2021097319-appb-100054
    为在当前训练步数进行训练时的第一修正学习率。
PCT/CN2021/097319 2020-11-05 2021-05-31 神经网络模型训练方法、装置、计算机设备及存储介质 WO2022095432A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011225964.8A CN112183750A (zh) 2020-11-05 2020-11-05 神经网络模型训练方法、装置、计算机设备及存储介质
CN202011225964.8 2020-11-05

Publications (1)

Publication Number Publication Date
WO2022095432A1 true WO2022095432A1 (zh) 2022-05-12

Family

ID=73917852

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097319 WO2022095432A1 (zh) 2020-11-05 2021-05-31 神经网络模型训练方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112183750A (zh)
WO (1) WO2022095432A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936323A (zh) * 2022-06-07 2022-08-23 北京百度网讯科技有限公司 图表示模型的训练方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183750A (zh) * 2020-11-05 2021-01-05 平安科技(深圳)有限公司 神经网络模型训练方法、装置、计算机设备及存储介质
CN112766493B (zh) * 2021-01-19 2023-04-07 北京市商汤科技开发有限公司 多任务神经网络的训练方法、装置、电子设备及存储介质
CN114841341B (zh) * 2022-04-25 2023-04-28 北京百度网讯科技有限公司 图像处理模型训练及图像处理方法、装置、设备和介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909990A (zh) * 2017-03-01 2017-06-30 腾讯科技(深圳)有限公司 一种基于历史数据的预测方法及装置
CN107944386A (zh) * 2017-11-22 2018-04-20 天津大学 基于卷积神经网络的视觉场景识别方法
CN110033081A (zh) * 2019-03-08 2019-07-19 华为技术有限公司 一种确定学习率的方法和装置
CN110942142A (zh) * 2019-11-29 2020-03-31 广州市百果园信息技术有限公司 神经网络的训练及人脸检测方法、装置、设备和存储介质
JP2020077392A (ja) * 2018-10-08 2020-05-21 株式会社ストラドビジョン 適応的学習率でニューラルネットワークを学習する方法及び装置、これを利用したテスト方法及び装置
CN111738408A (zh) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 损失函数的优化方法、装置、设备及存储介质
CN112183750A (zh) * 2020-11-05 2021-01-05 平安科技(深圳)有限公司 神经网络模型训练方法、装置、计算机设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909990A (zh) * 2017-03-01 2017-06-30 腾讯科技(深圳)有限公司 一种基于历史数据的预测方法及装置
CN107944386A (zh) * 2017-11-22 2018-04-20 天津大学 基于卷积神经网络的视觉场景识别方法
JP2020077392A (ja) * 2018-10-08 2020-05-21 株式会社ストラドビジョン 適応的学習率でニューラルネットワークを学習する方法及び装置、これを利用したテスト方法及び装置
CN110033081A (zh) * 2019-03-08 2019-07-19 华为技术有限公司 一种确定学习率的方法和装置
CN110942142A (zh) * 2019-11-29 2020-03-31 广州市百果园信息技术有限公司 神经网络的训练及人脸检测方法、装置、设备和存储介质
CN111738408A (zh) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 损失函数的优化方法、装置、设备及存储介质
CN112183750A (zh) * 2020-11-05 2021-01-05 平安科技(深圳)有限公司 神经网络模型训练方法、装置、计算机设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936323A (zh) * 2022-06-07 2022-08-23 北京百度网讯科技有限公司 图表示模型的训练方法、装置及电子设备
CN114936323B (zh) * 2022-06-07 2023-06-30 北京百度网讯科技有限公司 图表示模型的训练方法、装置及电子设备

Also Published As

Publication number Publication date
CN112183750A (zh) 2021-01-05

Similar Documents

Publication Publication Date Title
WO2022095432A1 (zh) 神经网络模型训练方法、装置、计算机设备及存储介质
US10936949B2 (en) Training machine learning models using task selection policies to increase learning progress
CN111091199B (zh) 一种基于差分隐私的联邦学习方法、装置及存储介质
US9830526B1 (en) Generating image features based on robust feature-learning
US11694109B2 (en) Data processing apparatus for accessing shared memory in processing structured data for modifying a parameter vector data structure
WO2021174935A1 (zh) 对抗生成神经网络的训练方法及系统
CN108733508B (zh) 用于控制数据备份的方法和系统
EP3504666A1 (en) Asychronous training of machine learning model
WO2021089013A1 (zh) 空间图卷积网络的训练方法、电子设备及存储介质
CN110135681B (zh) 风险用户识别方法、装置、可读存储介质及终端设备
WO2021051556A1 (zh) 深度学习权值更新方法、系统、计算机设备及存储介质
WO2022110640A1 (zh) 一种模型优化方法、装置、计算机设备及存储介质
CN117313789A (zh) 使用神经网络的黑盒优化
US20220164666A1 (en) Efficient mixed-precision search for quantizers in artificial neural networks
JP2022063250A (ja) SuperLoss:堅牢なカリキュラム学習のための一般的な損失
CN115101061A (zh) 语音识别模型的训练方法、装置、存储介质及电子设备
CN114581868A (zh) 基于模型通道剪枝的图像分析方法和装置
KR101700030B1 (ko) 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치
US20110264609A1 (en) Probabilistic gradient boosted machines
US20220044109A1 (en) Quantization-aware training of quantized neural networks
CN114758130B (zh) 图像处理及模型训练方法、装置、设备和存储介质
CN110046670B (zh) 特征向量降维方法和装置
US20220398437A1 (en) Depth-Parallel Training of Neural Networks
CN111062477B (zh) 一种数据处理方法、装置及存储介质
CN110782017B (zh) 用于自适应调整学习率的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888135

Country of ref document: EP

Kind code of ref document: A1