CN111738408A - Method, device and equipment for optimizing loss function and storage medium - Google Patents

Method, device and equipment for optimizing loss function and storage medium Download PDF

Info

Publication number
CN111738408A
CN111738408A CN202010405723.5A CN202010405723A CN111738408A CN 111738408 A CN111738408 A CN 111738408A CN 202010405723 A CN202010405723 A CN 202010405723A CN 111738408 A CN111738408 A CN 111738408A
Authority
CN
China
Prior art keywords
weight
slow
machine learning
optimizer
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010405723.5A
Other languages
Chinese (zh)
Inventor
郭跃超
谯轶轩
唐义君
王俊
高鹏
谢国彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010405723.5A priority Critical patent/CN111738408A/en
Priority to PCT/CN2020/118303 priority patent/WO2021139237A1/en
Publication of CN111738408A publication Critical patent/CN111738408A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of operation and maintenance of a pedestal, and discloses a method, a device, equipment and a storage medium for optimizing a loss function, which are used for solving the problem of low convergence accuracy of the loss function. The optimization method of the loss function comprises the following steps: obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.

Description

Method, device and equipment for optimizing loss function and storage medium
Technical Field
The present invention relates to the field of pedestal operation and maintenance, and in particular, to a method, an apparatus, a device, and a storage medium for optimizing a loss function.
Background
With the popularization of a neural network in a computer, deep learning is a feature which enables the neural network to learn how to capture data, and after the data feature is captured, the captured data feature is different from a real data feature, so that a loss function needs to be optimized in time. The optimizer is therefore an important tool for optimizing the loss function in deep learning networks. At present, an optimizer in deep learning is usually a Stochastic Gradient Descent (SGD) method, and when a loss function is optimized by using the SGD, a small batch of data is used to randomly descend a gradient, and an optimal loss function is obtained through continuous iteration and convergence.
However, in the later stage of optimizing the loss function, the SGD is prone to have the situation that the loss function falls into the local minimum, so that the loss function abnormally jitters during convergence, and the optimal situation of convergence cannot be achieved, and further the convergence accuracy and efficiency of the loss function are low.
Disclosure of Invention
The invention mainly aims to solve the problem of low accuracy of loss function convergence.
The first aspect of the present invention provides a method for optimizing a loss function, including: obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
Optionally, in a first implementation manner of the first aspect of the present invention, the training the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to perform iteration by using the first optimizer, and a result obtained by performing iteration by using the first optimizer includes: randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, using a first preset formula Wt+1=WttgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) and performing integrated calculation on k first short-time fast weight values to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer to iterate, and k ∈ {2,3, …, n }.
Optionally, in a second implementation manner of the first aspect of the present invention, the integrating and calculating k values of the first short-time fast weights to obtain first slow weights, where the first slow weights are used to instruct the machine learning task to use a result obtained after the iteration of the first optimizer, and k ∈ {2,3, …, n } includes: obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
Figure BDA0002491228200000021
wherein, t is the current time,
Figure BDA0002491228200000022
is the first slow weight at time t,
Figure BDA0002491228200000023
to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a third implementation manner of the first aspect of the present invention, the training of the machine learning task by using the second optimizer to obtain a second slow weight, where the second slow weight is used to indicate that the machine learning task is iterated by using the second optimizer, and the obtained result includes randomly selecting a sample i, i ∈ {1,2, …, n }, where n is an integer greater than 1, from n training samples of the machine learning task by using the second optimizer, and calculating a second short-time fast weight W 'updated by using a third preset formula't+1The third preset formula is as follows:
Figure BDA0002491228200000024
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,
Figure BDA0002491228200000025
is a correction value of the first order momentum term,
Figure BDA0002491228200000026
the expression of (a) is as follows:
Figure BDA0002491228200000027
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,
Figure BDA0002491228200000028
is mtThe correction value of (a) is determined,
Figure BDA0002491228200000029
is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function of the sample i about the weight W is appointed at the time t, and the values of k second short-time fast weights are integrated and calculated to obtain second slow weights, wherein the second slow weights are used for indicating the result obtained after the machine learning task adopts the second optimizer to iterate, and k ∈ {2,3, …, n }.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the integrating and calculating k values of the second short-time fast weights to obtain second slow weights, where the second slow weights are used to instruct the machine learning task to use a result obtained after the second optimizer performs iteration, and k ∈ {2,3, …, n } includes: obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
Figure BDA0002491228200000031
wherein, t is the current time,
Figure BDA0002491228200000032
is the second slow weight at time t,
Figure BDA0002491228200000033
to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a fifth implementation manner of the first aspect of the present invention, the merging the first slow weight and the second slow weight according to a preset merging formula to obtain the target update weight includes extracting the first slow weight and the second slow weight at time t, where t belongs to {0,1,2, …, n }, and substituting the first slow weight and the second slow weight at time t into a preset merging formula to obtain the target update weight, where the preset merging formula is:
Figure BDA0002491228200000034
wherein the content of the first and second substances,
Figure BDA0002491228200000035
the weights are updated for the targets at time t,
Figure BDA0002491228200000036
is the first slow weight at time t,
Figure BDA0002491228200000037
for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Figure BDA0002491228200000038
wherein T is the current update time, and T is the iteration number of the whole training.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the calculating the target update weight at each iteration stage until the convergence of the loss function is completed includes: acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage; and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
A second aspect of the present invention provides an apparatus for optimizing a loss function, including: an obtaining module, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model; the first optimization module is used for training the machine learning task by using a first optimizer to obtain a first slow weight, and the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; the second optimization module is used for training the machine learning task by using a second optimizer to obtain a second slow weight, and the second slow weight is used for indicating the machine learning task to adopt the second optimizer to perform iteration to obtain a result; the merging module is used for merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and the iteration module is used for calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
Optionally, in a first implementation manner of the second aspect of the present invention, the first optimization module includes: a first selection unit for randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, a first calculation unit for using a first preset formula Wt+1=WttgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) Is an output sample; the first integration sheetand the element is used for performing integrated calculation on k values of the first short-time fast weights to obtain first slow weights, the first slow weights are used for indicating results obtained after the machine learning task adopts the first optimizer to iterate, and k belongs to {2,3, …, n }.
Optionally, in a second implementation manner of the second aspect of the present invention, the first integration unit is specifically configured to: obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
Figure BDA0002491228200000041
wherein, t is the current time,
Figure BDA0002491228200000042
is the first slow weight at time t,
Figure BDA0002491228200000043
to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a third implementation manner of the second aspect of the present invention, the second optimization module includes a second selection unit configured to randomly select, by using a second optimizer, one sample i, i ∈ {1,2, …, n }, where n is an integer greater than 1, from n training samples of the machine learning task, and a second calculation unit configured to calculate, by using a third preset formula, a second updated short-time fast weight W 'of i't+1The third preset formula is as follows:
Figure BDA0002491228200000051
where t is the current time, Wt' weight parameter at time t of second optimizerη is the initial learning rate, is a numerical stability quantity,
Figure BDA0002491228200000052
is a correction value of the first order momentum term,
Figure BDA0002491228200000053
the expression of (a) is as follows:
Figure BDA0002491228200000054
mt=β1mt-1+(1-β1)gt',
Figure BDA0002491228200000055
vt=β2vt-1+(1-β2)gt'2,
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,
Figure BDA0002491228200000056
is mtThe correction value of (a) is determined,
Figure BDA0002491228200000057
is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function of the sample i with respect to the weight W is specified at the moment t, and a second integration unit is used for integrating and calculating the values of k second short-time fast weights to obtain second slow weights, wherein the second slow weights are used for indicating the result obtained after the machine learning task is iterated by adopting the second optimizer, and k belongs to {2,3, …, n }.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the second integration unit is specifically configured to: obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
Figure BDA0002491228200000058
wherein, t is the current time,
Figure BDA0002491228200000059
is the second slow weight at time t,
Figure BDA00024912282000000510
to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a fifth implementation manner of the second aspect of the present invention, the merging module is specifically configured to extract the first slow weight and the second slow weight at time t, and for which t belongs to {0,1,2, …, n }, and bring the first slow weight and the second slow weight at time t into a preset merging formula to obtain a target update weight, where the preset merging formula is:
Figure BDA00024912282000000511
wherein the content of the first and second substances,
Figure BDA00024912282000000512
the weights are updated for the targets at time t,
Figure BDA00024912282000000513
is the first slow weight at time t,
Figure BDA00024912282000000514
for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Figure BDA00024912282000000515
wherein T is the current update time, and T is the iteration number of the whole training.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the iteration module is specifically configured to: acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage; and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
A third aspect of the present invention provides a device for optimizing a loss function, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor calls the instructions in the memory to cause the optimization device of the loss function to perform the optimization method of the loss function described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the above-described method of optimization of a loss function.
According to the technical scheme, a machine learning task to be optimized is obtained, and the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; calculating the target update weight for each iteration stage until the loss function convergence is completed. In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a method for optimizing a loss function according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating another embodiment of the method for optimizing the loss function according to the embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of an apparatus for optimizing a loss function according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of the loss function optimization apparatus according to the embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of an optimization apparatus for a loss function in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for optimizing a loss function, wherein a first slow weight calculated by a first optimizer and a second slow weight calculated by a second optimizer are integrated and calculated to obtain a target update weight, so that the calculation time of the calculated weights and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the method for optimizing a loss function in the embodiment of the present invention includes:
101. obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
the server obtains a machine learning task to be optimized for indicating a loss function in the converged machine learning model.
It is to be understood that the execution subject of the present invention may be an optimization apparatus of the loss function, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
It should be noted that, in the process of deep learning, each machine learning model has a loss function, and the purpose of deep learning is to minimize the loss function, however, not all machine learning models can find the minimum value of the loss function quickly and accurately, even some loss functions have no minimum value, so a general machine learning model uses a convex function as the loss function, the convex function ensures that the convex function has the minimum value, and the most common method for finding the minimum value of the convex function in deep learning is a gradient descent method, which is the main role of the optimizer in deep learning. The machine learning task refers to a loss function which needs to be optimized in deep learning, the loss function plays a very important role in the deep learning, and many loss functions of the deep learning are constructed on a sample pair or a sample triplet, so that the magnitude of a sample space is very large.
102. Training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the server trains the machine learning task by using the first optimizer to obtain a first slow weight used for indicating a result obtained after the machine learning task is iterated by using the first optimizer.
The first optimizer here uses a Stochastic Gradient Descent (SGD), which has low requirements for the gradient and thus can calculate the gradient quickly, and for the introduced noise, the SGD can converge on a certain range of noise, for example: compared with the standard gradient descent method that all samples are traversed, and the time required for updating the parameters once when one sample is input is shortened greatly.
It should be noted that, when the SGD performs loss update on each sample, it is very critical to set the learning rate in the algorithm, and if the learning rate set by the server is small, the convergence speed is slow, and if the learning rate set by the server is large, the loss function is abnormally jittered during convergence, so that the optimal condition of convergence cannot be achieved, and therefore, a large amount of training is required to obtain an appropriate learning rate; in addition, SGD is when optimizing the later stage of loss function, appears falling into the condition of local minimum easily, can not effectually carry out the later stage optimization, can adopt SGD to train large-scale data set rapidly as first optimizer in this application from this, has improved the efficiency of optimizing the loss function to combine together with the second optimizer, reach the effect that improves efficiency promptly and have improvement rate of accuracy.
103. Training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
and the server trains the machine learning task by using the second optimizer to obtain a second slow weight used for indicating a result obtained after the machine learning task is iterated by using the second optimizer.
The second optimizer adopts an adaptive moment estimation algorithm (Adam), Adam belongs to an adaptive learning rate optimization algorithm, the adaptive learning rate optimization algorithm is optimized for the learning rate in the machine learning model, the traditional optimization algorithm generally sets the learning rate as a constant or adjusts the learning rate according to the training times, the setting ignores the possibility of other changes of the learning rate, so that the loss optimization deviation is caused, the Adam optimizer can dynamically adjust the learning rate of each parameter by using the first moment estimation and the second moment estimation of the gradient, and the purpose of accurately optimizing the loss function is achieved. Adam has the advantages that after offset correction, the learning rate of each iteration is within a determined range, so that the adjusted parameters are relatively stable, and the loss function is more accurately optimized.
Adam iterates as follows: in the machine learning task, the server randomly selects a sample i from n training samples in the machine learning task by using a second optimizer, wherein n is an integer larger than 1, and loss updating is carried out on the sample i by using gt'=ΔJ(Wt,i') calculate gt',gt' gradient magnitude of cost function with respect to W ' after t iterations, where J (W ') is the cost function, Δ J (W)t,i') for a given time t, the classification i is given, the gradient of the cost function with respect to W is given, and then the server calculates the first order momentum term m respectivelytAnd a second order momentum term vtIn which the first order momentum term mtThe correction value of (1) is:
Figure BDA0002491228200000091
mt=β1mt-1+(1-β1)gt',
wherein, beta1Is a first-order momentum attenuation coefficient generally taking the value of 0.9, and a second-order momentum term vtThe correction value of (1) is:
Figure BDA0002491228200000092
vt=β2vt-1+(1-β2)gt'2
wherein, beta2The second-order momentum attenuation coefficient is generally 0.999, and the server calculates the first-order momentum term mtAnd a second order momentum term vtAfter the correction value is obtained, the server calculates a second short-time fast weight according to a third preset formula, wherein the third preset formula is as follows:
Figure BDA0002491228200000093
wherein eta is an initial learning rate value, generally 0.01, and a value stabilizing quantity, generally 10-8So as to ensure that the denominator of the fraction is not zero, thus completing the optimization of one-time loss.
104. Merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the server combines the first slow weight and the second slow weight according to a preset combination formula to obtain a target update weight.
It should be noted that the first slow weight calculated by the server may be understood as a result of the first optimizer optimizing the loss function, the second slow weight calculated by the server may be understood as a result of the second optimizer optimizing the same loss function, and the results of the first optimizer and the second optimizer optimizing the loss function have errors, so that the server combines the first slow weight and the second slow weight after optimization, and the target update weight obtained by a preset combination formula more conforms to a true value. Where the server utilizes the pre-predictionThe collocation and combination formula is:
Figure BDA0002491228200000101
wherein the content of the first and second substances,
Figure BDA0002491228200000102
wherein alpha is consistent with a certain probability distribution without loss of generality, T is the current updating time, T is the iteration number of the whole training, and the server can obtain the probability distribution by calculation according to steps 102-103
Figure BDA0002491228200000103
And
Figure BDA0002491228200000104
such that the server may combine the first slow weight with the second slow weight to obtain the target update weight.
105. And calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
The server calculates the target update weights for each iteration stage until the loss function convergence is complete.
It can be understood that the server performs calculation of the target update weight by using the results of the k iteration steps, and connects the calculated target update weights at different stages to obtain a result after the loss optimization until the convergence of the loss function is completed. The iterative updating method effectively overcomes the jitter of the error function during gradient updating when the training data is ended, and ensures the convergence speed and the accuracy of convergence.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Referring to fig. 2, another embodiment of the method for optimizing a loss function according to the embodiment of the present invention includes:
201. obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
the server obtains a machine learning task to be optimized for indicating a loss function in the converged machine learning model.
It should be noted that the machine learning task herein refers to a loss function that needs to be optimized in the deep learning, the loss function plays a very important role in the deep metric learning, and many of the loss functions of the deep metric learning are constructed on a sample pair or a sample triplet, so the magnitude of a sample space is very large, in the training of a later stage learning model, the gradient values of the sample pair sample triplet are almost 0, and if any targeted optimization is not performed, the convergence rate of the learning algorithm is very slow and is easy to fall into local optimization, which results in the decrease of the accuracy rate of the loss function.
In the method, a machine learning task is solved by using a gradient descent method, the gradient descent method is the most common algorithm for optimizing a loss function in an optimizer, generally, a random gradient descent method (SGD) is commonly used, the gradient is randomly descended by using a small batch of data, the optimal loss function is obtained through continuous iteration and convergence, although the SGD has a certain deviation in the updating direction and the calculation batch per time is small, the more times of updating, the better the convergence effect of the loss function is. Derived variant optimizers by SGD fall into two broad categories: the optimization method comprises the steps of combining an SGD optimizer and an adaptive learning rate mechanism optimizer to obtain an optimal loss function.
202. Training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the server trains the machine learning task by using the first optimizer to obtain a first slow weight used for indicating a result obtained after the machine learning task is iterated by using the first optimizer. Specifically, the method comprises the following steps:
the server randomly selects a sample i from n training samples of the machine learning task by utilizing a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, and the server using a first preset formula Wt+1=WttgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) and the server performs integrated calculation on the values of the k first short-time fast weights to obtain first slow weights, wherein the first slow weights are used for indicating a result obtained after a machine learning task is iterated by adopting a first optimizer, and k ∈ {2,3, …, n }.
The first optimizer here is the SGD optimizer, which has low requirements for gradient, so that the gradient can be calculated quickly, and for the introduced noise, the SGD can converge on the noise within a certain range, for example: compared with the standard gradient descent method that all samples are traversed, and the time required for updating the parameters once when one sample is input is shortened greatly.
For example: in the machine learning task, the server randomly selects a sample i from n training samplessWhere n is an integer greater than 1 for sample isUpdate it for loss, server utilizes gt=ΔJis(Wt,X(is),Y(is) Calculate g)tHere gtIs the current gradient of SGD, J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) To output the sample, the server then utilizes a first preset formula andcalculated gtTo calculate a first short-term fast weight, where the first predetermined formula is Wt+1=WttgtT is the current time, Wtis a weight parameter, η, at time t of the first optimizertFor learning rate, this completes the optimization of one loss.
The server performs integrated calculation on the values of the k first short-time fast weights to obtain first slow weights, the first slow weights are used for indicating results obtained after the machine learning tasks are iterated by adopting a first optimizer, and k belongs to {2,3, …, n }, specifically: obtaining values of k continuous first short-time fast weights, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and the values of k continuous first short-time fast weights, wherein the second preset formula is as follows:
Figure BDA0002491228200000121
wherein, t is the current time,
Figure BDA0002491228200000122
is the first slow weight at time t,
Figure BDA0002491228200000123
to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
It is further described that after the server performs training iteration, values of first short-time fast weights updated at different points are obtained, after the server calculates the first short-time fast weight at each moment, values of k consecutive first short-time fast weights need to be integrated and calculated, the k first short-time fast weights are used to calculate the first slow weight, and a plurality of values are used for calculation, so that accuracy of the first slow weight can be ensured, and the value of the first slow weight is more fit with the value of the whole first short-time fast weight.
In general, the value of k is 4, so that the calculated value is more suitable for optimization of data, but the value of k may be modified according to actual conditions, and the value of k is not limited in this application.
For example, when k is 4, the server calculates the first slow weight as follows: the server knows that the starting point of the first ephemeral fast weight is
Figure BDA0002491228200000124
The first short-term fast weight of successive ones is W1、W2、W3、W4Then, according to a second preset formula:
Figure BDA0002491228200000125
calculating a first slow weight:
Figure BDA0002491228200000126
this results in a first slow weight that more closely fits the first ephemeral fast weight.
203. Training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
and the server trains the machine learning task by using the second optimizer to obtain a second slow weight used for indicating a result obtained after the machine learning task is iterated by using the second optimizer. Specifically, the method comprises the following steps:
the server firstly utilizes a second optimizer to randomly select a sample i, i ∈ {1,2, …, n } from n training samples of the machine learning task, n is an integer larger than 1, and then utilizes a third preset formula to calculate a second short-time fast weight W 'after i is updated't+1The third preset formula is:
Figure BDA0002491228200000131
where t is the current time, Wt' is the weight parameter at time t of the second optimizer,eta is the initial learning rate, is a numerical stability quantity,
Figure BDA0002491228200000132
is a correction value of the first order momentum term,
Figure BDA0002491228200000133
the expression of (a) is as follows:
Figure BDA0002491228200000134
mt=β1mt-1+(1-β1)gt',
Figure BDA0002491228200000135
vt=β2vt-1+(1-β2)gt'2,
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,
Figure BDA0002491228200000136
is mtThe correction value of (a) is determined,
Figure BDA0002491228200000137
is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') appointing the gradient of the cost function of the sample i about the weight W at the time t, and finally, the server carries out integrated calculation on the values of k second short-time and fast weights to obtain second slow weights, wherein the second slow weights are used for indicating the machine learning task to adopt a result obtained after a second optimizer is adopted for iteration, and k ∈ {2,3, …, n }.
The second optimizer adopts an adaptive moment estimation algorithm (Adam), Adam belongs to an adaptive learning rate optimization algorithm, the adaptive learning rate optimization algorithm is optimized according to the learning rate in the machine learning model, and the Adam optimizer can utilize a ladder to optimize the learning rateThe first moment estimation and the second moment estimation of the degree dynamically adjust the learning rate of each parameter, and the purpose of accurately optimizing the loss function is achieved. Adam has the advantages that after offset correction, the learning rate of each iteration is within a determined range, so that the adjusted parameters are relatively stable, and the loss function is more accurately optimized. Adam iterates as follows: in the machine learning task, the server randomly selects a sample i from n training samples in the machine learning task by using a second optimizer, wherein n is an integer larger than 1, and loss updating is carried out on the sample i by using gt'=ΔJ(Wt,i') calculate gt',gt' gradient magnitude of cost function with respect to W ' after t iterations, where J (W ') is the cost function, Δ J (W)t,i') for a given time t, the classification i is given, the gradient of the cost function with respect to W is given, and then the server calculates the first order momentum term m respectivelytAnd a second order momentum term vtIn which the first order momentum term mtThe correction value of (1) is:
Figure BDA0002491228200000138
mt=β1mt-1+(1-β1)gt',
wherein, beta1Is a first-order momentum attenuation coefficient generally taking the value of 0.9, and a second-order momentum term vtThe correction value of (1) is:
Figure BDA0002491228200000139
vt=β2vt-1+(1-β2)gt'2
wherein, beta2The second-order momentum attenuation coefficient is generally 0.999, and the server calculates the first-order momentum term mtAnd a second order momentum term vtAfter the correction value is obtained, the server calculates a second short-time fast weight according to a third preset formula, wherein the third preset formula is as follows:
Figure BDA0002491228200000141
wherein eta is an initial learning rate value, generally 0.01, and a value stabilizing quantity, generally 10-8So as to ensure that the denominator of the fraction is not zero, thus completing the optimization of one-time loss.
The server performs integrated calculation on values of the k second short-time fast weights to obtain second slow weights, the second slow weights are used for indicating results obtained after the machine learning tasks are iterated by adopting a second optimizer, and k belongs to {2,3, …, n }, specifically: the server obtains the values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; then the server calculates a second slow weight according to a fourth preset formula and values of k continuous second fast weight parameters, wherein the fourth preset formula is as follows:
Figure BDA0002491228200000142
wherein, t is the current time,
Figure BDA0002491228200000143
is the second slow weight at time t,
Figure BDA0002491228200000144
to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
It should be noted that after the server performs training iteration, values of second short-time fast weights updated at different points are obtained, after the server calculates the second short-time fast weights at each moment, values of k consecutive second short-time fast weights need to be calculated, the values of the k second short-time fast weights are used for calculating second slow weights, and a plurality of values are used for calculation, so that the accuracy of the second slow weights can be ensured, and the values of the second slow weights are more fit with the values of the whole second short-time fast weights.
In general, the value of k is 4, so that the calculated value is more suitable for optimization of data, but the value of k may be modified according to actual conditions, and the value of k is not limited in this application.
For example, when k is 4, the server calculates the second slow weight as follows: the server knows that the starting point of the second ephemeral fast weight is
Figure BDA0002491228200000145
The second short-term fast weight of successive ones is W1、W2、W3、W4Then, according to a second preset formula:
Figure BDA0002491228200000146
calculating a second slow weight as:
Figure BDA0002491228200000147
this results in a second slow weight that more closely fits the second ephemeral fast weight.
204. Merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the server combines the first slow weight and the second slow weight according to a preset combination formula to obtain a target update weight. Specifically, the method comprises the following steps:
the server extracts a first slow weight and a second slow weight at the time t, wherein the t belongs to {0,1,2, …, n }, and then the server brings the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:
Figure BDA0002491228200000151
wherein the content of the first and second substances,
Figure BDA0002491228200000152
the weights are updated for the targets at time t,
Figure BDA0002491228200000153
is the first slow weight at time t,
Figure BDA0002491228200000154
for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Figure BDA0002491228200000155
wherein T is the current update time, and T is the iteration number of the whole training.
It should be noted that the first slow weight calculated by the server may be understood as a result of the first optimizer optimizing the loss function, the second slow weight calculated by the server may be understood as a result of the second optimizer optimizing the same loss function, and the results of the first optimizer and the second optimizer optimizing the loss function have errors, so that the server combines the first slow weight and the second slow weight after optimization, and the target update weight obtained by a preset combination formula more conforms to a true value. The preset merge formula utilized by the server here is:
Figure BDA0002491228200000156
wherein
Figure BDA0002491228200000157
wherein, alpha accords with a certain probability distribution without loss of generality, T is the current updating time, T is the iteration number of the whole training, and the server can obtain the probability distribution through calculation according to the step 202 and the step 203
Figure BDA0002491228200000158
And
Figure BDA0002491228200000159
so that the server can combine the first slow weight and the second slow weight to obtain the target update weight.
205. Acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and the server acquires the target updating weight of the first iteration stage, takes the target updating weight of the first iteration stage as the starting point of the short-time fast weight of the second iteration stage, and calculates the target updating weight of the second iteration stage.
It is understood that the server calculates the target update weight of the second iteration stage by using the target update weight calculated in the first iteration stage as a starting point for calculating the short-term fast weight of the second iteration stage by using the method of steps 202-204. The step of calculating the target update weight in the second iteration stage by the server is the same as the step of calculating the target update weight in the first iteration stage, and therefore, the present application is not described herein in detail.
206. And taking the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
And the server takes the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculates the target updating weight of the third iteration stage and calculates the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
It can be understood that the server performs calculation of the target update weight by using the results of the k iteration steps, and connects the calculated target update weights at different stages to obtain a result after the loss optimization until the convergence of the loss function is completed. The iterative updating method effectively overcomes the jitter of the error function during gradient updating when the training data is ended, and ensures the convergence speed and the accuracy of convergence.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
With reference to fig. 3, the method for optimizing a loss function in an embodiment of the present invention is described above, and an embodiment of an apparatus for optimizing a loss function in an embodiment of the present invention is described below, where:
an obtaining module 301, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module 302 is configured to train the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to use the first optimizer to perform iteration to obtain a result;
the second optimization module 303 is configured to train the machine learning task by using a second optimizer to obtain a second slow weight, where the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result;
a combining module 304, configured to combine the first slow weight and the second slow weight according to a preset combining formula, to obtain a target update weight;
an iteration module 305 for calculating the target update weight for each iteration stage until the loss function convergence is completed.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Referring to fig. 4, another embodiment of the apparatus for optimizing a loss function according to the embodiment of the present invention includes:
an obtaining module 301, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module 302 is configured to train the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to use the first optimizer to perform iteration to obtain a result;
the second optimization module 303 is configured to train the machine learning task by using a second optimizer to obtain a second slow weight, where the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result;
a combining module 304, configured to combine the first slow weight and the second slow weight according to a preset combining formula, to obtain a target update weight;
an iteration module 305 for calculating the target update weight for each iteration stage until the loss function convergence is completed.
Optionally, the first optimization module 302 includes:
a first selecting unit 3021, configured to randomly select one sample i from n training samples of the machine learning task by using a first optimizers,ise is e {1,2, …, n }, wherein n is an integer greater than 1;
a first calculating unit 3022 for using a first preset formula Wt+1=WttgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient in which, among other things,
Figure BDA0002491228200000171
wherein J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) Is an output sample;
the first integrating unit 3023 is configured to perform integration calculation on the values of the k first short-time fast weights to obtain first slow weights, where the first slow weights are used to indicate results obtained after the machine learning task is iterated by using a first optimizer, and k belongs to {2,3, …, n }.
Optionally, the first integration unit 3023 may be further specifically configured to:
obtaining values of k continuous first short-time fast weights, wherein k belongs to {2,3, …, n };
calculating a first slow weight according to a second preset formula and the values of k continuous first short-time fast weights, wherein the second preset formula is as follows:
Figure BDA0002491228200000181
wherein, t is the current time,
Figure BDA0002491228200000182
is the first slow weight at time t,
Figure BDA0002491228200000183
to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
Optionally, the second optimization module 303 includes:
a second selecting unit 3031, configured to randomly select, by using a second optimizer, one sample i from n training samples of a machine learning task, where i belongs to {1,2, …, n }, and n is an integer greater than 1;
a second calculating unit 3032, configured to calculate i updated second short-time fast weight W 'by using a third preset formula't+1The third preset formula is:
Figure BDA0002491228200000184
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,
Figure BDA0002491228200000185
is a correction value of the first order momentum term,
Figure BDA0002491228200000186
the expression of (a) is as follows:
Figure BDA0002491228200000187
mt=β1mt-1+(1-β1)gt',
Figure BDA0002491228200000188
vt=β2vt-1+(1-β2)gt'2,
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,
Figure BDA0002491228200000189
is mtThe correction value of (a) is determined,
Figure BDA00024912282000001810
is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function for sample i with respect to weight W is specified for time t;
a second integrating unit 3033, configured to perform integration calculation on the values of the k second short-time fast weights to obtain a second slow weight, where the second slow weight is used to indicate a result obtained after the machine learning task is iterated by using a second optimizer, and k belongs to {2,3, …, n }.
Optionally, the second integration unit 3033 may be further specifically configured to:
obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n };
calculating a second slow weight according to a fourth preset formula and values of k continuous second fast weight parameters, wherein the fourth preset formula is as follows:
Figure BDA00024912282000001811
wherein, t is the current time,
Figure BDA00024912282000001812
is the second slow weight at time t,
Figure BDA00024912282000001813
to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
Optionally, the merging module 304 may be further specifically configured to:
extracting a first slow weight and a second slow weight at the time t, wherein the t belongs to {0,1,2, …, n };
and substituting the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:
Figure BDA0002491228200000191
wherein the content of the first and second substances,
Figure BDA0002491228200000192
the weights are updated for the targets at time t,
Figure BDA0002491228200000193
is the first slow weight at time t,
Figure BDA0002491228200000194
for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Figure BDA0002491228200000195
wherein T is the current update time, and T is the iteration number of the whole training.
Optionally, the iteration module 305 may be further specifically configured to:
acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and taking the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Fig. 3 and 4 describe the optimization apparatus of the loss function in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the optimization device of the loss function in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of an optimization apparatus for loss function according to an embodiment of the present invention, where the optimization apparatus 500 for loss function may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the optimization apparatus 500 for a loss function. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the loss function optimization apparatus 500.
The loss function optimization apparatus 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows server, Mac OS X, Unix, Linux, FreeBSD, and the like. It will be appreciated by those skilled in the art that the loss function optimization apparatus configuration shown in fig. 5 does not constitute a limitation of the loss function optimization apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method for optimizing a loss function.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for optimizing a loss function, the method comprising:
obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
2. The method for optimizing the loss function according to claim 1, wherein the training of the machine learning task by using the first optimizer to obtain a first slow weight, and the instructing of the first slow weight to the result obtained after the machine learning task is iterated by using the first optimizer comprises:
randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,ise is e {1,2, …, n }, wherein n is an integer greater than 1;
using a first preset formula Wt+1=WttgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient in which, among other things,
Figure FDA0002491228190000011
wherein J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) Is an output sample;
and performing integrated calculation on the k first short-time fast weights to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer to iterate, and k belongs to {2,3, …, n }.
3. The method for optimizing a loss function according to claim 2, wherein the integrating calculation of the k values of the first short-time fast weights results in first slow weights, the first slow weights are used for indicating results obtained after the machine learning task is iterated by using the first optimizer, and k e {2,3, …, n } includes:
obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n };
calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
Figure FDA0002491228190000021
wherein, t is the current time,
Figure FDA0002491228190000022
is the first slow weight at time t,
Figure FDA0002491228190000023
to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
4. The method for optimizing the loss function according to claim 1, wherein the training of the machine learning task by using the second optimizer obtains a second slow weight, and the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result including:
randomly selecting a sample i from n training samples of the machine learning task by using a second optimizer, wherein i belongs to {1,2, …, n }, and n is an integer greater than 1;
calculating a second short-time fast weight W 'after i is updated by using a third preset formula't+1The third preset formula is as follows:
Figure FDA0002491228190000024
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,
Figure FDA0002491228190000025
is a correction value of the first order momentum term,
Figure FDA0002491228190000026
the expression of (a) is as follows:
Figure FDA0002491228190000027
mt=β1mt-1+(1-β1)gt',
Figure FDA0002491228190000028
vt=β2vt-1+(1-β2)gt'2,
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,
Figure FDA0002491228190000029
is mtThe correction value of (a) is determined,
Figure FDA00024912281900000210
is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function for sample i with respect to weight W is specified for time t;
and performing integrated calculation on the k second short-time fast weights to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task adopts the second optimizer to perform iteration, and k belongs to {2,3, …, n }.
5. The method for optimizing a loss function according to claim 4, wherein the integrating calculation of the k values of the second short-time fast weights is performed to obtain second slow weights, the second slow weights are used to indicate results obtained after the machine learning task is iterated by using the second optimizer, and k e {2,3, …, n } includes:
obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n };
calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
Figure FDA0002491228190000031
wherein, t is the current time,
Figure FDA0002491228190000032
is the second slow weight at time t,
Figure FDA0002491228190000033
to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
6. The method of claim 1, wherein the combining the first slow weight and the second slow weight according to a preset combining formula to obtain a target update weight comprises:
extracting the first slow weight and the second slow weight at the time t, wherein t belongs to {0,1,2, …, n };
substituting the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:
Figure FDA0002491228190000034
wherein the content of the first and second substances,
Figure FDA0002491228190000035
the weights are updated for the targets at time t,
Figure FDA0002491228190000036
is the first slow weight at time t,
Figure FDA0002491228190000037
for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Figure FDA0002491228190000038
where T is the current update time and T is the iteration of the ensemble trainingThe number of times.
7. The method of claim 1, wherein the calculating the target update weight for each iteration stage until the loss function convergence is complete comprises:
acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
8. An optimization apparatus for a loss function, comprising:
an obtaining module, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module is used for training the machine learning task by using a first optimizer to obtain a first slow weight, and the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the second optimization module is used for training the machine learning task by using a second optimizer to obtain a second slow weight, and the second slow weight is used for indicating the machine learning task to adopt the second optimizer to perform iteration to obtain a result;
the merging module is used for merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the iteration module is used for calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
9. An optimization apparatus for a loss function, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor calls the instructions in the memory to cause the optimization device of the loss function to perform the optimization method of the loss function according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of optimizing a loss function according to any one of claims 1 to 7.
CN202010405723.5A 2020-05-14 2020-05-14 Method, device and equipment for optimizing loss function and storage medium Pending CN111738408A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010405723.5A CN111738408A (en) 2020-05-14 2020-05-14 Method, device and equipment for optimizing loss function and storage medium
PCT/CN2020/118303 WO2021139237A1 (en) 2020-05-14 2020-09-28 Method and apparatus for loss function optimization, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010405723.5A CN111738408A (en) 2020-05-14 2020-05-14 Method, device and equipment for optimizing loss function and storage medium

Publications (1)

Publication Number Publication Date
CN111738408A true CN111738408A (en) 2020-10-02

Family

ID=72647265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010405723.5A Pending CN111738408A (en) 2020-05-14 2020-05-14 Method, device and equipment for optimizing loss function and storage medium

Country Status (2)

Country Link
CN (1) CN111738408A (en)
WO (1) WO2021139237A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287089A (en) * 2020-11-23 2021-01-29 腾讯科技(深圳)有限公司 Classification model training and automatic question-answering method and device for automatic question-answering system
CN113763501A (en) * 2021-09-08 2021-12-07 上海壁仞智能科技有限公司 Iteration method of image reconstruction model and image reconstruction method
WO2022095432A1 (en) * 2020-11-05 2022-05-12 平安科技(深圳)有限公司 Neural network model training method and apparatus, computer device, and storage medium
CN114647387A (en) * 2022-05-23 2022-06-21 南京道成网络科技有限公司 Cache optimization method suitable for cloud storage
CN116237935A (en) * 2023-02-03 2023-06-09 兰州大学 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463576B (en) * 2021-12-24 2024-04-09 中国科学技术大学 Network training method based on re-weighting strategy
CN114254763B (en) * 2021-12-27 2024-04-05 西安交通大学 Machine learning model repairing method, system, computer equipment and storage medium
CN114692562B (en) * 2022-03-16 2024-05-24 北京理工大学 High-precision hybrid dynamic priority multi-objective optimization method
CN116630398B (en) * 2023-07-21 2024-05-10 浙江大学海南研究院 Optimizer momentum coefficient regulation and control method based on data set concave-convex characteristic

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909142A (en) * 2017-11-14 2018-04-13 深圳先进技术研究院 A kind of parameter optimization method of neutral net, system and electronic equipment
KR102068576B1 (en) * 2018-04-10 2020-01-21 배재대학교 산학협력단 Convolutional neural network based image processing system and method
CN108960318A (en) * 2018-06-28 2018-12-07 武汉市哈哈便利科技有限公司 A kind of commodity recognizer using binocular vision technology for self-service cabinet
CN111028306B (en) * 2019-11-06 2023-07-14 杭州电子科技大学 AR2U-Net neural network-based rapid magnetic resonance imaging method
CN111079896A (en) * 2019-11-15 2020-04-28 苏州浪潮智能科技有限公司 Hyper-parameter self-adaptive adjustment method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022095432A1 (en) * 2020-11-05 2022-05-12 平安科技(深圳)有限公司 Neural network model training method and apparatus, computer device, and storage medium
CN112287089A (en) * 2020-11-23 2021-01-29 腾讯科技(深圳)有限公司 Classification model training and automatic question-answering method and device for automatic question-answering system
CN112287089B (en) * 2020-11-23 2022-09-20 腾讯科技(深圳)有限公司 Classification model training and automatic question-answering method and device for automatic question-answering system
CN113763501A (en) * 2021-09-08 2021-12-07 上海壁仞智能科技有限公司 Iteration method of image reconstruction model and image reconstruction method
CN113763501B (en) * 2021-09-08 2024-02-27 上海壁仞智能科技有限公司 Iterative method of image reconstruction model and image reconstruction method
CN114647387A (en) * 2022-05-23 2022-06-21 南京道成网络科技有限公司 Cache optimization method suitable for cloud storage
CN116237935A (en) * 2023-02-03 2023-06-09 兰州大学 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium
CN116237935B (en) * 2023-02-03 2023-09-15 兰州大学 Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium

Also Published As

Publication number Publication date
WO2021139237A1 (en) 2021-07-15

Similar Documents

Publication Publication Date Title
CN111738408A (en) Method, device and equipment for optimizing loss function and storage medium
CN107729322B (en) Word segmentation method and device and sentence vector generation model establishment method and device
CN109947940B (en) Text classification method, device, terminal and storage medium
CN110458287B (en) Parameter updating method, device, terminal and storage medium of neural network optimizer
CN105224959A (en) The training method of order models and device
US8478688B1 (en) Rapid transaction processing
US9967275B1 (en) Efficient detection of network anomalies
CN112101674B (en) Resource allocation matching method, device, equipment and medium based on group intelligent algorithm
CN113434859A (en) Intrusion detection method, device, equipment and storage medium
CN110661727A (en) Data transmission optimization method and device, computer equipment and storage medium
WO2015161782A1 (en) Method and system for mining churn factor causing user churn for network application
CN110851333B (en) Root partition monitoring method and device and monitoring server
CN110880014A (en) Data processing method and device, computer equipment and storage medium
CN112669078A (en) Behavior prediction model training method, device, equipment and storage medium
CN112445690A (en) Information acquisition method and device and electronic equipment
CN106982250A (en) Information-pushing method and device
JP6652398B2 (en) Parameter selection method, parameter selection program and parameter selection device
JP6494258B2 (en) Prediction system, prediction method, and prediction program
CN111461329B (en) Model training method, device, equipment and readable storage medium
JP2016031739A (en) Production control support device and production control support method
JP6203313B2 (en) Feature selection device, feature selection method, and program
Vakili et al. Delayed feedback in kernel bandits
CN114819398A (en) Beidou satellite clock error sequence combination prediction method based on gray Markov chain
CN114912627A (en) Recommendation model training method, system, computer device and storage medium
CN111047016A (en) Model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201002