CN111738408A - Method, device and equipment for optimizing loss function and storage medium - Google Patents
Method, device and equipment for optimizing loss function and storage medium Download PDFInfo
- Publication number
- CN111738408A CN111738408A CN202010405723.5A CN202010405723A CN111738408A CN 111738408 A CN111738408 A CN 111738408A CN 202010405723 A CN202010405723 A CN 202010405723A CN 111738408 A CN111738408 A CN 111738408A
- Authority
- CN
- China
- Prior art keywords
- weight
- slow
- machine learning
- optimizer
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000006870 function Effects 0.000 claims abstract description 158
- 238000010801 machine learning Methods 0.000 claims abstract description 139
- 238000005457 optimization Methods 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims description 37
- 238000012937 correction Methods 0.000 claims description 23
- 239000000126 substance Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims 1
- 238000012423 maintenance Methods 0.000 abstract description 2
- NJPPVKZQTLUDBO-UHFFFAOYSA-N novaluron Chemical compound C1=C(Cl)C(OC(F)(F)C(OC(F)(F)F)F)=CC=C1NC(=O)NC(=O)C1=C(F)C=CC=C1F NJPPVKZQTLUDBO-UHFFFAOYSA-N 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 11
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 10
- 230000010354 integration Effects 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 7
- 230000002159 abnormal effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000011478 gradient descent method Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of operation and maintenance of a pedestal, and discloses a method, a device, equipment and a storage medium for optimizing a loss function, which are used for solving the problem of low convergence accuracy of the loss function. The optimization method of the loss function comprises the following steps: obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
Description
Technical Field
The present invention relates to the field of pedestal operation and maintenance, and in particular, to a method, an apparatus, a device, and a storage medium for optimizing a loss function.
Background
With the popularization of a neural network in a computer, deep learning is a feature which enables the neural network to learn how to capture data, and after the data feature is captured, the captured data feature is different from a real data feature, so that a loss function needs to be optimized in time. The optimizer is therefore an important tool for optimizing the loss function in deep learning networks. At present, an optimizer in deep learning is usually a Stochastic Gradient Descent (SGD) method, and when a loss function is optimized by using the SGD, a small batch of data is used to randomly descend a gradient, and an optimal loss function is obtained through continuous iteration and convergence.
However, in the later stage of optimizing the loss function, the SGD is prone to have the situation that the loss function falls into the local minimum, so that the loss function abnormally jitters during convergence, and the optimal situation of convergence cannot be achieved, and further the convergence accuracy and efficiency of the loss function are low.
Disclosure of Invention
The invention mainly aims to solve the problem of low accuracy of loss function convergence.
The first aspect of the present invention provides a method for optimizing a loss function, including: obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
Optionally, in a first implementation manner of the first aspect of the present invention, the training the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to perform iteration by using the first optimizer, and a result obtained by performing iteration by using the first optimizer includes: randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, using a first preset formula Wt+1=Wt-ηtgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) and performing integrated calculation on k first short-time fast weight values to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer to iterate, and k ∈ {2,3, …, n }.
Optionally, in a second implementation manner of the first aspect of the present invention, the integrating and calculating k values of the first short-time fast weights to obtain first slow weights, where the first slow weights are used to instruct the machine learning task to use a result obtained after the iteration of the first optimizer, and k ∈ {2,3, …, n } includes: obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
wherein, t is the current time,is the first slow weight at time t,to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a third implementation manner of the first aspect of the present invention, the training of the machine learning task by using the second optimizer to obtain a second slow weight, where the second slow weight is used to indicate that the machine learning task is iterated by using the second optimizer, and the obtained result includes randomly selecting a sample i, i ∈ {1,2, …, n }, where n is an integer greater than 1, from n training samples of the machine learning task by using the second optimizer, and calculating a second short-time fast weight W 'updated by using a third preset formula't+1The third preset formula is as follows:
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,is a correction value of the first order momentum term,the expression of (a) is as follows:
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,is mtThe correction value of (a) is determined,is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function of the sample i about the weight W is appointed at the time t, and the values of k second short-time fast weights are integrated and calculated to obtain second slow weights, wherein the second slow weights are used for indicating the result obtained after the machine learning task adopts the second optimizer to iterate, and k ∈ {2,3, …, n }.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the integrating and calculating k values of the second short-time fast weights to obtain second slow weights, where the second slow weights are used to instruct the machine learning task to use a result obtained after the second optimizer performs iteration, and k ∈ {2,3, …, n } includes: obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
wherein, t is the current time,is the second slow weight at time t,to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a fifth implementation manner of the first aspect of the present invention, the merging the first slow weight and the second slow weight according to a preset merging formula to obtain the target update weight includes extracting the first slow weight and the second slow weight at time t, where t belongs to {0,1,2, …, n }, and substituting the first slow weight and the second slow weight at time t into a preset merging formula to obtain the target update weight, where the preset merging formula is:wherein the content of the first and second substances,the weights are updated for the targets at time t,is the first slow weight at time t,for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Optionally, in a sixth implementation manner of the first aspect of the present invention, the calculating the target update weight at each iteration stage until the convergence of the loss function is completed includes: acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage; and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
A second aspect of the present invention provides an apparatus for optimizing a loss function, including: an obtaining module, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model; the first optimization module is used for training the machine learning task by using a first optimizer to obtain a first slow weight, and the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; the second optimization module is used for training the machine learning task by using a second optimizer to obtain a second slow weight, and the second slow weight is used for indicating the machine learning task to adopt the second optimizer to perform iteration to obtain a result; the merging module is used for merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; and the iteration module is used for calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
Optionally, in a first implementation manner of the second aspect of the present invention, the first optimization module includes: a first selection unit for randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, a first calculation unit for using a first preset formula Wt+1=Wt-ηtgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) Is an output sample; the first integration sheetand the element is used for performing integrated calculation on k values of the first short-time fast weights to obtain first slow weights, the first slow weights are used for indicating results obtained after the machine learning task adopts the first optimizer to iterate, and k belongs to {2,3, …, n }.
Optionally, in a second implementation manner of the second aspect of the present invention, the first integration unit is specifically configured to: obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
wherein, t is the current time,is the first slow weight at time t,to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a third implementation manner of the second aspect of the present invention, the second optimization module includes a second selection unit configured to randomly select, by using a second optimizer, one sample i, i ∈ {1,2, …, n }, where n is an integer greater than 1, from n training samples of the machine learning task, and a second calculation unit configured to calculate, by using a third preset formula, a second updated short-time fast weight W 'of i't+1The third preset formula is as follows:
where t is the current time, Wt' weight parameter at time t of second optimizerη is the initial learning rate, is a numerical stability quantity,is a correction value of the first order momentum term,the expression of (a) is as follows:
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,is mtThe correction value of (a) is determined,is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function of the sample i with respect to the weight W is specified at the moment t, and a second integration unit is used for integrating and calculating the values of k second short-time fast weights to obtain second slow weights, wherein the second slow weights are used for indicating the result obtained after the machine learning task is iterated by adopting the second optimizer, and k belongs to {2,3, …, n }.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the second integration unit is specifically configured to: obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
wherein, t is the current time,is the second slow weight at time t,to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
optionally, in a fifth implementation manner of the second aspect of the present invention, the merging module is specifically configured to extract the first slow weight and the second slow weight at time t, and for which t belongs to {0,1,2, …, n }, and bring the first slow weight and the second slow weight at time t into a preset merging formula to obtain a target update weight, where the preset merging formula is:wherein the content of the first and second substances,the weights are updated for the targets at time t,is the first slow weight at time t,for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Optionally, in a sixth implementation manner of the second aspect of the present invention, the iteration module is specifically configured to: acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage; and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
A third aspect of the present invention provides a device for optimizing a loss function, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor calls the instructions in the memory to cause the optimization device of the loss function to perform the optimization method of the loss function described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the above-described method of optimization of a loss function.
According to the technical scheme, a machine learning task to be optimized is obtained, and the machine learning task is used for indicating a loss function in a convergence machine learning model; training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer; training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer; merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight; calculating the target update weight for each iteration stage until the loss function convergence is completed. In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a method for optimizing a loss function according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating another embodiment of the method for optimizing the loss function according to the embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of an apparatus for optimizing a loss function according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of the loss function optimization apparatus according to the embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of an optimization apparatus for a loss function in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for optimizing a loss function, wherein a first slow weight calculated by a first optimizer and a second slow weight calculated by a second optimizer are integrated and calculated to obtain a target update weight, so that the calculation time of the calculated weights and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the method for optimizing a loss function in the embodiment of the present invention includes:
101. obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
the server obtains a machine learning task to be optimized for indicating a loss function in the converged machine learning model.
It is to be understood that the execution subject of the present invention may be an optimization apparatus of the loss function, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
It should be noted that, in the process of deep learning, each machine learning model has a loss function, and the purpose of deep learning is to minimize the loss function, however, not all machine learning models can find the minimum value of the loss function quickly and accurately, even some loss functions have no minimum value, so a general machine learning model uses a convex function as the loss function, the convex function ensures that the convex function has the minimum value, and the most common method for finding the minimum value of the convex function in deep learning is a gradient descent method, which is the main role of the optimizer in deep learning. The machine learning task refers to a loss function which needs to be optimized in deep learning, the loss function plays a very important role in the deep learning, and many loss functions of the deep learning are constructed on a sample pair or a sample triplet, so that the magnitude of a sample space is very large.
102. Training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the server trains the machine learning task by using the first optimizer to obtain a first slow weight used for indicating a result obtained after the machine learning task is iterated by using the first optimizer.
The first optimizer here uses a Stochastic Gradient Descent (SGD), which has low requirements for the gradient and thus can calculate the gradient quickly, and for the introduced noise, the SGD can converge on a certain range of noise, for example: compared with the standard gradient descent method that all samples are traversed, and the time required for updating the parameters once when one sample is input is shortened greatly.
It should be noted that, when the SGD performs loss update on each sample, it is very critical to set the learning rate in the algorithm, and if the learning rate set by the server is small, the convergence speed is slow, and if the learning rate set by the server is large, the loss function is abnormally jittered during convergence, so that the optimal condition of convergence cannot be achieved, and therefore, a large amount of training is required to obtain an appropriate learning rate; in addition, SGD is when optimizing the later stage of loss function, appears falling into the condition of local minimum easily, can not effectually carry out the later stage optimization, can adopt SGD to train large-scale data set rapidly as first optimizer in this application from this, has improved the efficiency of optimizing the loss function to combine together with the second optimizer, reach the effect that improves efficiency promptly and have improvement rate of accuracy.
103. Training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
and the server trains the machine learning task by using the second optimizer to obtain a second slow weight used for indicating a result obtained after the machine learning task is iterated by using the second optimizer.
The second optimizer adopts an adaptive moment estimation algorithm (Adam), Adam belongs to an adaptive learning rate optimization algorithm, the adaptive learning rate optimization algorithm is optimized for the learning rate in the machine learning model, the traditional optimization algorithm generally sets the learning rate as a constant or adjusts the learning rate according to the training times, the setting ignores the possibility of other changes of the learning rate, so that the loss optimization deviation is caused, the Adam optimizer can dynamically adjust the learning rate of each parameter by using the first moment estimation and the second moment estimation of the gradient, and the purpose of accurately optimizing the loss function is achieved. Adam has the advantages that after offset correction, the learning rate of each iteration is within a determined range, so that the adjusted parameters are relatively stable, and the loss function is more accurately optimized.
Adam iterates as follows: in the machine learning task, the server randomly selects a sample i from n training samples in the machine learning task by using a second optimizer, wherein n is an integer larger than 1, and loss updating is carried out on the sample i by using gt'=ΔJ(Wt,i') calculate gt',gt' gradient magnitude of cost function with respect to W ' after t iterations, where J (W ') is the cost function, Δ J (W)t,i') for a given time t, the classification i is given, the gradient of the cost function with respect to W is given, and then the server calculates the first order momentum term m respectivelytAnd a second order momentum term vtIn which the first order momentum term mtThe correction value of (1) is:
wherein, beta1Is a first-order momentum attenuation coefficient generally taking the value of 0.9, and a second-order momentum term vtThe correction value of (1) is:
wherein, beta2The second-order momentum attenuation coefficient is generally 0.999, and the server calculates the first-order momentum term mtAnd a second order momentum term vtAfter the correction value is obtained, the server calculates a second short-time fast weight according to a third preset formula, wherein the third preset formula is as follows:
wherein eta is an initial learning rate value, generally 0.01, and a value stabilizing quantity, generally 10-8So as to ensure that the denominator of the fraction is not zero, thus completing the optimization of one-time loss.
104. Merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the server combines the first slow weight and the second slow weight according to a preset combination formula to obtain a target update weight.
It should be noted that the first slow weight calculated by the server may be understood as a result of the first optimizer optimizing the loss function, the second slow weight calculated by the server may be understood as a result of the second optimizer optimizing the same loss function, and the results of the first optimizer and the second optimizer optimizing the loss function have errors, so that the server combines the first slow weight and the second slow weight after optimization, and the target update weight obtained by a preset combination formula more conforms to a true value. Where the server utilizes the pre-predictionThe collocation and combination formula is:wherein the content of the first and second substances,
wherein alpha is consistent with a certain probability distribution without loss of generality, T is the current updating time, T is the iteration number of the whole training, and the server can obtain the probability distribution by calculation according to steps 102-103Andsuch that the server may combine the first slow weight with the second slow weight to obtain the target update weight.
105. And calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
The server calculates the target update weights for each iteration stage until the loss function convergence is complete.
It can be understood that the server performs calculation of the target update weight by using the results of the k iteration steps, and connects the calculated target update weights at different stages to obtain a result after the loss optimization until the convergence of the loss function is completed. The iterative updating method effectively overcomes the jitter of the error function during gradient updating when the training data is ended, and ensures the convergence speed and the accuracy of convergence.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Referring to fig. 2, another embodiment of the method for optimizing a loss function according to the embodiment of the present invention includes:
201. obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
the server obtains a machine learning task to be optimized for indicating a loss function in the converged machine learning model.
It should be noted that the machine learning task herein refers to a loss function that needs to be optimized in the deep learning, the loss function plays a very important role in the deep metric learning, and many of the loss functions of the deep metric learning are constructed on a sample pair or a sample triplet, so the magnitude of a sample space is very large, in the training of a later stage learning model, the gradient values of the sample pair sample triplet are almost 0, and if any targeted optimization is not performed, the convergence rate of the learning algorithm is very slow and is easy to fall into local optimization, which results in the decrease of the accuracy rate of the loss function.
In the method, a machine learning task is solved by using a gradient descent method, the gradient descent method is the most common algorithm for optimizing a loss function in an optimizer, generally, a random gradient descent method (SGD) is commonly used, the gradient is randomly descended by using a small batch of data, the optimal loss function is obtained through continuous iteration and convergence, although the SGD has a certain deviation in the updating direction and the calculation batch per time is small, the more times of updating, the better the convergence effect of the loss function is. Derived variant optimizers by SGD fall into two broad categories: the optimization method comprises the steps of combining an SGD optimizer and an adaptive learning rate mechanism optimizer to obtain an optimal loss function.
202. Training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the server trains the machine learning task by using the first optimizer to obtain a first slow weight used for indicating a result obtained after the machine learning task is iterated by using the first optimizer. Specifically, the method comprises the following steps:
the server randomly selects a sample i from n training samples of the machine learning task by utilizing a first optimizers,is∈ {1,2, …, n }, n being an integer greater than 1, and the server using a first preset formula Wt+1=Wt-ηtgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient, wherein gt=ΔJis(Wt,X(is),Y(is) Wherein J (W) is a cost function, Δ J (W) is a gradient, and X (i)s) To input a sample, Y (i)s) and the server performs integrated calculation on the values of the k first short-time fast weights to obtain first slow weights, wherein the first slow weights are used for indicating a result obtained after a machine learning task is iterated by adopting a first optimizer, and k ∈ {2,3, …, n }.
The first optimizer here is the SGD optimizer, which has low requirements for gradient, so that the gradient can be calculated quickly, and for the introduced noise, the SGD can converge on the noise within a certain range, for example: compared with the standard gradient descent method that all samples are traversed, and the time required for updating the parameters once when one sample is input is shortened greatly.
For example: in the machine learning task, the server randomly selects a sample i from n training samplessWhere n is an integer greater than 1 for sample isUpdate it for loss, server utilizes gt=ΔJis(Wt,X(is),Y(is) Calculate g)tHere gtIs the current gradient of SGD, J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) To output the sample, the server then utilizes a first preset formula andcalculated gtTo calculate a first short-term fast weight, where the first predetermined formula is Wt+1=Wt-ηtgtT is the current time, Wtis a weight parameter, η, at time t of the first optimizertFor learning rate, this completes the optimization of one loss.
The server performs integrated calculation on the values of the k first short-time fast weights to obtain first slow weights, the first slow weights are used for indicating results obtained after the machine learning tasks are iterated by adopting a first optimizer, and k belongs to {2,3, …, n }, specifically: obtaining values of k continuous first short-time fast weights, wherein k belongs to {2,3, …, n }; calculating a first slow weight according to a second preset formula and the values of k continuous first short-time fast weights, wherein the second preset formula is as follows:
wherein, t is the current time,is the first slow weight at time t,to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
It is further described that after the server performs training iteration, values of first short-time fast weights updated at different points are obtained, after the server calculates the first short-time fast weight at each moment, values of k consecutive first short-time fast weights need to be integrated and calculated, the k first short-time fast weights are used to calculate the first slow weight, and a plurality of values are used for calculation, so that accuracy of the first slow weight can be ensured, and the value of the first slow weight is more fit with the value of the whole first short-time fast weight.
In general, the value of k is 4, so that the calculated value is more suitable for optimization of data, but the value of k may be modified according to actual conditions, and the value of k is not limited in this application.
For example, when k is 4, the server calculates the first slow weight as follows: the server knows that the starting point of the first ephemeral fast weight isThe first short-term fast weight of successive ones is W1、W2、W3、W4Then, according to a second preset formula:
calculating a first slow weight:this results in a first slow weight that more closely fits the first ephemeral fast weight.
203. Training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
and the server trains the machine learning task by using the second optimizer to obtain a second slow weight used for indicating a result obtained after the machine learning task is iterated by using the second optimizer. Specifically, the method comprises the following steps:
the server firstly utilizes a second optimizer to randomly select a sample i, i ∈ {1,2, …, n } from n training samples of the machine learning task, n is an integer larger than 1, and then utilizes a third preset formula to calculate a second short-time fast weight W 'after i is updated't+1The third preset formula is:
where t is the current time, Wt' is the weight parameter at time t of the second optimizer,eta is the initial learning rate, is a numerical stability quantity,is a correction value of the first order momentum term,the expression of (a) is as follows:
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,is mtThe correction value of (a) is determined,is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') appointing the gradient of the cost function of the sample i about the weight W at the time t, and finally, the server carries out integrated calculation on the values of k second short-time and fast weights to obtain second slow weights, wherein the second slow weights are used for indicating the machine learning task to adopt a result obtained after a second optimizer is adopted for iteration, and k ∈ {2,3, …, n }.
The second optimizer adopts an adaptive moment estimation algorithm (Adam), Adam belongs to an adaptive learning rate optimization algorithm, the adaptive learning rate optimization algorithm is optimized according to the learning rate in the machine learning model, and the Adam optimizer can utilize a ladder to optimize the learning rateThe first moment estimation and the second moment estimation of the degree dynamically adjust the learning rate of each parameter, and the purpose of accurately optimizing the loss function is achieved. Adam has the advantages that after offset correction, the learning rate of each iteration is within a determined range, so that the adjusted parameters are relatively stable, and the loss function is more accurately optimized. Adam iterates as follows: in the machine learning task, the server randomly selects a sample i from n training samples in the machine learning task by using a second optimizer, wherein n is an integer larger than 1, and loss updating is carried out on the sample i by using gt'=ΔJ(Wt,i') calculate gt',gt' gradient magnitude of cost function with respect to W ' after t iterations, where J (W ') is the cost function, Δ J (W)t,i') for a given time t, the classification i is given, the gradient of the cost function with respect to W is given, and then the server calculates the first order momentum term m respectivelytAnd a second order momentum term vtIn which the first order momentum term mtThe correction value of (1) is:
wherein, beta1Is a first-order momentum attenuation coefficient generally taking the value of 0.9, and a second-order momentum term vtThe correction value of (1) is:
wherein, beta2The second-order momentum attenuation coefficient is generally 0.999, and the server calculates the first-order momentum term mtAnd a second order momentum term vtAfter the correction value is obtained, the server calculates a second short-time fast weight according to a third preset formula, wherein the third preset formula is as follows:
wherein eta is an initial learning rate value, generally 0.01, and a value stabilizing quantity, generally 10-8So as to ensure that the denominator of the fraction is not zero, thus completing the optimization of one-time loss.
The server performs integrated calculation on values of the k second short-time fast weights to obtain second slow weights, the second slow weights are used for indicating results obtained after the machine learning tasks are iterated by adopting a second optimizer, and k belongs to {2,3, …, n }, specifically: the server obtains the values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n }; then the server calculates a second slow weight according to a fourth preset formula and values of k continuous second fast weight parameters, wherein the fourth preset formula is as follows:
wherein, t is the current time,is the second slow weight at time t,to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
It should be noted that after the server performs training iteration, values of second short-time fast weights updated at different points are obtained, after the server calculates the second short-time fast weights at each moment, values of k consecutive second short-time fast weights need to be calculated, the values of the k second short-time fast weights are used for calculating second slow weights, and a plurality of values are used for calculation, so that the accuracy of the second slow weights can be ensured, and the values of the second slow weights are more fit with the values of the whole second short-time fast weights.
In general, the value of k is 4, so that the calculated value is more suitable for optimization of data, but the value of k may be modified according to actual conditions, and the value of k is not limited in this application.
For example, when k is 4, the server calculates the second slow weight as follows: the server knows that the starting point of the second ephemeral fast weight isThe second short-term fast weight of successive ones is W1、W2、W3、W4Then, according to a second preset formula:
calculating a second slow weight as:this results in a second slow weight that more closely fits the second ephemeral fast weight.
204. Merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the server combines the first slow weight and the second slow weight according to a preset combination formula to obtain a target update weight. Specifically, the method comprises the following steps:
the server extracts a first slow weight and a second slow weight at the time t, wherein the t belongs to {0,1,2, …, n }, and then the server brings the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:wherein the content of the first and second substances,the weights are updated for the targets at time t,is the first slow weight at time t,for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
It should be noted that the first slow weight calculated by the server may be understood as a result of the first optimizer optimizing the loss function, the second slow weight calculated by the server may be understood as a result of the second optimizer optimizing the same loss function, and the results of the first optimizer and the second optimizer optimizing the loss function have errors, so that the server combines the first slow weight and the second slow weight after optimization, and the target update weight obtained by a preset combination formula more conforms to a true value. The preset merge formula utilized by the server here is:wherein
wherein, alpha accords with a certain probability distribution without loss of generality, T is the current updating time, T is the iteration number of the whole training, and the server can obtain the probability distribution through calculation according to the step 202 and the step 203Andso that the server can combine the first slow weight and the second slow weight to obtain the target update weight.
205. Acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and the server acquires the target updating weight of the first iteration stage, takes the target updating weight of the first iteration stage as the starting point of the short-time fast weight of the second iteration stage, and calculates the target updating weight of the second iteration stage.
It is understood that the server calculates the target update weight of the second iteration stage by using the target update weight calculated in the first iteration stage as a starting point for calculating the short-term fast weight of the second iteration stage by using the method of steps 202-204. The step of calculating the target update weight in the second iteration stage by the server is the same as the step of calculating the target update weight in the first iteration stage, and therefore, the present application is not described herein in detail.
206. And taking the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
And the server takes the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculates the target updating weight of the third iteration stage and calculates the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
It can be understood that the server performs calculation of the target update weight by using the results of the k iteration steps, and connects the calculated target update weights at different stages to obtain a result after the loss optimization until the convergence of the loss function is completed. The iterative updating method effectively overcomes the jitter of the error function during gradient updating when the training data is ended, and ensures the convergence speed and the accuracy of convergence.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
With reference to fig. 3, the method for optimizing a loss function in an embodiment of the present invention is described above, and an embodiment of an apparatus for optimizing a loss function in an embodiment of the present invention is described below, where:
an obtaining module 301, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module 302 is configured to train the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to use the first optimizer to perform iteration to obtain a result;
the second optimization module 303 is configured to train the machine learning task by using a second optimizer to obtain a second slow weight, where the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result;
a combining module 304, configured to combine the first slow weight and the second slow weight according to a preset combining formula, to obtain a target update weight;
an iteration module 305 for calculating the target update weight for each iteration stage until the loss function convergence is completed.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Referring to fig. 4, another embodiment of the apparatus for optimizing a loss function according to the embodiment of the present invention includes:
an obtaining module 301, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module 302 is configured to train the machine learning task by using a first optimizer to obtain a first slow weight, where the first slow weight is used to instruct the machine learning task to use the first optimizer to perform iteration to obtain a result;
the second optimization module 303 is configured to train the machine learning task by using a second optimizer to obtain a second slow weight, where the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result;
a combining module 304, configured to combine the first slow weight and the second slow weight according to a preset combining formula, to obtain a target update weight;
an iteration module 305 for calculating the target update weight for each iteration stage until the loss function convergence is completed.
Optionally, the first optimization module 302 includes:
a first selecting unit 3021, configured to randomly select one sample i from n training samples of the machine learning task by using a first optimizers,ise is e {1,2, …, n }, wherein n is an integer greater than 1;
a first calculating unit 3022 for using a first preset formula Wt+1=Wt-ηtgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient in which, among other things,wherein J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) Is an output sample;
the first integrating unit 3023 is configured to perform integration calculation on the values of the k first short-time fast weights to obtain first slow weights, where the first slow weights are used to indicate results obtained after the machine learning task is iterated by using a first optimizer, and k belongs to {2,3, …, n }.
Optionally, the first integration unit 3023 may be further specifically configured to:
obtaining values of k continuous first short-time fast weights, wherein k belongs to {2,3, …, n };
calculating a first slow weight according to a second preset formula and the values of k continuous first short-time fast weights, wherein the second preset formula is as follows:
wherein, t is the current time,is the first slow weight at time t,to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
Optionally, the second optimization module 303 includes:
a second selecting unit 3031, configured to randomly select, by using a second optimizer, one sample i from n training samples of a machine learning task, where i belongs to {1,2, …, n }, and n is an integer greater than 1;
a second calculating unit 3032, configured to calculate i updated second short-time fast weight W 'by using a third preset formula't+1The third preset formula is:
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,is a correction value of the first order momentum term,the expression of (a) is as follows:
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,is mtThe correction value of (a) is determined,is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function for sample i with respect to weight W is specified for time t;
a second integrating unit 3033, configured to perform integration calculation on the values of the k second short-time fast weights to obtain a second slow weight, where the second slow weight is used to indicate a result obtained after the machine learning task is iterated by using a second optimizer, and k belongs to {2,3, …, n }.
Optionally, the second integration unit 3033 may be further specifically configured to:
obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n };
calculating a second slow weight according to a fourth preset formula and values of k continuous second fast weight parameters, wherein the fourth preset formula is as follows:
wherein, t is the current time,is the second slow weight at time t,to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
Optionally, the merging module 304 may be further specifically configured to:
extracting a first slow weight and a second slow weight at the time t, wherein the t belongs to {0,1,2, …, n };
and substituting the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:wherein the content of the first and second substances,the weights are updated for the targets at time t,is the first slow weight at time t,for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
Optionally, the iteration module 305 may be further specifically configured to:
acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and taking the target updating weight of the second iteration stage as the starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
In the embodiment of the invention, the first slow weight calculated by the first optimizer and the second slow weight calculated by the second optimizer are integrated and calculated to obtain the target update weight, and finally iterative calculation is carried out until the loss function is converged, so that the calculation time of calculating the weight and the abnormal jitter of the loss function during convergence are reduced, and the convergence accuracy and the convergence efficiency of the loss function are improved.
Fig. 3 and 4 describe the optimization apparatus of the loss function in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the optimization device of the loss function in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of an optimization apparatus for loss function according to an embodiment of the present invention, where the optimization apparatus 500 for loss function may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the optimization apparatus 500 for a loss function. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the loss function optimization apparatus 500.
The loss function optimization apparatus 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows server, Mac OS X, Unix, Linux, FreeBSD, and the like. It will be appreciated by those skilled in the art that the loss function optimization apparatus configuration shown in fig. 5 does not constitute a limitation of the loss function optimization apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method for optimizing a loss function.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for optimizing a loss function, the method comprising:
obtaining a machine learning task to be optimized, wherein the machine learning task is used for indicating a loss function in a convergence machine learning model;
training the machine learning task by using a first optimizer to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
training the machine learning task by using a second optimizer to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task is iterated by using the second optimizer;
merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
2. The method for optimizing the loss function according to claim 1, wherein the training of the machine learning task by using the first optimizer to obtain a first slow weight, and the instructing of the first slow weight to the result obtained after the machine learning task is iterated by using the first optimizer comprises:
randomly selecting a sample i from n training samples of the machine learning task by using a first optimizers,ise is e {1,2, …, n }, wherein n is an integer greater than 1;
using a first preset formula Wt+1=Wt-ηtgtCalculate isUpdated first ephemeral fast weight Wt+1In the first preset formula, t is the current time, Wtis a weight parameter, η, at time t of the first optimizertTo learn rate, gtIs a gradient in which, among other things,wherein J (W) is the cost function, Δ J (W) is the gradient, X (i)s) To input a sample, Y (i)s) Is an output sample;
and performing integrated calculation on the k first short-time fast weights to obtain a first slow weight, wherein the first slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer to iterate, and k belongs to {2,3, …, n }.
3. The method for optimizing a loss function according to claim 2, wherein the integrating calculation of the k values of the first short-time fast weights results in first slow weights, the first slow weights are used for indicating results obtained after the machine learning task is iterated by using the first optimizer, and k e {2,3, …, n } includes:
obtaining k continuous values of the first short-time fast weight, wherein k belongs to {2,3, …, n };
calculating a first slow weight according to a second preset formula and k continuous values of the first short-time fast weight, wherein the second preset formula is as follows:
wherein, t is the current time,is the first slow weight at time t,to start the starting point of the first short-term weight, WtIs a weight parameter at time t, Wt+kAnd the first slow weight is a weight parameter at the moment of t + k, and is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
4. The method for optimizing the loss function according to claim 1, wherein the training of the machine learning task by using the second optimizer obtains a second slow weight, and the second slow weight is used to instruct the machine learning task to use the second optimizer for iteration to obtain a result including:
randomly selecting a sample i from n training samples of the machine learning task by using a second optimizer, wherein i belongs to {1,2, …, n }, and n is an integer greater than 1;
calculating a second short-time fast weight W 'after i is updated by using a third preset formula't+1The third preset formula is as follows:
where t is the current time, Wt' is a weight parameter at the time t of the second optimizer, η is an initial learning rate, is a numerical stability quantity,is a correction value of the first order momentum term,the expression of (a) is as follows:
where t is the current time, mtIs a first order momentum term, vtis a second order momentum term, β1is a first order momentum decay coefficient, beta2Is a second-order momentum attenuation coefficient,is mtThe correction value of (a) is determined,is v istCorrection value of, gradient gt'=ΔJ(Wt,i') where J (W') is a cost function, Δ J (W)t,i') the gradient of the cost function for sample i with respect to weight W is specified for time t;
and performing integrated calculation on the k second short-time fast weights to obtain a second slow weight, wherein the second slow weight is used for indicating a result obtained after the machine learning task adopts the second optimizer to perform iteration, and k belongs to {2,3, …, n }.
5. The method for optimizing a loss function according to claim 4, wherein the integrating calculation of the k values of the second short-time fast weights is performed to obtain second slow weights, the second slow weights are used to indicate results obtained after the machine learning task is iterated by using the second optimizer, and k e {2,3, …, n } includes:
obtaining values of k continuous second short-time fast weight parameters, wherein k belongs to {2,3, …, n };
calculating a second slow weight according to a fourth preset formula and k continuous values of the second fast weight parameter, wherein the fourth preset formula is as follows:
wherein, t is the current time,is the second slow weight at time t,to start the starting point of the second short-term fast weight, Wt' is a weight parameter at time t, Wt+kAnd the second slow weight is used for indicating a result obtained after the machine learning task adopts the first optimizer for iteration.
6. The method of claim 1, wherein the combining the first slow weight and the second slow weight according to a preset combining formula to obtain a target update weight comprises:
extracting the first slow weight and the second slow weight at the time t, wherein t belongs to {0,1,2, …, n };
substituting the first slow weight and the second slow weight at the time t into a preset combination formula to obtain a target update weight, wherein the preset combination formula is as follows:wherein the content of the first and second substances,the weights are updated for the targets at time t,is the first slow weight at time t,for the second slow weight at time t, α is a coefficient parameter, and the calculation formula of α is as follows:
7. The method of claim 1, wherein the calculating the target update weight for each iteration stage until the loss function convergence is complete comprises:
acquiring a target updating weight of a first iteration stage, taking the target updating weight of the first iteration stage as a starting point of a short-time fast weight of a second iteration stage, and calculating the target updating weight of the second iteration stage;
and taking the target updating weight of the second iteration stage as a starting point of the short-time fast weight of the third iteration stage, calculating the target updating weight of the third iteration stage, and calculating the target updating weights of the rest iteration stages until the convergence of the loss function is completed.
8. An optimization apparatus for a loss function, comprising:
an obtaining module, configured to obtain a machine learning task to be optimized, where the machine learning task is used to indicate a loss function in a converged machine learning model;
the first optimization module is used for training the machine learning task by using a first optimizer to obtain a first slow weight, and the first slow weight is used for indicating a result obtained after the machine learning task is iterated by using the first optimizer;
the second optimization module is used for training the machine learning task by using a second optimizer to obtain a second slow weight, and the second slow weight is used for indicating the machine learning task to adopt the second optimizer to perform iteration to obtain a result;
the merging module is used for merging the first slow weight and the second slow weight according to a preset merging formula to obtain a target updating weight;
and the iteration module is used for calculating the target updating weight of each iteration stage until the convergence of the loss function is completed.
9. An optimization apparatus for a loss function, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor calls the instructions in the memory to cause the optimization device of the loss function to perform the optimization method of the loss function according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of optimizing a loss function according to any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010405723.5A CN111738408A (en) | 2020-05-14 | 2020-05-14 | Method, device and equipment for optimizing loss function and storage medium |
PCT/CN2020/118303 WO2021139237A1 (en) | 2020-05-14 | 2020-09-28 | Method and apparatus for loss function optimization, device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010405723.5A CN111738408A (en) | 2020-05-14 | 2020-05-14 | Method, device and equipment for optimizing loss function and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111738408A true CN111738408A (en) | 2020-10-02 |
Family
ID=72647265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010405723.5A Pending CN111738408A (en) | 2020-05-14 | 2020-05-14 | Method, device and equipment for optimizing loss function and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111738408A (en) |
WO (1) | WO2021139237A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112287089A (en) * | 2020-11-23 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Classification model training and automatic question-answering method and device for automatic question-answering system |
CN113763501A (en) * | 2021-09-08 | 2021-12-07 | 上海壁仞智能科技有限公司 | Iteration method of image reconstruction model and image reconstruction method |
WO2022095432A1 (en) * | 2020-11-05 | 2022-05-12 | 平安科技(深圳)有限公司 | Neural network model training method and apparatus, computer device, and storage medium |
CN114647387A (en) * | 2022-05-23 | 2022-06-21 | 南京道成网络科技有限公司 | Cache optimization method suitable for cloud storage |
CN116237935A (en) * | 2023-02-03 | 2023-06-09 | 兰州大学 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463576B (en) * | 2021-12-24 | 2024-04-09 | 中国科学技术大学 | Network training method based on re-weighting strategy |
CN114254763B (en) * | 2021-12-27 | 2024-04-05 | 西安交通大学 | Machine learning model repairing method, system, computer equipment and storage medium |
CN114692562B (en) * | 2022-03-16 | 2024-05-24 | 北京理工大学 | High-precision hybrid dynamic priority multi-objective optimization method |
CN116630398B (en) * | 2023-07-21 | 2024-05-10 | 浙江大学海南研究院 | Optimizer momentum coefficient regulation and control method based on data set concave-convex characteristic |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909142A (en) * | 2017-11-14 | 2018-04-13 | 深圳先进技术研究院 | A kind of parameter optimization method of neutral net, system and electronic equipment |
KR102068576B1 (en) * | 2018-04-10 | 2020-01-21 | 배재대학교 산학협력단 | Convolutional neural network based image processing system and method |
CN108960318A (en) * | 2018-06-28 | 2018-12-07 | 武汉市哈哈便利科技有限公司 | A kind of commodity recognizer using binocular vision technology for self-service cabinet |
CN111028306B (en) * | 2019-11-06 | 2023-07-14 | 杭州电子科技大学 | AR2U-Net neural network-based rapid magnetic resonance imaging method |
CN111079896A (en) * | 2019-11-15 | 2020-04-28 | 苏州浪潮智能科技有限公司 | Hyper-parameter self-adaptive adjustment method and device |
-
2020
- 2020-05-14 CN CN202010405723.5A patent/CN111738408A/en active Pending
- 2020-09-28 WO PCT/CN2020/118303 patent/WO2021139237A1/en active Application Filing
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022095432A1 (en) * | 2020-11-05 | 2022-05-12 | 平安科技(深圳)有限公司 | Neural network model training method and apparatus, computer device, and storage medium |
CN112287089A (en) * | 2020-11-23 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Classification model training and automatic question-answering method and device for automatic question-answering system |
CN112287089B (en) * | 2020-11-23 | 2022-09-20 | 腾讯科技(深圳)有限公司 | Classification model training and automatic question-answering method and device for automatic question-answering system |
CN113763501A (en) * | 2021-09-08 | 2021-12-07 | 上海壁仞智能科技有限公司 | Iteration method of image reconstruction model and image reconstruction method |
CN113763501B (en) * | 2021-09-08 | 2024-02-27 | 上海壁仞智能科技有限公司 | Iterative method of image reconstruction model and image reconstruction method |
CN114647387A (en) * | 2022-05-23 | 2022-06-21 | 南京道成网络科技有限公司 | Cache optimization method suitable for cloud storage |
CN116237935A (en) * | 2023-02-03 | 2023-06-09 | 兰州大学 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
CN116237935B (en) * | 2023-02-03 | 2023-09-15 | 兰州大学 | Mechanical arm collaborative grabbing method, system, mechanical arm and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2021139237A1 (en) | 2021-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738408A (en) | Method, device and equipment for optimizing loss function and storage medium | |
CN107729322B (en) | Word segmentation method and device and sentence vector generation model establishment method and device | |
CN109947940B (en) | Text classification method, device, terminal and storage medium | |
CN110458287B (en) | Parameter updating method, device, terminal and storage medium of neural network optimizer | |
CN105224959A (en) | The training method of order models and device | |
US8478688B1 (en) | Rapid transaction processing | |
US9967275B1 (en) | Efficient detection of network anomalies | |
CN112101674B (en) | Resource allocation matching method, device, equipment and medium based on group intelligent algorithm | |
CN113434859A (en) | Intrusion detection method, device, equipment and storage medium | |
CN110661727A (en) | Data transmission optimization method and device, computer equipment and storage medium | |
WO2015161782A1 (en) | Method and system for mining churn factor causing user churn for network application | |
CN110851333B (en) | Root partition monitoring method and device and monitoring server | |
CN110880014A (en) | Data processing method and device, computer equipment and storage medium | |
CN112669078A (en) | Behavior prediction model training method, device, equipment and storage medium | |
CN112445690A (en) | Information acquisition method and device and electronic equipment | |
CN106982250A (en) | Information-pushing method and device | |
JP6652398B2 (en) | Parameter selection method, parameter selection program and parameter selection device | |
JP6494258B2 (en) | Prediction system, prediction method, and prediction program | |
CN111461329B (en) | Model training method, device, equipment and readable storage medium | |
JP2016031739A (en) | Production control support device and production control support method | |
JP6203313B2 (en) | Feature selection device, feature selection method, and program | |
Vakili et al. | Delayed feedback in kernel bandits | |
CN114819398A (en) | Beidou satellite clock error sequence combination prediction method based on gray Markov chain | |
CN114912627A (en) | Recommendation model training method, system, computer device and storage medium | |
CN111047016A (en) | Model training method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201002 |