CN115618935A - Robustness loss function searching method and system for classified task label noise - Google Patents
Robustness loss function searching method and system for classified task label noise Download PDFInfo
- Publication number
- CN115618935A CN115618935A CN202211645114.2A CN202211645114A CN115618935A CN 115618935 A CN115618935 A CN 115618935A CN 202211645114 A CN202211645114 A CN 202211645114A CN 115618935 A CN115618935 A CN 115618935A
- Authority
- CN
- China
- Prior art keywords
- loss function
- noise
- function
- parameters
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000006870 function Effects 0.000 claims abstract description 214
- 238000005457 optimization Methods 0.000 claims abstract description 71
- 238000012795 verification Methods 0.000 claims abstract description 65
- 238000003062 neural network model Methods 0.000 claims abstract description 25
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 230000000694 effects Effects 0.000 claims abstract description 18
- 238000010276 construction Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 65
- 230000004044 response Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 12
- 238000005259 measurement Methods 0.000 claims description 11
- 238000010200 validation analysis Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013210 evaluation model Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000009795 derivation Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 description 3
- 241000282994 Cervidae Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 241000695274 Processa Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007850 degeneration Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000002986 genetic algorithm method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a robustness loss function searching method and a robustness loss function searching system for classified task label noise, which comprise a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module; selecting a deep neural network model for classification tasks, and constructing parameterized loss functions with different expansion orders based on a Taylor expansion method; constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function thought on the basis of a verification set sample self-selection strategy combined with self-step learning; and a new loss function formed by the obtained hyper-parameters is used as a loss function which is obtained by final search and has robustness to the tag noise, and the retraining of the model is guided, so that the rapid automatic search of the loss function which is robust to the tag noise is realized, and the simplicity and the portability are realized.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a robustness loss function searching method and system for classified task label noise.
Background
Currently, tag noise is ubiquitous in real-world real-scene datasets. When these noise labels exist, the deep neural network may have a negative impact on the target task due to the over-fitting problem, such as in the field of medical diagnosis, and the over-fitting of the noise labels may seriously affect the judgment of the doctor.
The traditional loss function has the problems of low convergence speed, poor training effect and the like, and the traditional loss function has no robustness to the label noise or has robustness. Conventional Class Cross Entropy (CCE) contributes more to the gradient of misclassified samples, and is more likely to fit a noise label when there is a true label and a false label, and is therefore not robust to label noise. The Mean Absolute Error (MAE) loss has consistent contribution to the gradient of all samples, so that the method has robustness to the label noise, but cannot provide effective guidance for the model in the training process, and has low convergence speed and poor model training effect.
Manually designed robustness-loss functions are typically based on good a priori knowledge and can introduce hyper-parameters that need to be adjusted manually. The wide cross entropy (GCE) introduces a hyper-parameter q based on the idea of combining the respective advantages of the mean absolute error and the classification cross entropy, wherein the GCE is equivalent to CCE when q is close to 0, and the GCE is equivalent to MAE when q = 1; the Symmetrical Cross Entropy (SCE) defines the inverse cross entropy (RCE) with model prediction as a base point from the point of KL divergence by introducing a coefficientAndlinear combination with CCE is robust to tag noise, but these loss function designs are based on good a priori knowledge, the cost of the design is high, and the introduced hyper-parameters make the application inflexible on different tasks.
The traditional automatic search method of the loss function combined with genetic programming and the like has the problems of high calculation cost, low search speed, discrete search space, need of a verification set without label noise and the like. Based on the idea of AutoML, many studies expect to realize automatic search to a loss function robust to tag noise. In the method of AutoLoss-Zero, a loss function is split into polynomial combinations among different operators, a search space composed of basic operators is defined, a computation graph is used for representing the loss function, the loss function is constructed through basic operation on the graph, and finally, an evolutionary algorithm is adopted for searching the loss function; in addition, a common method is that a group of parameters with good effect on resisting tag noise are searched on a clean verification set by introducing an adjustable parameter group and combining a genetic algorithm method in consideration of certain superiority of the existing loss function; however, the method cannot utilize fast search of gradient descent due to the discrete search space, and the traditional grid search and genetic programming method has high calculation cost and low calculation speed.
Therefore, how to provide an efficient and fast classification task tag noise-oriented robustness loss function search method and system is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a classification task tag noise-oriented robustness loss function searching method and system, and solves the problems that the existing method for designing a loss function with robustness to tag noise is high in cost based on good prior design, introduction of hyper-parameters needing to be manually adjusted and search space dispersion, low in searching speed, needing a noise-free tag verification set and the like.
In order to achieve the purpose, the invention adopts the following technical scheme:
the robustness loss function searching method facing the task label noise comprises the following steps:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a final search-obtained loss function with robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function to finish the retraining process of the deep neural network.
In practical application, the classification tasks with different difficulties include simple classification tasks such as handwritten number recognition and complex classification tasks such as a large-scale image multi-classification task.
wherein ,is the probability that the predicted value of the model is the true value,is a learnable hyper-parameter, the initial value is a cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.
Preferably, the specific content of S4 includes: setting degradation weights in conjunction with self-learningSelecting the loss less than or equal to each iteration in the outer layer optimization processThe samples with high confidence coefficient of model classification accuracy are screened out as the samples with correct classification labels, and the number of the verification set samples selected according to the loss is gradually increased along with the increase of the iteration times.
Preferably, the inner layer optimization goal in S5 is to set the parameters of the loss functionUnder the conditions of (2), in the training setObtaining the optimal response of the model parametersThe goal of skin optimization is based on the current best responseIn the verification setObtain a measure of the classification performance of the evaluation modelTo achieve the optimum parametersThe method specifically comprises the following steps:
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
wherein Is the learning rate at the time of model parameter update,is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,is an ultra-gradient.
Preferably, S5 comprises:
s51, setting a arm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a micro condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guidance model when the double-layer optimization algorithm is finished.
Preferably, the specific content of S53 includes:
(1) Solving the super gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the form:
wherein ,is the optimal response of the model parameters in the inner-layer optimization task to the time t hyperparameter theta,for the gradient of the best response superparameter of the inner layer,to evaluate the metric function of the robustness effect of the model trained under noisy conditions on class label noise,is a verification set; the first term on the right of the equation is equal to the constant zero and can be ignored.
Setting loss functionIf there is a second derivative to the model parameter ω, then it follows from the implicit function theorem:
wherein ,for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
from the resulting super-gradient, the loss function parameters are updated:
(2) And (3) combining the penalty function idea to construct an auxiliary function of the inequality constraint optimization problem:
wherein ,in order to be the objective function of the target,for the constraint function, the auxiliary function is of the form:
wherein, sigma and epsilon are adjustable hyper-parameter and loss function parameterAnd model parametersFor explicit existence, when updating the loss function parameters based on the gradient descent mode, the auxiliary function is calculated about the loss function parametersFirst order partial derivatives of (1):
preferably, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss functionUpdating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters。
Preferably, the specific content of S55 is: setting fixed number of training roundsWhen the model is trained on the training setAfter a round, the parameter search process ends, or when the model is lost on average over the validation set(Continuous)When the round rises, the loss function parameter search process is ended.
A robustness loss function search system for classification task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the loss function which is finally searched and has robustness on the classified task label noise:
and the retraining module is used for guiding and training a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function so as to complete the retraining process of the deep neural network.
Preferably, the double-layer optimization module comprises an inner layer optimization unit and an outer layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters;
An outer optimization unit for optimizing the current optimal responseCalling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement which is obtained on the verification set and enables the classification performance of the evaluation modelTo achieve the optimum parameters。
According to the technical scheme, compared with the prior art, the invention discloses the robustness loss function searching method and system for the classified task label noise, a continuous searching space which is parameterized by a proper amount of variables and has the function capability of representing a wide enough range is constructed in a Taylor expansion mode, and conditions are created for realizing the loss function searching based on gradient descent; the method combines self-step learning and gradient descent based double-layer optimization, realizes searching loss functions in a continuous search space in a gradient descent mode on a verification set with the tag noise, finally realizes quick automatic search of the loss functions with robust tag noise, can be easily deployed in different classification tasks due to the simplicity and portability of a search algorithm, has the characteristic of application flexibility, and provides a new idea for overcoming the tag noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of a robustness loss function search method provided by the present invention;
FIG. 2 is a schematic diagram of a double-layer optimization structure for realizing a gradient descent search loss function parameter.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a robustness loss function searching method facing task label noise, which comprises the following steps:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained hyper-parameters as a loss function which is obtained by final search and has robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function to finish the retraining process of the deep neural network.
In practical applications, in an image classification task, a residual neural network model (ResNet) of different depths can be selected according to the difficulty of learning of provided image data, and when used for an image classification data set easy to learn such as MNIST, several layers of Convolutional Neural Networks (CNNs) can be simply defined as a base model.
In this embodiment, the parameterization of the loss function is implemented by taylor expansion of the conventional classification cross entropy. For the classification problem, the exemplar labels generally use One-hot coding, based on which the cross-entropy loss is expanded into taylor polynomials, resulting in the form:
the coefficients of the jth term in the cross-entropy penalty after expansion cancel the polynomial pairsThe power of the j item after derivation is obtained, and the coefficients of the visually expanded items are not the optimal group of coefficients under the condition that label noise exists, so the combination of N items before the expansion polynomial is taken as a loss function, and the coefficients are set as learnable parametersDerived parameterized post-loss function。
In order to further implement the above technical solution, the parameterized classification task loss function in S2Comprises the following steps:
wherein ,is the probability that the predicted value of the model is the true value,is a learnable parameter, the initial value is the cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.
In this embodiment, since the coefficients obtained after taylor expansion of cross entropy loss are constants and the derived coefficients cancel the influence of powers of the coefficients, the non-robustness of cross entropy to task-specific tag noise is caused by unreasonable distribution of the coefficients to the final gradient contribution when calculating the gradient, and thus, in this patent, the coefficients after taylor expansion are used as learnable parameters to find a group of coefficient components robust to tag noise.
The first term is the mean absolute error loss (MAE) robust to tag noise, indicating that the loss function of existing robustness can be one case in the search space; meanwhile, since the polynomial has a characteristic of fitting an arbitrary function, a search space constructed based on the taylor expansion has a capability of representing a sufficiently wide range of functions.
In order to further implement the above technical solution, the specific content of S4 includes: setting degeneration weights in conjunction with self-learningSelecting the loss less than or equal to each iteration in the outer layer optimization processIs used for evaluating model parametersScreening out samples with high confidence coefficient of model classification accuracy as samples with correct classification labels to provide supervision information for optimizing outer-layer problems, and gradually reducing the samples with the increase of iteration timesA value of (2)When the value of (c) goes to 0, the verification set samples selected according to the loss gradually increase.
In order to further implement the above technical solution, the optimization goal of the inner layer in S5 is to set the parameters of the loss functionUnder the conditions of (2), in the training setObtaining the optimal response of the model parametersThe goal of skin optimization is based on the current best responseIn the verification setObtain a measure of the classification performance of the evaluation modelTo achieve the optimum parametersThe method specifically comprises the following steps:
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
wherein Is the learning rate at the time of model parameter update,is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,is an ultra-gradient.
In order to further implement the above technical solution, S5 includes:
s51, setting a arm-up stage, training the deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
in the embodiment, in the initial stage of training, the model is not fitted with a sample with an error label, so that the model has good generalization capability, and meanwhile, in order to ensure that the model has classification capability of distinguishing a simple sample when the sample data is selected on the verification set based on a self-learning method, a arm-up stage is arranged in the double-layer optimization process, and in the previous stage, the model has good generalization capabilityUsing initialized penalty functions in round robin trainingNormally guiding to train the network model;
s52, evaluating the performance of the model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a microminiature condition in outer layer optimization;
in this embodiment, based on the verification set samples selected in combination with the self-learning, the cross entropy with good effect in the classification problem is selected as the measurement functionTaking the cross entropy loss of the current model on the selected sample as a verification set metric;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guide model when the double-layer optimization algorithm is finished.
In order to further implement the above technical solution, the specific content of S53 includes:
(1) Solving the super gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the form:
wherein ,is the optimal response of the model parameters in the inner-layer optimization task to the t-time hyperparameter theta,for the gradient of the optimum response to the hyperparameter for the inner layer,to evaluate the metric function of the robustness effect of the model trained under noisy conditions on class label noise,is a verification set;
in this embodiment, the model parameter ω and the loss function parameter θ are both directly related gradient termsIs the gradient of the training set loss versus the model parameters, is a scalar quantity that can be considered as relating toAndfunction of (2)(ii) a When the parameter omega in the inner layer optimization reaches the optimal response, according to the first orderOptimum conditions can be obtainedI.e. byIf the condition of the implicit function theorem is satisfied, then:
setting loss functionThere is a second derivative of the model parameter ω, which is obtained according to the implicit function theorem:
wherein ,for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
from the resulting super-gradient, the loss function parameters are updated:
(2) And (3) constructing an auxiliary function by combining the penalty function idea:
in this embodiment, when the parameter ω reaches the optimal response, a first-order optimal condition is satisfied, and the inner-layer optimization target is constructed as a soft constraint condition by combining the idea of penalty function, when the model parameter ω reaches the optimal responseWhen the optimal response is reached, the model loss values on the training set are sufficiently small, i.e.The outer optimization goal is to obtain the parameters on the validation set that are best measuredI.e. byThe two-layer optimization problem is formalized as an inequality constraint optimization problem:
constructing an auxiliary function to obtain:
wherein ,the cross entropy loss function is selected for the target function when the method is specifically realized, namely the loss function obtained by searching guides the trained model to obtain the best robustness to the classified label noise on the verification set,ensuring the loss of the model on the training set to be sufficiently small for constraining the function, wherein sigma and epsilon are adjustable hyper-parameters, and parameters of the loss functionAnd model parametersFor explicit existence, when updating the loss function parameters based on the gradient descent mode, the auxiliary function is calculated about the loss function parametersFirst order partial derivatives of (1):
in order to further implement the above technical solution, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss functionUpdating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters。
In order to further implement the above technical solution, the specific content of S55 is: setting fixed number of training roundsWhen the model is trained on the training setAfter a round, the parameter search process ends, or when the model is lost on average over the validation set(Continuous)When the round rises, the loss function parameter search process is ended.
The specific embodiment is as follows:
loss functions robust to uniformly distributed tag noise for real object classification such as airplane (airplan), bird (bird), cat (cat), dog (dog), horse (horse), ship (ship), truck (truck), automobile (automobile), deer (deer), and frog (frog) are searched based on the CIFAR10 dataset.
S1, selecting a common 18-layer or 32-layer residual error neural network (ResNet 18, resNet 32) as a basic model of classification prediction for a real object classification problem based on a CIFAR10 data set.
S2, performing Taylor expansion on the cross entropy loss with a good classification guidance effect on the clean data set, intercepting the first 5 items (N) as a loss function, and taking each coefficient as a learnable parameter.
S3, dividing 50000 training data samples in the CIFAR10 data set into a training set containing 45000 samplesAnd a validation set comprising 5000 samples. Meanwhile, as the original CIFAR10 data set does not contain label noise, uniformly distributed noise assumed to exist in the real data with the percentage of p is artificially added to the training set and the verification set, namely, the sample label is randomly converted into other types of labels with the probability of p.
S4, setting degradation weight combined with self-learningEnsuring in the outer optimization of the iterative processAs the number of iterations increases, the number of validation set samples selected based on the loss also increases.
S51, before setting training processThe round of training is as the arm-up phase, in which the parameters of the loss function are unchanged.
S52, after the arm-up stage is finished, starting outer layer optimization. In each round of training, the model parametersEach pass throughAfter training of each batch of training data, according to the currentAnd loss ofSelecting on a verification setOf a sample of (1) to obtain. Then is selected fromSamplingVerification set data of batches, using a metric functionEvaluating the current model on the selected sample to obtain。
S53, sampling from training setCalculating the loss of the current model according to the batch of training dataIn combination with the skin metric obtained in S5-2And updating the parameters of the current loss function based on gradient descent by a loss function parameter updating algorithm based on implicit function theorem or penalty function idea to obtain a group of new loss function parameters.
S54, under the condition of a new loss function, guiding the model toTraining was performed on training data of individual batches.
S55, repeating the steps from S5-2 to S5-4 until the completionRound of training or loss of model to validation set during training(Continuous)When the round rises, the loss function parameter searching process is ended.
S56, outputting a group of hyper-parameters with the best effect when the double-layer optimization algorithm is finishedLoss formed by this set of parametersAs a loss function of robustness to tag noise resulting from the final search.
And S6, according to the searched loss function, guiding to train a new basic model on the training set with noise and the verification set without noise, and completing the retraining process of the deep neural network. The classification model obtained in the process has good robustness on label noise.
The robustness loss function search system for classified task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-walking learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
specifically, according to the difficulty degree of a classification task, selecting base models with different adaptation degrees as basic deep neural network models of the classification task, wherein the input of the base models is the original data of samples, and the output is the classification result of the models on the samples;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
specifically, taylor expansion is carried out on traditional loss functions of different classification tasks, the first N terms of a Taylor expansion polynomial are intercepted and used as loss functions, coefficients of all terms are used as learnable parameters, and the current values are used as parameter initial values, so that parameterization of the loss functions is realized;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
specifically, when the measurement of the model on the verification set is calculated in the double-layer optimization algorithm, the number of samples with small loss selected by the model can be gradually increased along with the increase of the number of model training rounds, so that the self-learning from simple samples to difficult samples on the verification set is realized;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
specifically, a complex function relation of the model parameters with respect to the loss function parameters is regarded as an implicit function, the derivative of the model parameters to the loss function parameters is solved by using an implicit function theorem, and then the gradient of the measurement of the model on the verification set to the loss function is calculated; constructing an unconstrained optimization auxiliary function through constraint conditions, explicitly obtaining an optimization equation of the measurement on the loss function parameters on the verification set, and calculating the gradient of the measurement on the loss function parameters;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the finally searched loss function with robustness to the classified task label noise:
and the retraining module is used for guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function, and finishing the retraining process of the deep neural network.
In order to further implement the technical scheme, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters;
An outer optimization unit for optimizing the current optimal responseCalling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement which is obtained on the verification set and enables the classification performance of the evaluation modelTo achieve the optimum parameters。
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The robustness loss function searching method facing to classification task label noise is characterized by comprising the following steps of:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a finally searched loss function with robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with noise and the verification set without noise according to the searched loss function to finish the retraining process of the deep neural network.
2. The classification task tag noise-oriented robustness loss function searching method as claimed in claim 1, wherein the parameterized classification task loss function in S2Comprises the following steps:
3. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 1, wherein the specific content of S4 comprises: setting combinationDegradation weights for self-learningSelecting the loss less than or equal to each iteration in the outer layer optimization processAnd (3) screening out samples with high model classification accuracy confidence as samples with correct classification labels, and gradually increasing the number of verification set samples selected according to loss along with the increase of iteration times.
4. The classification-task-tag-noise-oriented robustness loss function searching method as claimed in claim 1, wherein the inner layer optimization goal in S5 is to give a loss function parameterUnder the conditions of (2), in the training setObtaining the optimal response of the model parametersThe goal of skin optimization is based on the current best responseIn the verification setObtain a measure of the classification performance of the evaluation modelTo achieve the optimum parametersThe method specifically comprises the following steps:
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
5. The classification-task-tag-noise-oriented robustness loss function searching method according to claim 1, wherein S5 comprises:
s51, setting a arm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
s52, evaluating the performance of the model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a microminiature condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guide model when the double-layer optimization algorithm is finished.
6. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 5, wherein the specific content of S53 includes:
(1) Solving the super-gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the following form:
wherein ,is the optimal response of the model parameters in the inner-layer optimization task to the time t hyperparameter theta,for the gradient of the optimum response to the hyperparameter for the inner layer,in order to be a function of the metric,in order to verify the set of images,
let a loss functionThere is a second order to the model parameter omegaAnd leading, obtaining according to the implicit function theorem:
wherein ,for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
from the resulting super-gradient, the loss function parameters are updated:
(2) And (3) combining the penalty function idea to construct an auxiliary function of the inequality constraint optimization problem:
wherein ,in order to be the objective function of the target,for the constraint function, the auxiliary function is of the form:
wherein, sigma and epsilon are adjustable hyper-parameter and loss function parameterAnd model parametersFor explicit existence, when updating the parameters of the loss function based on the gradient descent mode, the auxiliary function is calculated about the parameters of the loss functionFirst order partial derivatives of (1):
7. the classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S54 are as follows: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss functionUpdating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters。
8. The classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S55 are as follows: setting fixed number of training roundsWhen the model is trained on the training setAfter a round, the parameter search process ends, or when the model is lost on average over the validation set(Continuous)When the round rises, the loss function parameter search process is ended.
9. A robustness loss function search system facing classification task label noise is based on the robustness loss function search method facing classification task label noise of any one of claims 1 to 8, and is characterized by comprising a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the loss function which is finally searched and has robustness on the classified task label noise:
and the retraining module is used for guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function, and finishing the retraining process of the deep neural network.
10. The classification-task-tag-noise-oriented robustness loss function search system of claim 9, wherein the two-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters;
An outer optimization unit for optimizing the current optimal responseCalling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement obtained on the verification set and enabling the evaluation model to classify the performanceTo achieve the optimum parameters。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211645114.2A CN115618935B (en) | 2022-12-21 | 2022-12-21 | Robustness loss function searching method and system for classification task tag noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211645114.2A CN115618935B (en) | 2022-12-21 | 2022-12-21 | Robustness loss function searching method and system for classification task tag noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115618935A true CN115618935A (en) | 2023-01-17 |
CN115618935B CN115618935B (en) | 2023-05-05 |
Family
ID=84879818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211645114.2A Active CN115618935B (en) | 2022-12-21 | 2022-12-21 | Robustness loss function searching method and system for classification task tag noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115618935B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106446927A (en) * | 2016-07-07 | 2017-02-22 | 浙江大学 | Self-paced reinforcement image classification method and system |
CN109242028A (en) * | 2018-09-19 | 2019-01-18 | 西安电子科技大学 | SAR image classification method based on 2D-PCA and convolutional neural networks |
CN110110780A (en) * | 2019-04-30 | 2019-08-09 | 南开大学 | A kind of picture classification method based on confrontation neural network and magnanimity noise data |
CN112101328A (en) * | 2020-11-19 | 2020-12-18 | 四川新网银行股份有限公司 | Method for identifying and processing label noise in deep learning |
US20210241096A1 (en) * | 2018-04-22 | 2021-08-05 | Technion Research & Development Foundation Limited | System and method for emulating quantization noise for a neural network |
CN113537389A (en) * | 2021-08-05 | 2021-10-22 | 京东科技信息技术有限公司 | Robust image classification method and device based on model embedding |
CN114201632A (en) * | 2022-02-18 | 2022-03-18 | 南京航空航天大学 | Label noisy data set amplification method for multi-label target detection task |
CN114445662A (en) * | 2022-01-25 | 2022-05-06 | 南京理工大学 | Robust image classification method and system based on label embedding |
-
2022
- 2022-12-21 CN CN202211645114.2A patent/CN115618935B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106446927A (en) * | 2016-07-07 | 2017-02-22 | 浙江大学 | Self-paced reinforcement image classification method and system |
US20210241096A1 (en) * | 2018-04-22 | 2021-08-05 | Technion Research & Development Foundation Limited | System and method for emulating quantization noise for a neural network |
CN109242028A (en) * | 2018-09-19 | 2019-01-18 | 西安电子科技大学 | SAR image classification method based on 2D-PCA and convolutional neural networks |
CN110110780A (en) * | 2019-04-30 | 2019-08-09 | 南开大学 | A kind of picture classification method based on confrontation neural network and magnanimity noise data |
CN112101328A (en) * | 2020-11-19 | 2020-12-18 | 四川新网银行股份有限公司 | Method for identifying and processing label noise in deep learning |
CN113537389A (en) * | 2021-08-05 | 2021-10-22 | 京东科技信息技术有限公司 | Robust image classification method and device based on model embedding |
CN114445662A (en) * | 2022-01-25 | 2022-05-06 | 南京理工大学 | Robust image classification method and system based on label embedding |
CN114201632A (en) * | 2022-02-18 | 2022-03-18 | 南京航空航天大学 | Label noisy data set amplification method for multi-label target detection task |
Also Published As
Publication number | Publication date |
---|---|
CN115618935B (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gonzalez et al. | Improved training speed, accuracy, and data utilization through loss function optimization | |
Luketina et al. | Scalable gradient-based tuning of continuous regularization hyperparameters | |
US10832123B2 (en) | Compression of deep neural networks with proper use of mask | |
Garro et al. | Designing artificial neural networks using particle swarm optimization algorithms | |
US20200104688A1 (en) | Methods and systems for neural architecture search | |
US10460236B2 (en) | Neural network learning device | |
CN110909926A (en) | TCN-LSTM-based solar photovoltaic power generation prediction method | |
US20150134578A1 (en) | Discriminator, discrimination program, and discrimination method | |
KR20180120056A (en) | Method and system for pre-processing machine learning data | |
CN111368885B (en) | Gas circuit fault diagnosis method for aircraft engine | |
Salama et al. | A novel ant colony algorithm for building neural network topologies | |
CN112557034B (en) | Bearing fault diagnosis method based on PCA _ CNNS | |
CN114399032A (en) | Method and system for predicting metering error of electric energy meter | |
Moldovan et al. | Chicken swarm optimization and deep learning for manufacturing processes | |
JP7214863B2 (en) | Computer architecture for artificial image generation | |
CN114463540A (en) | Segmenting images using neural networks | |
US20200219008A1 (en) | Discrete learning structure | |
CN108228978B (en) | Xgboost time sequence prediction method combined with complementary set empirical mode decomposition | |
Zubair et al. | Performance enhancement of adaptive neural networks based on learning rate | |
Yamada et al. | Weight Features for Predicting Future Model Performance of Deep Neural Networks. | |
CN111967567A (en) | Neural network with layer for solving semi-definite programming | |
CN115618935A (en) | Robustness loss function searching method and system for classified task label noise | |
CN107229944B (en) | Semi-supervised active identification method based on cognitive information particles | |
CN115238874A (en) | Quantization factor searching method and device, computer equipment and storage medium | |
US20240020531A1 (en) | System and Method for Transforming a Trained Artificial Intelligence Model Into a Trustworthy Artificial Intelligence Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |