CN115618935A

CN115618935A - Robustness loss function searching method and system for classified task label noise

Info

Publication number: CN115618935A
Application number: CN202211645114.2A
Authority: CN
Inventors: 邓岳; 杜金阳
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-01-17
Anticipated expiration: 2042-12-21
Also published as: CN115618935B

Abstract

The invention discloses a robustness loss function searching method and a robustness loss function searching system for classified task label noise, which comprise a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module; selecting a deep neural network model for classification tasks, and constructing parameterized loss functions with different expansion orders based on a Taylor expansion method; constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function thought on the basis of a verification set sample self-selection strategy combined with self-step learning; and a new loss function formed by the obtained hyper-parameters is used as a loss function which is obtained by final search and has robustness to the tag noise, and the retraining of the model is guided, so that the rapid automatic search of the loss function which is robust to the tag noise is realized, and the simplicity and the portability are realized.

Description

Robustness loss function searching method and system for classified task label noise

Technical Field

The invention relates to the technical field of deep learning, in particular to a robustness loss function searching method and system for classified task label noise.

Background

Currently, tag noise is ubiquitous in real-world real-scene datasets. When these noise labels exist, the deep neural network may have a negative impact on the target task due to the over-fitting problem, such as in the field of medical diagnosis, and the over-fitting of the noise labels may seriously affect the judgment of the doctor.

The traditional loss function has the problems of low convergence speed, poor training effect and the like, and the traditional loss function has no robustness to the label noise or has robustness. Conventional Class Cross Entropy (CCE) contributes more to the gradient of misclassified samples, and is more likely to fit a noise label when there is a true label and a false label, and is therefore not robust to label noise. The Mean Absolute Error (MAE) loss has consistent contribution to the gradient of all samples, so that the method has robustness to the label noise, but cannot provide effective guidance for the model in the training process, and has low convergence speed and poor model training effect.

Manually designed robustness-loss functions are typically based on good a priori knowledge and can introduce hyper-parameters that need to be adjusted manually. The wide cross entropy (GCE) introduces a hyper-parameter q based on the idea of combining the respective advantages of the mean absolute error and the classification cross entropy, wherein the GCE is equivalent to CCE when q is close to 0, and the GCE is equivalent to MAE when q = 1; the Symmetrical Cross Entropy (SCE) defines the inverse cross entropy (RCE) with model prediction as a base point from the point of KL divergence by introducing a coefficient

And

linear combination with CCE is robust to tag noise, but these loss function designs are based on good a priori knowledge, the cost of the design is high, and the introduced hyper-parameters make the application inflexible on different tasks.

The traditional automatic search method of the loss function combined with genetic programming and the like has the problems of high calculation cost, low search speed, discrete search space, need of a verification set without label noise and the like. Based on the idea of AutoML, many studies expect to realize automatic search to a loss function robust to tag noise. In the method of AutoLoss-Zero, a loss function is split into polynomial combinations among different operators, a search space composed of basic operators is defined, a computation graph is used for representing the loss function, the loss function is constructed through basic operation on the graph, and finally, an evolutionary algorithm is adopted for searching the loss function; in addition, a common method is that a group of parameters with good effect on resisting tag noise are searched on a clean verification set by introducing an adjustable parameter group and combining a genetic algorithm method in consideration of certain superiority of the existing loss function; however, the method cannot utilize fast search of gradient descent due to the discrete search space, and the traditional grid search and genetic programming method has high calculation cost and low calculation speed.

Therefore, how to provide an efficient and fast classification task tag noise-oriented robustness loss function search method and system is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a classification task tag noise-oriented robustness loss function searching method and system, and solves the problems that the existing method for designing a loss function with robustness to tag noise is high in cost based on good prior design, introduction of hyper-parameters needing to be manually adjusted and search space dispersion, low in searching speed, needing a noise-free tag verification set and the like.

In order to achieve the purpose, the invention adopts the following technical scheme:

the robustness loss function searching method facing the task label noise comprises the following steps:

s1, selecting a deep neural network model for learning classification tasks with different difficulties;

s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;

s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;

s4, setting a verification set sample self-selection strategy combined with self-learning;

s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;

s6, forming a new loss function by the obtained super parameters to serve as a final search-obtained loss function with robustness to the label noise;

and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function to finish the retraining process of the deep neural network.

In practical application, the classification tasks with different difficulties include simple classification tasks such as handwritten number recognition and complex classification tasks such as a large-scale image multi-classification task.

Preferably, the S2 parameterized classification task loss function

Comprises the following steps:

wherein ,

is the probability that the predicted value of the model is the true value,

is a learnable hyper-parameter, the initial value is a cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.

Preferably, the specific content of S4 includes: setting degradation weights in conjunction with self-learning

Selecting the loss less than or equal to each iteration in the outer layer optimization process

The samples with high confidence coefficient of model classification accuracy are screened out as the samples with correct classification labels, and the number of the verification set samples selected according to the loss is gradually increased along with the increase of the iteration times.

Preferably, the inner layer optimization goal in S5 is to set the parameters of the loss function

Under the conditions of (2), in the training set

Obtaining the optimal response of the model parameters

The goal of skin optimization is based on the current best response

In the verification set

Obtain a measure of the classification performance of the evaluation model

To achieve the optimum parameters

The method specifically comprises the following steps:

and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:

wherein

Is the learning rate at the time of model parameter update,

is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,

is an ultra-gradient.

Preferably, S5 comprises:

s51, setting a arm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;

s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a micro condition in outer layer optimization;

s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;

s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;

s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;

and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guidance model when the double-layer optimization algorithm is finished.

Preferably, the specific content of S53 includes:

(1) Solving the super gradient by combining the implicit function theorem:

the chain rule of derivation from a function can transform the computation of the super-gradient into the form:

wherein ,

is the optimal response of the model parameters in the inner-layer optimization task to the time t hyperparameter theta,

for the gradient of the best response superparameter of the inner layer,

to evaluate the metric function of the robustness effect of the model trained under noisy conditions on class label noise,

is a verification set; the first term on the right of the equation is equal to the constant zero and can be ignored.

Setting loss function

If there is a second derivative to the model parameter ω, then it follows from the implicit function theorem:

wherein ,

for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:

the formula of the supergradient calculation combined with the implicit function theorem is as follows:

from the resulting super-gradient, the loss function parameters are updated:

；

(2) And (3) combining the penalty function idea to construct an auxiliary function of the inequality constraint optimization problem:

wherein ,

in order to be the objective function of the target,

for the constraint function, the auxiliary function is of the form:

wherein, sigma and epsilon are adjustable hyper-parameter and loss function parameter

And model parameters

For explicit existence, when updating the loss function parameters based on the gradient descent mode, the auxiliary function is calculated about the loss function parameters

First order partial derivatives of (1):

。

preferably, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function

Updating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters

。

Preferably, the specific content of S55 is: setting fixed number of training rounds

When the model is trained on the training set

After a round, the parameter search process ends, or when the model is lost on average over the validation set

(Continuous)

When the round rises, the loss function parameter search process is ended.

A robustness loss function search system for classification task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;

the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;

the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;

the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;

the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;

the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;

and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the loss function which is finally searched and has robustness on the classified task label noise:

and the retraining module is used for guiding and training a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function so as to complete the retraining process of the deep neural network.

Preferably, the double-layer optimization module comprises an inner layer optimization unit and an outer layer optimization unit;

an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters

；

An outer optimization unit for optimizing the current optimal response

Calling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement which is obtained on the verification set and enables the classification performance of the evaluation model

To achieve the optimum parameters

。

According to the technical scheme, compared with the prior art, the invention discloses the robustness loss function searching method and system for the classified task label noise, a continuous searching space which is parameterized by a proper amount of variables and has the function capability of representing a wide enough range is constructed in a Taylor expansion mode, and conditions are created for realizing the loss function searching based on gradient descent; the method combines self-step learning and gradient descent based double-layer optimization, realizes searching loss functions in a continuous search space in a gradient descent mode on a verification set with the tag noise, finally realizes quick automatic search of the loss functions with robust tag noise, can be easily deployed in different classification tasks due to the simplicity and portability of a search algorithm, has the characteristic of application flexibility, and provides a new idea for overcoming the tag noise.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of a robustness loss function search method provided by the present invention;

FIG. 2 is a schematic diagram of a double-layer optimization structure for realizing a gradient descent search loss function parameter.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a robustness loss function searching method facing task label noise, which comprises the following steps:

s6, forming a new loss function by the obtained hyper-parameters as a loss function which is obtained by final search and has robustness to the label noise;

In practical applications, in an image classification task, a residual neural network model (ResNet) of different depths can be selected according to the difficulty of learning of provided image data, and when used for an image classification data set easy to learn such as MNIST, several layers of Convolutional Neural Networks (CNNs) can be simply defined as a base model.

In this embodiment, the parameterization of the loss function is implemented by taylor expansion of the conventional classification cross entropy. For the classification problem, the exemplar labels generally use One-hot coding, based on which the cross-entropy loss is expanded into taylor polynomials, resulting in the form:

the coefficients of the jth term in the cross-entropy penalty after expansion cancel the polynomial pairs

The power of the j item after derivation is obtained, and the coefficients of the visually expanded items are not the optimal group of coefficients under the condition that label noise exists, so the combination of N items before the expansion polynomial is taken as a loss function, and the coefficients are set as learnable parameters

Derived parameterized post-loss function

。

In order to further implement the above technical solution, the parameterized classification task loss function in S2

Comprises the following steps:

wherein ,

is the probability that the predicted value of the model is the true value,

is a learnable parameter, the initial value is the cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.

In this embodiment, since the coefficients obtained after taylor expansion of cross entropy loss are constants and the derived coefficients cancel the influence of powers of the coefficients, the non-robustness of cross entropy to task-specific tag noise is caused by unreasonable distribution of the coefficients to the final gradient contribution when calculating the gradient, and thus, in this patent, the coefficients after taylor expansion are used as learnable parameters to find a group of coefficient components robust to tag noise.

The first term is the mean absolute error loss (MAE) robust to tag noise, indicating that the loss function of existing robustness can be one case in the search space; meanwhile, since the polynomial has a characteristic of fitting an arbitrary function, a search space constructed based on the taylor expansion has a capability of representing a sufficiently wide range of functions.

In order to further implement the above technical solution, the specific content of S4 includes: setting degeneration weights in conjunction with self-learning

Is used for evaluating model parameters

Screening out samples with high confidence coefficient of model classification accuracy as samples with correct classification labels to provide supervision information for optimizing outer-layer problems, and gradually reducing the samples with the increase of iteration times

A value of (2)

When the value of (c) goes to 0, the verification set samples selected according to the loss gradually increase.

In order to further implement the above technical solution, the optimization goal of the inner layer in S5 is to set the parameters of the loss function

Under the conditions of (2), in the training set

Obtaining the optimal response of the model parameters

The goal of skin optimization is based on the current best response

In the verification set

Obtain a measure of the classification performance of the evaluation model

To achieve the optimum parameters

The method specifically comprises the following steps:

wherein

Is the learning rate at the time of model parameter update,

is an ultra-gradient.

In order to further implement the above technical solution, S5 includes:

s51, setting a arm-up stage, training the deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;

in the embodiment, in the initial stage of training, the model is not fitted with a sample with an error label, so that the model has good generalization capability, and meanwhile, in order to ensure that the model has classification capability of distinguishing a simple sample when the sample data is selected on the verification set based on a self-learning method, a arm-up stage is arranged in the double-layer optimization process, and in the previous stage, the model has good generalization capability

Using initialized penalty functions in round robin training

Normally guiding to train the network model;

s52, evaluating the performance of the model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a microminiature condition in outer layer optimization;

in this embodiment, based on the verification set samples selected in combination with the self-learning, the cross entropy with good effect in the classification problem is selected as the measurement function

Taking the cross entropy loss of the current model on the selected sample as a verification set metric;

and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guide model when the double-layer optimization algorithm is finished.

In order to further implement the above technical solution, the specific content of S53 includes:

(1) Solving the super gradient by combining the implicit function theorem:

wherein ,

is the optimal response of the model parameters in the inner-layer optimization task to the t-time hyperparameter theta,

for the gradient of the optimum response to the hyperparameter for the inner layer,

is a verification set;

in this embodiment, the model parameter ω and the loss function parameter θ are both directly related gradient terms

Is the gradient of the training set loss versus the model parameters, is a scalar quantity that can be considered as relating to

And

function of (2)

(ii) a When the parameter omega in the inner layer optimization reaches the optimal response, according to the first orderOptimum conditions can be obtained

I.e. by

If the condition of the implicit function theorem is satisfied, then:

setting loss function

There is a second derivative of the model parameter ω, which is obtained according to the implicit function theorem:

wherein ,

from the resulting super-gradient, the loss function parameters are updated:

；

(2) And (3) constructing an auxiliary function by combining the penalty function idea:

in this embodiment, when the parameter ω reaches the optimal response, a first-order optimal condition is satisfied, and the inner-layer optimization target is constructed as a soft constraint condition by combining the idea of penalty function, when the model parameter ω reaches the optimal response

When the optimal response is reached, the model loss values on the training set are sufficiently small, i.e.

The outer optimization goal is to obtain the parameters on the validation set that are best measured

I.e. by

The two-layer optimization problem is formalized as an inequality constraint optimization problem:

constructing an auxiliary function to obtain:

wherein ,

the cross entropy loss function is selected for the target function when the method is specifically realized, namely the loss function obtained by searching guides the trained model to obtain the best robustness to the classified label noise on the verification set,

ensuring the loss of the model on the training set to be sufficiently small for constraining the function, wherein sigma and epsilon are adjustable hyper-parameters, and parameters of the loss function

And model parameters

First order partial derivatives of (1):

。

in order to further implement the above technical solution, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function

。

In order to further implement the above technical solution, the specific content of S55 is: setting fixed number of training rounds

When the model is trained on the training set

(Continuous)

When the round rises, the loss function parameter search process is ended.

The specific embodiment is as follows:

loss functions robust to uniformly distributed tag noise for real object classification such as airplane (airplan), bird (bird), cat (cat), dog (dog), horse (horse), ship (ship), truck (truck), automobile (automobile), deer (deer), and frog (frog) are searched based on the CIFAR10 dataset.

S1, selecting a common 18-layer or 32-layer residual error neural network (ResNet 18, resNet 32) as a basic model of classification prediction for a real object classification problem based on a CIFAR10 data set.

S2, performing Taylor expansion on the cross entropy loss with a good classification guidance effect on the clean data set, intercepting the first 5 items (N) as a loss function, and taking each coefficient as a learnable parameter.

For learnable parameters, the initial value is

。

S3, dividing 50000 training data samples in the CIFAR10 data set into a training set containing 45000 samples

And a validation set comprising 5000 samples

. Meanwhile, as the original CIFAR10 data set does not contain label noise, uniformly distributed noise assumed to exist in the real data with the percentage of p is artificially added to the training set and the verification set, namely, the sample label is randomly converted into other types of labels with the probability of p.

S4, setting degradation weight combined with self-learning

Ensuring in the outer optimization of the iterative processAs the number of iterations increases, the number of validation set samples selected based on the loss also increases.

S5, constructing a double-layer optimized algorithm main body and a total training wheel

：

S51, before setting training process

The round of training is as the arm-up phase, in which the parameters of the loss function are unchanged.

S52, after the arm-up stage is finished, starting outer layer optimization. In each round of training, the model parameters

Each pass through

After training of each batch of training data, according to the current

And loss of

Selecting on a verification set

Of a sample of (1) to obtain

. Then is selected from

Sampling

Verification set data of batches, using a metric function

Evaluating the current model on the selected sample to obtain

。

S53, sampling from training set

Calculating the loss of the current model according to the batch of training data

In combination with the skin metric obtained in S5-2

And updating the parameters of the current loss function based on gradient descent by a loss function parameter updating algorithm based on implicit function theorem or penalty function idea to obtain a group of new loss function parameters.

S54, under the condition of a new loss function, guiding the model to

Training was performed on training data of individual batches.

S55, repeating the steps from S5-2 to S5-4 until the completion

Round of training or loss of model to validation set during training

(Continuous)

When the round rises, the loss function parameter searching process is ended.

S56, outputting a group of hyper-parameters with the best effect when the double-layer optimization algorithm is finished

Loss formed by this set of parameters

As a loss function of robustness to tag noise resulting from the final search.

And S6, according to the searched loss function, guiding to train a new basic model on the training set with noise and the verification set without noise, and completing the retraining process of the deep neural network. The classification model obtained in the process has good robustness on label noise.

The robustness loss function search system for classified task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-walking learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;

specifically, according to the difficulty degree of a classification task, selecting base models with different adaptation degrees as basic deep neural network models of the classification task, wherein the input of the base models is the original data of samples, and the output is the classification result of the models on the samples;

specifically, taylor expansion is carried out on traditional loss functions of different classification tasks, the first N terms of a Taylor expansion polynomial are intercepted and used as loss functions, coefficients of all terms are used as learnable parameters, and the current values are used as parameter initial values, so that parameterization of the loss functions is realized;

specifically, when the measurement of the model on the verification set is calculated in the double-layer optimization algorithm, the number of samples with small loss selected by the model can be gradually increased along with the increase of the number of model training rounds, so that the self-learning from simple samples to difficult samples on the verification set is realized;

specifically, a complex function relation of the model parameters with respect to the loss function parameters is regarded as an implicit function, the derivative of the model parameters to the loss function parameters is solved by using an implicit function theorem, and then the gradient of the measurement of the model on the verification set to the loss function is calculated; constructing an unconstrained optimization auxiliary function through constraint conditions, explicitly obtaining an optimization equation of the measurement on the loss function parameters on the verification set, and calculating the gradient of the measurement on the loss function parameters;

and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the finally searched loss function with robustness to the classified task label noise:

and the retraining module is used for guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function, and finishing the retraining process of the deep neural network.

In order to further implement the technical scheme, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;

；

An outer optimization unit for optimizing the current optimal response

To achieve the optimum parameters

。

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The robustness loss function searching method facing to classification task label noise is characterized by comprising the following steps of:

s6, forming a new loss function by the obtained super parameters to serve as a finally searched loss function with robustness to the label noise;

and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with noise and the verification set without noise according to the searched loss function to finish the retraining process of the deep neural network.

2. The classification task tag noise-oriented robustness loss function searching method as claimed in claim 1, wherein the parameterized classification task loss function in S2

Comprises the following steps:

wherein ,

is the probability that the predicted value of the model is the true value,

3. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 1, wherein the specific content of S4 comprises: setting combinationDegradation weights for self-learning

And (3) screening out samples with high model classification accuracy confidence as samples with correct classification labels, and gradually increasing the number of verification set samples selected according to loss along with the increase of iteration times.

4. The classification-task-tag-noise-oriented robustness loss function searching method as claimed in claim 1, wherein the inner layer optimization goal in S5 is to give a loss function parameter

Under the conditions of (2), in the training set

Obtaining the optimal response of the model parameters

The goal of skin optimization is based on the current best response

In the verification set

Obtain a measure of the classification performance of the evaluation model

To achieve the optimum parameters

The method specifically comprises the following steps:

wherein

Is the learning rate at the time of model parameter update,

is an ultra-gradient.

5. The classification-task-tag-noise-oriented robustness loss function searching method according to claim 1, wherein S5 comprises:

6. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 5, wherein the specific content of S53 includes:

(1) Solving the super-gradient by combining the implicit function theorem:

the chain rule of derivation from a function can transform the computation of the super-gradient into the following form:

wherein ,

in order to be a function of the metric,

in order to verify the set of images,

let a loss function

There is a second order to the model parameter omegaAnd leading, obtaining according to the implicit function theorem:

wherein ,

from the resulting super-gradient, the loss function parameters are updated:

wherein ,

in order to be the objective function of the target,

for the constraint function, the auxiliary function is of the form:

And model parameters

For explicit existence, when updating the parameters of the loss function based on the gradient descent mode, the auxiliary function is calculated about the parameters of the loss function

First order partial derivatives of (1):

。

7. the classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S54 are as follows: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function

。

8. The classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S55 are as follows: setting fixed number of training rounds

When the model is trained on the training set

(Continuous)

When the round rises, the loss function parameter search process is ended.

9. A robustness loss function search system facing classification task label noise is based on the robustness loss function search method facing classification task label noise of any one of claims 1 to 8, and is characterized by comprising a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;

10. The classification-task-tag-noise-oriented robustness loss function search system of claim 9, wherein the two-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;

；

An outer optimization unit for optimizing the current optimal response

Calling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement obtained on the verification set and enabling the evaluation model to classify the performance

To achieve the optimum parameters

。