CN115618935A - Robustness loss function searching method and system for classified task label noise - Google Patents

Robustness loss function searching method and system for classified task label noise Download PDF

Info

Publication number
CN115618935A
CN115618935A CN202211645114.2A CN202211645114A CN115618935A CN 115618935 A CN115618935 A CN 115618935A CN 202211645114 A CN202211645114 A CN 202211645114A CN 115618935 A CN115618935 A CN 115618935A
Authority
CN
China
Prior art keywords
loss function
noise
function
parameters
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211645114.2A
Other languages
Chinese (zh)
Other versions
CN115618935B (en
Inventor
邓岳
杜金阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202211645114.2A priority Critical patent/CN115618935B/en
Publication of CN115618935A publication Critical patent/CN115618935A/en
Application granted granted Critical
Publication of CN115618935B publication Critical patent/CN115618935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a robustness loss function searching method and a robustness loss function searching system for classified task label noise, which comprise a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module; selecting a deep neural network model for classification tasks, and constructing parameterized loss functions with different expansion orders based on a Taylor expansion method; constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function thought on the basis of a verification set sample self-selection strategy combined with self-step learning; and a new loss function formed by the obtained hyper-parameters is used as a loss function which is obtained by final search and has robustness to the tag noise, and the retraining of the model is guided, so that the rapid automatic search of the loss function which is robust to the tag noise is realized, and the simplicity and the portability are realized.

Description

Robustness loss function searching method and system for classified task label noise
Technical Field
The invention relates to the technical field of deep learning, in particular to a robustness loss function searching method and system for classified task label noise.
Background
Currently, tag noise is ubiquitous in real-world real-scene datasets. When these noise labels exist, the deep neural network may have a negative impact on the target task due to the over-fitting problem, such as in the field of medical diagnosis, and the over-fitting of the noise labels may seriously affect the judgment of the doctor.
The traditional loss function has the problems of low convergence speed, poor training effect and the like, and the traditional loss function has no robustness to the label noise or has robustness. Conventional Class Cross Entropy (CCE) contributes more to the gradient of misclassified samples, and is more likely to fit a noise label when there is a true label and a false label, and is therefore not robust to label noise. The Mean Absolute Error (MAE) loss has consistent contribution to the gradient of all samples, so that the method has robustness to the label noise, but cannot provide effective guidance for the model in the training process, and has low convergence speed and poor model training effect.
Manually designed robustness-loss functions are typically based on good a priori knowledge and can introduce hyper-parameters that need to be adjusted manually. The wide cross entropy (GCE) introduces a hyper-parameter q based on the idea of combining the respective advantages of the mean absolute error and the classification cross entropy, wherein the GCE is equivalent to CCE when q is close to 0, and the GCE is equivalent to MAE when q = 1; the Symmetrical Cross Entropy (SCE) defines the inverse cross entropy (RCE) with model prediction as a base point from the point of KL divergence by introducing a coefficient
Figure 135286DEST_PATH_IMAGE001
And
Figure 693306DEST_PATH_IMAGE002
linear combination with CCE is robust to tag noise, but these loss function designs are based on good a priori knowledge, the cost of the design is high, and the introduced hyper-parameters make the application inflexible on different tasks.
The traditional automatic search method of the loss function combined with genetic programming and the like has the problems of high calculation cost, low search speed, discrete search space, need of a verification set without label noise and the like. Based on the idea of AutoML, many studies expect to realize automatic search to a loss function robust to tag noise. In the method of AutoLoss-Zero, a loss function is split into polynomial combinations among different operators, a search space composed of basic operators is defined, a computation graph is used for representing the loss function, the loss function is constructed through basic operation on the graph, and finally, an evolutionary algorithm is adopted for searching the loss function; in addition, a common method is that a group of parameters with good effect on resisting tag noise are searched on a clean verification set by introducing an adjustable parameter group and combining a genetic algorithm method in consideration of certain superiority of the existing loss function; however, the method cannot utilize fast search of gradient descent due to the discrete search space, and the traditional grid search and genetic programming method has high calculation cost and low calculation speed.
Therefore, how to provide an efficient and fast classification task tag noise-oriented robustness loss function search method and system is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a classification task tag noise-oriented robustness loss function searching method and system, and solves the problems that the existing method for designing a loss function with robustness to tag noise is high in cost based on good prior design, introduction of hyper-parameters needing to be manually adjusted and search space dispersion, low in searching speed, needing a noise-free tag verification set and the like.
In order to achieve the purpose, the invention adopts the following technical scheme:
the robustness loss function searching method facing the task label noise comprises the following steps:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a final search-obtained loss function with robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function to finish the retraining process of the deep neural network.
In practical application, the classification tasks with different difficulties include simple classification tasks such as handwritten number recognition and complex classification tasks such as a large-scale image multi-classification task.
Preferably, the S2 parameterized classification task loss function
Figure 815983DEST_PATH_IMAGE003
Comprises the following steps:
Figure 916794DEST_PATH_IMAGE004
wherein ,
Figure 833934DEST_PATH_IMAGE005
is the probability that the predicted value of the model is the true value,
Figure 285775DEST_PATH_IMAGE006
is a learnable hyper-parameter, the initial value is a cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.
Preferably, the specific content of S4 includes: setting degradation weights in conjunction with self-learning
Figure 946564DEST_PATH_IMAGE007
Selecting the loss less than or equal to each iteration in the outer layer optimization process
Figure 26515DEST_PATH_IMAGE008
The samples with high confidence coefficient of model classification accuracy are screened out as the samples with correct classification labels, and the number of the verification set samples selected according to the loss is gradually increased along with the increase of the iteration times.
Preferably, the inner layer optimization goal in S5 is to set the parameters of the loss function
Figure 255503DEST_PATH_IMAGE009
Under the conditions of (2), in the training set
Figure 53694DEST_PATH_IMAGE010
Obtaining the optimal response of the model parameters
Figure 518174DEST_PATH_IMAGE011
The goal of skin optimization is based on the current best response
Figure 829463DEST_PATH_IMAGE011
In the verification set
Figure 353985DEST_PATH_IMAGE012
Obtain a measure of the classification performance of the evaluation model
Figure 514839DEST_PATH_IMAGE013
To achieve the optimum parameters
Figure 517430DEST_PATH_IMAGE014
The method specifically comprises the following steps:
Figure 571974DEST_PATH_IMAGE015
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
Figure 142763DEST_PATH_IMAGE016
Figure 384389DEST_PATH_IMAGE017
wherein
Figure 456250DEST_PATH_IMAGE018
Is the learning rate at the time of model parameter update,
Figure 240667DEST_PATH_IMAGE019
is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,
Figure 106991DEST_PATH_IMAGE020
is an ultra-gradient.
Preferably, S5 comprises:
s51, setting a arm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
s52, evaluating the performance of a model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a micro condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guidance model when the double-layer optimization algorithm is finished.
Preferably, the specific content of S53 includes:
(1) Solving the super gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the form:
Figure 101492DEST_PATH_IMAGE021
wherein ,
Figure 85366DEST_PATH_IMAGE022
is the optimal response of the model parameters in the inner-layer optimization task to the time t hyperparameter theta,
Figure 848923DEST_PATH_IMAGE023
for the gradient of the best response superparameter of the inner layer,
Figure 620570DEST_PATH_IMAGE013
to evaluate the metric function of the robustness effect of the model trained under noisy conditions on class label noise,
Figure 243312DEST_PATH_IMAGE012
is a verification set; the first term on the right of the equation is equal to the constant zero and can be ignored.
Setting loss function
Figure 656976DEST_PATH_IMAGE024
If there is a second derivative to the model parameter ω, then it follows from the implicit function theorem:
Figure 415985DEST_PATH_IMAGE025
wherein ,
Figure 889691DEST_PATH_IMAGE026
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
Figure 734151DEST_PATH_IMAGE027
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
Figure 328336DEST_PATH_IMAGE028
from the resulting super-gradient, the loss function parameters are updated:
Figure 597644DEST_PATH_IMAGE029
(2) And (3) combining the penalty function idea to construct an auxiliary function of the inequality constraint optimization problem:
Figure 586460DEST_PATH_IMAGE030
wherein ,
Figure 42849DEST_PATH_IMAGE031
in order to be the objective function of the target,
Figure 643988DEST_PATH_IMAGE032
for the constraint function, the auxiliary function is of the form:
Figure 343302DEST_PATH_IMAGE033
wherein, sigma and epsilon are adjustable hyper-parameter and loss function parameter
Figure 565336DEST_PATH_IMAGE009
And model parameters
Figure 243442DEST_PATH_IMAGE034
For explicit existence, when updating the loss function parameters based on the gradient descent mode, the auxiliary function is calculated about the loss function parameters
Figure 677965DEST_PATH_IMAGE009
First order partial derivatives of (1):
Figure 125127DEST_PATH_IMAGE035
preferably, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function
Figure 485439DEST_PATH_IMAGE036
Updating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters
Figure 916421DEST_PATH_IMAGE011
Preferably, the specific content of S55 is: setting fixed number of training rounds
Figure 748110DEST_PATH_IMAGE037
When the model is trained on the training set
Figure 190724DEST_PATH_IMAGE037
After a round, the parameter search process ends, or when the model is lost on average over the validation set
Figure 82457DEST_PATH_IMAGE038
(Continuous)
Figure 876100DEST_PATH_IMAGE039
When the round rises, the loss function parameter search process is ended.
A robustness loss function search system for classification task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the loss function which is finally searched and has robustness on the classified task label noise:
and the retraining module is used for guiding and training a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function so as to complete the retraining process of the deep neural network.
Preferably, the double-layer optimization module comprises an inner layer optimization unit and an outer layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters
Figure 245902DEST_PATH_IMAGE011
An outer optimization unit for optimizing the current optimal response
Figure 933235DEST_PATH_IMAGE011
Calling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement which is obtained on the verification set and enables the classification performance of the evaluation model
Figure 871235DEST_PATH_IMAGE013
To achieve the optimum parameters
Figure 11230DEST_PATH_IMAGE014
According to the technical scheme, compared with the prior art, the invention discloses the robustness loss function searching method and system for the classified task label noise, a continuous searching space which is parameterized by a proper amount of variables and has the function capability of representing a wide enough range is constructed in a Taylor expansion mode, and conditions are created for realizing the loss function searching based on gradient descent; the method combines self-step learning and gradient descent based double-layer optimization, realizes searching loss functions in a continuous search space in a gradient descent mode on a verification set with the tag noise, finally realizes quick automatic search of the loss functions with robust tag noise, can be easily deployed in different classification tasks due to the simplicity and portability of a search algorithm, has the characteristic of application flexibility, and provides a new idea for overcoming the tag noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of a robustness loss function search method provided by the present invention;
FIG. 2 is a schematic diagram of a double-layer optimization structure for realizing a gradient descent search loss function parameter.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a robustness loss function searching method facing task label noise, which comprises the following steps:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained hyper-parameters as a loss function which is obtained by final search and has robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function to finish the retraining process of the deep neural network.
In practical applications, in an image classification task, a residual neural network model (ResNet) of different depths can be selected according to the difficulty of learning of provided image data, and when used for an image classification data set easy to learn such as MNIST, several layers of Convolutional Neural Networks (CNNs) can be simply defined as a base model.
In this embodiment, the parameterization of the loss function is implemented by taylor expansion of the conventional classification cross entropy. For the classification problem, the exemplar labels generally use One-hot coding, based on which the cross-entropy loss is expanded into taylor polynomials, resulting in the form:
Figure 450301DEST_PATH_IMAGE040
the coefficients of the jth term in the cross-entropy penalty after expansion cancel the polynomial pairs
Figure 368972DEST_PATH_IMAGE041
The power of the j item after derivation is obtained, and the coefficients of the visually expanded items are not the optimal group of coefficients under the condition that label noise exists, so the combination of N items before the expansion polynomial is taken as a loss function, and the coefficients are set as learnable parameters
Figure 9032DEST_PATH_IMAGE009
Derived parameterized post-loss function
Figure 636322DEST_PATH_IMAGE003
In order to further implement the above technical solution, the parameterized classification task loss function in S2
Figure 488872DEST_PATH_IMAGE003
Comprises the following steps:
Figure 150797DEST_PATH_IMAGE004
wherein ,
Figure 696179DEST_PATH_IMAGE005
is the probability that the predicted value of the model is the true value,
Figure 545187DEST_PATH_IMAGE006
is a learnable parameter, the initial value is the cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.
In this embodiment, since the coefficients obtained after taylor expansion of cross entropy loss are constants and the derived coefficients cancel the influence of powers of the coefficients, the non-robustness of cross entropy to task-specific tag noise is caused by unreasonable distribution of the coefficients to the final gradient contribution when calculating the gradient, and thus, in this patent, the coefficients after taylor expansion are used as learnable parameters to find a group of coefficient components robust to tag noise.
The first term is the mean absolute error loss (MAE) robust to tag noise, indicating that the loss function of existing robustness can be one case in the search space; meanwhile, since the polynomial has a characteristic of fitting an arbitrary function, a search space constructed based on the taylor expansion has a capability of representing a sufficiently wide range of functions.
In order to further implement the above technical solution, the specific content of S4 includes: setting degeneration weights in conjunction with self-learning
Figure 326061DEST_PATH_IMAGE007
Selecting the loss less than or equal to each iteration in the outer layer optimization process
Figure 950815DEST_PATH_IMAGE008
Is used for evaluating model parameters
Figure 526153DEST_PATH_IMAGE034
Screening out samples with high confidence coefficient of model classification accuracy as samples with correct classification labels to provide supervision information for optimizing outer-layer problems, and gradually reducing the samples with the increase of iteration times
Figure 128035DEST_PATH_IMAGE007
A value of (2)
Figure 322387DEST_PATH_IMAGE007
When the value of (c) goes to 0, the verification set samples selected according to the loss gradually increase.
In order to further implement the above technical solution, the optimization goal of the inner layer in S5 is to set the parameters of the loss function
Figure 693326DEST_PATH_IMAGE009
Under the conditions of (2), in the training set
Figure 580510DEST_PATH_IMAGE010
Obtaining the optimal response of the model parameters
Figure 138531DEST_PATH_IMAGE011
The goal of skin optimization is based on the current best response
Figure 526787DEST_PATH_IMAGE011
In the verification set
Figure 627598DEST_PATH_IMAGE012
Obtain a measure of the classification performance of the evaluation model
Figure 544738DEST_PATH_IMAGE013
To achieve the optimum parameters
Figure 763624DEST_PATH_IMAGE014
The method specifically comprises the following steps:
Figure 689991DEST_PATH_IMAGE042
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
Figure 645309DEST_PATH_IMAGE043
Figure 264509DEST_PATH_IMAGE044
wherein
Figure 938067DEST_PATH_IMAGE018
Is the learning rate at the time of model parameter update,
Figure 136967DEST_PATH_IMAGE019
is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,
Figure 71425DEST_PATH_IMAGE020
is an ultra-gradient.
In order to further implement the above technical solution, S5 includes:
s51, setting a arm-up stage, training the deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
in the embodiment, in the initial stage of training, the model is not fitted with a sample with an error label, so that the model has good generalization capability, and meanwhile, in order to ensure that the model has classification capability of distinguishing a simple sample when the sample data is selected on the verification set based on a self-learning method, a arm-up stage is arranged in the double-layer optimization process, and in the previous stage, the model has good generalization capability
Figure 736893DEST_PATH_IMAGE045
Using initialized penalty functions in round robin training
Figure 491222DEST_PATH_IMAGE046
Normally guiding to train the network model;
s52, evaluating the performance of the model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a microminiature condition in outer layer optimization;
in this embodiment, based on the verification set samples selected in combination with the self-learning, the cross entropy with good effect in the classification problem is selected as the measurement function
Figure 759392DEST_PATH_IMAGE013
Taking the cross entropy loss of the current model on the selected sample as a verification set metric;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guide model when the double-layer optimization algorithm is finished.
In order to further implement the above technical solution, the specific content of S53 includes:
(1) Solving the super gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the form:
Figure 922258DEST_PATH_IMAGE021
wherein ,
Figure 352103DEST_PATH_IMAGE022
is the optimal response of the model parameters in the inner-layer optimization task to the t-time hyperparameter theta,
Figure 124887DEST_PATH_IMAGE023
for the gradient of the optimum response to the hyperparameter for the inner layer,
Figure 806535DEST_PATH_IMAGE013
to evaluate the metric function of the robustness effect of the model trained under noisy conditions on class label noise,
Figure 450006DEST_PATH_IMAGE012
is a verification set;
in this embodiment, the model parameter ω and the loss function parameter θ are both directly related gradient terms
Figure 660538DEST_PATH_IMAGE047
Is the gradient of the training set loss versus the model parameters, is a scalar quantity that can be considered as relating to
Figure 655039DEST_PATH_IMAGE034
And
Figure 265012DEST_PATH_IMAGE009
function of (2)
Figure 670979DEST_PATH_IMAGE048
(ii) a When the parameter omega in the inner layer optimization reaches the optimal response, according to the first orderOptimum conditions can be obtained
Figure 442626DEST_PATH_IMAGE049
I.e. by
Figure 65368DEST_PATH_IMAGE050
If the condition of the implicit function theorem is satisfied, then:
setting loss function
Figure 479032DEST_PATH_IMAGE024
There is a second derivative of the model parameter ω, which is obtained according to the implicit function theorem:
Figure 97095DEST_PATH_IMAGE025
wherein ,
Figure 446168DEST_PATH_IMAGE026
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
Figure 149682DEST_PATH_IMAGE027
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
Figure 976824DEST_PATH_IMAGE028
from the resulting super-gradient, the loss function parameters are updated:
Figure 714972DEST_PATH_IMAGE029
(2) And (3) constructing an auxiliary function by combining the penalty function idea:
in this embodiment, when the parameter ω reaches the optimal response, a first-order optimal condition is satisfied, and the inner-layer optimization target is constructed as a soft constraint condition by combining the idea of penalty function, when the model parameter ω reaches the optimal response
Figure 94001DEST_PATH_IMAGE034
When the optimal response is reached, the model loss values on the training set are sufficiently small, i.e.
Figure 658712DEST_PATH_IMAGE051
The outer optimization goal is to obtain the parameters on the validation set that are best measured
Figure 414179DEST_PATH_IMAGE009
I.e. by
Figure 6834DEST_PATH_IMAGE052
The two-layer optimization problem is formalized as an inequality constraint optimization problem:
Figure 432130DEST_PATH_IMAGE030
constructing an auxiliary function to obtain:
Figure 375816DEST_PATH_IMAGE053
wherein ,
Figure 544760DEST_PATH_IMAGE031
the cross entropy loss function is selected for the target function when the method is specifically realized, namely the loss function obtained by searching guides the trained model to obtain the best robustness to the classified label noise on the verification set,
Figure 991922DEST_PATH_IMAGE032
ensuring the loss of the model on the training set to be sufficiently small for constraining the function, wherein sigma and epsilon are adjustable hyper-parameters, and parameters of the loss function
Figure 712753DEST_PATH_IMAGE009
And model parameters
Figure 19101DEST_PATH_IMAGE034
For explicit existence, when updating the loss function parameters based on the gradient descent mode, the auxiliary function is calculated about the loss function parameters
Figure 850791DEST_PATH_IMAGE009
First order partial derivatives of (1):
Figure 152459DEST_PATH_IMAGE035
in order to further implement the above technical solution, the specific content of S54 is: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function
Figure 44191DEST_PATH_IMAGE036
Updating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters
Figure 62002DEST_PATH_IMAGE011
In order to further implement the above technical solution, the specific content of S55 is: setting fixed number of training rounds
Figure 697383DEST_PATH_IMAGE037
When the model is trained on the training set
Figure 119137DEST_PATH_IMAGE037
After a round, the parameter search process ends, or when the model is lost on average over the validation set
Figure 57137DEST_PATH_IMAGE038
(Continuous)
Figure 197131DEST_PATH_IMAGE039
When the round rises, the loss function parameter search process is ended.
The specific embodiment is as follows:
loss functions robust to uniformly distributed tag noise for real object classification such as airplane (airplan), bird (bird), cat (cat), dog (dog), horse (horse), ship (ship), truck (truck), automobile (automobile), deer (deer), and frog (frog) are searched based on the CIFAR10 dataset.
S1, selecting a common 18-layer or 32-layer residual error neural network (ResNet 18, resNet 32) as a basic model of classification prediction for a real object classification problem based on a CIFAR10 data set.
S2, performing Taylor expansion on the cross entropy loss with a good classification guidance effect on the clean data set, intercepting the first 5 items (N) as a loss function, and taking each coefficient as a learnable parameter.
Figure 370624DEST_PATH_IMAGE054
Figure 522250DEST_PATH_IMAGE055
For learnable parameters, the initial value is
Figure 755786DEST_PATH_IMAGE056
S3, dividing 50000 training data samples in the CIFAR10 data set into a training set containing 45000 samples
Figure 117497DEST_PATH_IMAGE010
And a validation set comprising 5000 samples
Figure 235626DEST_PATH_IMAGE012
. Meanwhile, as the original CIFAR10 data set does not contain label noise, uniformly distributed noise assumed to exist in the real data with the percentage of p is artificially added to the training set and the verification set, namely, the sample label is randomly converted into other types of labels with the probability of p.
S4, setting degradation weight combined with self-learning
Figure 631972DEST_PATH_IMAGE007
Ensuring in the outer optimization of the iterative processAs the number of iterations increases, the number of validation set samples selected based on the loss also increases.
S5, constructing a double-layer optimized algorithm main body and a total training wheel
Figure 36408DEST_PATH_IMAGE057
S51, before setting training process
Figure 619836DEST_PATH_IMAGE058
The round of training is as the arm-up phase, in which the parameters of the loss function are unchanged.
S52, after the arm-up stage is finished, starting outer layer optimization. In each round of training, the model parameters
Figure 40191DEST_PATH_IMAGE034
Each pass through
Figure 291044DEST_PATH_IMAGE059
After training of each batch of training data, according to the current
Figure 866382DEST_PATH_IMAGE034
And loss of
Figure 812472DEST_PATH_IMAGE003
Selecting on a verification set
Figure 397037DEST_PATH_IMAGE060
Of a sample of (1) to obtain
Figure 502397DEST_PATH_IMAGE061
. Then is selected from
Figure 389581DEST_PATH_IMAGE061
Sampling
Figure 213181DEST_PATH_IMAGE062
Verification set data of batches, using a metric function
Figure 70278DEST_PATH_IMAGE063
Evaluating the current model on the selected sample to obtain
Figure 171089DEST_PATH_IMAGE064
S53, sampling from training set
Figure 88230DEST_PATH_IMAGE065
Calculating the loss of the current model according to the batch of training data
Figure 399125DEST_PATH_IMAGE066
In combination with the skin metric obtained in S5-2
Figure 436745DEST_PATH_IMAGE064
And updating the parameters of the current loss function based on gradient descent by a loss function parameter updating algorithm based on implicit function theorem or penalty function idea to obtain a group of new loss function parameters.
S54, under the condition of a new loss function, guiding the model to
Figure 516696DEST_PATH_IMAGE036
Training was performed on training data of individual batches.
S55, repeating the steps from S5-2 to S5-4 until the completion
Figure 604738DEST_PATH_IMAGE067
Round of training or loss of model to validation set during training
Figure 278296DEST_PATH_IMAGE038
(Continuous)
Figure 742775DEST_PATH_IMAGE068
When the round rises, the loss function parameter searching process is ended.
S56, outputting a group of hyper-parameters with the best effect when the double-layer optimization algorithm is finished
Figure 677233DEST_PATH_IMAGE069
Loss formed by this set of parameters
Figure 936176DEST_PATH_IMAGE003
As a loss function of robustness to tag noise resulting from the final search.
And S6, according to the searched loss function, guiding to train a new basic model on the training set with noise and the verification set without noise, and completing the retraining process of the deep neural network. The classification model obtained in the process has good robustness on label noise.
The robustness loss function search system for classified task label noise comprises a deep neural network model selection module, a loss function parameterization module, a classified task data set division module, a self-walking learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
specifically, according to the difficulty degree of a classification task, selecting base models with different adaptation degrees as basic deep neural network models of the classification task, wherein the input of the base models is the original data of samples, and the output is the classification result of the models on the samples;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
specifically, taylor expansion is carried out on traditional loss functions of different classification tasks, the first N terms of a Taylor expansion polynomial are intercepted and used as loss functions, coefficients of all terms are used as learnable parameters, and the current values are used as parameter initial values, so that parameterization of the loss functions is realized;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
specifically, when the measurement of the model on the verification set is calculated in the double-layer optimization algorithm, the number of samples with small loss selected by the model can be gradually increased along with the increase of the number of model training rounds, so that the self-learning from simple samples to difficult samples on the verification set is realized;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
specifically, a complex function relation of the model parameters with respect to the loss function parameters is regarded as an implicit function, the derivative of the model parameters to the loss function parameters is solved by using an implicit function theorem, and then the gradient of the measurement of the model on the verification set to the loss function is calculated; constructing an unconstrained optimization auxiliary function through constraint conditions, explicitly obtaining an optimization equation of the measurement on the loss function parameters on the verification set, and calculating the gradient of the measurement on the loss function parameters;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the finally searched loss function with robustness to the classified task label noise:
and the retraining module is used for guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function, and finishing the retraining process of the deep neural network.
In order to further implement the technical scheme, the double-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters
Figure 831451DEST_PATH_IMAGE011
An outer optimization unit for optimizing the current optimal response
Figure 365201DEST_PATH_IMAGE011
Calling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement which is obtained on the verification set and enables the classification performance of the evaluation model
Figure 154165DEST_PATH_IMAGE013
To achieve the optimum parameters
Figure 459376DEST_PATH_IMAGE014
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The robustness loss function searching method facing to classification task label noise is characterized by comprising the following steps of:
s1, selecting a deep neural network model for learning classification tasks with different difficulties;
s2, constructing parameterized classification task loss functions with different expansion orders according to a deep neural network model and based on a Taylor expansion method;
s3, dividing a data set of the classification task into a noise-containing training set for inner-layer optimization, a noise-containing verification set for outer-layer optimization and a clean test set for testing;
s4, setting a verification set sample self-selection strategy combined with self-learning;
s5, constructing a double-layer optimized algorithm main body, and outputting a group of hyper-parameters with the best effect on a verification set by combining a implicit function theorem or a penalty function idea based on a verification set sample self-selection strategy;
s6, forming a new loss function by the obtained super parameters to serve as a finally searched loss function with robustness to the label noise;
and S7, retraining the deep neural network, and guiding to train a new basic model on the training set with noise and the verification set without noise according to the searched loss function to finish the retraining process of the deep neural network.
2. The classification task tag noise-oriented robustness loss function searching method as claimed in claim 1, wherein the parameterized classification task loss function in S2
Figure 708793DEST_PATH_IMAGE001
Comprises the following steps:
Figure 65956DEST_PATH_IMAGE002
wherein ,
Figure 214040DEST_PATH_IMAGE003
is the probability that the predicted value of the model is the true value,
Figure 474514DEST_PATH_IMAGE004
is a learnable hyper-parameter, the initial value is a cross entropy loss expansion coefficient,Nthe order of the taylor expansion is lost for the selected cross entropy.
3. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 1, wherein the specific content of S4 comprises: setting combinationDegradation weights for self-learning
Figure 682641DEST_PATH_IMAGE005
Selecting the loss less than or equal to each iteration in the outer layer optimization process
Figure 386155DEST_PATH_IMAGE007
And (3) screening out samples with high model classification accuracy confidence as samples with correct classification labels, and gradually increasing the number of verification set samples selected according to loss along with the increase of iteration times.
4. The classification-task-tag-noise-oriented robustness loss function searching method as claimed in claim 1, wherein the inner layer optimization goal in S5 is to give a loss function parameter
Figure 213297DEST_PATH_IMAGE009
Under the conditions of (2), in the training set
Figure 951446DEST_PATH_IMAGE010
Obtaining the optimal response of the model parameters
Figure 596053DEST_PATH_IMAGE011
The goal of skin optimization is based on the current best response
Figure 662230DEST_PATH_IMAGE011
In the verification set
Figure 152117DEST_PATH_IMAGE012
Obtain a measure of the classification performance of the evaluation model
Figure DEST_PATH_IMAGE013
To achieve the optimum parameters
Figure 885718DEST_PATH_IMAGE014
The method specifically comprises the following steps:
Figure 701227DEST_PATH_IMAGE016
and alternately optimizing the inner layer and the outer layer to obtain the optimal loss function parameters:
Figure 379333DEST_PATH_IMAGE018
Figure 46812DEST_PATH_IMAGE020
wherein
Figure 759553DEST_PATH_IMAGE021
Is the learning rate at the time of model parameter update,
Figure DEST_PATH_IMAGE022
is the learning rate when updating the hyper-parameters, omega is the model parameter, theta is the loss function parameter,
Figure 621330DEST_PATH_IMAGE023
is an ultra-gradient.
5. The classification-task-tag-noise-oriented robustness loss function searching method according to claim 1, wherein S5 comprises:
s51, setting a arm-up stage, training a deep neural network model under the guidance of an initialized loss function, and ensuring that the deep neural network learns the capability of correctly classifying simple samples;
s52, evaluating the performance of the model obtained by training under the condition of noise existence on a verification set by using a measurement function meeting a microminiature condition in outer layer optimization;
s53, updating parameters of the loss function based on a gradient descending mode by combining a hidden function theorem or a penalty function idea;
s54, continuing to guide the training of the model on the training set according to the newly obtained loss function;
s55, repeating the steps from S52 to S54 until iteration is finished or the algorithm converges;
and S56, outputting a group of hyper-parameters with the best effect of overcoming the noise influence of the classification label by the stored guide model when the double-layer optimization algorithm is finished.
6. The method for searching the robustness loss function of the classification-oriented task tag noise according to claim 5, wherein the specific content of S53 includes:
(1) Solving the super-gradient by combining the implicit function theorem:
the chain rule of derivation from a function can transform the computation of the super-gradient into the following form:
Figure 786732DEST_PATH_IMAGE025
wherein ,
Figure 759367DEST_PATH_IMAGE026
is the optimal response of the model parameters in the inner-layer optimization task to the time t hyperparameter theta,
Figure 326615DEST_PATH_IMAGE027
for the gradient of the optimum response to the hyperparameter for the inner layer,
Figure 952768DEST_PATH_IMAGE028
in order to be a function of the metric,
Figure 746412DEST_PATH_IMAGE029
in order to verify the set of images,
let a loss function
Figure 381793DEST_PATH_IMAGE030
There is a second order to the model parameter omegaAnd leading, obtaining according to the implicit function theorem:
Figure 803547DEST_PATH_IMAGE031
wherein ,
Figure 243012DEST_PATH_IMAGE032
for the Hessian matrix, the inverse of the Hessian matrix is effectively approximated by a Noiman series:
Figure 117427DEST_PATH_IMAGE033
the formula of the supergradient calculation combined with the implicit function theorem is as follows:
Figure DEST_PATH_IMAGE034
from the resulting super-gradient, the loss function parameters are updated:
Figure 963023DEST_PATH_IMAGE035
(2) And (3) combining the penalty function idea to construct an auxiliary function of the inequality constraint optimization problem:
Figure 239284DEST_PATH_IMAGE037
wherein ,
Figure DEST_PATH_IMAGE038
in order to be the objective function of the target,
Figure 738398DEST_PATH_IMAGE039
for the constraint function, the auxiliary function is of the form:
Figure 241055DEST_PATH_IMAGE041
wherein, sigma and epsilon are adjustable hyper-parameter and loss function parameter
Figure DEST_PATH_IMAGE042
And model parameters
Figure 483817DEST_PATH_IMAGE043
For explicit existence, when updating the parameters of the loss function based on the gradient descent mode, the auxiliary function is calculated about the parameters of the loss function
Figure 489951DEST_PATH_IMAGE042
First order partial derivatives of (1):
Figure 159966DEST_PATH_IMAGE045
7. the classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S54 are as follows: after obtaining the new loss function parameters from S53, iteration is performed on the training set according to the new loss function
Figure 8974DEST_PATH_IMAGE047
Updating the model parameters by batch data to obtain the optimal response of the model parameters to the new loss function parameters
Figure 898170DEST_PATH_IMAGE011
8. The classification task tag noise-oriented robustness loss function searching method according to claim 5, wherein the specific contents of S55 are as follows: setting fixed number of training rounds
Figure 149023DEST_PATH_IMAGE048
When the model is trained on the training set
Figure 724361DEST_PATH_IMAGE048
After a round, the parameter search process ends, or when the model is lost on average over the validation set
Figure 201609DEST_PATH_IMAGE049
(Continuous)
Figure DEST_PATH_IMAGE050
When the round rises, the loss function parameter search process is ended.
9. A robustness loss function search system facing classification task label noise is based on the robustness loss function search method facing classification task label noise of any one of claims 1 to 8, and is characterized by comprising a deep neural network model selection module, a loss function parameterization module, a classification task data set division module, a self-step learning module, a double-layer optimization module, a robustness loss function construction module and a retraining module;
the deep neural network model selection module outputs a deep neural network model according to the difficulty degree of the classification task;
the loss function parameterization module is used for constructing parameterized classification task loss functions with different expansion orders according to the deep neural network model and based on a Taylor expansion method;
the classification task data set dividing module is used for outputting a noise-containing training set for inner layer optimization, a noise-containing verification set for outer layer optimization and a clean test set for testing based on the data set of the classification task;
the self-learning module is used for outputting a set verification set sample self-selection strategy combined with self-learning;
the double-layer optimization module is used for constructing a double-layer optimization algorithm main body, calling a verification set sample self-selection strategy combined with self-learning, and outputting a group of hyper-parameters with the best effect on the verification set by combining a implicit function theorem or a penalty function idea;
and the robustness loss function construction module is used for forming a new loss function by the obtained hyper-parameters and outputting the loss function which is finally searched and has robustness on the classified task label noise:
and the retraining module is used for guiding to train a new basic model on the training set with the noise added and the verification set without the noise added according to the searched loss function, and finishing the retraining process of the deep neural network.
10. The classification-task-tag-noise-oriented robustness loss function search system of claim 9, wherein the two-layer optimization module comprises an inner-layer optimization unit and an outer-layer optimization unit;
an inner layer optimization unit for outputting the optimal response of the model parameters obtained on the training set under the condition of given loss function parameters
Figure DEST_PATH_IMAGE051
An outer optimization unit for optimizing the current optimal response
Figure 192699DEST_PATH_IMAGE051
Calling a verification set sample self-selection strategy combined with self-learning, and outputting the measurement obtained on the verification set and enabling the evaluation model to classify the performance
Figure 298058DEST_PATH_IMAGE028
To achieve the optimum parameters
Figure DEST_PATH_IMAGE052
CN202211645114.2A 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise Active CN115618935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211645114.2A CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211645114.2A CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Publications (2)

Publication Number Publication Date
CN115618935A true CN115618935A (en) 2023-01-17
CN115618935B CN115618935B (en) 2023-05-05

Family

ID=84879818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211645114.2A Active CN115618935B (en) 2022-12-21 2022-12-21 Robustness loss function searching method and system for classification task tag noise

Country Status (1)

Country Link
CN (1) CN115618935B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446927A (en) * 2016-07-07 2017-02-22 浙江大学 Self-paced reinforcement image classification method and system
CN109242028A (en) * 2018-09-19 2019-01-18 西安电子科技大学 SAR image classification method based on 2D-PCA and convolutional neural networks
CN110110780A (en) * 2019-04-30 2019-08-09 南开大学 A kind of picture classification method based on confrontation neural network and magnanimity noise data
CN112101328A (en) * 2020-11-19 2020-12-18 四川新网银行股份有限公司 Method for identifying and processing label noise in deep learning
US20210241096A1 (en) * 2018-04-22 2021-08-05 Technion Research & Development Foundation Limited System and method for emulating quantization noise for a neural network
CN113537389A (en) * 2021-08-05 2021-10-22 京东科技信息技术有限公司 Robust image classification method and device based on model embedding
CN114201632A (en) * 2022-02-18 2022-03-18 南京航空航天大学 Label noisy data set amplification method for multi-label target detection task
CN114445662A (en) * 2022-01-25 2022-05-06 南京理工大学 Robust image classification method and system based on label embedding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446927A (en) * 2016-07-07 2017-02-22 浙江大学 Self-paced reinforcement image classification method and system
US20210241096A1 (en) * 2018-04-22 2021-08-05 Technion Research & Development Foundation Limited System and method for emulating quantization noise for a neural network
CN109242028A (en) * 2018-09-19 2019-01-18 西安电子科技大学 SAR image classification method based on 2D-PCA and convolutional neural networks
CN110110780A (en) * 2019-04-30 2019-08-09 南开大学 A kind of picture classification method based on confrontation neural network and magnanimity noise data
CN112101328A (en) * 2020-11-19 2020-12-18 四川新网银行股份有限公司 Method for identifying and processing label noise in deep learning
CN113537389A (en) * 2021-08-05 2021-10-22 京东科技信息技术有限公司 Robust image classification method and device based on model embedding
CN114445662A (en) * 2022-01-25 2022-05-06 南京理工大学 Robust image classification method and system based on label embedding
CN114201632A (en) * 2022-02-18 2022-03-18 南京航空航天大学 Label noisy data set amplification method for multi-label target detection task

Also Published As

Publication number Publication date
CN115618935B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
Gonzalez et al. Improved training speed, accuracy, and data utilization through loss function optimization
Luketina et al. Scalable gradient-based tuning of continuous regularization hyperparameters
US10832123B2 (en) Compression of deep neural networks with proper use of mask
Garro et al. Designing artificial neural networks using particle swarm optimization algorithms
US20200104688A1 (en) Methods and systems for neural architecture search
US10460236B2 (en) Neural network learning device
CN110909926A (en) TCN-LSTM-based solar photovoltaic power generation prediction method
US20150134578A1 (en) Discriminator, discrimination program, and discrimination method
KR20180120056A (en) Method and system for pre-processing machine learning data
CN111368885B (en) Gas circuit fault diagnosis method for aircraft engine
Salama et al. A novel ant colony algorithm for building neural network topologies
CN112557034B (en) Bearing fault diagnosis method based on PCA _ CNNS
CN114399032A (en) Method and system for predicting metering error of electric energy meter
Moldovan et al. Chicken swarm optimization and deep learning for manufacturing processes
JP7214863B2 (en) Computer architecture for artificial image generation
CN114463540A (en) Segmenting images using neural networks
US20200219008A1 (en) Discrete learning structure
CN108228978B (en) Xgboost time sequence prediction method combined with complementary set empirical mode decomposition
Zubair et al. Performance enhancement of adaptive neural networks based on learning rate
Yamada et al. Weight Features for Predicting Future Model Performance of Deep Neural Networks.
CN111967567A (en) Neural network with layer for solving semi-definite programming
CN115618935A (en) Robustness loss function searching method and system for classified task label noise
CN107229944B (en) Semi-supervised active identification method based on cognitive information particles
CN115238874A (en) Quantization factor searching method and device, computer equipment and storage medium
US20240020531A1 (en) System and Method for Transforming a Trained Artificial Intelligence Model Into a Trustworthy Artificial Intelligence Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant