CN112598082A - Method and system for predicting generalized error of image identification model based on non-check set - Google Patents

Method and system for predicting generalized error of image identification model based on non-check set Download PDF

Info

Publication number
CN112598082A
CN112598082A CN202110017334.XA CN202110017334A CN112598082A CN 112598082 A CN112598082 A CN 112598082A CN 202110017334 A CN202110017334 A CN 202110017334A CN 112598082 A CN112598082 A CN 112598082A
Authority
CN
China
Prior art keywords
training
image recognition
recognition model
output
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110017334.XA
Other languages
Chinese (zh)
Other versions
CN112598082B (en
Inventor
伍冬睿
张潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110017334.XA priority Critical patent/CN112598082B/en
Publication of CN112598082A publication Critical patent/CN112598082A/en
Application granted granted Critical
Publication of CN112598082B publication Critical patent/CN112598082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for predicting image identification model generalization errors based on a non-check set, belonging to the field of deep learning optimization and generalization and comprising the following steps: after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer; obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture; calculating the output variance value of each training picture, and normalizing the variance value by using the output modular length to obtain an output relative variance; and outputting the relative variance to predict the variation trend of the generalized error of the image recognition model in the training process. Therefore, the method can put all training samples into training without using a calibration set, thereby obtaining better generalization performance; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.

Description

Method and system for predicting generalized error of image identification model based on non-check set
Technical Field
The invention belongs to the field of deep learning optimization and generalization, and particularly relates to a method and a system for predicting a generalization error of an image identification model based on a non-check set.
Background
Machine learning, as a research hotspot of artificial intelligence at present, is often used for mining potential relationships between data. In recent years, data-driven machine learning algorithms have achieved excellent performance in various fields such as biology, medicine, finance, military and the like. With the improvement of data and computational power, deep learning becomes a current research hotspot as a machine learning algorithm which can process images well and is widely applied to various industries.
Although deep learning has a good performance in the task of image recognition, there are many problems to be solved and studied. The neural network model for image recognition has a complex generalization phenomenon in the training process, such as a test error quadratic drop phenomenon in the training process mentioned in the prior art: as the number of training rounds increases, the error of the neural network on the image test set decreases first, then starts to rise due to overfitting, and finally decreases again at some point. These complex generalization phenomena make the trend of the prediction model generalization error change important in the training process. The most common prediction means at present is to divide a part of an image training set into a check set, then train an image recognition model on the rest training set, calculate errors on the check set so as to predict the variation trend of the test errors, and finally perform other downstream processing such as early stop and the like through the predicted variation trend of the test errors.
Although the method for predicting the generalization error curve in the training process of the image recognition model by using the information of the check set is simple and practical, part of the training pictures are omitted by the check set, so that the predicted generalization error curve is often not consistent with the generalization error curve in the actual training process by using all training samples, and the subsequent processing such as early stop is influenced; in addition, the reduction of the number of training pictures due to dividing the check set often brings about the reduction of generalization performance. The latter can be relieved by two rounds of training, namely, a part of the training set is divided into a check set, then the number of training rounds is determined by checking the result on the check set, and finally the check set is merged into the whole training set so as to train the same number of rounds on all pictures; however, the increased training cost causes the loss of hardware and energy to become a new problem, and meanwhile, the process still has no way to ensure that the generalization error curves are changed consistently under the condition of different numbers of training pictures.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a method and a system for predicting the generalization error of an image recognition model based on a non-check set, so that the technical problems of high cost and inaccurate prediction of multiple times of training when the check set is used for predicting the generalization performance in the training process of the conventional image recognition model are solved.
In order to achieve the above object, in one aspect, the present invention provides a method for predicting a generalized error of an image recognition model based on a non-parity set, comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
Further, the parameter updating amount of the image identification model is a parameter updating gradient.
Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.
Further, the output relative variance RV is represented as:
Figure BDA0002887424320000031
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
In another aspect, the present invention provides a system for predicting a generalized error of an image recognition model based on a non-parity set, including:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
Further, the parameter updating amount of the image identification model is a parameter updating gradient.
Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.
Further, the output relative variance RV is represented as:
Figure BDA0002887424320000032
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
the invention predicts the variation trend of the generalization error of the image recognition model in the training process by the output relative variance, can directly estimate on a training set, and can more accurately judge the variation trend of the generalization error curve in the training process of the image recognition model. Meanwhile, all training pictures can be put into training because a check set is not needed in the process, so that better generalization performance is obtained; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.
Drawings
FIG. 1 is a simplified flow chart for calculating the relative variance of the model output according to the present invention;
FIG. 2 is a test error curve of the neural network model VGG16 when training on the data set CIFAR100 under different tag noises (i.e., randomly perturbing tags of different proportions) and a RV curve calculated using the training set;
fig. 3 is a RV curve and test accuracy curve corresponding to ResNet18 of different widths on a CIFAR10 dataset.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, the present invention provides a method for predicting a generalization error of an image recognition model based on a non-parity set, comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
In particular, to use a training data set comprising n samples
Figure BDA0002887424320000051
For example, after each training round, K sets of training batches including B training samples (e.g., K100-150, B128 or 256) are randomly sampled from the training dataset D, and then optimizers such as ADAMs (learning rate 1e-3 or 1e-4), SGDs (learning rate 1e-2 or 1e-3, momentum 0.9) and the like in the training of the model are used to obtain corresponding model parameter updates according to the training batches, so as to obtain corresponding K updated models (e.g., n 50000 in CIFAR 10)
Figure BDA0002887424320000052
Calculating the relative variance values of the K models on the training sample:
Figure BDA0002887424320000053
experiments show that the RV value and the generalization performance of the model have the same change trend in the training process, so that the generalization performance of the model can be predicted directly by using the RV value without dividing a verification set.
The process of obtaining the RV index requires multiple times of calculation of the model parameter update amount, so that the calculation becomes relatively complicated. A simplified scheme is to use directly sampled random noise (such as Gaussian noise with the mean value of 0 and the variance of 0.001 times the model length of the layer parameter in each layer of the neural network) to replace the model parameter updating amount needing to be calculated, so that the calculation amount is greatly reduced. It should be noted that although this scheme has a simpler computational approach, it is not efficient on partial data sets (e.g., CIFAR 100). This reduction scheme is generally only valid for simple datasets with a small number of classes (typically less than 20 classes).
The RV can be used for predicting the generalization performance curve of a single model in the training process and can also be used for predicting the change of the generalization performance when the architecture gradually changes. For example, after ResNet18 with different widths train on CIFAR10 for the same number of rounds, corresponding test accuracy rate changes can be predicted by respectively calculating RVs corresponding to the same rounds by using an SGD optimizer (learning rate is 1e-3) without momentum. The experimental result shows that RV has extremely high correlation with accuracy, and the change trend of the generalization performance of ResNet18 along with the width change can be predicted to a certain extent.
Fig. 1 shows a simplified flow chart of the calculation of the relative variance of the model. Different training batches are sampled in a training data set to calculate the corresponding model parameter updating amount, then the variance of the output of each model after the parameters are respectively updated to the same training sample point is estimated, the output model length is used for normalization, and the expectation of the value on the training sample point is obtained, so that the output relative variance index is obtained. By estimating the index in different training stages and recording the variation trend of the index in the training process, the variation trend of the generalized error can be obtained.
Fig. 2 shows test error curves of the neural network model VGG16 when training on the data set CIFAR100 under different label noises (i.e., labels randomly perturbed by different proportions) and RV curves calculated using the training set. The two curves are symmetrical in the vertical direction, and the experimental result shows that the RV can well predict the change curve of the generalization performance of the model in the training process.
Fig. 3 shows ResNet18 of different widths on a CIFAR10 dataset with its corresponding RV and test accuracy. The widths were 0.25-2.0 times the width of the original model, respectively, and 100 rounds of training were performed using an ADAM optimizer (learning rate 1 e-4). Through calculation, the correlation degree of the RV and the test accuracy is-0.94, the significance test p value is 0.0006, and the result shows that the RV has a good prediction effect on the test accuracy of models with different widths.
In another aspect, the present invention provides a system for predicting a generalized error of an image recognition model based on a non-parity set, including:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
The division of each module in the system for identifying the model generalization error based on the non-check set predicted image is only used for illustration, and in other embodiments, the system for identifying the model generalization error based on the non-check set predicted image can be divided into different modules as required to complete all or part of the functions of the system.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for predicting generalized error of an image identification model based on a non-check set is characterized by comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
2. The method for predicting the generalization error of an image recognition model based on a non-parity set according to claim 1, wherein the parameter update amount of said image recognition model is a parameter update gradient.
3. The method for identifying model generalized errors based on non-check set predicted images of claim 1, wherein said model optimizer comprises an ADAM optimizer, an SGD optimizer.
4. The method for generalizing the error based on a non-parity set predictive image recognition model according to claim 1, wherein said output relative variance RV is represented as:
Figure FDA0002887424310000011
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
5. A system for predicting generalized error of an image recognition model based on a non-parity set, comprising:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
6. The system for predicting image recognition model generalized errors based on non-parity set as claimed in claim 5 wherein the parameter update amount of said image recognition model is a parameter update gradient.
7. The system for identifying model generalized errors based on non-check set predicted images according to claim 5, wherein said model optimizer comprises an ADAM optimizer, an SGD optimizer.
8. The system for generalizing the error based on a non-parity set predictive image recognition model according to claim 5, wherein said output relative variance RV is represented as:
Figure FDA0002887424310000021
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
CN202110017334.XA 2021-01-07 2021-01-07 Method and system for predicting generalized error of image identification model based on non-check set Active CN112598082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110017334.XA CN112598082B (en) 2021-01-07 2021-01-07 Method and system for predicting generalized error of image identification model based on non-check set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110017334.XA CN112598082B (en) 2021-01-07 2021-01-07 Method and system for predicting generalized error of image identification model based on non-check set

Publications (2)

Publication Number Publication Date
CN112598082A true CN112598082A (en) 2021-04-02
CN112598082B CN112598082B (en) 2022-07-12

Family

ID=75207068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110017334.XA Active CN112598082B (en) 2021-01-07 2021-01-07 Method and system for predicting generalized error of image identification model based on non-check set

Country Status (1)

Country Link
CN (1) CN112598082B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361575A (en) * 2021-05-28 2021-09-07 北京百度网讯科技有限公司 Model training method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103954914A (en) * 2014-05-16 2014-07-30 哈尔滨工业大学 Lithium ion battery remaining life direct prediction method based on probability integration
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106951959A (en) * 2017-01-24 2017-07-14 上海交通大学 Deep neural network optimization method based on learning automaton
US20200327450A1 (en) * 2019-04-15 2020-10-15 Apple Inc. Addressing a loss-metric mismatch with adaptive loss alignment
CN112115973A (en) * 2020-08-18 2020-12-22 吉林建筑大学 Convolutional neural network based image identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103954914A (en) * 2014-05-16 2014-07-30 哈尔滨工业大学 Lithium ion battery remaining life direct prediction method based on probability integration
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106951959A (en) * 2017-01-24 2017-07-14 上海交通大学 Deep neural network optimization method based on learning automaton
US20200327450A1 (en) * 2019-04-15 2020-10-15 Apple Inc. Addressing a loss-metric mismatch with adaptive loss alignment
CN112115973A (en) * 2020-08-18 2020-12-22 吉林建筑大学 Convolutional neural network based image identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARIANTHI MARKATOU等: "Analysis of Variance of Cross-Validation Estimators of the Generalization Error", 《JOURNAL OF MACHINE LEARNING RESEARCH》 *
高敬阳 等: "基于样本抽样和权重调整的SWA-Adaboost算法", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361575A (en) * 2021-05-28 2021-09-07 北京百度网讯科技有限公司 Model training method and device and electronic equipment
CN113361575B (en) * 2021-05-28 2023-10-20 北京百度网讯科技有限公司 Model training method and device and electronic equipment

Also Published As

Publication number Publication date
CN112598082B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
US20200104688A1 (en) Methods and systems for neural architecture search
WO2018196760A1 (en) Ensemble transfer learning
CN111563706A (en) Multivariable logistics freight volume prediction method based on LSTM network
CN113196314B (en) Adapting a predictive model
CN111144542A (en) Oil well productivity prediction method, device and equipment
US20210081798A1 (en) Neural network method and apparatus
CN112232407B (en) Neural network model training method and device for pathological image samples
CN106709588B (en) Prediction model construction method and device and real-time prediction method and device
EP3792841A1 (en) Automated feature generation for machine learning application
US10452961B2 (en) Learning temporal patterns from electronic health records
CN113255573B (en) Pedestrian re-identification method based on mixed cluster center label learning and storage medium
Böhm et al. Uncertainty quantification with generative models
US20240045871A1 (en) Cardinality estimation method and device for skyline query based on deep learning
CN113807900A (en) RF order demand prediction method based on Bayesian optimization
CN116596582A (en) Marketing information prediction method and device based on big data
CN112598082B (en) Method and system for predicting generalized error of image identification model based on non-check set
De Wiljes et al. An adaptive Markov chain Monte Carlo approach to time series clustering of processes with regime transition behavior
CN114065996A (en) Traffic flow prediction method based on variational self-coding learning
Brusa et al. Tempered expectation-maximization algorithm for the estimation of discrete latent variable models
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN116977091A (en) Method and device for determining individual investment portfolio, electronic equipment and readable storage medium
Nguyen-Duc et al. Deep EHR spotlight: a framework and mechanism to highlight events in electronic health records for explainable predictions
US20240152818A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
CN114610899A (en) Representation learning method and system of knowledge graph
CN112884028A (en) System resource adjusting method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant