CN112598082B - Method and system for predicting generalized error of image identification model based on non-check set - Google Patents
Method and system for predicting generalized error of image identification model based on non-check set Download PDFInfo
- Publication number
- CN112598082B CN112598082B CN202110017334.XA CN202110017334A CN112598082B CN 112598082 B CN112598082 B CN 112598082B CN 202110017334 A CN202110017334 A CN 202110017334A CN 112598082 B CN112598082 B CN 112598082B
- Authority
- CN
- China
- Prior art keywords
- training
- image recognition
- output
- model
- recognition model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for predicting image identification model generalization errors based on a non-check set, belonging to the field of deep learning optimization and generalization and comprising the following steps: after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer; obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture; calculating the output variance value of each training picture, and normalizing the variance value by using the output modular length to obtain an output relative variance; and outputting the relative variance to predict the variation trend of the generalized error of the image recognition model in the training process. Therefore, the method can put all training samples into training without using a calibration set, thereby obtaining better generalization performance; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.
Description
Technical Field
The invention belongs to the field of deep learning optimization and generalization, and particularly relates to a method and a system for predicting a generalization error of an image identification model based on a non-check set.
Background
Machine learning, as a research hotspot of current artificial intelligence, is often used for mining potential relationships between data. In recent years, data-driven machine learning algorithms have achieved excellent performance in various fields such as biology, medicine, finance, military and the like. With the improvement of data and computational power, deep learning becomes a current research hotspot as a machine learning algorithm which can process images well and is widely applied to various industries.
Although the deep learning has a good performance in the task of image recognition, there are still many problems to be solved and studied. The neural network model for image recognition has a complex generalization phenomenon in the training process, such as a test error quadratic drop phenomenon in the training process mentioned in the prior art: as the number of training rounds increases, the error of the neural network on the image test set decreases first, then starts to rise due to overfitting, and finally decreases again at some point. These complex generalization phenomena make the trend of the prediction model generalization error change important in the training process. The most common prediction means at present is to divide a part of an image training set into a check set, then train an image recognition model on the rest training set, calculate errors on the check set so as to predict the variation trend of the test errors, and finally perform other downstream processing such as early stop and the like through the predicted variation trend of the test errors.
Although the method for predicting the generalization error curve in the training process of the image recognition model by using the information of the calibration set is simple and practical, part of the training pictures are drawn out by the calibration set, so that the predicted generalization error curve is often not consistent with the generalization error curve in practice when all training samples are used for training, and the subsequent processing such as early stop is influenced; in addition, the reduction of the number of training pictures due to dividing the check set often brings about the reduction of generalization performance. The latter can be relieved by two rounds of training, namely, a part of the training set is divided into a check set, then the number of training rounds is determined by checking the result on the check set, and finally the check set is merged into the whole training set so as to train the same number of rounds on all pictures; however, the increased training cost causes the loss of hardware and energy to become a new problem, and meanwhile, the process still has no way to ensure that the generalization error curves are changed consistently under the condition of different numbers of training pictures.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a method and a system for predicting generalization errors of an image recognition model based on a non-check set, so that the technical problems of high cost and inaccurate prediction of multiple times of training when the check set is used for predicting generalization performance in the training process of the conventional image recognition model are solved.
In order to achieve the above object, in one aspect, the present invention provides a method for predicting a generalized error of an image recognition model based on a non-parity set, comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
Further, the parameter updating amount of the image identification model is a parameter updating gradient.
Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.
Further, the output relative variance RV is represented as:
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
In another aspect, the present invention provides a system for predicting a generalized error of an image recognition model based on a non-parity set, including:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
Further, the parameter updating amount of the image identification model is a parameter updating gradient.
Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.
Further, the output relative variance RV is represented as:
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
the invention predicts the variation trend of the generalization error of the image recognition model in the training process by the output relative variance, can directly estimate on a training set, and can more accurately judge the variation trend of the generalization error curve in the training process of the image recognition model. Meanwhile, all training pictures can be put into training because a check set is not needed in the process, so that better generalization performance is obtained; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.
Drawings
FIG. 1 is a simplified flow chart for calculating the relative variance of the model output according to the present invention;
FIG. 2 is a test error curve of the neural network model VGG16 when training on the data set CIFAR100 under different tag noises (i.e., randomly perturbing tags of different proportions) and a RV curve calculated using the training set;
fig. 3 is a RV curve and test accuracy curve corresponding to ResNet18 of different widths on a CIFAR10 dataset.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, the present invention provides a method for predicting a generalization error of an image recognition model based on a non-parity set, comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
In particular, to use a training data set comprising n samplesFor example, training the model f (e.g., n of 50000 in CIFAR 10), after each training round, randomly sampling K sets of training batches including B training samples (e.g., K of 100 to 150, B of 128 or 256) from the training dataset D, and then using optimizers such as ADAM (learning rate of 1e-3 or 1e-4), SGD (learning rate of 1e-2 or 1e-3, momentum of 0.9) and the like in the training model to obtain corresponding model parameter updates according to each training batch, thereby obtaining corresponding model parametersK updated modelsCalculating the relative variance values of the K models on the training sample:
experiments show that the RV value and the generalization performance of the model have the same variation trend in the training process, so that the generalization performance of the model can be predicted by directly using the RV value without dividing a verification set.
The process of obtaining the RV index requires multiple times of calculation of the model parameter update amount, so that the calculation becomes relatively complicated. A simplified scheme is to use directly sampled random noise (such as Gaussian noise with the mean value of 0 and the variance of 0.001 times the model length of the layer parameter in each layer of the neural network) to replace the model parameter updating amount needing to be calculated, so that the calculation amount is greatly reduced. It should be noted that although this scheme has a simpler computational approach, it is not efficient on partial data sets (e.g., CIFAR 100). This reduction scheme is generally only valid for simple datasets with a small number of classes (typically less than 20 classes).
RV can be used for predicting the generalization performance curve of a single model in the training process, and can also be used for predicting the change of the generalization performance when the architecture changes gradually. For example, after ResNet18 with different widths train on CIFAR10 for the same number of rounds, corresponding test accuracy rate changes can be predicted by respectively calculating RVs corresponding to the same rounds by using an SGD optimizer (learning rate is 1e-3) without momentum. The experimental result shows that RV has extremely high correlation with accuracy, and the change trend of the generalization performance of ResNet18 along with the width change can be predicted to a certain extent.
Fig. 1 shows a simplified flow chart of the calculation of the relative variance of the model. Different training batches are sampled in a training data set to calculate the corresponding model parameter updating amount, then the variance of the output of each model after the parameters are respectively updated to the same training sample point is estimated, the output model length is used for normalization, and the expectation of the value on the training sample point is obtained, so that the output relative variance index is obtained. By estimating the index in different training stages and recording the variation trend of the index in the training process, the variation trend of the generalized error can be obtained.
Fig. 2 shows test error curves of the neural network model VGG16 when training on the data set CIFAR100 under different label noises (i.e., labels randomly perturbed by different proportions) and RV curves calculated using the training set. The two curves are symmetrical in the vertical direction, and the experimental result shows that the RV can well predict the change curve of the generalization performance of the model in the training process.
Fig. 3 shows different widths of ResNet18 on CIFAR10 dataset with their corresponding RVs and test accuracy. The widths were 0.25-2.0 times the width of the original model, respectively, and 100 rounds of training were performed using an ADAM optimizer (learning rate 1 e-4). Through calculation, the correlation degree of the RV and the test accuracy is-0.94, the significance test p value is 0.0006, and the result shows that the RV has a good prediction effect on the test accuracy of models with different widths.
In another aspect, the present invention provides a system for predicting a generalized error of an image recognition model based on a non-parity set, including:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.
The division of each module in the system for identifying the model generalization error based on the non-check set predicted image is only used for illustration, and in other embodiments, the system for identifying the model generalization error based on the non-check set predicted image can be divided into different modules as required to complete all or part of the functions of the system.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. A method for predicting generalized error of an image identification model based on a non-check set is characterized by comprising the following steps:
(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;
(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance;
the output relative variance RV is expressed as:
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
2. The method for predicting the generalization error of an image recognition model based on a non-parity set according to claim 1, wherein the parameter update amount of said image recognition model is a parameter update gradient.
3. The method for identifying model generalization errors based on non-check set predicted images according to claim 1, wherein said model optimizer comprises an ADAM optimizer or an SGD optimizer.
4. A system for predicting generalized error of an image recognition model based on a non-parity set, comprising:
the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;
the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;
the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance;
the output relative variance RV is represented as:
where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.
5. The system for generalizing the error based on a non-parity set predictive image recognition model according to claim 4, wherein the parameter update amount of said image recognition model is a parameter update gradient.
6. The system for identifying model generalized errors based on non-check set predicted images according to claim 4, wherein said model optimizer comprises an ADAM optimizer, an SGD optimizer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110017334.XA CN112598082B (en) | 2021-01-07 | 2021-01-07 | Method and system for predicting generalized error of image identification model based on non-check set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110017334.XA CN112598082B (en) | 2021-01-07 | 2021-01-07 | Method and system for predicting generalized error of image identification model based on non-check set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112598082A CN112598082A (en) | 2021-04-02 |
CN112598082B true CN112598082B (en) | 2022-07-12 |
Family
ID=75207068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110017334.XA Active CN112598082B (en) | 2021-01-07 | 2021-01-07 | Method and system for predicting generalized error of image identification model based on non-check set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112598082B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113361575B (en) * | 2021-05-28 | 2023-10-20 | 北京百度网讯科技有限公司 | Model training method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103954914A (en) * | 2014-05-16 | 2014-07-30 | 哈尔滨工业大学 | Lithium ion battery remaining life direct prediction method based on probability integration |
CN106169096A (en) * | 2016-06-24 | 2016-11-30 | 山西大学 | A kind of appraisal procedure of machine learning system learning performance |
CN106951959A (en) * | 2017-01-24 | 2017-07-14 | 上海交通大学 | Deep neural network optimization method based on learning automaton |
CN112115973A (en) * | 2020-08-18 | 2020-12-22 | 吉林建筑大学 | Convolutional neural network based image identification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200327450A1 (en) * | 2019-04-15 | 2020-10-15 | Apple Inc. | Addressing a loss-metric mismatch with adaptive loss alignment |
-
2021
- 2021-01-07 CN CN202110017334.XA patent/CN112598082B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103954914A (en) * | 2014-05-16 | 2014-07-30 | 哈尔滨工业大学 | Lithium ion battery remaining life direct prediction method based on probability integration |
CN106169096A (en) * | 2016-06-24 | 2016-11-30 | 山西大学 | A kind of appraisal procedure of machine learning system learning performance |
CN106951959A (en) * | 2017-01-24 | 2017-07-14 | 上海交通大学 | Deep neural network optimization method based on learning automaton |
CN112115973A (en) * | 2020-08-18 | 2020-12-22 | 吉林建筑大学 | Convolutional neural network based image identification method |
Non-Patent Citations (2)
Title |
---|
Analysis of Variance of Cross-Validation Estimators of the Generalization Error;Marianthi Markatou等;《Journal of Machine Learning Research》;20051231;第6卷;第1127-1168页 * |
基于样本抽样和权重调整的SWA-Adaboost算法;高敬阳 等;《计算机工程》;20140930;第40卷(第9期);第248-251、256页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112598082A (en) | 2021-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200104688A1 (en) | Methods and systems for neural architecture search | |
WO2018196760A1 (en) | Ensemble transfer learning | |
CN111144542A (en) | Oil well productivity prediction method, device and equipment | |
CN112232407B (en) | Neural network model training method and device for pathological image samples | |
CN113255573B (en) | Pedestrian re-identification method based on mixed cluster center label learning and storage medium | |
CN112000808B (en) | Data processing method and device and readable storage medium | |
CN116644755B (en) | Multi-task learning-based few-sample named entity recognition method, device and medium | |
CN113469186B (en) | Cross-domain migration image segmentation method based on small number of point labels | |
Böhm et al. | Uncertainty quantification with generative models | |
US20240045871A1 (en) | Cardinality estimation method and device for skyline query based on deep learning | |
CN113807900A (en) | RF order demand prediction method based on Bayesian optimization | |
CN116596582A (en) | Marketing information prediction method and device based on big data | |
CN112598082B (en) | Method and system for predicting generalized error of image identification model based on non-check set | |
De Wiljes et al. | An adaptive Markov chain Monte Carlo approach to time series clustering of processes with regime transition behavior | |
CN116522143B (en) | Model training method, clustering method, equipment and medium | |
CN116245139B (en) | Training method and device for graph neural network model, event detection method and device | |
CN114495114B (en) | Text sequence recognition model calibration method based on CTC decoder | |
Brusa et al. | Tempered expectation-maximization algorithm for the estimation of discrete latent variable models | |
CN117113086A (en) | Energy storage unit load prediction method, system, electronic equipment and medium | |
CN114610899A (en) | Representation learning method and system of knowledge graph | |
CN114610953A (en) | Data classification method, device, equipment and storage medium | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
CN112884028A (en) | System resource adjusting method, device and equipment | |
CN112200488A (en) | Risk identification model training method and device for business object | |
CN113032553A (en) | Information processing apparatus, information processing method, and computer program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |