CN112598082A

CN112598082A - Method and system for predicting generalized error of image identification model based on non-check set

Info

Publication number: CN112598082A
Application number: CN202110017334.XA
Authority: CN
Inventors: 伍冬睿; 张潇
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-04-02
Anticipated expiration: 2041-01-07
Also published as: CN112598082B

Abstract

The invention discloses a method and a system for predicting image identification model generalization errors based on a non-check set, belonging to the field of deep learning optimization and generalization and comprising the following steps: after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer; obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture; calculating the output variance value of each training picture, and normalizing the variance value by using the output modular length to obtain an output relative variance; and outputting the relative variance to predict the variation trend of the generalized error of the image recognition model in the training process. Therefore, the method can put all training samples into training without using a calibration set, thereby obtaining better generalization performance; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.

Description

Method and system for predicting generalized error of image identification model based on non-check set

Technical Field

The invention belongs to the field of deep learning optimization and generalization, and particularly relates to a method and a system for predicting a generalization error of an image identification model based on a non-check set.

Background

Machine learning, as a research hotspot of artificial intelligence at present, is often used for mining potential relationships between data. In recent years, data-driven machine learning algorithms have achieved excellent performance in various fields such as biology, medicine, finance, military and the like. With the improvement of data and computational power, deep learning becomes a current research hotspot as a machine learning algorithm which can process images well and is widely applied to various industries.

Although deep learning has a good performance in the task of image recognition, there are many problems to be solved and studied. The neural network model for image recognition has a complex generalization phenomenon in the training process, such as a test error quadratic drop phenomenon in the training process mentioned in the prior art: as the number of training rounds increases, the error of the neural network on the image test set decreases first, then starts to rise due to overfitting, and finally decreases again at some point. These complex generalization phenomena make the trend of the prediction model generalization error change important in the training process. The most common prediction means at present is to divide a part of an image training set into a check set, then train an image recognition model on the rest training set, calculate errors on the check set so as to predict the variation trend of the test errors, and finally perform other downstream processing such as early stop and the like through the predicted variation trend of the test errors.

Although the method for predicting the generalization error curve in the training process of the image recognition model by using the information of the check set is simple and practical, part of the training pictures are omitted by the check set, so that the predicted generalization error curve is often not consistent with the generalization error curve in the actual training process by using all training samples, and the subsequent processing such as early stop is influenced; in addition, the reduction of the number of training pictures due to dividing the check set often brings about the reduction of generalization performance. The latter can be relieved by two rounds of training, namely, a part of the training set is divided into a check set, then the number of training rounds is determined by checking the result on the check set, and finally the check set is merged into the whole training set so as to train the same number of rounds on all pictures; however, the increased training cost causes the loss of hardware and energy to become a new problem, and meanwhile, the process still has no way to ensure that the generalization error curves are changed consistently under the condition of different numbers of training pictures.

Disclosure of Invention

Aiming at the defects or improvement requirements in the prior art, the invention provides a method and a system for predicting the generalization error of an image recognition model based on a non-check set, so that the technical problems of high cost and inaccurate prediction of multiple times of training when the check set is used for predicting the generalization performance in the training process of the conventional image recognition model are solved.

In order to achieve the above object, in one aspect, the present invention provides a method for predicting a generalized error of an image recognition model based on a non-parity set, comprising the following steps:

(1) after each training round is finished, randomly sampling K groups of training pictures, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;

(2) obtaining corresponding K updated models by using the parameter updating amount, and recording the output of the K updated models to each training picture;

(3) calculating the output variance value of each training picture, and normalizing the variance value by using an output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.

Further, the parameter updating amount of the image identification model is a parameter updating gradient.

Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.

Further, the output relative variance RV is represented as:

where n is the number of picture samples, i is 1,2, … …, n, j is 1,2, … …, K, f represents the image recognition model.

In another aspect, the present invention provides a system for predicting a generalized error of an image recognition model based on a non-parity set, including:

the first calculation module is used for randomly sampling K groups of training pictures after each training round is finished, and calculating the parameter updating amount of the image recognition model corresponding to the K groups of training pictures by using a model optimizer;

the updating module is used for obtaining corresponding K updated models by utilizing the parameter updating amount and recording the output of the K updated models to each training picture;

the second calculation module is used for calculating the output variance value of each training picture and normalizing the variance value by using the output module length to obtain an output relative variance; and predicting the variation trend of the generalization error of the image recognition model in the training process by using the output relative variance.

Further, the model optimizer comprises an ADAM optimizer and an SGD optimizer.

Further, the output relative variance RV is represented as:

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

the invention predicts the variation trend of the generalization error of the image recognition model in the training process by the output relative variance, can directly estimate on a training set, and can more accurately judge the variation trend of the generalization error curve in the training process of the image recognition model. Meanwhile, all training pictures can be put into training because a check set is not needed in the process, so that better generalization performance is obtained; in addition, the process only needs to train one round of neural network, and energy and hardware loss caused by multiple times of training is reduced.

Drawings

FIG. 1 is a simplified flow chart for calculating the relative variance of the model output according to the present invention;

FIG. 2 is a test error curve of the neural network model VGG16 when training on the data set CIFAR100 under different tag noises (i.e., randomly perturbing tags of different proportions) and a RV curve calculated using the training set;

fig. 3 is a RV curve and test accuracy curve corresponding to ResNet18 of different widths on a CIFAR10 dataset.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, the present invention provides a method for predicting a generalization error of an image recognition model based on a non-parity set, comprising the following steps:

In particular, to use a training data set comprising n samples

For example, after each training round, K sets of training batches including B training samples (e.g., K100-150, B128 or 256) are randomly sampled from the training dataset D, and then optimizers such as ADAMs (learning rate 1e-3 or 1e-4), SGDs (learning rate 1e-2 or 1e-3, momentum 0.9) and the like in the training of the model are used to obtain corresponding model parameter updates according to the training batches, so as to obtain corresponding K updated models (e.g., n 50000 in CIFAR 10)

Calculating the relative variance values of the K models on the training sample:

experiments show that the RV value and the generalization performance of the model have the same change trend in the training process, so that the generalization performance of the model can be predicted directly by using the RV value without dividing a verification set.

The process of obtaining the RV index requires multiple times of calculation of the model parameter update amount, so that the calculation becomes relatively complicated. A simplified scheme is to use directly sampled random noise (such as Gaussian noise with the mean value of 0 and the variance of 0.001 times the model length of the layer parameter in each layer of the neural network) to replace the model parameter updating amount needing to be calculated, so that the calculation amount is greatly reduced. It should be noted that although this scheme has a simpler computational approach, it is not efficient on partial data sets (e.g., CIFAR 100). This reduction scheme is generally only valid for simple datasets with a small number of classes (typically less than 20 classes).

The RV can be used for predicting the generalization performance curve of a single model in the training process and can also be used for predicting the change of the generalization performance when the architecture gradually changes. For example, after ResNet18 with different widths train on CIFAR10 for the same number of rounds, corresponding test accuracy rate changes can be predicted by respectively calculating RVs corresponding to the same rounds by using an SGD optimizer (learning rate is 1e-3) without momentum. The experimental result shows that RV has extremely high correlation with accuracy, and the change trend of the generalization performance of ResNet18 along with the width change can be predicted to a certain extent.

Fig. 1 shows a simplified flow chart of the calculation of the relative variance of the model. Different training batches are sampled in a training data set to calculate the corresponding model parameter updating amount, then the variance of the output of each model after the parameters are respectively updated to the same training sample point is estimated, the output model length is used for normalization, and the expectation of the value on the training sample point is obtained, so that the output relative variance index is obtained. By estimating the index in different training stages and recording the variation trend of the index in the training process, the variation trend of the generalized error can be obtained.

Fig. 2 shows test error curves of the neural network model VGG16 when training on the data set CIFAR100 under different label noises (i.e., labels randomly perturbed by different proportions) and RV curves calculated using the training set. The two curves are symmetrical in the vertical direction, and the experimental result shows that the RV can well predict the change curve of the generalization performance of the model in the training process.

Fig. 3 shows ResNet18 of different widths on a CIFAR10 dataset with its corresponding RV and test accuracy. The widths were 0.25-2.0 times the width of the original model, respectively, and 100 rounds of training were performed using an ADAM optimizer (learning rate 1 e-4). Through calculation, the correlation degree of the RV and the test accuracy is-0.94, the significance test p value is 0.0006, and the result shows that the RV has a good prediction effect on the test accuracy of models with different widths.

The division of each module in the system for identifying the model generalization error based on the non-check set predicted image is only used for illustration, and in other embodiments, the system for identifying the model generalization error based on the non-check set predicted image can be divided into different modules as required to complete all or part of the functions of the system.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for predicting generalized error of an image identification model based on a non-check set is characterized by comprising the following steps:

2. The method for predicting the generalization error of an image recognition model based on a non-parity set according to claim 1, wherein the parameter update amount of said image recognition model is a parameter update gradient.

3. The method for identifying model generalized errors based on non-check set predicted images of claim 1, wherein said model optimizer comprises an ADAM optimizer, an SGD optimizer.

4. The method for generalizing the error based on a non-parity set predictive image recognition model according to claim 1, wherein said output relative variance RV is represented as:

5. A system for predicting generalized error of an image recognition model based on a non-parity set, comprising:

6. The system for predicting image recognition model generalized errors based on non-parity set as claimed in claim 5 wherein the parameter update amount of said image recognition model is a parameter update gradient.

7. The system for identifying model generalized errors based on non-check set predicted images according to claim 5, wherein said model optimizer comprises an ADAM optimizer, an SGD optimizer.

8. The system for generalizing the error based on a non-parity set predictive image recognition model according to claim 5, wherein said output relative variance RV is represented as: