CN114037864A

CN114037864A - Method and device for constructing image classification model, electronic equipment and storage medium

Info

Publication number: CN114037864A
Application number: CN202111278779.XA
Authority: CN
Inventors: 程新景; 李爽
Original assignee: International Network Technology Shanghai Co Ltd
Current assignee: International Network Technology Shanghai Co Ltd
Priority date: 2021-10-31
Filing date: 2021-10-31
Publication date: 2022-02-11
Anticipated expiration: 2041-10-31
Also published as: CN114037864B

Abstract

The invention relates to a method, device, electronic device and storage medium for constructing an image classification model. The method includes: augmenting the original training samples to obtain the augmented training samples of the current round; acquiring the covariance matrix of the augmented training samples of the current round; and obtaining the covariance matrix of the augmented training samples of the current round according to the matrix, to determine the upper bound function of the first loss function of the image classification model to be optimized after the last round of secondary optimization on the training sample after the current round of augmentation; take the upper bound function as the optimization goal to optimize, Obtain the image classification model to be optimized for the first optimization in the current round; use the verification set to optimize the covariance matrix in the upper bound function; according to the optimized covariance matrix, perform the second optimization on the image classification model to be optimized for the first optimization in the current round optimization. The present invention utilizes the verification set to optimize the covariance matrix in the upper bound function, thereby improving the performance of the model.

Description

Method and device for constructing image classification model, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for constructing an image classification model, an electronic device, and a storage medium.

Background

The scheme aims to solve the problem of neural network training under long tail training distribution. In real life, the long tail problem is very common, such as the two eight law in economics, the sept law of natural language processing power. In the field of computer vision, the problem of long tail is often encountered that the distribution of the training samples is presented as a long tail distribution, that is, the class distribution of the training samples is unbalanced, the head class occupies most data, and the tail class occupies few data. The image classification model obtained by training according to the training samples distributed along the long tail has lower accuracy in classifying the tail type images.

In the prior art, a resampling technology is usually used to perform down-sampling on training samples of most categories and perform up-sampling on training samples of few categories to obtain a more average training distribution. However, resampling may sacrifice the classification performance of some of the majority classes, and there is some risk of overfitting on the minority classes. This is because the minority of natural data is small in amount, and the role of the data is too emphasized, so that overfitting is likely to occur.

In view of the above-mentioned drawbacks of the prior art, a technical solution for improving the generalization performance on the tail category is needed.

Disclosure of Invention

The invention provides a method and a device for constructing an image classification model, electronic equipment and a storage medium, which are used for solving the defect of poor generalization performance of the image classification model on tail categories in the prior art.

The invention provides a construction method of an image classification model, which comprises the following steps: the following steps are executed iteratively:

amplifying the original training samples to obtain the training samples after the current round of amplification;

acquiring a covariance matrix of the training samples after the current round of amplification;

determining an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of augmentation according to the covariance matrix of the training sample after the current round of augmentation;

optimizing by taking the upper bound function as an optimization target to obtain an image classification model to be optimized, which is optimized for the first time in the current round;

optimizing a covariance matrix in the upper bound function by using a verification set;

and performing secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

According to the construction method of the image classification model provided by the invention, before the first iteration, the method further comprises the following steps:

and training the original image classification model by using the original training sample to obtain the image classification model to be optimized.

According to the method for constructing the image classification model, the method for amplifying the original training sample to obtain the training sample after the current round of amplification comprises the following steps:

extracting feature data of the original training sample according to the image classification model to be optimized after the previous round of secondary optimization, and obtaining a covariance matrix of the original training sample again according to the feature data of the original training sample;

obtaining an augmented sample of the original training sample according to the characteristic data of the original training sample and the covariance matrix of the original training sample

According to the method for constructing the image classification model provided by the invention, the determining of the upper bound function of the first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification according to the covariance matrix of the training sample after the current round of amplification comprises the following steps:

determining a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification according to the training sample after the current round of amplification and the covariance matrix of the training sample after the current round of amplification;

and determining an upper bound function of the first loss function by assuming a mode of carrying out infinite amplification on the training samples after the current round of amplification.

According to the method for constructing the image classification model, which is provided by the invention, the image classification model to be optimized, which is firstly optimized in the current round, is obtained by optimizing the upper bound function as an optimization target, and the method comprises the following steps:

taking the minimized upper bound function as an optimization target, and obtaining a model parameter of the first optimization of the current round;

and updating the image classification model to be optimized according to the model parameters of the current round of first optimization to obtain the image classification model to be optimized of the current round of first optimization.

According to the method for constructing the image classification model provided by the invention, the optimization of the covariance matrix in the upper bound function by using the verification set comprises the following steps:

acquiring a second loss function of the image classification model to be optimized, which is firstly optimized in the current round, on the verification set;

and taking the covariance matrix in the second loss function as a hyper-parameter, and taking the minimized second loss function as an optimization target to obtain the optimized covariance matrix corresponding to the second loss function.

According to the method for constructing the image classification model provided by the invention, the secondary optimization is carried out on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix, and the method comprises the following steps:

replacing the covariance matrix of the training sample after the current round of amplification in the upper bound function with the optimized covariance matrix;

taking the upper bound function after the covariance matrix is replaced in a minimized mode as an optimization target, and obtaining model parameters of the current round of secondary optimization;

and updating the image classification model to be optimized according to the model parameters of the current round of secondary optimization.

According to the method for constructing the image classification model provided by the invention, after the secondary optimization is performed on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix, the method further comprises the following steps:

stopping iteration under the condition that the current iteration round reaches a preset number, or under the condition that the difference value between the upper bound function value of the current iteration round and the upper bound function value of the previous iteration round is smaller than a preset threshold value;

and determining the image classification model to be optimized after the secondary optimization in the iteration turn is stopped as a target image classification model.

The invention also provides a device for constructing the image classification model, which comprises the following components:

the sample amplification module is used for amplifying the training samples after the previous round of amplification to obtain the training samples after the current round of amplification; when the first iteration is performed, the original training sample is augmented;

the matrix acquisition module is used for acquiring a covariance matrix of the training samples after the current round of amplification;

an upper bound function obtaining module, configured to determine, according to the covariance matrix of the training sample after the current round of amplification, an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification;

the first optimization module is used for optimizing by taking the upper bound function as an optimization target to obtain an image classification model to be optimized for the first optimization of the current round;

the matrix optimization module is used for optimizing a covariance matrix in the upper bound function by using a verification set;

and the secondary optimization module is used for carrying out secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize all or part of the steps of the construction method of the image classification model.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements all or part of the steps of the method of constructing an image classification model as described in any one of the above.

According to the construction method and device of the image classification model, the electronic equipment and the storage medium, the training samples are expanded, the occupation ratio of the training samples in a few categories is increased, and the model prediction performance of the image classification model to be optimized in the few categories is improved; the bound function is used as an optimization target function, and target model parameters of the image classification model to be optimized are optimized, so that the time complexity of model training and optimization processes is greatly reduced; and optimizing the covariance matrix in the upper bound function by using the class-balanced verification set, and further performing secondary optimization on the model, thereby further improving the performance of the image classification model to be optimized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for constructing an image classification model according to the present invention;

FIG. 2 is a schematic structural diagram of an image classification model building apparatus provided in the present invention;

fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes a method, an apparatus, an electronic device, and a storage medium for constructing an image classification model according to the present invention with reference to fig. 1 to 3.

Fig. 1 is a schematic flow chart of a method for constructing an image classification model according to the present invention, as shown in fig. 1, the method includes:

s11, amplifying the original training sample to obtain a training sample after the current round of iterative amplification;

specifically, the method for constructing the image classification model provided by the invention is used for amplifying the original training sample on the basis of the preliminarily trained image classification model to be optimized and correspondingly performing iterative optimization on the image classification model to be optimized, thereby obtaining the optimized image splitting model. The iterative augmentation basis of the training samples is the original training samples, and different data augmentation directions are searched on the basis of the original training samples and are used for performing iterative optimization on the image classification model to be optimized. And the image classification model to be optimized after the previous round of optimization is used as the basis of each round of iterative optimization of the image classification model to be optimized.

And in each iteration, performing data amplification on the basis of the original training sample to obtain the training sample after the current round of amplification. The training sample comprises a sample image and a reference classification label corresponding to the sample image, and the reference classification label can be represented by the number of the corresponding classification of the sample image. The data augmentation method includes, for example: the pixel space degree image is subjected to operations such as turning, rotation and translation, and for example, sampling and conversion are performed in a depth feature space of a training sample to generate new features to augment data. And different weights can be given to training samples of different classes in combination with class weighting, so that the image classification model can pay more attention to the gradient generated by a few classes of samples. The invention is not limited to the particular manner in which the data is augmented.

S12, acquiring a covariance matrix of the training samples after the current round of amplification;

specifically, the training samples after the current round of augmentation include the original training samples and the partial training samples generated by the current round of augmentation. The covariance matrix describes the sample-to-sample correlation in each class of training samples. The obtaining mode of the covariance matrix can be based on the image classification model to be optimized after the previous iteration optimization to perform feature extraction on the training sample after the current round of amplification to obtain feature data, and then statistics is performed on the feature data to obtain the covariance matrix of the training sample after the current round of amplification.

S13, determining an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification according to the covariance matrix of the training sample after the current round of amplification;

specifically, the loss function represents the difference between the prediction output (i.e., the image classification result) of the model and the actual value, the larger the loss function is, the worse the prediction performance of the model is, and the smaller the loss function is, the better the prediction performance of the model is. According to the sample images in the training samples after the current round of amplification and the reference classification labels corresponding to the sample images, and in combination with the covariance matrix, a first loss function of the image classification model to be optimized on the training samples after the current round of amplification can be determined. Further, the image classification model has different loss functions based on different samples. And the training samples are frequently replaced/added, so that the process of training and optimizing the model is time-consuming. The upper bound function of the loss function of the image classification model to be optimized is determined through derivation, and the loss function is always smaller than the upper bound function, so that the prediction performance of the image classification model can be indirectly known through the upper bound function.

It should be noted that the process of training sample augmentation can be directly implemented by updating the upper bound function in a data updating manner, without explicitly generating an actual augmentation sample, or executing a process of training the model to be optimized according to the augmentation sample. Directly using the upper bound function as an optimization target function, and determining target model parameters of the image classification model to be optimized by iteratively updating the upper bound function, thereby greatly reducing the time complexity of the model training and optimizing processes.

S14, optimizing by taking the upper bound function as an optimization target to obtain a current round of first-time optimized image classification model to be optimized;

specifically, the upper bound function represents the maximum value of the loss function of the image classification model to be optimized on the corresponding training sample. And taking the upper bound function as an optimization objective function, and iteratively updating the upper bound function correspondingly by amplifying the iteration of the training sample. And when the iteration update reaches a preset condition, stopping the iteration update, and determining the model parameters corresponding to the upper bound function when the iteration update is stopped as the model parameters of the image classification model to be optimized for the first time of the current round (namely updating the model parameters of the image classification model to be optimized according to the model parameters corresponding to the upper bound function when the iteration update is stopped).

S15, optimizing a covariance matrix in the upper bound function by using a verification set;

specifically, the original training samples themselves have the defect of class imbalance, and the augmented samples also have the defect to some extent, so that the covariance matrix estimated from the augmented samples is not fine enough. Based on the characteristic that if an appropriate covariance matrix is used for data enhancement, a loss function on a balanced verification set is minimized, the covariance matrix is optimized by using the verification set on the basis of the covariance matrix estimated by the augmented sample, the verification set has less data amount compared with an original training sample, but the sample class distribution in the verification set is relatively balanced and is closer to the actual test sample distribution, and a more accurate covariance matrix can be estimated. Specifically, the covariance matrix can be regarded as a hyper-parameter, and the covariance matrix is optimized by minimizing a loss function of the first-time optimized image classification model to be optimized in the current round on the verification set.

The invention utilizes the thought of meta-learning to learn a covariance matrix which has more information content and is more suitable for an image classification model to be optimized through the knowledge of a small (namely, smaller data volume) balance verification set, and the covariance matrix is used as the basis for iterative sample augmentation and model optimization.

And S16, carrying out secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

Specifically, after the covariance matrix is optimized according to the verification set, the upper bound function may be updated by using the optimized covariance matrix, and the image classification model to be optimized is secondarily optimized with reference to steps S13 and S14 again.

It should be noted that the first update performed on the model to be optimized by the current wheel in step S14 is a pseudo update implemented on a duplicate image classification model to be optimized (i.e. equivalent to an intermediate auxiliary process implemented on the secondary model). Step S16 is a formal update implemented on the image classification model to be optimized (i.e. the main model) after the previous round of secondary optimization.

In the embodiment, training samples are expanded, the occupation ratio of training samples in a few categories is increased, and the model prediction performance of the image classification model to be optimized in the few categories is improved; the bound function is used as an optimization target function, and target model parameters of the image classification model to be optimized are optimized, so that the time complexity of model training and optimization processes is greatly reduced; and optimizing the covariance matrix in the upper bound function by using the class-balanced verification set, and further performing secondary optimization on the model, thereby further improving the performance of the image classification model to be optimized.

Based on any of the above embodiments, in an embodiment, before the first iteration, the method further includes:

Specifically, the image classification model to be optimized is an image classification model that has been subjected to preliminary training, and before extracting feature data of an original training sample by using the image classification model to be optimized, the original training sample is required to be used to train the original image classification model, and model parameters are adjusted to obtain the image classification model to be optimized. The type of the image classification model may be VGG network, google net network, Residual network, ResNet network, etc., and the specific type is not limited. The original image classification model comprises a feature extraction network and a classifier network. The feature extraction network may be configured to perform feature extraction on training data (i.e., the original training samples) to obtain feature data. The classifier network can be used for processing the characteristic data process and outputting a prediction result, the classification result is displayed in a probability form, and the class corresponding to the value with the largest proportion in the probability is the label of the final prediction.

In this embodiment, the original training sample is used to train the original image classification model to obtain the image classification model to be optimized, which facilitates subsequent iterative optimization.

Based on any one of the above embodiments, in an embodiment, the augmenting the original training sample to obtain the augmented training sample of the current round includes:

and obtaining an augmented sample of the original training sample according to the characteristic data of the original training sample and the covariance matrix of the original training sample.

Specifically, feature extraction can be performed on the original training sample by using the image classification model to be optimized after the previous round of secondary optimization, and the extracted feature data are depth features with high abstraction degree. The method comprises the steps of counting feature data of an original training sample to obtain a corresponding covariance matrix, wherein the covariance matrix is composed of covariance matrixes of a plurality of categories, the covariance matrix of each category can be obtained by counting according to depth features of the category of the training sample, and covariance in the covariance matrix represents correlation among the corresponding features. On the basis, the covariance matrix can be sampled and converted, and the original training sample is augmented by combining the extracted characteristic data, so that the augmented sample of the current round is obtained. The augmented training sample can effectively avoid the condition of overfitting on a few categories.

For example, assume that the original training sample set is:

wherein x_iIs the image in the ith training sample, and y_iIs its corresponding label, and the entire training data set has N samples. Training sample x is subjected to image classification model to be optimized after previous round of secondary optimization_iExtracting the characteristics, and marking the obtained characteristic data as a_iThe covariance matrix of the original training samples can be expressed as ∑ ═ Σ₁,Σ₂,…,Σ_M}，Σ₁,Σ₂,…,Σ_MIs the covariance matrix of each class, and M is the total number of classes.

And counting the characteristic distribution in the covariance matrix to obtain the distribution condition, and knowing the corresponding data amplification direction according to the distribution condition. The data augmentation directions correspond to actual samples, such as different colors of cars in a "car" type picture, different viewing directions of cars, and so forth. The covariance matrix is sampled according to the distribution situation, and a conversion vector is generated, and the conversion vector can be used for enhancing the feature data and generating new feature data (namely, the training sample is correspondingly augmented). The characteristic distribution preferably uses a Gaussian distribution model, which can be recorded as

Wherein, y_iIs the classification label corresponding to the ith image, one of a plurality of categories,

is the covariance matrix of the classification, and λ is the hyper-parameter used to adjust the data amplification strength. Sampling the covariance matrix according to the Gaussian distribution, and generating a conversion vector for enhancing the characteristic data a corresponding to the ith image_i。

In the embodiment, a covariance matrix representing the relevance between each feature data in the training samples is obtained according to the feature data of the training samples after the previous round of amplification; the covariance matrix is sampled according to the characteristic distribution in the covariance matrix, and the generated conversion vector is used for amplifying the training sample after the previous round of amplification, so that the condition that the image recognition model to be optimized is over-fitted in a few categories is avoided.

Based on any one of the above embodiments, in an embodiment, the determining, according to the covariance matrix of the training sample after the current round of augmentation, an upper bound function of a first loss function of the image classification model to be optimized after the previous round of quadratic optimization on the training sample after the current round of augmentation includes:

determining a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification according to the training sample after the current generation of amplification and the covariance matrix of the training sample after the current round of amplification;

Specifically, the loss function represents the difference between the prediction output (i.e., the image classification result) of the model and the actual value, the larger the loss function is, the worse the prediction performance of the model is, and the smaller the loss function is, the better the prediction performance of the model is. And obtaining a loss function of the image classification model to be optimized on the training sample after the augmentation every time the training sample augmentation is carried out. In order to explore all possible data augmentation directions, the covariance matrix of the training samples needs to be sampled for multiple times according to the feature distribution in the covariance matrix, augmentation is correspondingly performed for multiple times, and the loss function of the image classification model to be optimized on the corresponding augmentation samples can be updated. As a further example, assuming infinite sampling, the training sample augmentation is performed accordingly, so that the upper bound function of the loss function of the image classification model to be optimized can be determined.

In this embodiment, an upper bound function of the loss function is determined by assuming a mode of carrying out infinite amplification on the training samples after the current round of amplification, so that the prediction performance of the image classification model can be conveniently and indirectly known.

Based on any one of the above embodiments, in an embodiment, the optimizing with the upper bound function as an optimization target to obtain the current round of first-optimized image classification model to be optimized includes:

Specifically, the optimization process of the image classification model to be optimized by directly taking the minimized loss function as the optimization target is complex, the calculated amount is large, and the model parameters optimized for the first time in the current round are obtained by minimizing the upper bound function of the first loss function of the image classification model to be optimized on the training sample after the current round is augmented, so that the model parameters can be updated conveniently.

In the embodiment, the image classification model to be optimized is optimized for the first time conveniently through the upper bound function of the first loss function on the training sample after the current round of augmentation.

Based on any of the above embodiments, in one embodiment, the optimizing the covariance matrix in the upper bound function using the validation set includes:

Specifically, a second loss function of the image classification model to be optimized, which is first optimized in the current round, on the verification set is obtained; and taking the covariance matrix in the second loss function as a hyperparameter (namely, as a variable), conducting derivation updating on the hyperparameter, and performing optimization adjustment on the covariance matrix in the second loss function by using a gradient descent algorithm and taking the minimized second loss function as an optimization target to obtain the optimized covariance matrix corresponding to the second loss function.

In the embodiment, the covariance matrix is optimized by using the verification set with relatively more balanced sample class distribution, so that more accurate model parameters are conveniently obtained, and the model parameters of the image classification model to be optimized are improved.

Based on any one of the above embodiments, in an embodiment, the performing, according to the optimized covariance matrix, secondary optimization on the current-round first-optimized image classification model to be optimized includes:

Specifically, after the optimized covariance matrix is obtained, the optimized covariance matrix is used for replacing the covariance matrix of the training sample after the current round of amplification in the upper bound function of the step S13, and the step S14 is executed again, the upper bound function after the covariance matrix is replaced in a minimized mode is used as an optimization target, the model parameters of the current round of secondary optimization are obtained, the image classification model to be optimized is updated according to the model parameters of the current round of secondary optimization, and the secondary optimization of the current round is achieved.

In this embodiment, the optimized covariance matrix is used to replace the covariance matrix of the training sample after the current round of amplification in the upper bound function, and through a single iteration cycle, the next optimization is performed each time based on the current optimal model parameter/covariance matrix, so that the iterative optimization process is accelerated, the time consumed by calculation is reduced, and the model performance of the image classification model to be optimized is improved.

Based on any one of the above embodiments, in an embodiment, after performing secondary optimization on the current round of first-optimized image classification model to be optimized according to the optimized covariance matrix, the method further includes:

Specifically, the iteration may be stopped when the number of iteration rounds reaches a preset number, and the preset number may be set according to the performance requirement of the model image classification model to be optimized in combination with the historical data, for example, set to 800 times; the iteration can also be stopped under the condition that the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than a preset threshold value, the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than the preset threshold value, which indicates that the space for continuous optimization is smaller, and the time consumption for further optimization is longer, and the iteration can be stopped at the moment. And determining the image classification model to be optimized after the secondary optimization in the iteration turn is stopped as a target image classification model.

In this embodiment, when the iteration number reaches the preset number, or when the difference between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than the preset threshold, the iteration is stopped, the condition for stopping the iteration is flexibly and accurately set, and the differentiated model building requirement of the user is met.

The method for constructing the image classification model provided by the invention is described below with reference to a preferred embodiment.

Assume that the original training sample set is defined as:

wherein x_iIs the image in the ith training sample, and y_iIs its corresponding label, and the entire original training data set has N samples.

Training sample data of the tail category is extremely lack of data, so that the performance of the trained image classification model is extremely unbalanced, and the performance of the trained image classification model is poor on the tail category. In this embodiment, a semantic augmentation technology (ISDA) is used to augment the training samples of the tail classes, so as to improve the performance of the classifier. To is coming toRealizing semantic data augmentation, and dynamically estimating covariance matrix sigma ═ of each category by ISDA (inverse space division multiple access)₁,Σ₂,…,Σ_C}. The ISDA then constructs a gaussian distribution using these covariance matrices and samples from the gaussian distribution to augment the data. By infinite sampling, the ISDA derives an optimized upper bound on the loss function for these augmented training samples:

wherein x is_iIs the image in the ith training sample, and y_iIs the corresponding label, theta is the model parameter of the image classification model to be optimized, sigma is the covariance matrix of the feature data of the training sample,

is a sample x_iThe c-th element in the prediction output result (i.e., the probability that the corresponding prediction belongs to the c-th class),

is a sample x_iY in the predicted output result of (1)_iAn element (i.e. the corresponding prediction belongs to the y-th_iThe probability of an individual class),

is the covariance matrix of the corresponding classification of the ith training sample, w_cIs the c column in the last layer classifier weight matrix of the image classification model to be optimized

Is the c column and the y column in the last layer of classifier weight matrix of the image classification model to be optimized_iThe difference value of the columns is,

is that

The transposing of (1).

By optimizing this upper bound function, the ISDA can effectively implement an equivalent semantic augmentation process. Generally, an image classification model can be constructed by using formula (1), and the data enhancement process is completed at the same time. But in the context of the long tail problem, the head class dominates the training set. And by equation (1) it can be observed that the effect of data augmentation is training data dependent. If equation (1) is used directly, it is actually mainly in the most extensive category, contrary to our goal.

Therefore, in order to solve this problem, the present application proposes to adjust the ratio of the training samples of the few classes by applying different gains to the samples after the semantic augmentation according to the number of samples included in the classes in combination with the class weights and the formula (1). The category weight is defined as:

wherein e is_cIs a preset class weight of the c-th class, n_cThe number of training samples in the c-th category, beta is a hyper-parameter, and the value range of beta is (0, 1).

The optimal network parameters can be obtained by using the loss function after class weight weighting:

wherein, theta^*And representing the optimized model parameters obtained by the optimization loss function correspondingly.

The model is further improved, considering that the performance of the ISDA depends on the estimation of the covariance matrix, under the long tail problem, the ISDA performance is limited by too few tail types, and reasonable covariance estimation is difficult to obtain. In order to further solve the problem of class imbalance and improve the performance of the model in a few classes, the embodiment tries to learn a proper covariance matrix to perform data enhancement, so as to lead to better performance. The key idea is as follows: if a suitable covariance matrix is used for data enhancement, the penalty function on the balanced validation set should be minimized. In this embodiment, we utilize meta-learning to achieve this goal. If the covariance matrix is considered as a hyperparameter, it can actually be searched for its optimum value using the validation set. In detail, consider a small verification set

Wherein N is^vIs the number of samples in the validation set and the number of samples in the validation set is much smaller than the training set. The optimal covariance matrix can be obtained by minimizing the following loss function:

wherein, sigma^*Represents the optimized covariance matrix, L, obtained by minimizing the loss function_ce(. cndot.) is a cross-entropy loss equation. N is a radical of^vIs the number of samples in the validation set,

validating sample images in the ith sample number in the set, y_iIs the reference classification label corresponding to the sample image in the ith training sample in the validation set. Because the validation set is balanced, the present embodiment employs a common CE loss function.

Considering that in the formula (3), in order to obtain the optimal values of the model parameters and the covariance matrix, two nested loops are required, and excessive computing resources are consumed. Thus, the present embodiment updates both the model parameters and the covariance matrix using an online strategy through a single iteration loop, each time seeking the current optimal solution for one quantity based on the current optimal solution for the other quantity. Given the current iteration round t, we obtain the current covariance statistically. The model parameters are then updated using the following equations:

where α is the learning rate of the model parameters. Sigma^tIs the covariance matrix, Θ, of the training samples after the current round of augmentation^tAnd model parameters of the image classification model to be optimized after the previous round of secondary optimization.

The model parameters of the image classification model to be optimized after the current round of first optimization are obtained, and the model parameters which are optimized for the first time are obtained after the step of back propagation is executed. Then, the covariance matrix is updated using the following equation:

where γ is the learning rate of the covariance matrix, Σ^t+1The covariance matrix of the image classification model to be optimized after the first optimization of the current round is optimized according to the verification set, and further, the optimized covariance matrix can be used for further improving the model parameters of the image classification model:

wherein, theta^t+1The updated covariance matrix is learned from the verification set of class distribution balance, so that the optimized covariance matrix can help to learn better model parameters, further improvement of the model is realized, and the model performance of the constructed image classification model on a few classes is improved.

For a few classes of data augmentation problems, the present embodiment mainly uses a semantic augmentation technique in a depth space, which depends on the estimation of the covariance matrix of features of each class, and if the estimation of the covariance matrix is inaccurate or the amount of information is small, the effect of data augmentation may be poor. The present embodiment addresses this problem, making improvements using meta learning. A loss function of the class distribution equalized validation dataset is employed to help learn a better covariance matrix. And then substituting the learned covariance matrix into the semantic data augmentation process to intend to obtain a better data augmentation effect. The whole data augmentation training process can be replaced by using a loss function without generating a plurality of augmentation samples in a display mode, and the operation has the advantages that:

1. and the augmented sample is not required to be displayed and generated, and the training overhead is low.

2. Meaningful data augmentation directions can be learned automatically without the need for human selection of augmentation directions.

3. And a better covariance matrix can be obtained by only needing a small amount of verification set data without additional data.

4. Filling in minority classes through data augmentation may reduce the risk of model overfitting on minority classes.

The following describes an image classification model construction apparatus provided by the present invention, and the image classification model construction apparatus described below and the image classification model construction method described above may be referred to in correspondence with each other.

Fig. 2 is a schematic structural diagram of an image classification model building apparatus provided in the present invention, and as shown in fig. 2, the apparatus includes: the system comprises a sample augmentation module 21, a matrix acquisition module 22, an upper bound function acquisition module 23, a first optimization module 24, a matrix optimization module 25 and a second optimization module 26.

The sample augmentation module 21 is used for augmenting the original training sample to obtain the training sample after the current round of augmentation;

the matrix obtaining module 22 is configured to obtain a covariance matrix of the training samples after the current round of amplification;

an upper bound function obtaining module 23, configured to determine, according to the covariance matrix of the training sample after the current round of amplification, an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification;

the first optimization module 24 is configured to optimize with the upper bound function as an optimization target to obtain a current round of first optimized to-be-optimized image classification model;

a matrix optimization module 25 for optimizing a covariance matrix in the upper bound function using a validation set;

and the secondary optimization module 26 is configured to perform secondary optimization on the image classification model to be optimized, which is first optimized in the current round, according to the optimized covariance matrix.

Based on any one of the above embodiments, in an embodiment, the apparatus further includes:

and the model training module is used for training the original image classification model by using the original training sample to obtain the image classification model to be optimized.

In one embodiment, based on any of the above embodiments, the sample amplification module 21 includes:

the first sample amplification unit is used for extracting the feature data of the original training sample according to the image classification model to be optimized after the previous round of secondary optimization, and re-acquiring the covariance matrix of the original training sample according to the feature data of the original training sample;

and the second sample amplification unit is used for obtaining the amplification sample of the original training sample according to the characteristic data of the original training sample and the covariance matrix of the original training sample.

Based on any one of the above embodiments, in an embodiment, the upper bound function obtaining module 23 includes:

the first obtaining unit is used for determining a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of amplification according to the training sample after the current round of amplification and the covariance matrix of the training sample after the current round of amplification;

and the second acquisition unit is used for determining an upper bound function of the first loss function by assuming a mode of carrying out infinite amplification on the training samples after the current round of amplification.

Based on any of the above embodiments, in an embodiment, the first optimization module 24 includes:

the first model optimization unit is used for obtaining the model parameters of the first optimization of the current round by taking the minimized upper bound function as an optimization target;

and the second model optimization unit is used for updating the image classification model to be optimized according to the model parameters of the current round of first optimization to obtain the image classification model to be optimized of the current round of first optimization.

Based on any of the above embodiments, in one embodiment, the matrix optimization module 25 includes;

the first matrix optimization unit is used for acquiring a second loss function of the current round of firstly optimized image classification model to be optimized on the verification set;

and the second matrix optimization unit is used for taking the covariance matrix in the second loss function as a hyperparameter and taking the minimized second loss function as an optimization target to obtain the optimized covariance matrix corresponding to the second loss function.

Based on any of the above embodiments, in an embodiment, the secondary optimization module 26 includes:

the third model optimization unit is used for replacing the covariance matrix of the training sample after the current round of amplification in the upper bound function with the optimized covariance matrix;

the fourth model optimization unit is used for obtaining model parameters of the current round of secondary optimization by taking the upper bound function subjected to minimum replacement of the covariance matrix as an optimization target;

and the fifth model optimization unit is used for updating the image classification model to be optimized according to the model parameters of the current round of secondary optimization.

the iteration detection module is used for stopping iteration under the condition that the current iteration round reaches a preset number, or the difference value between the upper bound function value of the current iteration round and the upper bound function value of the previous iteration round is smaller than a preset threshold value;

and the model determining module is used for determining the image classification model to be optimized after the secondary optimization in the iteration turn is stopped as a target image classification model.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform all or part of the steps of the above-mentioned construction method of the provided image classification model, the method comprising: amplifying the original training samples to obtain the training samples after the current round of amplification; acquiring a covariance matrix of the training samples after the current round of amplification; determining an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of augmentation according to the covariance matrix of the training sample after the current round of augmentation; optimizing by taking the upper bound function as an optimization target to obtain an image classification model to be optimized, which is optimized for the first time in the current round; optimizing a covariance matrix in the upper bound function by using a verification set; and performing secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform all or part of the steps of the method for constructing an image classification model provided above, the method comprising: amplifying the original training samples to obtain the training samples after the current round of amplification; acquiring a covariance matrix of the training samples after the current round of amplification; determining an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of augmentation according to the covariance matrix of the training sample after the current round of augmentation; optimizing by taking the upper bound function as an optimization target to obtain an image classification model to be optimized, which is optimized for the first time in the current round; optimizing a covariance matrix in the upper bound function by using a verification set; and performing secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform all or part of the steps of the above-provided image classification model construction method, the method including: amplifying the original training samples to obtain the training samples after the current round of amplification; acquiring a covariance matrix of the training samples after the current round of amplification; determining an upper bound function of a first loss function of the image classification model to be optimized after the previous round of secondary optimization on the training sample after the current round of augmentation according to the covariance matrix of the training sample after the current round of augmentation; optimizing by taking the upper bound function as an optimization target to obtain an image classification model to be optimized, which is optimized for the first time in the current round; optimizing a covariance matrix in the upper bound function by using a verification set; and performing secondary optimization on the image classification model to be optimized, which is firstly optimized in the current round, according to the optimized covariance matrix.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. a construction method of an image classification model, is characterized in that, comprises: iteratively executes the following steps:

Augment the original training samples to obtain the augmented training samples of the current round;

Obtain the covariance matrix of the augmented training samples of the current round;

According to the covariance matrix of the augmented training samples of the current round, determine the upper bound of the first loss function of the image classification model to be optimized after the last round of secondary optimization on the augmented training samples of the current round function;

Taking the upper bound function as the optimization goal to optimize, obtain the image classification model to be optimized for the first optimization of the current round;

Use the validation set to optimize the covariance matrix in the upper bound function;

According to the optimized covariance matrix, secondary optimization is performed on the image classification model to be optimized for the first optimization in the current round.

2. The method for constructing an image classification model according to claim 1, wherein, before the first round of iteration, the method further comprises:

The original image classification model is trained by using the original training samples to obtain the image classification model to be optimized.

3. The method for constructing an image classification model according to claim 1, wherein the original training sample is augmented to obtain the augmented training sample of the current round, comprising:

Extract the characteristic data of the original training sample according to the image classification model to be optimized after the last round of secondary optimization, and re-acquire the covariance matrix of the original training sample according to the characteristic data of the original training sample;

An augmented sample of the original training sample is obtained according to the characteristic data of the original training sample and the covariance matrix of the original training sample.

4. the construction method of image classification model according to claim 1, is characterized in that, described according to the covariance matrix of the training sample after described current round augmentation, determine the image to be optimized after last round of secondary optimization The upper bound function of the first loss function of the classification model on the augmented training samples of the current round, including:

According to the augmented training samples of the current round and the covariance matrix of the augmented training samples of the current round, it is determined that the image classification model to be optimized after the previous round of secondary optimization is after the current round of augmentation. The first loss function on the training samples of ;

The upper bound function of the first loss function is determined by assuming that the augmented training samples of the current round are augmented countless times.

5. the construction method of image classification model according to claim 1, is characterized in that, described taking described upper bound function as optimization target to carry out optimization, obtains the image classification model to be optimized that is optimized for the first time in current round, comprising:

Taking minimizing the upper bound function as the optimization goal, obtain the model parameters optimized for the first time in the current round;

The image classification model to be optimized is updated according to the model parameters of the first optimization in the current round, and the image classification model to be optimized for the first optimization in the current round is obtained.

6. the construction method of image classification model according to claim 1, is characterized in that, described utilizes validation set to optimize the covariance matrix in described upper bound function, comprises:

Obtain the second loss function of the image classification model to be optimized for the first optimization of the current round on the verification set;

The covariance matrix in the second loss function is regarded as a hyperparameter, and the optimization goal is to minimize the second loss function, and the optimized covariance matrix corresponding to the second loss function is obtained.

7. the construction method of image classification model according to claim 1, is characterized in that, described according to the covariance matrix after optimization, the image classification model to be optimized that described current round is optimized for the first time is carried out secondary optimization, comprising:

Replace the covariance matrix of the training sample after the current round augmentation in the upper bound function with the optimized covariance matrix;

Taking minimizing the upper bound function after replacing the covariance matrix as the optimization objective, obtain the model parameters of the current round of secondary optimization;

The to-be-optimized image classification model is updated according to the model parameters of the current round of secondary optimization.

8. The method for constructing an image classification model according to any one of claims 1-7, wherein the image classification model to be optimized for the first optimization of the current round is carried out secondary optimization according to the optimized covariance matrix After that, the method also includes:

When the current iteration round reaches a preset number of times, or when the difference between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is less than the preset threshold, the iteration is stopped;

The image classification model to be optimized after the secondary optimization in the stop iteration round is determined as the target image classification model.

9. A device for constructing an image classification model, comprising:

The sample augmentation module is used to augment the original training samples to obtain the augmented training samples of the current round;

The matrix acquisition module is used to acquire the covariance matrix of the augmented training samples of the current round;

The upper bound function acquisition module is used to determine, according to the covariance matrix of the augmented training samples of the current round, the image classification model to be optimized after the last round of secondary optimization is on the augmented training samples of the current round The upper bound function of the first loss function of ;

The first optimization module is used to optimize with the upper bound function as the optimization target, and obtain the image classification model to be optimized for the first optimization of the current round;

a matrix optimization module, used to optimize the covariance matrix in the upper bound function using the validation set;

A secondary optimization module, configured to perform secondary optimization on the image classification model to be optimized for the first optimization in the current round according to the optimized covariance matrix.

10. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the program as claimed in claim 1 when the processor executes the program All or part of the steps of the method for constructing an image classification model according to any one of to 8.

11. A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the construction of the image classification model according to any one of claims 1 to 8 is realized all or part of the steps of the method.