CN114022706A

CN114022706A - Method, device and equipment for optimizing image classification model and storage medium

Info

Publication number: CN114022706A
Application number: CN202111273287.1A
Authority: CN
Inventors: 程新景; 李爽
Original assignee: International Network Technology Shanghai Co Ltd
Current assignee: International Network Technology Shanghai Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-02-08

Abstract

The invention relates to an optimization method, device and equipment of an image classification model and a storage medium. The method comprises the following steps: extracting feature data of an original training sample by using an image classification model to be optimized; generating a semantic augmentation sample of the original training sample according to the feature data; according to preset category weight, carrying out category weighting on the original training sample and the semantic augmentation sample, and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation; determining an upper bound function of the loss function; and determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration. The method avoids the overfitting condition of the image classification model to be optimized on the minority class, improves the model prediction performance of the model on the minority class, and greatly reduces the time complexity of the model training and optimizing process.

Description

Method, device and equipment for optimizing image classification model and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for optimizing an image classification model.

Background

In the field of image classification, the quality of a training sample directly relates to the accuracy of a trained image classification model, however, when the image classification model is trained, the obtained training sample is often unbalanced in class distribution, i.e., long-tail distribution is presented. The training samples distributed with long tails induce the image classification model to tend towards the classification accuracy (prediction performance) of the head class, while neglecting the classification accuracy (prediction performance) of the tail class.

In the prior art, methods for solving the problem that the accuracy of an image classification model obtained by training is low in a few categories due to long-tail distributed training samples mainly include: resampling techniques and data augmentation techniques.

The resampling technique samples to obtain a more even training distribution by down-sampling most categories and up-sampling few categories. However, resampling may sacrifice some of the performance of the majority classes and there is some risk of overfitting on the minority classes. This is because the amount of training sample data of a small number of classes is small, and the effect thereof is emphasized too much, and overfitting is likely to occur.

Data augmentation technology: aiming at the problem of few tail type samples, the training samples can be increased by using data augmentation methods such as rotation, translation, overturning and the like, and the generalization performance of the neural network is enhanced. The traditional data augmentation technology is directly operated in a pixel space, and the pixel space has high complexity, so that the data augmentation result is limited, a good augmentation effect cannot be obtained, and the time consumption of model training is long.

In view of the above-mentioned drawbacks of the prior art, a technical solution for improving generalization performance in tail category with low time complexity is needed.

Disclosure of Invention

The invention provides an optimization method, device, equipment and storage medium of an image classification model, which are used for solving the defects that data augmentation results are limited and the time consumption of model training is long in the prior art, and the technical scheme that the generalization performance of the image classification model on tail categories is improved and the time complexity is low is provided.

The invention provides an optimization method of an image classification model, which comprises the following steps:

extracting feature data of an original training sample by using an image classification model to be optimized;

generating a semantic augmentation sample of the original training sample according to the feature data;

according to preset category weight, carrying out category weighting on the original training sample and the semantic augmentation sample, and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation;

determining an upper bound function of the loss function;

and determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

According to the optimization method of the image classification model provided by the invention, before extracting the feature data of the original training sample by using the image classification model to be optimized, the method further comprises the following steps:

training an original image classification model by using an original training sample to obtain the image classification model to be optimized; the image classification model to be optimized comprises a feature extraction network and a classifier network.

According to the optimization method of the image classification model provided by the invention, the generating of the semantic augmentation sample of the original training sample according to the feature data comprises the following steps:

acquiring a covariance matrix of the original training sample according to the characteristic data;

sampling the covariance matrix according to the feature distribution in the covariance matrix to generate a conversion vector;

and generating a semantic augmentation sample of the original training sample according to the conversion vector and the feature data.

According to the optimization method of the image classification model provided by the invention, the preset class weight is obtained according to the following formula:

wherein e is_cIs a preset class weight, n_cIs the number of samples of the c-th class, β is a hyper-parameter, and the value range of β is (0, 1).

According to the optimization method of the image classification model provided by the invention, the determining the upper bound function of the loss function comprises the following steps:

and determining an upper bound function of the loss function by assuming a mode of carrying out infinite times of semantic augmentation and category weighted augmentation.

According to the optimization method of the image classification model provided by the invention, the upper bound function is obtained through the following formula:

wherein i is the sequence number of the training samples, N is the total number of the training samples, and is epsilon_iIs the class weight, x, corresponding to the ith training sample_iIs the sample image in the ith training sample, theta is the parameter of the image classification model to be optimized, f (x)_i(ii) a Theta) is the training sample x of the image classification model to be optimized_iThe predicted output of the sample image in (1), y_iThe reference classification label corresponding to the sample image in the ith training sample, and Σ is the covariance matrix of the training sample.

According to the optimization method of the image classification model provided by the invention, the target model parameters of the image classification model to be optimized are determined by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration, and the method comprises the following steps:

stopping the iteration to perform the semantic augmentation and the category weighting augmentation under the condition that the times of the semantic augmentation and the category weighting augmentation of the iteration reach preset times or the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than a preset threshold value;

and determining the model parameters in the upper bound function of the iteration stopping turn as the target model parameters of the image classification model to be optimized.

The invention also provides an optimization device of the image classification model, which comprises the following components:

the characteristic obtaining module is used for extracting characteristic data of an original training sample by utilizing an image classification model to be optimized;

the semantic augmentation module is used for generating a semantic augmentation sample of the original training sample according to the feature data;

the loss function generation module is used for carrying out category weighting on the original training sample and the semantic augmentation sample according to preset category weight and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation;

an upper bound function generation module for determining an upper bound function of the loss function;

and the target parameter acquisition module is used for determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements all or part of the steps of the method for optimizing the image classification model according to any one of the above methods when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs all or part of the steps of a method of optimizing an image classification model as described in any one of the above.

According to the optimization method, the optimization device, the optimization equipment and the optimization storage medium of the image classification model, the semantic augmentation sample is generated according to the feature data of the original training sample, and the condition that the image classification model to be optimized is over-fitted in a few categories is avoided; according to the preset class weight, class weighting augmentation is carried out on the original training samples and the semantic augmentation samples, the occupation ratio of a few classes of training samples is increased, and the model prediction performance of the image classification model to be optimized on the few classes is improved; the boundary function is used as an optimization target function, semantic augmentation and category weighting augmentation are carried out through iteration, target model parameters of the image classification model to be optimized are determined, and time complexity of model training and optimization processes is greatly reduced.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of an optimization method of an image classification model according to the present invention;

FIG. 2 is a schematic structural diagram of an image classification model to be optimized according to the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for optimizing an image classification model according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes an optimization method, apparatus, device and storage medium of an image classification model according to the present invention with reference to fig. 1 to 4.

Fig. 1 is a schematic flow chart of an optimization method of an image classification model provided by the present invention, as shown in fig. 1, the method includes:

s11, extracting feature data of an original training sample by using the image classification model to be optimized;

specifically, the image classification model to be optimized is an image classification model which is subjected to preliminary training, and the invention aims to further optimize the image classification model after the preliminary training and improve the generalization performance of the image classification model on the tail category. The type of the image classification model may be VGG network, google net network, Residual network, ResNet network, etc., and the specific type is not limited.

It will be appreciated that the original training sample includes the sample image and the corresponding reference classification label for the sample image. Because the dimension of the data in the pixel space of the sample image is too high, abstract information is difficult to directly extract from the data, and data augmentation is performed, so that the feature extraction is performed on the original training sample by using the image classification model to be optimized, the feature data of the sample image in the original training sample is obtained, and the data augmentation is performed on the basis of the feature data conveniently.

S12, generating a semantic augmentation sample of the original training sample according to the feature data;

specifically, the feature data extracted from the original training sample by the image classification model to be optimized is a depth feature with a high abstraction degree, and on the basis, the feature data can be sampled and converted, and the semantic augmentation sample of the original training sample is obtained by combining the original training sample to perform semantic augmentation. The samples generated by semantic augmentation can effectively avoid the condition of overfitting on a few categories.

S13, according to preset class weights, carrying out class weighting on the original training samples and the semantic augmentation samples, and determining loss functions of the image classification model to be optimized on the training samples after the semantic augmentation and the class weighting augmentation;

specifically, the preset class weight is a weight preset for each class in the training samples, and the size of the class weight is inversely related to the number of samples in the class, that is, a relatively low gain is given to the class with the larger number of training samples, and a relatively high gain is given to the class with the smaller number of training samples. And according to preset class weight, carrying out class weighting on the original training sample and the semantic augmentation sample, and further carrying out class weighting augmentation on the basis of semantic augmentation so as to increase the occupation ratio of a few classes of training samples. It can be understood that, for a plurality of training samples of the same class (i.e. the reference classification labels of the training samples are the same), the preset class weights are the same.

And determining a loss function of the image classification model to be optimized on the training sample after semantic augmentation and category weighting augmentation according to preset category weights. The loss function represents the difference between the prediction output (namely the image classification result) of the model and the actual value, the larger the loss function is, the worse the prediction performance of the model is, and the smaller the loss function is, the better the prediction performance of the model is.

S14, determining an upper bound function of the loss function;

specifically, the image classification model has different loss functions based on different samples. And the training samples are frequently replaced/added, so that the process of training and optimizing the model is time-consuming. By deducing and determining an upper bound function of a loss function of the image classification model to be optimized, wherein the loss function is always smaller than the upper bound function, the prediction performance of the image classification model can be indirectly known through the upper bound function.

And S15, determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

Specifically, the upper bound function represents the maximum value of the loss function of the image classification model to be optimized on the corresponding training sample. And taking the upper bound function as an optimization target function, and performing semantic augmentation and category weighting augmentation through iteration to realize iterative updating of the upper bound function. And when the iterative updating reaches a preset condition, stopping the iterative updating, and determining the model parameters corresponding to the upper bound function when the iterative updating is stopped as the target model parameters of the image classification model to be optimized.

It should be noted that the above processes of semantic augmentation and category weighting augmentation can be directly implemented by updating in the upper bound function in a data updating manner, without explicitly generating actual semantic augmentation samples and category weighting augmentation samples, or executing a process of training the model to be optimized according to the semantic augmentation samples and the category weighting augmentation samples. The method directly uses the upper bound function as the optimization target function, can determine the target model parameters of the image classification model to be optimized by iteratively updating the upper bound function, equivalently realizes the augmentation process through the target function, reduces the calculation overhead, and greatly reduces the time complexity of the model training and optimization process. The basis of each iteration is an original training sample, the iteration semantic augmentation is to obtain different data augmentation directions, and the category weighting augmentation direction is to balance a few categories in the training samples.

In the embodiment, the semantic augmentation sample is generated according to the feature data of the original training sample, so that the condition that the image classification model to be optimized is over-fitted in a few categories is avoided; according to the preset class weight, class weighting augmentation is carried out on the original training samples and the semantic augmentation samples, the occupation ratio of a few classes of training samples is increased, and the model prediction performance of the image classification model to be optimized on the few classes is improved; the boundary function is used as an optimization target function, semantic augmentation and category weighting augmentation are carried out through iteration, target model parameters of the image classification model to be optimized are determined, and time complexity of model training and optimization processes is greatly reduced.

Based on any one of the above embodiments, in an embodiment, before the extracting the feature data of the original training sample by using the to-be-optimized image classification model, the method further includes:

Specifically, the image classification model to be optimized is an image classification model that has been subjected to preliminary training, and before extracting feature data of an original training sample by using the image classification model to be optimized, the original training sample needs to be used to train the original image classification model to obtain the image classification model to be optimized. Fig. 2 is a schematic structural diagram of an image classification model to be optimized according to the present invention, and as shown in fig. 2, an original image classification model includes a feature extraction network and a classifier network.

The feature extraction network may be configured to perform feature extraction on training data (i.e., the original training samples) to obtain feature data. Because the dimension of the sample image data in the input training data is too high, abstract information is difficult to directly extract, and feature extraction needs to be performed through multi-level dimension reduction. The feature extraction network may include a plurality of convolutional layers, pooling layers, activation function layers, and the like.

The classifier network may be used to process the feature data and output the prediction results. The classifier network may employ a linear classification layer to classify the data and output the prediction results. Further, the classifier network preferably uses a fully-connected layer and a normalized index function layer (soft-max), wherein the fully-connected layer performs feature weighting on feature data, the normalized index function layer displays a classification result in a probability form, and a class corresponding to a value with the largest proportion in the probability is a last predicted label.

In this embodiment, the original training sample is used for training the original image classification model to obtain the to-be-optimized image classification model, so that feature extraction is performed on the original training sample by using the to-be-optimized image classification model subsequently to obtain feature data, and the model is optimized according to the feature data.

Based on any one of the above embodiments, in one embodiment, the structure of the image classification model to be optimized is a ResNet structure.

Specifically, in the prior art, a deep convolutional neural network is generally used to extract feature information, and it is desirable to extract depth features with high abstraction degree from the depth features, and then use the depth features as a basis for classification. The application preferably uses a ResNet structure on this basis. Compared with the traditional machine learning, the key characteristics of deep learning are deeper network layer number, nonlinear conversion, automatic feature extraction and feature conversion. Among these, nonlinear transformation is a key goal, which maps data into a high-dimensional space in order to better complete the classification task. As networks become deeper and deeper, more and more activation functions are introduced, and data is mapped to a more discrete hidden layer space. The nonlinear transformation greatly improves the data classification capability, but as the depth of the network is continuously increased, the data flow is subjected to excessive nonlinear transformation, and the linear transformation cannot be realized. To enable the neural network to perform linear transformations, ResNet adds a shortcut connection branch to the module that can balance between linear and non-linear transformations.

In the embodiment, the structure of the image classification model to be optimized is the ResNet structure, so that balance between linear conversion and nonlinear conversion is achieved, and the model prediction performance of the image classification model to be optimized is improved.

Based on any one of the above embodiments, in an embodiment, the generating a semantic augmented sample of the original training sample according to the feature data includes:

Specifically, the feature data of the original training sample is counted to obtain a covariance matrix of the original training sample, the covariance matrix of the original training sample is composed of covariance matrices of multiple categories, the covariance matrix of each category can be obtained by counting according to the depth features of the category of the training sample, and the covariance in the covariance matrix represents the correlation between corresponding features. Hypothesis original trainingThe sample set is:

wherein x_iIs the image in the ith training sample, and y_iIs its corresponding label, and the entire original training data set has N samples. To-be-optimized image classification model pair training sample x_iExtracting the characteristics, and marking the obtained characteristic data as a_iThe covariance matrix of the original training samples can be expressed as ∑ ═ Σ₁,Σ₂,…,Σ_M}，Σ₁,Σ₂,…,Σ_MIs the covariance matrix of each class, and M is the total number of classes.

And counting the characteristic distribution in the covariance matrix to obtain the distribution condition, and knowing the corresponding data amplification direction according to the distribution condition. The data augmentation directions correspond to actual samples, such as different colors of cars in a "car" type picture, different viewing directions of cars, and so forth. And sampling the covariance matrix according to the distribution condition, and generating a conversion vector, wherein the conversion vector can be used for enhancing the feature data and generating new feature data (namely semantic augmentation is correspondingly performed on the training sample). The characteristic distribution preferably uses a Gaussian distribution model, which can be recorded as

Wherein, y_iIs the classification label corresponding to the ith image, one of a plurality of categories,

is the covariance matrix of the classification, and λ is the hyper-parameter used to adjust the data amplification strength. Sampling the covariance matrix according to the Gaussian distribution, and generating a conversion vector for enhancing the characteristic data a corresponding to the ith image_i。

In the embodiment, a covariance matrix representing the relevance between the feature data in the original training sample is obtained according to the feature data; the covariance matrix is sampled according to the characteristic distribution in the covariance matrix, and the conversion vector is generated to semantically augment the original training sample, so that the condition that the image recognition model to be optimized is over-fitted on a few categories is avoided.

Based on any one of the above embodiments, in an embodiment, the preset category weight is obtained according to the following formula:

wherein e is_cIs a preset class weight of the c-th class, n_cThe number of training samples in the c-th category, beta is a hyper-parameter, and the value range of beta is (0, 1).

Specifically, observing the above formula, it can be found that for the class with the larger number of training samples, the corresponding preset class weight e thereof_cThe smaller, i.e. the corresponding gain is smaller; for the class with less training samples, the corresponding preset class weight is epsilon_cThe larger the gain, that is, the larger the corresponding gain, the larger the occupation ratio of a few class training samples in the training samples with the increased class weights. The value range of the above-mentioned hyper-parameter β is (0, 1), and preferably 0.99 or 0.999.

In the embodiment, the occupation ratio of the minority class training samples is increased by setting the preset class weight, and the prediction performance of the image classification model to be optimized on the minority class is improved.

Based on any of the above embodiments, in an embodiment, the determining an upper bound function of the loss function includes:

Specifically, the loss function of the image classification model to be optimized can be obtained by performing semantic augmentation and category weighting augmentation each time. In order to explore all possible data augmentation directions, the covariance matrix of the original training sample needs to be sampled for multiple times according to the feature distribution in the covariance matrix, semantic augmentation samples and class weighting augmentation are correspondingly performed for multiple times, and a loss function of an image classification model to be optimized on the corresponding augmentation samples can be updated. If the sampling is carried out for infinite times, the semantic augmentation samples and the class weighting augmentation are correspondingly carried out for infinite times, so that an upper bound function of a loss function of the image classification model to be optimized can be determined.

In the embodiment, the upper bound function of the loss function is determined by assuming an infinite semantic augmentation and category weighting augmentation mode, so that the prediction performance of the image classification model can be conveniently and indirectly known.

Based on any one of the above embodiments, in an embodiment, the upper bound function is obtained by the following formula:

wherein i is the sequence number of the training samples, N is the total number of the training samples, and is epsilon_iIs the class weight corresponding to the ith training sample, xi is the sample image in the ith training sample, theta is the parameter of the image classification model to be optimized, f (x)_i(ii) a Theta) is the training sample x of the image classification model to be optimized_iThe predicted output of the sample image in (1), y_iThe reference classification label corresponding to the sample image in the ith training sample, and Σ is the covariance matrix of the training sample.

Correspondingly, the upper bound function is used as an optimization objective function, semantic augmentation and category weighting augmentation are performed through iteration, and objective model parameters of the image classification model to be optimized are determined, and can be expressed as:

namely, the minimized upper bound function is taken as an optimization target, and the original training sample is iterated through the upper bound function to carry out semantic augmentation and category weighting augmentation, so that the optimized model parameters are determined.

In this embodiment, the upper bound function of the loss function is determined, so that the prediction performance of the image classification model can be conveniently and indirectly known, and the image classification model to be optimized can be optimized.

Based on any one of the above embodiments, in an embodiment, the loss function is a cross-entropy loss function, and the upper bound function is obtained by the following formula:

wherein x is_iIs the image in the ith training sample, and y_iIs a corresponding label, theta is a model parameter of the image classification model to be optimized, sigma is a covariance matrix of characteristic data of the training sample, and belongs to_iIs the class weight corresponding to the ith training sample,

is a sample x_iThe c-th element in the prediction output result (i.e., the probability that the corresponding prediction belongs to the c-th class),

is a sample x_iIs predicted to output the yi-th element in the result (i.e., the corresponding prediction belongs to the y-th element)_iThe probability of an individual class),

is the covariance matrix of the corresponding classification of the ith training sample, w_cIs the c-th column in the classifier weight matrix of the last layer of the image classification model to be optimized,

is the c column and the y column in the last layer of classifier weight matrix of the image classification model to be optimized_iThe difference value of the columns is,

is that

The transposing of (1).

Specifically, the Loss function may be calculated by using a Cross Entropy Loss function (Cross Entropy Loss), an Exponential Loss function (Exponential Loss), a robust Loss function (Modified Huber Loss), or the like. The cross entropy loss function can not only well measure the effect of the multi-classification model, but also can easily conduct derivation calculation, and preferably, the cross entropy loss function is adopted to derive the above upper bound function of the cross entropy loss function of the image classification model to be optimized on the augmentation sample.

In the embodiment, the loss function of the image classification model to be optimized on the augmented training sample is accurately and conveniently determined by adopting the cross entropy loss function.

Based on any one of the above embodiments, in an embodiment, the determining target model parameters of the image classification model to be optimized by iteratively performing semantic augmentation and category weighted augmentation with the upper bound function as an optimization target function includes:

Specifically, the iteration may be stopped when the number of times of performing semantic augmentation and category weighting augmentation in the iteration reaches a preset number, and the preset number may be set according to the performance requirement of the model image classification model to be optimized in combination with historical data, for example, 1000 times; the iteration can also be stopped under the condition that the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than a preset threshold value, the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than the preset threshold value, which indicates that the space for continuous optimization is smaller, and the time consumption for further optimization is longer, and the iteration can be stopped at the moment. And stopping iteration turns, wherein the model parameters in the upper bound function are target model parameters of the image classification model to be optimized.

In this embodiment, when the number of times of semantic augmentation and category weighting augmentation performed in iteration reaches a preset number, or when a difference between an upper bound function value of a current iteration and an upper bound function value of a previous iteration is smaller than a preset threshold, the semantic augmentation and the category weighting augmentation performed in iteration is stopped, conditions for stopping iteration are flexibly and accurately set, and a user's differentiated model optimization requirements are met.

Data augmentation for the long tail problem may also be based on the above embodiments, such as combining rotation and translation. By fusing different data augmentations, more training samples can be generated. In addition, training samples may also be generated by training a generator. By training a generator with existing training data, a data generator can be obtained, the resulting samples of which can be used to generate training data. However, this alternative requires additional training of a network, which increases the complexity of the scheme.

The following describes the optimization apparatus of the image classification model provided by the present invention, and the optimization apparatus of the image classification model described below and the optimization method of the image classification model described above can be referred to correspondingly.

Fig. 3 is a schematic structural diagram of an apparatus for optimizing an image classification model according to the present invention, as shown in fig. 3, the apparatus includes: the system comprises a feature acquisition module 31, a semantic augmentation module 32, a loss function generation module 33, an upper bound function generation module 34 and a target parameter acquisition module 35.

The feature obtaining module 31 is configured to extract feature data of an original training sample by using an image classification model to be optimized; a semantic augmentation module 32, configured to generate a semantic augmentation sample of the original training sample according to the feature data; the loss function generation module 33 is configured to perform class weighting on the original training sample and the semantic augmentation sample according to a preset class weight, and determine a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the class weighting augmentation; an upper bound function generation module 34 for determining an upper bound function of the loss function; and the target parameter obtaining module 35 is configured to determine target model parameters of the image classification model to be optimized by performing semantic augmentation and category weighting augmentation through iteration by using the upper bound function as an optimization target function.

Based on any one of the above embodiments, in an embodiment, the apparatus further includes:

the model training module is used for training an original image classification model by using an original training sample to obtain the image classification model to be optimized; the image classification model to be optimized comprises a feature extraction network and a classifier network.

Based on any of the above embodiments, in an embodiment, the semantic expansion module 32 includes:

the first amplification unit is used for acquiring a covariance matrix of the original training sample according to the characteristic data;

the second amplification unit is used for sampling the covariance matrix according to the feature distribution in the covariance matrix to generate a conversion vector;

and the third augmentation unit is used for generating a semantic augmentation sample of the original training sample according to the conversion vector and the feature data.

Based on any of the above embodiments, in an embodiment, the upper bound function generating module 34 is further configured to determine the upper bound function of the loss function by assuming an infinite number of semantic augmentation and category weighted augmentation.

Based on any one of the above embodiments, in an embodiment, the target parameter obtaining module 35 includes:

the iteration detection unit is used for stopping iteration to carry out semantic augmentation and category weighting augmentation under the condition that the number of times of carrying out semantic augmentation and category weighting augmentation by iteration reaches a preset number of times or the difference value between the upper bound function value of the current iteration and the upper bound function value of the previous iteration is smaller than a preset threshold value;

and the target parameter determining unit is used for determining the model parameters in the upper bound function for stopping iteration turns as the target model parameters of the image classification model to be optimized.

In this embodiment, when the number of times of semantic augmentation and category weighting augmentation performed in iteration reaches a preset number, or when a difference between an upper bound function value of a current iteration and an upper bound function value of a previous iteration is smaller than a preset threshold, the condition for stopping iteration is flexibly and accurately set to satisfy a user's differentiated model optimization requirement.

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform all or a portion of the steps of the above-described methods for optimizing an image classification model, the methods comprising: extracting feature data of an original training sample by using an image classification model to be optimized; generating a semantic augmentation sample of the original training sample according to the feature data; according to preset category weight, carrying out category weighting on the original training sample and the semantic augmentation sample, and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation; determining an upper bound function of the loss function; and determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform all or part of the steps of the method for optimizing an image classification model provided above, the method comprising: extracting feature data of an original training sample by using an image classification model to be optimized; generating a semantic augmentation sample of the original training sample according to the feature data; according to preset category weight, carrying out category weighting on the original training sample and the semantic augmentation sample, and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation; determining an upper bound function of the loss function; and determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform all or part of the steps of the above-provided image classification model optimization method, the method comprising: extracting feature data of an original training sample by using an image classification model to be optimized; generating a semantic augmentation sample of the original training sample according to the feature data; according to preset category weight, carrying out category weighting on the original training sample and the semantic augmentation sample, and determining a loss function of the image classification model to be optimized on the training sample after the semantic augmentation and the category weighting augmentation; determining an upper bound function of the loss function; and determining target model parameters of the image classification model to be optimized by taking the upper bound function as an optimization target function and performing semantic augmentation and category weighting augmentation through iteration.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for optimizing an image classification model, comprising:

determining an upper bound function of the loss function;

2. The method for optimizing an image classification model according to claim 1, wherein before extracting feature data of an original training sample by using the image classification model to be optimized, the method further comprises:

3. The method for optimizing an image classification model according to claim 1, wherein the generating semantic augmentation samples of the original training samples according to the feature data comprises:

4. The method for optimizing an image classification model according to claim 1, wherein the preset class weight is obtained according to the following formula:

wherein e is_cIs a pre-set weight of the category,n_cis the number of samples of the c-th class, β is a hyper-parameter, and the value range of β is (0, 1).

5. The method of optimizing an image classification model according to claim 1, wherein the determining an upper bound function of the loss function comprises:

6. The method for optimizing an image classification model according to claims 1 to 5, wherein the upper bound function is obtained by the following formula:

7. The method for optimizing an image classification model according to claims 1 to 5, wherein the determining the target model parameters of the image classification model to be optimized by iteratively performing semantic augmentation and class weighting augmentation with the upper bound function as an optimization target function comprises:

8. An apparatus for optimizing an image classification model, comprising;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements all or part of the steps of the method for optimizing an image classification model according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements all or part of the steps of the method for optimizing an image classification model according to any one of claims 1 to 7.