CN113052261A

CN113052261A - Image classification loss function design method based on cosine space optimization

Info

Publication number: CN113052261A
Application number: CN202110434753.3A
Authority: CN
Inventors: 李晨; 许虞俊; 孙翔; 曹悦欣; 杜文娟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-06-29
Anticipated expiration: 2041-04-22
Also published as: CN113052261B

Abstract

The invention discloses a method for designing an image classification loss function based on cosine space optimization, which provides a loss function capable of actively regulating and controlling the intra-class distance and the inter-class distance of image classification simultaneously based on common Additive Margin Softmax optimization. According to the method, an AM-Softmax loss function is adopted in the first half stage of model training, the inter-class distance is pulled, the intra-class centers which can be dynamically adjusted along with a training batch are added in the second half stage of the training, the feature vectors of objects in the same class are further compacted, and the cosine distances between the feature vectors of objects in different classes are pulled, so that the model can be converged more quickly, similar classes can be fully distinguished, and the performance of the model can be further improved.

Description

Image classification loss function design method based on cosine space optimization

Technical Field

The invention relates to the field of computer vision and artificial intelligence, in particular to a method for designing an image classification loss function based on cosine space optimization.

Background

The target classification algorithm is a basic and important research field in computer vision, and the current field of target classification algorithms can solve most of simple classification problems along with the development of deep learning technology, namely, a series of CNN models such as ResNet, GoogleNet and EfficientNet emerge after AlexNet, the performance of ImageNet data sets is continuously refreshed, and the Top-1 precision of the current optimal image classification algorithm on the ImageNet data sets reaches 84.4% along with the adoption of more complex network structures and the introduction of deep residual error connection, however, the algorithm models usually have huge parameter quantity and calculation complexity. For the edge mobile scene needing to deploy the algorithm, due to the problems of limited memory and limited computing power, a large-scale network is difficult to use, and therefore the demand of a light intelligent network is gradually increased. In order to improve the precision of the lightweight image classification algorithm, the precision of the network can be effectively improved by optimizing the loss function under the condition of not increasing the complexity of the model and the data volume, and the method is an effective scheme for solving the problem of low precision of the lightweight image classification network.

The loss function commonly used in the field of image classification is a sigmoid cross entropy loss function and a softmax cross entropy loss function, the supervision capability of the loss function is limited, the classification result of an object with larger difference can only be pulled open in a Euclidean space, and the classification capability is weaker for the class with larger similarity, so that the loss function in the field of face recognition is introduced, and the image classification is further optimized to achieve the better effect of the image classification, and the loss function has higher practical value.

Disclosure of Invention

In view of the above, the present invention provides a method for designing an image classification loss function based on cosine space optimization, where the loss function designed by the method can improve the precision of a classification algorithm without increasing the number of network parameters and the amount of training data, and is suitable for a light intelligent network.

In order to achieve the purpose, the invention adopts the following technical scheme:

a design method of an image classification loss function based on cosine space optimization comprises the following steps:

step S1, acquiring a data set, setting hyper-parameters and initializing a deep learning model;

step S2, carrying out multi-batch iterative training on the deep learning model, and sequentially executing steps S21-S23 in each iterative batch;

step S21, calculating the intra-class center of each class of object in the current iteration batch according to the feature vector obtained by the deep learning model in the forward propagation process, and cumulatively updating the intra-class center;

step S22, calculating a cross entropy loss function value and an inter-class loss function value of the current iteration batch;

step S23, judging whether the current iteration batch reaches the preset batch number N;

if not, calculating a first total loss function of the current iteration batch, performing gradient back propagation on the first total loss function, updating model parameters, and returning to the step S2 for a new round of training;

if yes, calculating an intra-class loss function value of the current iteration batch, calculating a second total loss function value by combining the first total loss function and the intra-class loss function value, performing gradient back propagation on the second total loss function value, updating model parameters, and entering step S3;

step S3, judging whether the deep learning model converges,

if not, returning to the step S2 to repeat the iterative training until the model converges;

if the convergence is reached, the model is output.

Further, in the step S1, the hyper-parameter includes: a weighting coefficient α, a weighting coefficient β, a weighting coefficient γ, and a compaction coefficient ∈ and satisfies: gamma > alpha > beta;

the hyper-parameter further comprises the number of batches N in step 23, the N e (0, Epoch)_end) Epoch in the formula_endRepresenting the last training batch.

Further, the expression of the first total loss function is:

Loss＝αLoss₁+γLoss_{cross-entropy} (1)

in the formula (1), α and γ are expressed as weighting coefficients, Loss₁Expressed as the inter-class loss function; loss_{cross-entropy}Expressed as the cross entropy loss function.

Further, the expression of the inter-class loss function is:

in formula (2), n is the number of samples in a batch, s is the scaling factor, and m is the distance of decision boundaries of different classes in cosine space, cos

Expressed as i samples in their corresponding category y_iC is expressed as the total number of classes, cos θ_jRepresenting the projection of the i sample on the other class j.

Further, the expression of the second total loss function is:

Loss＝αLoss₁+β·Truc(Loss₂-ε)+γLoss_{cross-entropy} (3)

in the formula (3), α, β and γ are all expressed as weighting coefficients, Loss₁Expressed as the Loss function between classes, Loss_{cross-entropy}Expressed as the cross entropy Loss function, Loss₂Expressed as the intra-class loss function, epsilon is a compact coefficient of a determined class in a cosine space, Truc (x) is expressed as a piecewise function, and the expression is as follows:

further, the expression of the intra-class loss function is as follows:

in the formula (4), C_bExpressed as the number of categories in a batch, n_iExpressed as the number of samples of the ith category in a batch,

represented as the projection of the jth sample in the ith class on its corresponding class i, c_iDenoted as the intra-class center.

The invention has the beneficial effects that:

the invention adopts the AM-Softmax loss function in the first half stage of model training, pulls open the inter-class distance, adds the intra-class center which can be dynamically adjusted along with the training batch in the second half stage of the training, further compacts the characteristic vector of the object in the same class, and simultaneously pulls open the cosine distance between the characteristic vectors of the objects in different classes, thereby leading the model to be more quickly converged, fully distinguishing similar classes and further improving the performance of the model.

Drawings

Fig. 1 is a schematic flow chart of a method for designing an image classification loss function based on cosine space optimization according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The implementation provides an image classification loss function based on cosine space optimization, wherein the loss function consists of three parts, and a specific expression is shown as the following formula:

Loss＝αLoss₁+β·Truc(Loss₂-ε)+γLoss_{cross-entropy}

in the formula, alpha is the weighting coefficient of the loss function between classes, beta is the weighting coefficient of the loss function in the classes, gamma is the weighting coefficient of the cross entropy loss function, and the proportion of gamma is more than alpha and more than beta is adopted; loss₁Expressed as an inter-class Loss function, Loss_{cross-entropy}Expressed as cross-entropy Loss function, Loss₂Expressed as an intra-class loss function;

the hyper-parameter beta is a piecewise function, when the batch of the algorithm model training is smaller than the hyper-parameter N, the value of beta is 0, and when the training batch exceeds N, the value of beta is larger than 0.

And the value of the super-parameter N is determined according to the convergence condition of the model, and after the model has converged, beta is started, namely the intra-class loss function is used for further optimizing the network, so that the intra-class loss function does not influence the convergence of the network when the intra-class center is unstable in the early stage of model training.

Specifically, in this embodiment, the expression of the truc (x) function is:

the function is mainly used for limiting the influence of the intra-class loss function on the overall loss function, the compaction of the intra-class loss function is controlled through the hyper-parameter epsilon, and the problem of model overfitting caused by the fact that the intra-class loss function is too compact is solved, so that the generalization capability of the model can be effectively improved, overfitting of the model to a training set picture is reduced, and the robustness of an algorithm model is improved.

Specifically, in this embodiment, the Loss is described above_{cross-entropy}Is a common cross entropy loss function and is a main function for calculating loss.

Specifically, in this embodiment, the Loss is described above₁Is an inter-class loss function, namely an AM-Softmax function, and the expression of the inter-class loss function is as follows:

where n is the number of samples in a batch, the hyper-parameter is the scaling factor, and the hyper-parameter m is the distance of decision boundaries of different classes in cosine space, cos

For i samples in their corresponding category y_iC is the total number of categories. The loss function is used as auxiliary supervision, the decision boundaries of different types of samples are guided to be separated as much as possible in a cosine space, and the classification precision is effectively improved.

Specifically, in this embodiment, the Loss function Loss in class mentioned above₂The method is mainly used for actively constraining the distribution of the features of the objects in the same category in the cosine space, and compared with the original AM-Softmax loss function, the method can help the features of the objects in the same category to be effectively distributed in the decision boundary of the category more effectively, so that the network model is further converged.

More specifically, in the present embodiment, the Loss-in-class function Loss is described above₂The expression of (a) is:

in the formula, C_bIs the number of categories, n, within a batch_iThe number of samples of the ith category in a batch,

for the projection of the jth sample in the ith class on its corresponding class i, c_iIs the intra-class center of the ith category.

It should be noted that the intra-class loss function adopted in this implementation requires that the network continuously update the intra-class center c in the cosine space with the training batch in the training process_iI.e. cumulatively iterates over all training samples of the previous epoch. It is used as its computing Loss only when the next epoch is greater than N, i.e. β > 0₂Is located in the center of the class.

Example 2

The embodiment provides a method for designing an image classification loss function based on cosine space optimization, which comprises the following steps:

specifically, in this embodiment, the above-mentioned hyper-parameters include: the weighting coefficient alpha, the weighting coefficient beta, the weighting coefficient gamma, the compact coefficient epsilon and the batch number N of the iterative training satisfy the following conditions: gamma > alpha > beta.

S2, inputting the acquired data set into the initialized deep learning model, performing multi-batch iterative training on the deep learning model, and sequentially executing steps S21-S23 in each iterative batch;

step S21, calculating the in-class center c of each class of object in the current iteration batch according to the feature vector obtained by the deep learning model in the forward propagation process_iAnd cumulatively updating the class center c_i；

specifically, the above-mentioned inter-class loss function, i.e. AM-Softmax function, has the expression:

For i samples in their corresponding category y_iC is the total number of categories. The loss function is used as auxiliary supervision, the decision boundaries of different types of samples are guided to be separated as much as possible in a cosine space, the classification precision is effectively improved, and cos theta is_jRepresenting the projection of the i sample on the other class j.

Step (ii) ofS23, judging whether the current iteration batch reaches the preset number N, wherein N belongs to (0, Epoch)_end) Epoch in the formula_endRepresenting the last training batch, the number N being the number N of iterative training batches determined in step S1;

specifically, in this embodiment, the expression of the first total loss function is:

Loss＝αLoss₁+γLoss_{cross-entropy}

in the formula, α and γ are expressed as weighting coefficients, Loss₁Expressed as the above-mentioned inter-class loss function; loss_{cross-entropy}Expressed as the cross entropy loss function described above.

The expression of the loss-in-class function is:

in the formula, C_bExpressed as the number of categories in a batch, n_iExpressed as the number of samples of the ith category in a batch,

The expression of the second total loss function is:

Loss＝αLoss₁+β·Truc(Loss₂-ε)+γLoss_{cross-entropy}

in the formula, alpha, beta and gamma are all expressed as weighting coefficients, Loss₁Expressed as the Loss function between classes, Loss_{cross-entropy}Expressed as the cross entropy Loss function, Loss₂The method is expressed as an intra-class loss function, epsilon is a compact coefficient of a determined class in a cosine space, Truc (x) is expressed as a piecewise function, and the expression is as follows:

step S3, judging whether the deep learning model converges,

if the convergence is reached, the model is output.

It should be noted that the Loss-in-class function Loss used in this embodiment₂The network is required to continuously update the class center c in the cosine space with the training batch in the training process_iI.e. cumulatively iterates over all training samples of the previous epoch. It is used as its computing Loss only when the next epoch is greater than N, i.e. β > 0₂Is located in the center of the class.

In summary, the method for designing the image classification loss function based on the cosine space optimization comprises two stages, wherein a hyper-parameter N epsilon (0, Epoch) is used_end) Is cut off in the form of an Epoch_endRepresenting the last training batch. Wherein the first stage uses the cross entropy Loss function Loss_{cross-entropy}Loss function Loss between classes₁Calculating Loss, entering a second stage when the training batch exceeds N, and increasing Loss function Loss in class₂。

The AM-Softmax loss function is adopted in the first stage, the inter-class distance is pulled, the intra-class center which can be dynamically adjusted along with the training batch is added in the second stage, the feature vectors of the objects in the same class are further compacted, and meanwhile, the cosine distances between the feature vectors of the objects in different classes are pulled, so that the model can be converged more quickly, the similar classes can be fully distinguished, and the performance of the model can be further improved.

The invention is not described in detail, but is well known to those skilled in the art.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A design method of an image classification loss function based on cosine space optimization is characterized by comprising the following steps:

step S3, judging whether the deep learning model converges,

if the convergence is reached, the model is output.

2. The method for designing an image classification loss function based on cosine space optimization as claimed in claim 1, wherein in the step S1, the hyper-parameter comprises: a weighting coefficient α, a weighting coefficient β, a weighting coefficient γ, and a compaction coefficient ∈ and satisfies: gamma > alpha > beta;

3. The method for designing an image classification loss function based on cosine space optimization according to claim 2, wherein the expression of the first total loss function is as follows:

Loss＝αLoss₁+γLoss_{cross-entropy} (1)

4. The method according to claim 3, wherein the inter-class loss function is expressed as:

in equation (2), n is expressed as the number of samples in a batch, s is expressed as a scaling factor, m is expressed as the distance of decision boundaries of different classes in cosine space,

5. The method as claimed in claim 4, wherein the second total loss function is expressed as:

Loss＝αLoss₁+β·Truc(Loss₂-ε)+γLoss_{cross-entropy} (3)

6. the method according to claim 5, wherein the intra-class loss function is expressed as:

expressed as the jth sample in the ith class in its correspondenceProjection on class i, c_iDenoted as the intra-class center.