CN113918743A

CN113918743A - Model training method for image classification under long-tail distribution scene

Info

Publication number: CN113918743A
Application number: CN202111526779.7A
Authority: CN
Inventors: 高翠芸; 高树政; 王轩; 陈清财; 刘川意; 廖清; 罗文坚; 王朝正
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-01-11
Anticipated expiration: 2041-12-15
Also published as: CN113918743B

Abstract

The invention provides a model training method for image classification in a long-tail distribution scene, which comprises the following steps: constructing a first loss function

For adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes; constructing a second loss function

So that each isMore uniform dispersion of prototypes of classes, least angle-maximized regularization term loss in relation to the number of classes; constructing a third loss function

Regularization loss of small module size of the feature vector for helping the effective training of the model; applying a first loss function

Second loss function

A third loss function

The combination yields the final Loss function Loss. The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.

Description

Model training method for image classification under long-tail distribution scene

Technical Field

The invention relates to the technical field of image processing, in particular to a model training method for image classification in a long-tail distribution scene.

Background

Deep learning has been a great success in the field of picture classification, but the experimental environment of the prior art selection is too ideal, and the number of each class in the training data is the same. However, in an actual scene, the number distribution of each category in the data tends to tend to a long-tail distribution, the data amount of the middle category of the long-tail distribution is larger and is called as a head category, and the category with the smaller data amount corresponding to the long-tail distribution is called as a tail category. While the same number of test sets per category are used when testing the model because the model is required to learn good classification results for each category. In this scenario, the effectiveness of the conventional classification method is greatly compromised. Therefore, how to solve the classification problem under the long tail distribution is a crucial step for putting the deep learning technology into an actual scene.

The current common picture classification method using softmax and cross entropy loss can perform poorly in long-tailed scenes. Although a classification method for a long-term distribution scene mainly includes some rebalancing methods and a two-stage method, a common rebalancing method is easily over-fitted to a training data set to cause poor generalization, and the two-stage method faces a problem of inconsistent decision boundaries during training and testing.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a model training method for image classification in a long-tailed distribution scene, and solves the problem of poor classification effect of the tail classes of image data in the long-tailed distribution scene.

The invention provides a model training method for picture classification in a long-tail distribution scene, which is characterized in that a loss function is expressed to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:

a first loss function construction step: constructing a first loss function

For adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes;

a second loss function construction step: constructing a second loss function

Make the prototypes of each category more dispersedAdding a uniform, least angle-maximized regularization term loss in relation to the number of classes;

a third loss function construction step: constructing a third loss function

Regularization loss of small module size of the feature vector for helping the effective training of the model;

and a final loss function construction step: applying a first loss function

Second loss function

A third loss function

The combination yields the final Loss function Loss,

wherein

And

is a hyper-parameter.

As a further improvement of the invention, the boundary adopted by each category passes through

And calculating, wherein m represents a hyper-parameter, m determines the size of the boundary, θ y represents the size of the boundary angle of the y-th class, k is 4, and ny represents the number of training samples of the y-th class.

As a further improvement of the present invention, in the first loss function constructing step, the formula for calculating the classification loss is:

p (y | x) represents the probability of classifying the feature vector x of the picture into the y-th class, s is a hyperparameter, x is the extracted feature vector of the picture, c is the c-th class, y is a class, theta y represents the size of the boundary angle of the y-th class, theta wy, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the y-th class, and theta wc, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the c-th class;

n represents the total number of training samples, yi represents the label of the ith training sample, xi represents the feature vector of the ith training picture, and p (yi | xi) is the probability of dividing the feature vector of the ith training picture into yi classes, which is calculated by the formula.

As a further improvement of the present invention, in the second loss function constructing step, a formula for calculating the regular term loss with the maximized minimum angle in relation to the number of categories is as follows:

weight represents the weight of each class, wi represents the vector of the prototype of the ith normalized class, wj represents the vector of the prototype of the jth normalized class, C represents the total number of classes, ni and nj represent the number of training samples of the ith and jth classes, and k takes 4.

As a further improvement of the present invention, in the third loss function constructing step, a formula for calculating the regularization loss of the small vector mode size is as follows:

wherein

N represents the number of training samples, | | | | represents the length of the vector, g (x) is a function, and x represents the extracted feature vector of the picture.

The invention also provides a picture classification method, which comprises the following steps:

an input step: inputting the picture into a Loss function Loss of the model training method;

and (3) classification step: classifying the pictures through a Loss function Loss;

an output step: and displaying or storing the classified pictures in a classified manner.

The invention also provides a picture classification system, comprising:

an input module: the method is used for inputting the picture into a Loss function Loss of the model training method;

a classification module: the Loss function Loss is used for classifying the pictures;

an output module: and the system is used for displaying or storing the classified pictures in a classified manner.

The invention also provides a computer-readable storage medium, in which a computer program is stored, the computer program being configured to, when invoked by a processor, perform the steps of the picture classification method according to the invention.

The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data under the condition of ensuring the same decision boundary in the training and testing process and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The invention discloses a model training method for image classification in a long-tail distribution scene.

Existing studies indicate that the magnitude of the modular length of each class can be greatly different when training on an unbalanced training set, and this results in a model that is more prone to classify a new sample into a head class when classifying. In order to solve the problem, the invention adopts a training and testing method of prototype length normalization, and simultaneously adopts a method of maximizing the minimum angle related to more uniform types of the dispersed prototypes of all types in order to solve the problem of small angle between the prototypes of different types in training, and finally adds a regularization loss aiming at the characteristic vector mode size in order to help the model to train better.

First, the first part of the loss function is described, which is a softmax-cross entry classification loss with prototype normalization to eliminate a priori bias of the model due to data imbalance and margin (boundary) related to the number of classes in the angular domain. While margin is to improve the generalization of the model, the added margin size is related to the number of classes, because for the tail classes with few classes, the probability of the test data being distributed outside the training data is higher, and therefore, the addition of larger margin during training is more beneficial to the generalization on the test set. Wherein the margin taken by each category passes

And (4) calculating. And the margin selection is added on the angular domain, which is also referred to the research of other fields in the prior art.

The second part of the loss function is a regularization term constraint that maximizes the minimum angle associated with the number of classes, which makes the prototypes of the individual classes more uniformly dispersed. According to our experiments, it is found that training using the loss function normalized by the prototype may face the problem of too small included angle between different types of prototypes, and the existing research shows that the more uniform distribution of the prototypes in the space is more beneficial to the generalization of the model, so the regularization constraint term of the minimum angle maximization is also added in the invention.

And the third part of the penalty function is a regularization penalty for small module sizes of the feature vectors. It is easy to observe from the classification loss of the first part that for a misclassified sample, the loss can be reduced by reducing the mode length of the misclassified sample in addition to reducing the included angle between the misclassified sample and the misclassified sample, but the method cannot classify the misclassified sample in pairs, so that in order to avoid the problem, a function for restricting the characteristic mode length of the sample is added, wherein the function is used for further solving the problem

。

Finally, the three parts are combined to form the loss function which is proposed in the summary of the invention, wherein

And

are two hyper-parameters.

Compared with the prior art, the biggest difference and improvement of the method is that the prototype normalization is integrated into the training process and some auxiliary regular terms are proposed to solve the problems possibly faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization.

The superparameter of the invention has three m,

And

the first superparameter m determines the size of the added margin, while the two latter superparameters determine how large the two regularization terms are in proportion to the overall loss function.

The weight of each category is calculated by the formula

。

In summary, the loss function for image classification in the long-tailed distribution scene proposed by the invention integrates prototype normalization into the training process and provides some auxiliary regular terms to solve the problems that may be faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization. A comparison with other existing results is shown in table 1:

table 1 comparison of different methods on two long tail datasets

As can be found from Table 1, the model training method of the invention effectively improves the training effect of the model under the scene of long-tail distribution of training data.

As shown in fig. 1, the present invention discloses a method for classifying pictures, which comprises the following steps:

an input step: inputting the picture into a Loss function Loss of a model training method;

The invention also discloses a picture classification system, which comprises:

The invention also discloses a computer readable storage medium storing a computer program configured to implement the steps of the picture classification method of the invention when called by a processor.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A model training method for picture classification in a long-tail distribution scene is characterized in that the model training method is expressed as a loss function to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:

a first loss function construction step: constructing a first loss function

a second loss function construction step: constructing a second loss function

The prototype of each category is dispersed more uniformly, and the least angle related to the number of the categories is maximized;

a third loss function construction step: constructing a third loss function

and a final loss function construction step: applying a first loss function

Second loss function

A third loss function

The combination yields the final Loss function Loss,

wherein

And

is a hyper-parameter.

2. The model training method of claim 1, wherein the boundary used by each class is passed

3. The model training method according to claim 1, wherein in the first loss function constructing step, the formula for calculating the classification loss is:

4. The model training method according to claim 1, wherein in the second loss function constructing step, the formula for calculating the regularization term loss with the minimum angle maximization in relation to the number of classes is:

5. The model training method according to claim 1, wherein in the third loss function construction step, the formula for calculating the regularization loss of the vector mode size is:

wherein

6. A picture classification method is characterized by comprising the following steps:

an input step: inputting the picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;

7. A picture classification system, comprising:

an input module: inputting a picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;

8. A computer-readable storage medium characterized by: the computer-readable storage medium stores a computer program configured to, when invoked by a processor, implement the steps of the picture classification method of claim 6.