CN113918743A - Model training method for image classification under long-tail distribution scene - Google Patents
Model training method for image classification under long-tail distribution scene Download PDFInfo
- Publication number
- CN113918743A CN113918743A CN202111526779.7A CN202111526779A CN113918743A CN 113918743 A CN113918743 A CN 113918743A CN 202111526779 A CN202111526779 A CN 202111526779A CN 113918743 A CN113918743 A CN 113918743A
- Authority
- CN
- China
- Prior art keywords
- loss function
- loss
- picture
- class
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/54—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Abstract
The invention provides a model training method for image classification in a long-tail distribution scene, which comprises the following steps: constructing a first loss functionFor adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes; constructing a second loss functionSo that each isMore uniform dispersion of prototypes of classes, least angle-maximized regularization term loss in relation to the number of classes; constructing a third loss functionRegularization loss of small module size of the feature vector for helping the effective training of the model; applying a first loss functionSecond loss functionA third loss functionThe combination yields the final Loss function Loss. The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a model training method for image classification in a long-tail distribution scene.
Background
Deep learning has been a great success in the field of picture classification, but the experimental environment of the prior art selection is too ideal, and the number of each class in the training data is the same. However, in an actual scene, the number distribution of each category in the data tends to tend to a long-tail distribution, the data amount of the middle category of the long-tail distribution is larger and is called as a head category, and the category with the smaller data amount corresponding to the long-tail distribution is called as a tail category. While the same number of test sets per category are used when testing the model because the model is required to learn good classification results for each category. In this scenario, the effectiveness of the conventional classification method is greatly compromised. Therefore, how to solve the classification problem under the long tail distribution is a crucial step for putting the deep learning technology into an actual scene.
The current common picture classification method using softmax and cross entropy loss can perform poorly in long-tailed scenes. Although a classification method for a long-term distribution scene mainly includes some rebalancing methods and a two-stage method, a common rebalancing method is easily over-fitted to a training data set to cause poor generalization, and the two-stage method faces a problem of inconsistent decision boundaries during training and testing.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a model training method for image classification in a long-tailed distribution scene, and solves the problem of poor classification effect of the tail classes of image data in the long-tailed distribution scene.
The invention provides a model training method for picture classification in a long-tail distribution scene, which is characterized in that a loss function is expressed to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:
a first loss function construction step: constructing a first loss functionFor adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes;
a second loss function construction step: constructing a second loss functionMake the prototypes of each category more dispersedAdding a uniform, least angle-maximized regularization term loss in relation to the number of classes;
a third loss function construction step: constructing a third loss functionRegularization loss of small module size of the feature vector for helping the effective training of the model;
and a final loss function construction step: applying a first loss functionSecond loss functionA third loss functionThe combination yields the final Loss function Loss,
As a further improvement of the invention, the boundary adopted by each category passes throughAnd calculating, wherein m represents a hyper-parameter, m determines the size of the boundary, θ y represents the size of the boundary angle of the y-th class, k is 4, and ny represents the number of training samples of the y-th class.
As a further improvement of the present invention, in the first loss function constructing step, the formula for calculating the classification loss is:
p (y | x) represents the probability of classifying the feature vector x of the picture into the y-th class, s is a hyperparameter, x is the extracted feature vector of the picture, c is the c-th class, y is a class, theta y represents the size of the boundary angle of the y-th class, theta wy, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the y-th class, and theta wc, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the c-th class;
n represents the total number of training samples, yi represents the label of the ith training sample, xi represents the feature vector of the ith training picture, and p (yi | xi) is the probability of dividing the feature vector of the ith training picture into yi classes, which is calculated by the formula.
As a further improvement of the present invention, in the second loss function constructing step, a formula for calculating the regular term loss with the maximized minimum angle in relation to the number of categories is as follows:
weight represents the weight of each class, wi represents the vector of the prototype of the ith normalized class, wj represents the vector of the prototype of the jth normalized class, C represents the total number of classes, ni and nj represent the number of training samples of the ith and jth classes, and k takes 4.
As a further improvement of the present invention, in the third loss function constructing step, a formula for calculating the regularization loss of the small vector mode size is as follows:
N represents the number of training samples, | | | | represents the length of the vector, g (x) is a function, and x represents the extracted feature vector of the picture.
The invention also provides a picture classification method, which comprises the following steps:
an input step: inputting the picture into a Loss function Loss of the model training method;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
The invention also provides a picture classification system, comprising:
an input module: the method is used for inputting the picture into a Loss function Loss of the model training method;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
The invention also provides a computer-readable storage medium, in which a computer program is stored, the computer program being configured to, when invoked by a processor, perform the steps of the picture classification method according to the invention.
The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data under the condition of ensuring the same decision boundary in the training and testing process and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention discloses a model training method for image classification in a long-tail distribution scene.
Existing studies indicate that the magnitude of the modular length of each class can be greatly different when training on an unbalanced training set, and this results in a model that is more prone to classify a new sample into a head class when classifying. In order to solve the problem, the invention adopts a training and testing method of prototype length normalization, and simultaneously adopts a method of maximizing the minimum angle related to more uniform types of the dispersed prototypes of all types in order to solve the problem of small angle between the prototypes of different types in training, and finally adds a regularization loss aiming at the characteristic vector mode size in order to help the model to train better.
Existing studies indicate that the magnitude of the modular length of each class can be greatly different when training on an unbalanced training set, and this results in a model that is more prone to classify a new sample into a head class when classifying. In order to solve the problem, the invention adopts a training and testing method of prototype length normalization, and simultaneously adopts a method of maximizing the minimum angle related to more uniform types of the dispersed prototypes of all types in order to solve the problem of small angle between the prototypes of different types in training, and finally adds a regularization loss aiming at the characteristic vector mode size in order to help the model to train better.
First, the first part of the loss function is described, which is a softmax-cross entry classification loss with prototype normalization to eliminate a priori bias of the model due to data imbalance and margin (boundary) related to the number of classes in the angular domain. While margin is to improve the generalization of the model, the added margin size is related to the number of classes, because for the tail classes with few classes, the probability of the test data being distributed outside the training data is higher, and therefore, the addition of larger margin during training is more beneficial to the generalization on the test set. Wherein the margin taken by each category passesAnd (4) calculating. And the margin selection is added on the angular domain, which is also referred to the research of other fields in the prior art.
The second part of the loss function is a regularization term constraint that maximizes the minimum angle associated with the number of classes, which makes the prototypes of the individual classes more uniformly dispersed. According to our experiments, it is found that training using the loss function normalized by the prototype may face the problem of too small included angle between different types of prototypes, and the existing research shows that the more uniform distribution of the prototypes in the space is more beneficial to the generalization of the model, so the regularization constraint term of the minimum angle maximization is also added in the invention.
And the third part of the penalty function is a regularization penalty for small module sizes of the feature vectors. It is easy to observe from the classification loss of the first part that for a misclassified sample, the loss can be reduced by reducing the mode length of the misclassified sample in addition to reducing the included angle between the misclassified sample and the misclassified sample, but the method cannot classify the misclassified sample in pairs, so that in order to avoid the problem, a function for restricting the characteristic mode length of the sample is added, wherein the function is used for further solving the problem。
Finally, the three parts are combined to form the loss function which is proposed in the summary of the invention, whereinAndare two hyper-parameters.
Compared with the prior art, the biggest difference and improvement of the method is that the prototype normalization is integrated into the training process and some auxiliary regular terms are proposed to solve the problems possibly faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization.
The superparameter of the invention has three m,Andthe first superparameter m determines the size of the added margin, while the two latter superparameters determine how large the two regularization terms are in proportion to the overall loss function.
In summary, the loss function for image classification in the long-tailed distribution scene proposed by the invention integrates prototype normalization into the training process and provides some auxiliary regular terms to solve the problems that may be faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization. A comparison with other existing results is shown in table 1:
table 1 comparison of different methods on two long tail datasets
As can be found from Table 1, the model training method of the invention effectively improves the training effect of the model under the scene of long-tail distribution of training data.
As shown in fig. 1, the present invention discloses a method for classifying pictures, which comprises the following steps:
an input step: inputting the picture into a Loss function Loss of a model training method;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
The invention also discloses a picture classification system, which comprises:
an input module: the method is used for inputting the picture into a Loss function Loss of the model training method;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
The invention also discloses a computer readable storage medium storing a computer program configured to implement the steps of the picture classification method of the invention when called by a processor.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (8)
1. A model training method for picture classification in a long-tail distribution scene is characterized in that the model training method is expressed as a loss function to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:
a first loss function construction step: constructing a first loss functionFor adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes;
a second loss function construction step: constructing a second loss functionThe prototype of each category is dispersed more uniformly, and the least angle related to the number of the categories is maximized;
a third loss function construction step: constructing a third loss functionRegularization loss of small module size of the feature vector for helping the effective training of the model;
and a final loss function construction step: applying a first loss functionSecond loss functionA third loss functionThe combination yields the final Loss function Loss,
2. The model training method of claim 1, wherein the boundary used by each class is passedAnd calculating, wherein m represents a hyper-parameter, m determines the size of the boundary, θ y represents the size of the boundary angle of the y-th class, k is 4, and ny represents the number of training samples of the y-th class.
3. The model training method according to claim 1, wherein in the first loss function constructing step, the formula for calculating the classification loss is:
p (y | x) represents the probability of classifying the feature vector x of the picture into the y-th class, s is a hyperparameter, x is the extracted feature vector of the picture, c is the c-th class, y is a class, theta y represents the size of the boundary angle of the y-th class, theta wy, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the y-th class, and theta wc, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the c-th class;
n represents the total number of training samples, yi represents the label of the ith training sample, xi represents the feature vector of the ith training picture, and p (yi | xi) is the probability of dividing the feature vector of the ith training picture into yi classes, which is calculated by the formula.
4. The model training method according to claim 1, wherein in the second loss function constructing step, the formula for calculating the regularization term loss with the minimum angle maximization in relation to the number of classes is:
weight represents the weight of each class, wi represents the vector of the prototype of the ith normalized class, wj represents the vector of the prototype of the jth normalized class, C represents the total number of classes, ni and nj represent the number of training samples of the ith and jth classes, and k takes 4.
5. The model training method according to claim 1, wherein in the third loss function construction step, the formula for calculating the regularization loss of the vector mode size is:
N represents the number of training samples, | | | | represents the length of the vector, g (x) is a function, and x represents the extracted feature vector of the picture.
6. A picture classification method is characterized by comprising the following steps:
an input step: inputting the picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
7. A picture classification system, comprising:
an input module: inputting a picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
8. A computer-readable storage medium characterized by: the computer-readable storage medium stores a computer program configured to, when invoked by a processor, implement the steps of the picture classification method of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526779.7A CN113918743B (en) | 2021-12-15 | 2021-12-15 | Model training method for image classification under long-tail distribution scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526779.7A CN113918743B (en) | 2021-12-15 | 2021-12-15 | Model training method for image classification under long-tail distribution scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113918743A true CN113918743A (en) | 2022-01-11 |
CN113918743B CN113918743B (en) | 2022-04-15 |
Family
ID=79249203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111526779.7A Active CN113918743B (en) | 2021-12-15 | 2021-12-15 | Model training method for image classification under long-tail distribution scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113918743B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120049A (en) * | 2022-01-27 | 2022-03-01 | 南京理工大学 | Long tail distribution visual identification method based on prototype classifier learning |
CN114821207A (en) * | 2022-06-30 | 2022-07-29 | 浙江凤凰云睿科技有限公司 | Image classification method and device, storage medium and terminal |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063707A (en) * | 2014-07-14 | 2014-09-24 | 金陵科技学院 | Color image clustering segmentation method based on multi-scale perception characteristic of human vision |
CN111738303A (en) * | 2020-05-28 | 2020-10-02 | 华南理工大学 | Long-tail distribution image identification method based on hierarchical learning |
CN112446305A (en) * | 2020-11-10 | 2021-03-05 | 云南联合视觉科技有限公司 | Pedestrian re-identification method based on classification weight equidistant distribution loss model |
CN112632320A (en) * | 2020-12-22 | 2021-04-09 | 天津大学 | Method for improving speech classification tail recognition accuracy based on long tail distribution |
CN112766143A (en) * | 2021-01-15 | 2021-05-07 | 湖南大学 | Multi-emotion-based face aging processing method and system |
CN113657561A (en) * | 2021-10-20 | 2021-11-16 | 之江实验室 | Semi-supervised night image classification method based on multi-task decoupling learning |
-
2021
- 2021-12-15 CN CN202111526779.7A patent/CN113918743B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063707A (en) * | 2014-07-14 | 2014-09-24 | 金陵科技学院 | Color image clustering segmentation method based on multi-scale perception characteristic of human vision |
CN111738303A (en) * | 2020-05-28 | 2020-10-02 | 华南理工大学 | Long-tail distribution image identification method based on hierarchical learning |
CN112446305A (en) * | 2020-11-10 | 2021-03-05 | 云南联合视觉科技有限公司 | Pedestrian re-identification method based on classification weight equidistant distribution loss model |
CN112632320A (en) * | 2020-12-22 | 2021-04-09 | 天津大学 | Method for improving speech classification tail recognition accuracy based on long tail distribution |
CN112766143A (en) * | 2021-01-15 | 2021-05-07 | 湖南大学 | Multi-emotion-based face aging processing method and system |
CN113657561A (en) * | 2021-10-20 | 2021-11-16 | 之江实验室 | Semi-supervised night image classification method based on multi-task decoupling learning |
Non-Patent Citations (2)
Title |
---|
CHAOZHENG WANG 等: "Label-Aware Distribution Calibration for Long-tailed Classification", 《ARXIV》 * |
陈世鸿等: "商标数据库存储模式及其检索算法研究", 《武汉大学学报(理学版)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120049A (en) * | 2022-01-27 | 2022-03-01 | 南京理工大学 | Long tail distribution visual identification method based on prototype classifier learning |
CN114120049B (en) * | 2022-01-27 | 2023-08-29 | 南京理工大学 | Long-tail distribution visual identification method based on prototype classifier learning |
CN114821207A (en) * | 2022-06-30 | 2022-07-29 | 浙江凤凰云睿科技有限公司 | Image classification method and device, storage medium and terminal |
CN114821207B (en) * | 2022-06-30 | 2022-11-04 | 浙江凤凰云睿科技有限公司 | Image classification method and device, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN113918743B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dhar et al. | Learning without memorizing | |
CN108564129B (en) | Trajectory data classification method based on generation countermeasure network | |
Yu et al. | Multi-modal factorized bilinear pooling with co-attention learning for visual question answering | |
CN113918743B (en) | Model training method for image classification under long-tail distribution scene | |
US20230085401A1 (en) | Method of training an image classification model | |
CN109063719B (en) | Image classification method combining structure similarity and class information | |
US11640527B2 (en) | Near-zero-cost differentially private deep learning with teacher ensembles | |
CN109615014A (en) | A kind of data sorting system and method based on the optimization of KL divergence | |
Paccolat et al. | Geometric compression of invariant manifolds in neural networks | |
US20190065899A1 (en) | Distance Metric Learning Using Proxies | |
CN105894050A (en) | Multi-task learning based method for recognizing race and gender through human face image | |
CN103177265B (en) | High-definition image classification method based on kernel function Yu sparse coding | |
Ackerman et al. | Automatically detecting data drift in machine learning classifiers | |
Gu et al. | Class-incremental instance segmentation via multi-teacher networks | |
Casalino et al. | Incremental adaptive semi-supervised fuzzy clustering for data stream classification | |
CN107480636A (en) | Face identification method, system and storage medium based on core Non-negative Matrix Factorization | |
WO2023088174A1 (en) | Target detection method and apparatus | |
US11645544B2 (en) | System and method for continual learning using experience replay | |
Li et al. | Hilbert sinkhorn divergence for optimal transport | |
Hui et al. | Inter-class angular loss for convolutional neural networks | |
Hong et al. | Student-teacher learning from clean inputs to noisy inputs | |
CN113239866B (en) | Face recognition method and system based on space-time feature fusion and sample attention enhancement | |
Zhang et al. | Learning from label proportions by learning with label noise | |
Zhang et al. | Transfer learning from unlabeled data via neural networks | |
Ouyang et al. | Missdiff: Training diffusion models on tabular data with missing values |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |