CN113918743A - Model training method for image classification under long-tail distribution scene - Google Patents

Model training method for image classification under long-tail distribution scene Download PDF

Info

Publication number
CN113918743A
CN113918743A CN202111526779.7A CN202111526779A CN113918743A CN 113918743 A CN113918743 A CN 113918743A CN 202111526779 A CN202111526779 A CN 202111526779A CN 113918743 A CN113918743 A CN 113918743A
Authority
CN
China
Prior art keywords
loss function
loss
picture
class
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111526779.7A
Other languages
Chinese (zh)
Other versions
CN113918743B (en
Inventor
高翠芸
高树政
王轩
陈清财
刘川意
廖清
罗文坚
王朝正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202111526779.7A priority Critical patent/CN113918743B/en
Publication of CN113918743A publication Critical patent/CN113918743A/en
Application granted granted Critical
Publication of CN113918743B publication Critical patent/CN113918743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/54Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention provides a model training method for image classification in a long-tail distribution scene, which comprises the following steps: constructing a first loss function
Figure 412962DEST_PATH_IMAGE001
For adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes; constructing a second loss function
Figure 106112DEST_PATH_IMAGE002
So that each isMore uniform dispersion of prototypes of classes, least angle-maximized regularization term loss in relation to the number of classes; constructing a third loss function
Figure 636450DEST_PATH_IMAGE003
Regularization loss of small module size of the feature vector for helping the effective training of the model; applying a first loss function
Figure 389642DEST_PATH_IMAGE004
Second loss function
Figure 271011DEST_PATH_IMAGE005
A third loss function
Figure 767851DEST_PATH_IMAGE003
The combination yields the final Loss function Loss. The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.

Description

Model training method for image classification under long-tail distribution scene
Technical Field
The invention relates to the technical field of image processing, in particular to a model training method for image classification in a long-tail distribution scene.
Background
Deep learning has been a great success in the field of picture classification, but the experimental environment of the prior art selection is too ideal, and the number of each class in the training data is the same. However, in an actual scene, the number distribution of each category in the data tends to tend to a long-tail distribution, the data amount of the middle category of the long-tail distribution is larger and is called as a head category, and the category with the smaller data amount corresponding to the long-tail distribution is called as a tail category. While the same number of test sets per category are used when testing the model because the model is required to learn good classification results for each category. In this scenario, the effectiveness of the conventional classification method is greatly compromised. Therefore, how to solve the classification problem under the long tail distribution is a crucial step for putting the deep learning technology into an actual scene.
The current common picture classification method using softmax and cross entropy loss can perform poorly in long-tailed scenes. Although a classification method for a long-term distribution scene mainly includes some rebalancing methods and a two-stage method, a common rebalancing method is easily over-fitted to a training data set to cause poor generalization, and the two-stage method faces a problem of inconsistent decision boundaries during training and testing.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a model training method for image classification in a long-tailed distribution scene, and solves the problem of poor classification effect of the tail classes of image data in the long-tailed distribution scene.
The invention provides a model training method for picture classification in a long-tail distribution scene, which is characterized in that a loss function is expressed to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:
a first loss function construction step: constructing a first loss function
Figure 759505DEST_PATH_IMAGE001
For adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes;
a second loss function construction step: constructing a second loss function
Figure 404113DEST_PATH_IMAGE002
Make the prototypes of each category more dispersedAdding a uniform, least angle-maximized regularization term loss in relation to the number of classes;
a third loss function construction step: constructing a third loss function
Figure 860502DEST_PATH_IMAGE003
Regularization loss of small module size of the feature vector for helping the effective training of the model;
and a final loss function construction step: applying a first loss function
Figure 350389DEST_PATH_IMAGE004
Second loss function
Figure 693777DEST_PATH_IMAGE005
A third loss function
Figure 509287DEST_PATH_IMAGE003
The combination yields the final Loss function Loss,
Figure 452972DEST_PATH_IMAGE006
wherein
Figure 746550DEST_PATH_IMAGE007
And
Figure 708558DEST_PATH_IMAGE008
is a hyper-parameter.
As a further improvement of the invention, the boundary adopted by each category passes through
Figure 429390DEST_PATH_IMAGE009
And calculating, wherein m represents a hyper-parameter, m determines the size of the boundary, θ y represents the size of the boundary angle of the y-th class, k is 4, and ny represents the number of training samples of the y-th class.
As a further improvement of the present invention, in the first loss function constructing step, the formula for calculating the classification loss is:
Figure 860371DEST_PATH_IMAGE010
p (y | x) represents the probability of classifying the feature vector x of the picture into the y-th class, s is a hyperparameter, x is the extracted feature vector of the picture, c is the c-th class, y is a class, theta y represents the size of the boundary angle of the y-th class, theta wy, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the y-th class, and theta wc, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the c-th class;
Figure 957640DEST_PATH_IMAGE011
n represents the total number of training samples, yi represents the label of the ith training sample, xi represents the feature vector of the ith training picture, and p (yi | xi) is the probability of dividing the feature vector of the ith training picture into yi classes, which is calculated by the formula.
As a further improvement of the present invention, in the second loss function constructing step, a formula for calculating the regular term loss with the maximized minimum angle in relation to the number of categories is as follows:
Figure 524888DEST_PATH_IMAGE012
weight represents the weight of each class, wi represents the vector of the prototype of the ith normalized class, wj represents the vector of the prototype of the jth normalized class, C represents the total number of classes, ni and nj represent the number of training samples of the ith and jth classes, and k takes 4.
As a further improvement of the present invention, in the third loss function constructing step, a formula for calculating the regularization loss of the small vector mode size is as follows:
Figure 167353DEST_PATH_IMAGE013
wherein
Figure 820051DEST_PATH_IMAGE014
N represents the number of training samples, | | | | represents the length of the vector, g (x) is a function, and x represents the extracted feature vector of the picture.
The invention also provides a picture classification method, which comprises the following steps:
an input step: inputting the picture into a Loss function Loss of the model training method;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
The invention also provides a picture classification system, comprising:
an input module: the method is used for inputting the picture into a Loss function Loss of the model training method;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
The invention also provides a computer-readable storage medium, in which a computer program is stored, the computer program being configured to, when invoked by a processor, perform the steps of the picture classification method according to the invention.
The invention has the beneficial effects that: the method can avoid the problem of model prior deviation caused by unbalanced training data under the condition of ensuring the same decision boundary in the training and testing process and further improve the generalization of the model on the test set, thereby improving the image classification accuracy under the long-tail distribution scene.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention discloses a model training method for image classification in a long-tail distribution scene.
Existing studies indicate that the magnitude of the modular length of each class can be greatly different when training on an unbalanced training set, and this results in a model that is more prone to classify a new sample into a head class when classifying. In order to solve the problem, the invention adopts a training and testing method of prototype length normalization, and simultaneously adopts a method of maximizing the minimum angle related to more uniform types of the dispersed prototypes of all types in order to solve the problem of small angle between the prototypes of different types in training, and finally adds a regularization loss aiming at the characteristic vector mode size in order to help the model to train better.
Existing studies indicate that the magnitude of the modular length of each class can be greatly different when training on an unbalanced training set, and this results in a model that is more prone to classify a new sample into a head class when classifying. In order to solve the problem, the invention adopts a training and testing method of prototype length normalization, and simultaneously adopts a method of maximizing the minimum angle related to more uniform types of the dispersed prototypes of all types in order to solve the problem of small angle between the prototypes of different types in training, and finally adds a regularization loss aiming at the characteristic vector mode size in order to help the model to train better.
First, the first part of the loss function is described, which is a softmax-cross entry classification loss with prototype normalization to eliminate a priori bias of the model due to data imbalance and margin (boundary) related to the number of classes in the angular domain. While margin is to improve the generalization of the model, the added margin size is related to the number of classes, because for the tail classes with few classes, the probability of the test data being distributed outside the training data is higher, and therefore, the addition of larger margin during training is more beneficial to the generalization on the test set. Wherein the margin taken by each category passes
Figure 721011DEST_PATH_IMAGE015
And (4) calculating. And the margin selection is added on the angular domain, which is also referred to the research of other fields in the prior art.
Figure 142765DEST_PATH_IMAGE010
Figure 205399DEST_PATH_IMAGE011
The second part of the loss function is a regularization term constraint that maximizes the minimum angle associated with the number of classes, which makes the prototypes of the individual classes more uniformly dispersed. According to our experiments, it is found that training using the loss function normalized by the prototype may face the problem of too small included angle between different types of prototypes, and the existing research shows that the more uniform distribution of the prototypes in the space is more beneficial to the generalization of the model, so the regularization constraint term of the minimum angle maximization is also added in the invention.
Figure 594661DEST_PATH_IMAGE016
And the third part of the penalty function is a regularization penalty for small module sizes of the feature vectors. It is easy to observe from the classification loss of the first part that for a misclassified sample, the loss can be reduced by reducing the mode length of the misclassified sample in addition to reducing the included angle between the misclassified sample and the misclassified sample, but the method cannot classify the misclassified sample in pairs, so that in order to avoid the problem, a function for restricting the characteristic mode length of the sample is added, wherein the function is used for further solving the problem
Figure 299312DEST_PATH_IMAGE014
Figure 841151DEST_PATH_IMAGE017
Finally, the three parts are combined to form the loss function which is proposed in the summary of the invention, wherein
Figure 74687DEST_PATH_IMAGE018
And
Figure 452709DEST_PATH_IMAGE019
are two hyper-parameters.
Figure 429893DEST_PATH_IMAGE020
Compared with the prior art, the biggest difference and improvement of the method is that the prototype normalization is integrated into the training process and some auxiliary regular terms are proposed to solve the problems possibly faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization.
The superparameter of the invention has three m,
Figure 826239DEST_PATH_IMAGE018
And
Figure 496255DEST_PATH_IMAGE019
the first superparameter m determines the size of the added margin, while the two latter superparameters determine how large the two regularization terms are in proportion to the overall loss function.
The weight of each category is calculated by the formula
Figure 345262DEST_PATH_IMAGE021
In summary, the loss function for image classification in the long-tailed distribution scene proposed by the invention integrates prototype normalization into the training process and provides some auxiliary regular terms to solve the problems that may be faced in the prototype normalization. These improvements may allow the decision boundary at model training to always be within the decision boundary at test and have better generalization. A comparison with other existing results is shown in table 1:
table 1 comparison of different methods on two long tail datasets
Figure 375404DEST_PATH_IMAGE022
As can be found from Table 1, the model training method of the invention effectively improves the training effect of the model under the scene of long-tail distribution of training data.
As shown in fig. 1, the present invention discloses a method for classifying pictures, which comprises the following steps:
an input step: inputting the picture into a Loss function Loss of a model training method;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
The invention also discloses a picture classification system, which comprises:
an input module: the method is used for inputting the picture into a Loss function Loss of the model training method;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
The invention also discloses a computer readable storage medium storing a computer program configured to implement the steps of the picture classification method of the invention when called by a processor.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (8)

1. A model training method for picture classification in a long-tail distribution scene is characterized in that the model training method is expressed as a loss function to improve the accuracy of picture classification under end-to-end model training, and the model training method comprises the following steps:
a first loss function construction step: constructing a first loss function
Figure 237423DEST_PATH_IMAGE001
For adding prototype normalization and cross-entropy classification loss with boundaries on the angular domain related to the number of classes;
a second loss function construction step: constructing a second loss function
Figure 685722DEST_PATH_IMAGE002
The prototype of each category is dispersed more uniformly, and the least angle related to the number of the categories is maximized;
a third loss function construction step: constructing a third loss function
Figure 747350DEST_PATH_IMAGE003
Regularization loss of small module size of the feature vector for helping the effective training of the model;
and a final loss function construction step: applying a first loss function
Figure 673718DEST_PATH_IMAGE001
Second loss function
Figure 19248DEST_PATH_IMAGE002
A third loss function
Figure 107290DEST_PATH_IMAGE003
The combination yields the final Loss function Loss,
Figure 639903DEST_PATH_IMAGE004
wherein
Figure 619229DEST_PATH_IMAGE005
And
Figure 819266DEST_PATH_IMAGE007
is a hyper-parameter.
2. The model training method of claim 1, wherein the boundary used by each class is passed
Figure 78209DEST_PATH_IMAGE008
And calculating, wherein m represents a hyper-parameter, m determines the size of the boundary, θ y represents the size of the boundary angle of the y-th class, k is 4, and ny represents the number of training samples of the y-th class.
3. The model training method according to claim 1, wherein in the first loss function constructing step, the formula for calculating the classification loss is:
Figure 98118DEST_PATH_IMAGE009
p (y | x) represents the probability of classifying the feature vector x of the picture into the y-th class, s is a hyperparameter, x is the extracted feature vector of the picture, c is the c-th class, y is a class, theta y represents the size of the boundary angle of the y-th class, theta wy, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the y-th class, and theta wc, x represents the size of the included angle between the extracted feature vector of the picture and the vector of the prototype of the c-th class;
Figure 366288DEST_PATH_IMAGE010
n represents the total number of training samples, yi represents the label of the ith training sample, xi represents the feature vector of the ith training picture, and p (yi | xi) is the probability of dividing the feature vector of the ith training picture into yi classes, which is calculated by the formula.
4. The model training method according to claim 1, wherein in the second loss function constructing step, the formula for calculating the regularization term loss with the minimum angle maximization in relation to the number of classes is:
Figure 905985DEST_PATH_IMAGE011
weight represents the weight of each class, wi represents the vector of the prototype of the ith normalized class, wj represents the vector of the prototype of the jth normalized class, C represents the total number of classes, ni and nj represent the number of training samples of the ith and jth classes, and k takes 4.
5. The model training method according to claim 1, wherein in the third loss function construction step, the formula for calculating the regularization loss of the vector mode size is:
Figure 866988DEST_PATH_IMAGE012
wherein
Figure 374192DEST_PATH_IMAGE013
N represents the number of training samples, | | | | represents the length of the vector, g (x) is a function, and x represents the extracted feature vector of the picture.
6. A picture classification method is characterized by comprising the following steps:
an input step: inputting the picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;
and (3) classification step: classifying the pictures through a Loss function Loss;
an output step: and displaying or storing the classified pictures in a classified manner.
7. A picture classification system, comprising:
an input module: inputting a picture into a Loss function Loss of the model training method according to any one of claims 1 to 5;
a classification module: the Loss function Loss is used for classifying the pictures;
an output module: and the system is used for displaying or storing the classified pictures in a classified manner.
8. A computer-readable storage medium characterized by: the computer-readable storage medium stores a computer program configured to, when invoked by a processor, implement the steps of the picture classification method of claim 6.
CN202111526779.7A 2021-12-15 2021-12-15 Model training method for image classification under long-tail distribution scene Active CN113918743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111526779.7A CN113918743B (en) 2021-12-15 2021-12-15 Model training method for image classification under long-tail distribution scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111526779.7A CN113918743B (en) 2021-12-15 2021-12-15 Model training method for image classification under long-tail distribution scene

Publications (2)

Publication Number Publication Date
CN113918743A true CN113918743A (en) 2022-01-11
CN113918743B CN113918743B (en) 2022-04-15

Family

ID=79249203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111526779.7A Active CN113918743B (en) 2021-12-15 2021-12-15 Model training method for image classification under long-tail distribution scene

Country Status (1)

Country Link
CN (1) CN113918743B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120049A (en) * 2022-01-27 2022-03-01 南京理工大学 Long tail distribution visual identification method based on prototype classifier learning
CN114821207A (en) * 2022-06-30 2022-07-29 浙江凤凰云睿科技有限公司 Image classification method and device, storage medium and terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063707A (en) * 2014-07-14 2014-09-24 金陵科技学院 Color image clustering segmentation method based on multi-scale perception characteristic of human vision
CN111738303A (en) * 2020-05-28 2020-10-02 华南理工大学 Long-tail distribution image identification method based on hierarchical learning
CN112446305A (en) * 2020-11-10 2021-03-05 云南联合视觉科技有限公司 Pedestrian re-identification method based on classification weight equidistant distribution loss model
CN112632320A (en) * 2020-12-22 2021-04-09 天津大学 Method for improving speech classification tail recognition accuracy based on long tail distribution
CN112766143A (en) * 2021-01-15 2021-05-07 湖南大学 Multi-emotion-based face aging processing method and system
CN113657561A (en) * 2021-10-20 2021-11-16 之江实验室 Semi-supervised night image classification method based on multi-task decoupling learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063707A (en) * 2014-07-14 2014-09-24 金陵科技学院 Color image clustering segmentation method based on multi-scale perception characteristic of human vision
CN111738303A (en) * 2020-05-28 2020-10-02 华南理工大学 Long-tail distribution image identification method based on hierarchical learning
CN112446305A (en) * 2020-11-10 2021-03-05 云南联合视觉科技有限公司 Pedestrian re-identification method based on classification weight equidistant distribution loss model
CN112632320A (en) * 2020-12-22 2021-04-09 天津大学 Method for improving speech classification tail recognition accuracy based on long tail distribution
CN112766143A (en) * 2021-01-15 2021-05-07 湖南大学 Multi-emotion-based face aging processing method and system
CN113657561A (en) * 2021-10-20 2021-11-16 之江实验室 Semi-supervised night image classification method based on multi-task decoupling learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAOZHENG WANG 等: "Label-Aware Distribution Calibration for Long-tailed Classification", 《ARXIV》 *
陈世鸿等: "商标数据库存储模式及其检索算法研究", 《武汉大学学报(理学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120049A (en) * 2022-01-27 2022-03-01 南京理工大学 Long tail distribution visual identification method based on prototype classifier learning
CN114120049B (en) * 2022-01-27 2023-08-29 南京理工大学 Long-tail distribution visual identification method based on prototype classifier learning
CN114821207A (en) * 2022-06-30 2022-07-29 浙江凤凰云睿科技有限公司 Image classification method and device, storage medium and terminal
CN114821207B (en) * 2022-06-30 2022-11-04 浙江凤凰云睿科技有限公司 Image classification method and device, storage medium and terminal

Also Published As

Publication number Publication date
CN113918743B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
Dhar et al. Learning without memorizing
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
Yu et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering
CN113918743B (en) Model training method for image classification under long-tail distribution scene
US20230085401A1 (en) Method of training an image classification model
CN109063719B (en) Image classification method combining structure similarity and class information
US11640527B2 (en) Near-zero-cost differentially private deep learning with teacher ensembles
CN109615014A (en) A kind of data sorting system and method based on the optimization of KL divergence
Paccolat et al. Geometric compression of invariant manifolds in neural networks
US20190065899A1 (en) Distance Metric Learning Using Proxies
CN105894050A (en) Multi-task learning based method for recognizing race and gender through human face image
CN103177265B (en) High-definition image classification method based on kernel function Yu sparse coding
Ackerman et al. Automatically detecting data drift in machine learning classifiers
Gu et al. Class-incremental instance segmentation via multi-teacher networks
Casalino et al. Incremental adaptive semi-supervised fuzzy clustering for data stream classification
CN107480636A (en) Face identification method, system and storage medium based on core Non-negative Matrix Factorization
WO2023088174A1 (en) Target detection method and apparatus
US11645544B2 (en) System and method for continual learning using experience replay
Li et al. Hilbert sinkhorn divergence for optimal transport
Hui et al. Inter-class angular loss for convolutional neural networks
Hong et al. Student-teacher learning from clean inputs to noisy inputs
CN113239866B (en) Face recognition method and system based on space-time feature fusion and sample attention enhancement
Zhang et al. Learning from label proportions by learning with label noise
Zhang et al. Transfer learning from unlabeled data via neural networks
Ouyang et al. Missdiff: Training diffusion models on tabular data with missing values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant