CN111738301A

CN111738301A - Long-tail distribution image data identification method based on two-channel learning

Info

Publication number: CN111738301A
Application number: CN202010465433.XA
Authority: CN
Inventors: 陈琼; 林恩禄; 朱戈仁
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2020-10-02
Anticipated expiration: 2040-05-28
Also published as: CN111738301B

Abstract

The invention discloses a long-tail distribution image data identification method based on dual-channel learning, which comprises the following steps of: 1) constructing a double-channel learning model combining unbalanced learning and small sample learning; 2) updating all parameters in the dual-channel learning model by utilizing the total loss of the dual-channel learning in a back propagation manner, and storing the optimal parameters of the dual-channel learning model; 3) and inputting the image data of the test set to the optimal double-channel learning model to obtain a prediction label of the image. The invention combines unbalanced learning and small sample learning to solve the problem of long tail distribution image data identification, the unbalanced learning channel can improve the identification accuracy of an unbalanced data set, the small sample learning channel can improve the characteristic representation of model learning, the model is emphasized on the unbalanced learning channel in the early stage of training and emphasized on the small sample learning channel in the later stage of training due to the double-channel total loss, and thus the identification accuracy of the long tail distribution image data is improved on the whole.

Description

Long-tail distribution image data identification method based on two-channel learning

Technical Field

The invention relates to the technical field of unbalanced classification, small sample learning and long-tail distribution image data identification in machine learning, in particular to a long-tail distribution image data identification method based on double-channel learning.

Background

Long-tail distribution image data identification generally adopts imbalance learning related technologies, and the technologies are mainly divided into a data level and an algorithm level. The data plane techniques mainly include a hybrid sampling method of down-sampling most samples, up-sampling few samples, or a combination of both. However, the resampled data cannot reflect the real data distribution characteristics, for example, the down-sampling method discards most samples, thereby losing much valuable information in the data set, and the up-sampling method causes the over-fitting problem and brings great computational power consumption. The algorithm level technology mainly readjusts the weight of each category through a cost sensitive method, and the method alleviates the problem of long tail distribution image data identification to a certain extent, but does not comprehensively consider the condition that a large number of tail categories only have few samples, so that the identification accuracy of the tail categories is still low. In addition, feasible solution ideas also include learning knowledge from head-class rich data to migrate to tail classes, designing a loss function suitable for long-tail distribution image data identification and constructing a more reasonable long-tail distribution image data identification model.

Data in real life are often presented in a long-tail distribution mode, however, currently, research on long-tail distribution image data identification is still in a preliminary stage, all long-tail distribution image data identification methods have limitations, and the identification accuracy of tail categories cannot be well improved.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and provides an effective, scientific and reasonable long-tail distribution image data identification method based on double-channel learning, wherein the method combines unbalanced learning and small sample learning to solve the problem of long-tail distribution image data identification, the unbalanced learning channel can improve the identification accuracy of a model to an unbalanced data set, and the small sample learning channel can improve the feature representation learned by the model and enhance the identification capability of the model to tail type image data; the constructed double-channel learning total loss function enables the model to be emphasized on an unbalanced learning channel in the early stage of training and be emphasized on a small sample learning channel in the later stage of training, and therefore the recognition accuracy of the model on long-tail image data is improved on the whole. The method provided by the invention is suitable for the problems of unbalanced multi-classification and long-tail distribution image data identification, and is a universal method with stronger robustness.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a long-tail distribution image data identification method based on dual-channel learning comprises the following steps:

1) constructing a two-channel learning model consisting of an unbalanced learning channel sampler, an unbalanced learning channel network, a small sample learning channel sampler, a small sample learning channel network and a two-channel learning total loss function; dividing a long-tail distribution image data set into a training set, a verification set and a test set; sampling image data and label data from a training set by using an unbalanced learning channel sampler, inputting the data into an unbalanced learning channel network, and calculating the loss of the unbalanced learning channel; sampling image data and label data from a training set by using a small sample learning channel sampler, inputting the data into a small sample learning channel network, and calculating the small sample learning channel loss; then carrying out weighted summation on the unbalanced learning channel loss and the small sample learning channel loss to obtain a two-channel learning total loss;

2) updating all parameters in the dual-channel learning model by utilizing the total loss of the dual-channel learning in a back propagation mode, namely training the dual-channel learning model, and storing the optimal parameters of the dual-channel learning model to obtain the optimal dual-channel learning model;

3) and inputting the image data of the test set to the optimal two-channel learning model to obtain a prediction label of the image, namely a prediction result.

In step 1), the unbalanced learning channel sampler is as follows:

the input data of the unbalanced learning channel is sampled from a uniform sampler, and each sample in the training set is sampled with equal probability and at most once in each training round T; define B as the number of samples sampled per batch, the input data for the samples is represented as { (x)₁ ^imb,y₁ ^imb),...,(x_i ^imb,y_i ^imb),...,(x_B ^imb,y_B ^imb) Where superscript imb is used to identify the unbalanced learning channel, (x)_i ^imb,y_i ^imb) Representing image data and label data of the ith sample, i is more than or equal to 1 and less than or equal to B;

the case of the unbalanced learning channel network is as follows:

the unbalanced learning channel network is based on an unbalanced classification algorithm, transplants a network model of the unbalanced classification algorithm, and comprises a feature extractor f_φSorter

And an imbalance loss function L_imbThree parts, the feature extractor f_φFor extracting input data (x)_i ^imb,y_i ^imb) Is characterized by

Then the features are expressed

Input to a classifier

Obtaining a predictive tag

Finally, the well-defined unbalanced loss function L is combined_imbCalculating the unbalanced learning channel loss of the corresponding batch of samples

The small sample learning channel sampler is as follows:

the input data for the small sample learning channel is sampled from a meta-sampler that first randomly samples N classes in all classes of the training set and then randomly samples each of the N classes in each training pass TSampling K_SA sample and K_QThe samples are respectively used as a support set of a small sample learning channel

And query set

The superscript sup and the superscript qry are respectively used for identifying the support set and the query set;

image data and label data representing the ith sample of the support set, 1 ≦ i ≦ N × K_S；

Image data and label data representing the ith sample of the query set, 1 ≦ i ≦ N × K_Q(ii) a Each batch of data consists of a support set S and a query set Q;

the case of the small sample learning channel network is as follows:

the small sample learning channel network is based on a small sample learning algorithm, a network model of the small sample learning algorithm is transplanted, and the small sample learning channel network comprises a feature extractor f_φDistance gauge d and loss function L_fsThe method comprises three parts, wherein a feature extractor adopted by a small sample learning channel network and a feature extractor adopted by an unbalanced learning channel network use the same network architecture and share weight parameters; input support set sample data (x)_i ^sup,y_i ^sup) And query set sample data (x)_i ^qry,y_i ^qry) First pass feature extractor f_φExtracting feature z_i ^sup＝f_φ(x_i ^sup) And z_i ^qry＝f_φ(x_i ^qry) Then, according to the distance gauge d, the distance d (x) of the query set sample feature and the support set sample feature is calculated_i ^qry,y_i ^sup) The label of the support set sample closest to the query set sample is the query set samplePredictive tag

Finally according to the defined small sample loss function L_fsCalculating small sample learning channel losses

The two-channel learning total loss function is as follows:

the total loss of the two-channel learning is a weighted sum of the unbalanced learning channel loss and the small sample learning channel loss, which is as follows:

in the formula, α is a hyper-parameter related to the training round T, α and the training round number T are in a parabolic decreasing relationship, and a value is 1 at the beginning of training and gradually decreases to 0 along with the increase of the training round number T, so that the dual-channel learning model is focused on an unbalanced learning channel in the early stage of training and focused on a small sample learning channel in the later stage of training.

In step 2), when training the two-channel learning model, firstly, the maximum training round number T is set_maxThe method comprises the steps of optimizing an optimizer type and an initial learning rate, inputting sampling data of a uniform sampler to an unbalanced learning channel network and sampling data of a small sample to a small sample learning channel network in each turn respectively, calculating unbalanced learning channel loss and small sample learning channel loss simultaneously, then carrying out weighted summation according to the unbalanced learning channel loss and the small sample learning channel loss, calculating double-channel learning total loss, combining the double-channel learning total loss with an optimizer, reversely propagating and updating a characteristic extractor parameter shared by double-channel weight and an unbalanced learning channel classifier parameter, enabling a hyper-parameter α in a double-channel learning total loss function to be in a parabolic decreasing relation with the number of training turns, setting the value of α to be 1 at the beginning of training, gradually reducing the hyper-parameter to 0 along with the increase of the number of training turns, and enabling the double-channel learning model to be more emphasized than the unbalanced learning in the early stage of trainingThe channel is emphasized to learn the channel by small samples in the later training period;

evaluating the performance of the dual-channel model by using the accuracy rate and the recall rate of a Many-shot type, a Medium-shot type, an Few-shot type and an Overall type in a verification set of the long-tail distribution image data set, wherein the number of samples of the Many-shot type is more than 100, the number of samples of the Medium-shot type is between 20 and 100, the number of samples of the Few-shot type is less than 20, the Overall type refers to all types of the verification set, and when the number of training rounds reaches a set maximum number of rounds T_maxAnd (5) terminating the training and storing the optimal double-channel learning model parameters.

In step 3), inputting the image data of the test set into an optimal two-channel learning model, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the final prediction result of the image data of the test set.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention adopts a mode of combining an unbalanced learning channel and a small sample learning channel, compared with the method of only using the unbalanced learning method, the added small sample learning channel can improve the feature representation, enhance the compactness in class and improve the identification capability of a double-channel learning model to the tail class image data with rare data.

2. The uniform sampler adopted by the unbalanced learning channel can keep the original distribution of the long-tail distribution image data set, and is beneficial to the representation learning of the characteristics.

3. The small sample learning channel adopts the meta sampler to perform meta sampling on all classes of a training set of the long-tail distribution image data set, and samples a small amount of data of different classes in different rounds to perform learning as a meta task, so that the dual-channel learning model can learn the self-adaptive capacity of a small amount of sample recognition tasks and can fully utilize the data set.

4. The two-channel learning total loss function constructed by the invention is the weighted summation of the unbalanced learning channel loss and the small sample learning channel loss, the two-channel learning model emphasizes on the unbalanced learning channel in the early stage of training so as to learn a good decision boundary, and emphasizes on the small sample learning channel in the later stage of training, the characteristic representation of the two-channel learning model damaged by unbalanced learning is gradually corrected by pulling the similar samples and pushing the heterogeneous samples, meanwhile, the decision boundary learned by the unbalanced learning channel is ensured not to be damaged, and the identification accuracy of the two-channel learning model on long-tail distribution image data is integrally improved.

5. The long-tail distribution image data identification method based on the two-channel learning uses the output of the last layer of classifier of the unbalanced learning channel network as the final prediction result. When the dual-channel learning model is trained, the accuracy rate and the recall rate of the Many-shot type, the Medium-shot type and the Few-shot type in the verification set of the long-tail distribution image data set are used for evaluating the performance of the dual-channel learning model, the change of the model authenticity performance can be tracked better, and the trained model is more reliable.

Drawings

FIG. 1 is a diagram illustrating an example of input data according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a two-channel learning model structure according to the present invention.

Detailed Description

The present invention will be further described with reference to the following specific examples.

The plates 365 dataset is a large image dataset covering 365 scene categories, each category containing no more than 5000 training pictures, 50 verification pictures and 900 test pictures. The Places365 original data set is downsampled according to the pareto distribution with the power exponent parameter of 6, the training set of the obtained long-tail distribution image data set totally comprises 62500 pictures, each class at most comprises 4980 pictures and at least 5 pictures, and the place-LT of the constructed long-tail distribution image data set is shown in figure 1. And (4) sampling 20 pictures in each type in a verification set of the long-tail distribution image data set, and tracking and evaluating the performance of the two-channel learning model. And (3) sampling 50 pictures in each type of the test set of the long-tail distribution image data set, and evaluating and comparing the performance of the dual-channel learning model and other image data identification models.

For the constructed long-tailed distribution image dataset, the data preprocessing operation is as follows: all pictures are first adjusted to 256 x 256, randomly cropped to 224 x 224 and then horizontally flipped with 50% probability during training and randomly dithered in brightness, contrast and saturation of the picture to enhance the picture, with the image being center cropped to 224 x 224 without further enhancement during verification and testing.

As shown in fig. 2, the method for identifying long-tail distribution image data based on two-channel learning according to this embodiment includes the following steps:

1) constructing a two-channel learning model consisting of an unbalanced learning channel sampler, an unbalanced learning channel network, a small sample learning channel sampler, a small sample learning channel network and a two-channel learning total loss function, wherein:

unbalanced learning channel sampler: the input data for the unbalanced learning channel is sampled from a uniform sampler. Each sample in the training set of the long-tailed distribution image dataset is sampled with equal probability and at most once in each training round T. Define B as the number of samples sampled per batch, B is set to 128 in this embodiment, and the input data for the sampling is represented by { (x)₁ ^imb,y₁ ^imb),...,(x_i ^imb,y_i ^imb),...,(x_B ^imb,y_B ^imb) Where superscript imb is used to identify the unbalanced learning channel, (x)_i ^imb,y_i ^imb) Image data and label data representing the ith (1. ltoreq. i.ltoreq.B) sample, respectively.

Unbalanced learning channel network: based on the unbalanced classification algorithm, the network model of the unbalanced classification algorithm can be transplanted, in this embodiment, the unbalanced learning channel network adopts an LDAM unbalanced classification network, wherein the feature extractor f_φResidual error network and classifier adopting ResNet10

Using full-link networksUnbalanced loss function L_imbLDAM losses were used. Feature extractor f_φFirst, input data (x) is extracted_i ^imb,y_i ^imb) Is characterized by

Then the features are expressed

Input to a classifier

Obtaining a predictive tag

Finally, calculating the unbalanced learning channel loss of the batch of samples by utilizing an LDAM loss function

Record the category as y_i ^imbSample x of_i ^imbIs characterized by being represented as

Training set y_i ^imbThe number of class samples is

The hyperparameter C is set to 0.5 and the LDAM loss function is given by:

wherein:

small sample learning channel sampler: the input data for the small sample learning channel is sampled from a meta-sampler. At each training round T, the meta-sampler first distributes the bins of the training set of image data sets at the long tailRandom sampling N in 5 classes, then random sampling K in each of the 5 classes _S1 sample and K _Q1 sample is respectively used as a support set of a small sample learning channel

And query set

Wherein the superscript sup and the superscript qry are used to identify the support set and the query set respectively,

i (1. ltoreq. i. ltoreq.N × K) of the support set_S) The image data and the label data of the individual samples,

the ith (1 ≦ i ≦ N × K) representing the query set_Q) Image data and label data for individual samples. Each batch of data consists of a support set S and a query set Q.

Small sample learning channel network: based on the small sample learning algorithm, the network model of the small sample learning algorithm can be transplanted, in this embodiment, the network model of the small sample learning channel adopts a prototype network model, and the channel adopts a feature extractor f_φFeature extractor f for use with unbalanced learning channels_φThe same ResNet10 network architecture is used and the weight parameters are shared. Small sample loss function L_fsCross entropy loss is employed. Input support set sample data (x)_i ^sup,y_i ^sup) And query set sample data (x)_i ^qry,y_i ^qry) First pass feature extractor f_φExtracting a feature representation z_i ^sup＝f_φ(x_i ^sup) And z_i ^qry＝f_φ(x_i ^qry) Feature extractor f_φAfter the characteristics of the input batch data are extracted, a support set per-type sample set S is calculated_kC center of the feature_kThen sample collection according to the queryCharacteristic z of_i ^qryAnd class feature center c_kEuclidean distance d (z)_i ^qry,c_k) Computing a query set sample x_i ^qryProbability of belonging to class k

Wherein:

finally, according to the small sample loss function L_fsCalculating small sample learning channel losses

The small sample learning channel loss is as follows:

two-channel learning total loss function: the total loss of the two-channel learning is the weighted sum of the loss of the unbalanced learning channel and the loss of the small sample learning channel. The two-channel learning total loss function is as follows:

wherein α is a hyper-parameter related to training round T, and defines the total number of training rounds as T_maxα relates to training round T as follows:

2) and (3) updating all parameters in the dual-channel learning model by utilizing the total loss of the dual-channel learning and back propagation, namely training the dual-channel learning model, and storing the optimal parameters of the dual-channel learning model to obtain the optimal dual-channel learning model.

In the process of training the two-channel learning model, the maximum training round number T_maxThe learning rate is initialized to 0.1 by using the SGD optimizer, the learning rate is reduced by 0.1 time when the number of training rounds T reaches 70 and is continuously reduced by 0.1 time when the number of training rounds T reaches 90, and the hyperparameter α in the two-channel learning total loss function and the number of training rounds T are in a parabolic decreasing relation, so that the two-channel learning model is emphasized on an unbalanced learning channel in the early training period and emphasized on a small sample learning channel in the later training period.

When the dual-channel learning model is trained, the performance of the dual-channel model is evaluated by using the accuracy rate and the recall rate of a Many-shot category, a Medium-shot category, an Few-shot category and an Overall category in a verification set of a long-tailed distribution image dataset. The number of samples of the Many-shot category is greater than 100, the number of samples of the Medium-shot category is between 20 and 100, the number of samples of the Few-shot category is less than 20, and the Overall category refers to all categories of the verification set. When the training round number T reaches the set maximum round number T_maxAnd (5) terminating the training and storing the optimal double-channel learning model parameters.

3) And inputting the image data of the test set of the long-tail distribution image data set into the optimal two-channel learning model stored in the last step, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the final prediction result of the image data of the test set.

The following table is a comparison of the two-channel learning model with other image data recognition models in the Places-LT dataset. In the comparison Model, DC-LTR represents a two-channel learning Model, and besides the Plain Model is a naive deep convolutional neural network classification Model, other models are the current mainstream models for processing unbalanced image data sets or long-tail distribution image data sets. For fair comparison, all comparison models are trained using a training set of Place-LT and a Resnet10 network structure, and then Class-Balanced Accuracy and Macro F-measure of Many-shot category, Medium-shot category, Few-shot category and overload category are calculated on the testing set of Place-LT, wherein Class-Balanced Accuracy represents the average recall rate of each Class, and Macro F-measure represents the average Accuracy rate of each Class.

TABLE 1 results of comparative experiments on Places-LT data set

From experimental results, the Class-Balanced Accuracy result and the Macro F-measure result of the double-channel learning model DC-LTR in the Few-shot type and the Overall type obviously exceed those of other comparison models, so that the double-channel learning model can improve the identification Accuracy of the tail type image data with rare data and improve the identification Accuracy of the long-tail distribution image data on the whole; the result of the double-channel learning model DC-LTR on the Medium-shot category has the same advantages, the result on the Many-shot category is slightly reduced, but the result is equivalent to other models based on an imbalance algorithm or long-tail distribution image data identification models, and the result shows that the double-channel learning model does not damage the head class image data identification accuracy rate with rich data while improving the tail class image data identification accuracy rate with rare data. The effectiveness and superiority of the dual-channel learning model are verified through comparison of different models.

The model of the invention is compiled by Python3.7, based on a deep learning framework PyTorch, the model of the GPU for experimental operation is 2 NVIDIA GeForce GTX 1080Ti, and the total is 22GB video memory.

The long tail identification method for other data sets is similar to this method.

In conclusion, the invention combines the unbalanced learning and the small sample learning to solve the problem of long-tail distribution image data identification. The unbalanced learning channel improves the malformation phenomenon that a general algorithm is too biased to the head category, a good classification decision boundary is learned at the same time, and the identification accuracy of a double-channel learning model on an unbalanced data set is improved; the small sample learning channel restores the characteristic representation capability damaged by the unbalanced learning channel by pulling the similar sample and pushing the dissimilar sample, and enhances the recognition capability of the two-channel learning model on the tail image data; the constructed double-channel learning total loss function enables the double-channel learning model to be emphasized on an unbalanced learning channel in the early stage of training and emphasized on a small sample learning channel in the later stage of training, and therefore the recognition accuracy rate of the double-channel learning model on long-tail distribution image data is improved on the whole. Therefore, the invention has practical application value and is worth popularizing.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A long tail distribution image data identification method based on dual-channel learning is characterized by comprising the following steps:

2. The long-tail distribution image data identification method based on two-channel learning as claimed in claim 1, wherein: in step 1), the unbalanced learning channel sampler is as follows:

the case of the unbalanced learning channel network is as follows:

Then the features are expressed

Input to a classifier

Obtaining a predictive tag

The small sample learning channel sampler is as follows:

the input data for the small sample learning channel is sampled from a meta-sampler that first randomly samples N classes in all classes of the training set and then randomly samples K in each of the N classes in each training pass T_SA sample and K_QThe samples are respectively used as a support set of a small sample learning channel

And query set

the case of the small sample learning channel network is as follows:

the small sample learning channel network is based on a small sample learning algorithm, a network model of the small sample learning algorithm is transplanted, and the small sample learning channel network comprises a feature extractor f_φDistance gauge d and loss function L_fsThree parts, wherein the small sample learns the feature extractor and imbalance adopted by the channel networkThe feature extractors adopted by the learning channel network use the same network architecture and share weight parameters; input support set sample data (x)_i ^sup,y_i ^sup) And query set sample data (x)_i ^qry,y_i ^qry) First pass feature extractor f_φExtracting feature z_i ^sup＝f_φ(x_i ^sup) And z_i ^qry＝f_φ(x_i ^qry) Then, according to the distance gauge d, the distance d (x) of the query set sample feature and the support set sample feature is calculated_i ^qry,y_i ^sup) The label of the support set sample closest to the query set sample is the prediction label of the query set sample

The two-channel learning total loss function is as follows:

3. The method for identifying long-tail distribution image data based on two-channel learning as claimed in claim 1, wherein the method is characterized in that: in step 2), when training the two-channel learning model, firstly, the maximum training round number T is set_maxThe method comprises the steps of optimizing an optimizer type and an initial learning rate, inputting sampling data of a uniform sampler to an unbalanced learning channel network and sampling data of a small sample to a small sample learning channel network in each round respectively, calculating unbalanced learning channel loss and small sample learning channel loss simultaneously, carrying out weighted summation according to the unbalanced learning channel loss and the small sample learning channel loss, calculating double-channel learning total loss, combining the double-channel learning total loss with an optimizer, reversely propagating and updating a characteristic extractor parameter shared by double-channel weight and an unbalanced learning channel classifier parameter, enabling a hyper-parameter α in a double-channel learning total loss function to be in a parabolic decreasing relation with the number of training rounds, and enabling α to take a value of 1 at the beginning of training and gradually decrease to 0 along with the increase of the number of training rounds, so that the double-channel learning model is emphasized on the unbalanced learning channel at the early stage of training and emphasized on the small sample learning channel at the later stage of training;

4. The long-tail distribution image data identification method based on two-channel learning as claimed in claim 1, wherein: in step 3), inputting the image data of the test set into an optimal two-channel learning model, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the final prediction result of the image data of the test set.