CN111738301B - Long-tail distribution image data identification method based on double-channel learning - Google Patents

Long-tail distribution image data identification method based on double-channel learning Download PDF

Info

Publication number
CN111738301B
CN111738301B CN202010465433.XA CN202010465433A CN111738301B CN 111738301 B CN111738301 B CN 111738301B CN 202010465433 A CN202010465433 A CN 202010465433A CN 111738301 B CN111738301 B CN 111738301B
Authority
CN
China
Prior art keywords
learning
channel
unbalanced
small sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010465433.XA
Other languages
Chinese (zh)
Other versions
CN111738301A (en
Inventor
陈琼
林恩禄
朱戈仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010465433.XA priority Critical patent/CN111738301B/en
Publication of CN111738301A publication Critical patent/CN111738301A/en
Application granted granted Critical
Publication of CN111738301B publication Critical patent/CN111738301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a long tail distribution image data identification method based on double-channel learning, which comprises the following steps: 1) Constructing a two-channel learning model combining unbalanced learning and small sample learning; 2) Updating all parameters in the two-channel learning model by using the total loss of the two-channel learning, and storing the optimal parameters of the two-channel learning model; 3) Inputting the image data of the test set to an optimal two-channel learning model to obtain a prediction label of the image. According to the invention, unbalanced learning and small sample learning are combined to solve the problem of long-tail distributed image data identification, the unbalanced learning channel can improve the identification accuracy of an unbalanced data set, the small sample learning channel can improve the characteristic representation of model learning, the model is focused on the unbalanced learning channel in the early training stage, and the small sample learning channel in the later training stage, so that the identification accuracy of long-tail distributed image data is improved as a whole.

Description

Long-tail distribution image data identification method based on double-channel learning
Technical Field
The invention relates to the technical fields of unbalanced classification, small sample learning and long-tail distribution image data identification in machine learning, in particular to a long-tail distribution image data identification method based on double-channel learning.
Background
Long tail distributed image data identification generally adopts unbalanced learning related technology, and the technology is mainly divided into a data layer and an algorithm layer. The techniques at the data plane mainly include a mixed sampling method of downsampling majority class samples, upsampling minority class samples, or a combination of both. However, resampled data cannot reflect the real data distribution characteristics, for example, most samples are discarded by the downsampling method, so that a lot of valuable information in the data set is lost, and the upsampling method causes an overfitting problem and causes great computational power consumption. The algorithm level technology mainly readjusts the weight of each category through a cost sensitive method, and the method alleviates the problem of long-tail distribution image data identification to a certain extent, but does not comprehensively consider the situation that a large number of tail categories have few samples, so that the tail category identification accuracy is still low. In addition, the feasible solution thinking is to learn knowledge from the head class rich data and migrate to the tail class, design a loss function suitable for long-tail distribution image data identification, and construct a more reasonable long-tail distribution image data identification model.
Data in real life are often presented in a long-tail distribution form, however, the current research on long-tail distribution image data identification is still in a preliminary stage, and all long-tail distribution image data identification methods have limitations and cannot well improve the identification accuracy of tail categories.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides an effective, scientific and reasonable long-tail distribution image data identification method based on double-channel learning, wherein unbalanced learning and small-sample learning are combined to solve the problem of long-tail distribution image data identification, an unbalanced learning channel can improve the identification accuracy of a model to an unbalanced data set, a small-sample learning channel can improve the characteristic representation learned by the model, and the identification capability of the model to tail type image data is enhanced; the built double-channel learning total loss function enables the model to be focused on an unbalanced learning channel in the early training stage and focused on a small sample learning channel in the later training stage, so that the recognition accuracy of the model on long-tail image data is improved as a whole. The method provided by the invention is applicable to the problems of unbalanced multi-classification and long-tail distribution image data identification, and is a general method with stronger robustness.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a long tail distribution image data identification method based on double-channel learning comprises the following steps:
1) Constructing a two-channel learning model consisting of an unbalanced learning channel sampler, an unbalanced learning channel network, a small sample learning channel sampler, a small sample learning channel network and a two-channel learning total loss function; dividing the long-tail distribution image data set into a training set, a verification set and a test set; sampling image data and label data from a training set by using an unbalanced learning channel sampler, inputting the data into an unbalanced learning channel network, and calculating unbalanced learning channel loss; sampling image data and label data from a training set by using a small sample learning channel sampler, inputting the data into a small sample learning channel network, and calculating small sample learning channel loss; then carrying out weighted summation on the unbalanced learning channel loss and the small sample learning channel loss to obtain a two-channel learning total loss;
2) Using the total loss of the two-channel learning, and updating all parameters in the two-channel learning model by back propagation, namely training the two-channel learning model, and storing the optimal two-channel learning model parameters to obtain an optimal two-channel learning model;
3) Inputting the image data of the test set to an optimal two-channel learning model, and obtaining a prediction label, namely a prediction result, of the image.
In step 1), the case of the unbalanced learning channel sampler is as follows:
the input data of the unbalanced learning channel is sampled from a uniform sampler, and each sample in the training set is sampled with equal probability and at most once in each training round T; define B as the number of samples sampled per batch, the sampled input data is represented as { (x) 1 imb ,y 1 imb ),...,(x i imb ,y i imb ),...,(x B imb ,y B imb ) A superscript imb identifying an unbalanced learning path, (x) i imb ,y i imb ) Image data and label data representing the ith sample, i is more than or equal to 1 and less than or equal to B;
the unbalanced learning channel network is as follows:
the unbalanced learning channel network is based on an unbalanced classification algorithmThe network model of the unbalanced classification algorithm is transplanted and comprises a feature extractor f φ Classifier
Figure BDA0002512483940000031
And an unbalance loss function L imb Three parts, the feature extractor f φ For extracting input data (x i imb ,y i imb ) Characteristic representation of +.>
Figure BDA0002512483940000032
The feature representation is then ++>
Figure BDA0002512483940000033
Input to classifier->
Figure BDA0002512483940000034
Get predictive tag->
Figure BDA0002512483940000035
Finally combine the defined unbalance loss function L imb Calculating unbalanced learning channel loss of corresponding batch of samples>
Figure BDA0002512483940000036
The small sample learning channel sampler is as follows:
the input data of the small sample learning channel is sampled from a meta-sampler that randomly samples N categories in all categories of the training set first, and then randomly samples K in each of the N categories, for each training round T S Samples and K Q The individual samples are respectively used as the support sets of the small sample learning channels
Figure BDA0002512483940000037
And a query set
Figure BDA0002512483940000038
Wherein the superscript sup and the superscript qry are respectively used for identifying the support set and the queryA collection; />
Figure BDA0002512483940000039
Image data and label data representing the ith sample of the support set, 1.ltoreq.i.ltoreq.NxK S
Figure BDA00025124839400000310
Image data and label data representing the ith sample of the query set, 1.ltoreq.i.ltoreq.NxK Q The method comprises the steps of carrying out a first treatment on the surface of the The data of each batch consists of a support set S and a query set Q;
the small sample learning channel network is as follows:
the small sample learning channel network is based on a small sample learning algorithm, and a network model of the small sample learning algorithm is transplanted and comprises a feature extractor f φ Distance gauge d and loss function L fs Three parts, wherein the feature extractor adopted by the small sample learning channel network and the feature extractor adopted by the unbalanced learning channel network use the same network architecture and share weight parameters; input support set sample data (x i sup ,y i sup ) And query set sample data (x i qry ,y i qry ) First pass feature extractor f φ Extracting feature z i sup =f φ (x i sup ) And z i qry =f φ (x i qry ) Then, according to the distance scale d, calculating the distance d (x) between the sample characteristics of the query set and the sample characteristics of the support set i qry ,y i sup ) The label of the support set sample closest to the query set sample is the prediction label of the query set sample
Figure BDA0002512483940000041
Finally according to the defined small sample loss function L fs Calculating the small sample learning channel loss->
Figure BDA0002512483940000042
The two-channel learning total loss function is as follows:
the two-channel learning total loss is a weighted sum of the unbalanced learning channel loss and the small sample learning channel loss, as follows:
Figure BDA0002512483940000043
wherein, alpha is a super parameter related to the training round T, alpha and the training round number T are in parabolic decreasing relation, the value is 1 at the beginning of training, and the value gradually decreases to 0 along with the increase of the training round number T, so that the two-channel learning model focuses on an unbalanced learning channel in the early stage of training and focuses on a small sample learning channel in the later stage of training.
In step 2), when training the two-channel learning model, a maximum training round number T is first set max The method comprises the steps of firstly, inputting sampling data of an optimizer type and an initial learning rate into an unbalanced learning channel network by using a uniform sampler and inputting sampling data of a meta sampler into a small sample learning channel network by using each round, simultaneously calculating unbalanced learning channel loss and small sample learning channel loss, then carrying out weighted summation according to the unbalanced learning channel loss and the small sample learning channel loss to calculate a two-channel learning total loss, combining the two-channel learning total loss with the optimizer, updating a characteristic extractor parameter and an unbalanced learning channel classifier parameter of two-channel weight sharing by back propagation, wherein an over-parameter alpha in a two-channel learning total loss function and a training round number are in parabolic decreasing relation, the alpha is 1 at the beginning of training, and gradually decreases to 0 along with the increase of the training round number, so that the two-channel learning model is focused on the unbalanced learning channel in the early training period and focused on the small sample learning channel in the later training period;
evaluating performance of the dual-channel model using accuracy and recall rates of a Many-shot class, a Medium-shot class, a Few-shot class, and an Overall class in a verification set of the long-tail distributed image dataset, wherein the number of samples of the Many-shot class is greater than 100, the number of samples of the Medium-shot class is between 20 and 100, and the number of samples of the Few-shot classLess than 20, the override category refers to all categories of the validation set when the training round number reaches a set maximum round number T max And when the training is stopped, the optimal two-channel learning model parameters are saved.
In step 3), inputting the image data of the test set into an optimal two-channel learning model, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the final prediction result of the image data of the test set.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. compared with the method for only using unbalanced learning, the method for identifying the tail type image data of the unbalanced learning system has the advantages that the method for identifying the tail type image data of the unbalanced learning system adopts a mode of combining the unbalanced learning channel and the small sample learning channel, and the added small sample learning channel can improve characteristic representation, enhance intra-class compactness and improve the identification capability of the two-channel learning model on the tail type image data with rare data.
2. The uniform sampler adopted by the unbalanced learning channel can keep the original distribution of the long-tail distribution image dataset, and is beneficial to the representation learning of the characteristics.
3. The meta sampler adopted by the small sample learning channel is used for meta sampling all categories of a training set of a long-tail distributed image dataset, and learning is carried out by taking a small amount of data of different categories as meta tasks in different rounds of sampling, so that the two-channel learning model can learn the self-adaption capability of identifying tasks on a small amount of samples and fully utilize the dataset.
4. The total loss function of the two-channel learning constructed by the invention is the weighted sum of the loss of the unbalanced learning channel and the loss of the small sample learning channel, the two-channel learning model focuses on the unbalanced learning channel in the early training stage so as to learn a good decision boundary, and focuses on the small sample learning channel in the later training stage, the characteristic representation of the two-channel learning model damaged by the unbalanced learning is gradually corrected by pulling similar samples and pushing different samples, meanwhile, the decision boundary learned by the unbalanced learning channel is not damaged, and the recognition accuracy of the two-channel learning model on long-tail distribution image data is improved as a whole.
5. The long tail distribution image data identification method based on the double-channel learning uses the output of the last layer classifier of the unbalanced learning channel network as a final prediction result. And when the two-channel learning model is trained, the performance of the two-channel learning model is evaluated by using the accuracy and recall rates of the Many-shot class, the Medium-shot class and the Few-shot class in the verification set of the long-tail distributed image dataset, so that the change of the real performance of the model can be tracked better, and the trained model is more reliable.
Drawings
Fig. 1 is a diagram showing an example of input data according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a two-channel learning model structure according to the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples.
Places365 dataset is a large image dataset covering 365 scene categories, each category containing no more than 5000 training pictures, 50 verification pictures, and 900 test pictures. The Places365 original data set is downsampled according to the pareto distribution with the power exponent parameter of 6, the training lump of the obtained long-tail distribution image data set totally comprises 62500 pictures, wherein each class contains 4980 pictures at most and 5 pictures at least, and the training set Places-LT of the constructed long-tail distribution image data set is shown in figure 1. The verification set of the long-tail distributed image dataset samples 20 pictures per class for tracking and evaluating the performance of the two-channel learning model. 50 pictures are sampled from each class of test set of the long-tail distributed image data set and are used for evaluating and comparing the performance of the two-channel learning model and other image data identification models.
For a constructed long tail distributed image dataset, the data preprocessing operation is as follows: all pictures were first adjusted to 256 x 256, randomly cropped to 224 x 224 during training, then flipped horizontally with 50% probability, and randomly dithered in brightness, contrast, and saturation of the pictures to enhance the pictures, which were center cropped to 224 x 224 without further enhancement during verification and testing.
As shown in fig. 2, the long tail distribution image data identification method based on the dual-channel learning provided by the embodiment includes the following steps:
1) Constructing a two-channel learning model consisting of an unbalanced learning channel sampler, an unbalanced learning channel network, a small sample learning channel sampler, a small sample learning channel network and a two-channel learning total loss function, wherein:
imbalance learning channel sampler: the input data of the unbalanced learning channel is sampled from a uniform sampler. In each training round T, each sample in the training set of long-tail distributed image datasets is sampled with equal probability and at most once. Defining B as the number of samples sampled per batch, B is set to 128 in this embodiment, and the sampled input data is represented as { (x) 1 imb ,y 1 imb ),...,(x i imb ,y i imb ),...,(x B imb ,y B imb ) A }, wherein superscript imb is used to identify an imbalance learning channel, (x) i imb ,y i imb ) Image data and label data of the i (1. Ltoreq.i.ltoreq.B) th sample are represented, respectively.
Unbalanced learning channel network: based on the unbalanced classification algorithm, the network model of the unbalanced classification algorithm can be transplanted, in this embodiment, the unbalanced learning channel network adopts an LDAM unbalanced classification network, wherein the feature extractor f φ Adopting ResNet10 residual error network and classifier
Figure BDA0002512483940000071
Using a fully-connected layer network, unbalanced loss function L imb LDAM losses are used. Feature extractor f φ First, input data (x i imb ,y i imb ) Characteristic representation of +.>
Figure BDA0002512483940000072
The feature representation is then ++>
Figure BDA0002512483940000073
Input to classifier->
Figure BDA0002512483940000074
Get predictive tag->
Figure BDA0002512483940000075
Finally, calculating the unbalanced learning channel loss of the batch of samples by using the LDAM loss function>
Figure BDA0002512483940000076
Record category y i imb Is of sample x of (2) i imb Is characterized by
Figure BDA0002512483940000077
Training set y i imb The number of class samples is +.>
Figure BDA0002512483940000078
The super parameter C is set to 0.5, and the LDAM loss function is as follows:
Figure BDA0002512483940000079
wherein:
Figure BDA0002512483940000081
a small sample learning channel sampler: the input data of the small sample learning channel is sampled from a meta-sampler. In each training round T, the meta-sampler first randomly samples n=5 categories among all categories of the training set of the long-tail distributed image dataset, and then randomly samples K in each of the 5 categories S =1 sample and K Q Support set for small sample learning channel with 1 sample
Figure BDA0002512483940000082
And a query set
Figure BDA0002512483940000083
Wherein the superscript sup and the superscript qry are used for identifying the support set and the query set, respectively, < >>
Figure BDA0002512483940000084
The ith (i is more than or equal to 1 and less than or equal to N multiplied by K) of the support set is represented S ) Image data and label data for each sample,
Figure BDA0002512483940000085
the ith (i is more than or equal to 1 and less than or equal to N multiplied by K) of the query set is represented Q ) Image data and label data for each sample. The data for each batch consists of a support set S and a query set Q.
Small sample learning channel network: based on the small sample learning algorithm, the network model of the small sample learning algorithm can be transplanted, in this embodiment, the network model of the small sample learning channel adopts a prototype network model, and the channel adopts a feature extractor f φ Feature extractor f for use with an unbalanced learning channel φ The same ResNet10 network architecture is used and the weight parameters are shared. Small sample loss function L fs Cross entropy loss is employed. Input support set sample data (x i sup ,y i sup ) And query set sample data (x i qry ,y i qry ) First pass feature extractor f φ Extracting the feature representation z i sup =f φ (x i sup ) And z i qry =f φ (x i qry ) Feature extractor f φ After extracting the characteristics of the input batch data, calculating each type of sample set S of the support set k Feature center c of (2) k Then according to the sample characteristics z of the query set i qry And class feature center c k Is the Euclidean distance d (z) i qry ,c k ) Calculate the sample x of the query set i qry Probability of belonging to class k
Figure BDA0002512483940000086
Figure BDA0002512483940000087
Wherein:
Figure BDA0002512483940000088
finally, according to the small sample loss function L fs Calculating a small sample learning channel loss
Figure BDA0002512483940000091
The small sample learning channel loss is as follows:
Figure BDA0002512483940000092
double-channel learning total loss function: the two-channel learning total loss is a weighted sum of the unbalanced learning channel loss and the small sample learning channel loss. The two-channel learning total loss function is as follows:
Figure BDA0002512483940000093
wherein alpha is a super parameter related to training round T, and the total round number of training is defined as T max The relationship of α to training round T is as follows:
Figure BDA0002512483940000094
2) And (3) updating all parameters in the two-channel learning model by using the total loss of the two-channel learning through back propagation, namely training the two-channel learning model, and storing the optimal two-channel learning model parameters to obtain the optimal two-channel learning model.
In the process of training the two-channel learning model, the maximum training round number T max Set to 120, the optimizer uses an SGD optimizer, the learning rate is initialized to 0.1, the learning rate drops by a factor of 0.1 when the training round number T reaches 70, and the learning rate continues to drop by a factor of 0.1 when the training round number reaches 90. The super parameter alpha in the dual-channel learning total loss function and the training round number T are in parabolic decreasing relation, so that the dual-channel learning model is focused on an unbalanced learning channel in the early stage of training and focused on a small sample learning channel in the later stage of training.
And when the two-channel learning model is trained, the accuracy and recall of the Many-shot class, the Medium-shot class, the Few-shot class and the Overall class in the verification set of the long-tail distributed image dataset are used for evaluating the performance of the two-channel model. The number of samples of the Many-shot class is greater than 100, the number of samples of the Medium-shot class is between 20 and 100, the number of samples of the Few-shot class is less than 20, and the overlay class refers to all classes of the verification set. When the training round number T reaches the set maximum round number T max And when the training is stopped, the optimal two-channel learning model parameters are saved.
3) Inputting the image data of the test set of the long-tail distributed image data set into the optimal two-channel learning model stored in the last step, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the prediction result of the image data of the test set finally.
The following table is a comparison of the two-channel learning model with other image data recognition models in the Places-LT dataset. In the comparison Model, the DC-LTR represents a two-channel learning Model, and other models are currently mainstream models for processing an unbalanced image dataset or a long tail distributed image dataset except that the Plain Model is a naive deep convolutional neural network classification Model. For fair comparison, all comparison models were trained using a Places-LT training set and Resnet10 network structure, and then Class-Balanced Accuracy and Macro F-measure for Many-shot Class, medium-shot Class, few-shot Class, and Overall Class were calculated on Places-LT test set, where Class-Balanced Accuracy represents average recall per Class and Macro F-measure represents average precision per Class.
TABLE 1 results of comparative experiments on Places-LT datasets
Figure BDA0002512483940000101
From the experimental results, the Class-Balanced Accuracy results and the Macro F-measure results of the two-channel learning model DC-LTR on the Few-shot Class and the Overall Class are obviously superior to those of other contrast models, so that the recognition accuracy of the tail type image data with sparse data can be improved by the two-channel learning model, and the recognition accuracy of the long tail distribution image data is improved as a whole; the DC-LTR of the two-channel learning model has the same advantages that the result on the Medium-shot class is slightly reduced, but the result on the Many-shot class is equivalent to other models based on an unbalanced algorithm or long-tail distribution image data recognition models, so that the recognition accuracy of the tail type image data with sparse data is improved, and meanwhile, the recognition accuracy of the head type image data with abundant data is not damaged. The effectiveness and superiority of the two-channel learning model are verified by comparing different models.
The model is written by using Python3.7, and based on a deep learning framework PyTorch, the model of the GPU of experimental operation is 2 blocks NVIDIA GeForce GTX 1080Ti, and the total 22GB video memory.
The long tail identification method of other data sets is similar to this method.
In summary, the invention combines unbalanced learning and small sample learning to solve the long tail distribution image data identification problem. The unbalanced learning channel improves the malformation phenomenon that a general algorithm is too biased to the head category, and simultaneously learns a good classification decision boundary, thereby improving the recognition accuracy of the unbalanced data set by the two-channel learning model; the small sample learning channel restores the characteristic representation capability damaged by the unbalanced learning channel by pulling similar samples and pushing different samples, and enhances the recognition capability of the two-channel learning model on tail image data; the constructed two-channel learning total loss function enables the two-channel learning model to focus on an unbalanced learning channel in the early training stage and focus on a small sample learning channel in the later training stage, so that the recognition accuracy of the two-channel learning model on long-tail distribution image data is improved as a whole. Therefore, the invention has practical application value and is worth popularizing.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims (3)

1. A long tail distribution image data identification method based on double-channel learning is characterized by comprising the following steps:
1) Constructing a two-channel learning model consisting of an unbalanced learning channel sampler, an unbalanced learning channel network, a small sample learning channel sampler, a small sample learning channel network and a two-channel learning total loss function; dividing the long-tail distribution image data set into a training set, a verification set and a test set; sampling image data and label data from a training set by using an unbalanced learning channel sampler, inputting the data into an unbalanced learning channel network, and calculating unbalanced learning channel loss; sampling image data and label data from a training set by using a small sample learning channel sampler, inputting the data into a small sample learning channel network, and calculating small sample learning channel loss; then carrying out weighted summation on the unbalanced learning channel loss and the small sample learning channel loss to obtain a two-channel learning total loss;
the unbalanced learning channel sampler is as follows:
the input data of the unbalanced learning channel is sampled from a uniform sampler, and each sample in the training set is sampled with equal probability and at most once in each training round T; define B as the number of samples sampled per batch, the sampled input data is represented as { (x) 1 imb ,y 1 imb ),...,(x i imb ,y i imb ),...,(x B imb ,y B imb ) A superscript imb identifying an unbalanced learning path, (x) i imb ,y i imb ) Image data and label data representing the ith sample, i is more than or equal to 1 and less than or equal to B;
the unbalanced learning channel network is as follows:
the unbalanced learning channel network is based on an unbalanced classification algorithm, and a network model of the unbalanced classification algorithm is transplanted and comprises a feature extractor f φ Classifier
Figure FDA0004077925520000014
And an unbalance loss function L imb Three parts, the feature extractor f φ For extracting input data (x i imb ,y i imb ) Characteristic representation of +.>
Figure FDA0004077925520000011
The feature representation is then ++>
Figure FDA0004077925520000016
Input to a classifier
Figure FDA0004077925520000015
Get predictive tag->
Figure FDA0004077925520000012
Finally combine the defined unbalance loss function L imb Calculating unbalanced learning channel loss of corresponding batch of samples>
Figure FDA0004077925520000013
The small sample learning channel sampler is as follows:
the input data of the small sample learning channel is sampled from a meta-sampler that randomly samples N categories in all categories of the training set first, and then randomly samples K in each of the N categories, for each training round T S Samples and K Q The individual samples are respectively used as small sample learning channelsSupport set of tracks
Figure FDA0004077925520000021
And a query set
Figure FDA0004077925520000022
Wherein, the superscript sup and the superscript qry are respectively used for identifying a supporting set and a query set; />
Figure FDA0004077925520000023
Image data and label data representing the g-th sample of the support set, 1.ltoreq.g.ltoreq.NxK S
Figure FDA0004077925520000024
Image data and tag data representing the p-th sample of the query set, 1.ltoreq.p.ltoreq.NxK Q The method comprises the steps of carrying out a first treatment on the surface of the The data of each batch consists of a support set S and a query set Q;
the small sample learning channel network is as follows:
the small sample learning channel network is based on a small sample learning algorithm, and a network model of the small sample learning algorithm is transplanted and comprises a feature extractor f φ Distance gauge d and loss function L fs Three parts, wherein the feature extractor adopted by the small sample learning channel network and the feature extractor adopted by the unbalanced learning channel network use the same network architecture and share weight parameters; input support set sample data
Figure FDA0004077925520000025
And query set sample data->
Figure FDA0004077925520000026
First pass feature extractor f φ Extracting feature z g sup =f φ (x g sup ) And z p qry =f φ (x p qry ) Then according to the distance metric d, calculating the sample characteristics of the query set and the sample characteristics of the support setDistance of sign d (x p qry ,y g sup ) The label of the support set sample closest to the query set sample is the predictive label of the query set sample +.>
Figure FDA0004077925520000027
Finally according to the defined small sample loss function L fs Calculating the small sample learning channel loss->
Figure FDA0004077925520000028
The two-channel learning total loss function is as follows:
the two-channel learning total loss is a weighted sum of the unbalanced learning channel loss and the small sample learning channel loss, as follows:
Figure FDA0004077925520000031
wherein, alpha is a super parameter related to the training round T, alpha and the training round number T are in parabolic decreasing relation, the value is 1 at the beginning of training, and gradually decreases to 0 along with the increase of the training round number T, so that the two-channel learning model focuses on an unbalanced learning channel in the early stage of training and focuses on a small sample learning channel in the later stage of training;
2) Using the total loss of the two-channel learning, and updating all parameters in the two-channel learning model by back propagation, namely training the two-channel learning model, and storing the optimal two-channel learning model parameters to obtain an optimal two-channel learning model;
3) Inputting the image data of the test set to an optimal two-channel learning model, and obtaining a prediction label, namely a prediction result, of the image.
2. The long-tail distribution image data identification method based on two-channel learning as claimed in claim 1, wherein the method comprises the following steps: in step 2), when training the two-channel learning model, a maximum training round number T is first set max The method comprises the steps of firstly, inputting sampling data of an optimizer type and an initial learning rate into an unbalanced learning channel network by using a uniform sampler and inputting sampling data of a meta sampler into a small sample learning channel network by using each round, simultaneously calculating unbalanced learning channel loss and small sample learning channel loss, then carrying out weighted summation according to the unbalanced learning channel loss and the small sample learning channel loss to calculate a two-channel learning total loss, combining the two-channel learning total loss with the optimizer, updating a characteristic extractor parameter and an unbalanced learning channel classifier parameter of two-channel weight sharing by back propagation, wherein an over-parameter alpha in a two-channel learning total loss function and a training round number are in parabolic decreasing relation, the alpha is 1 at the beginning of training, and gradually decreases to 0 along with the increase of the training round number, so that the two-channel learning model is focused on the unbalanced learning channel in the early training period and focused on the small sample learning channel in the later training period;
the method comprises the steps of evaluating the performance of a dual-channel model by using the accuracy rate and recall rate of a Many-shot class, a Medium-shot class, a Few-shot class and an Overall class in a verification set of a long-tail distribution image dataset, wherein the number of samples of the Many-shot class is greater than 100, the number of samples of the Medium-shot class is between 20 and 100, the number of samples of the Few-shot class is less than 20, the Overall class refers to all classes of the verification set, and when the training round number reaches a set maximum round number T max And when the training is stopped, the optimal two-channel learning model parameters are saved.
3. The long-tail distribution image data identification method based on two-channel learning as claimed in claim 1, wherein the method comprises the following steps: in step 3), inputting the image data of the test set into an optimal two-channel learning model, wherein the output of the last layer of classifier of the unbalanced learning channel network in the model is the final prediction result of the image data of the test set.
CN202010465433.XA 2020-05-28 2020-05-28 Long-tail distribution image data identification method based on double-channel learning Active CN111738301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010465433.XA CN111738301B (en) 2020-05-28 2020-05-28 Long-tail distribution image data identification method based on double-channel learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010465433.XA CN111738301B (en) 2020-05-28 2020-05-28 Long-tail distribution image data identification method based on double-channel learning

Publications (2)

Publication Number Publication Date
CN111738301A CN111738301A (en) 2020-10-02
CN111738301B true CN111738301B (en) 2023-06-20

Family

ID=72647933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010465433.XA Active CN111738301B (en) 2020-05-28 2020-05-28 Long-tail distribution image data identification method based on double-channel learning

Country Status (1)

Country Link
CN (1) CN111738301B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022099600A1 (en) * 2020-11-13 2022-05-19 Intel Corporation Method and system of image hashing object detection for image processing
CN112560904A (en) * 2020-12-01 2021-03-26 中国科学技术大学 Small sample target identification method based on self-adaptive model unknown element learning
CN112632319B (en) * 2020-12-22 2023-04-11 天津大学 Method for improving overall classification accuracy of long-tail distributed speech based on transfer learning
CN112632320A (en) * 2020-12-22 2021-04-09 天津大学 Method for improving speech classification tail recognition accuracy based on long tail distribution
CN113076873B (en) * 2021-04-01 2022-02-22 重庆邮电大学 Crop disease long-tail image identification method based on multi-stage training
CN113095304B (en) * 2021-06-08 2021-09-03 成都考拉悠然科技有限公司 Method for weakening influence of resampling on pedestrian re-identification
CN113449613B (en) * 2021-06-15 2024-02-27 北京华创智芯科技有限公司 Multi-task long tail distribution image recognition method, system, electronic equipment and medium
CN113255832B (en) * 2021-06-23 2021-10-01 成都考拉悠然科技有限公司 Method for identifying long tail distribution of double-branch multi-center
CN113569960B (en) * 2021-07-29 2023-12-26 北京邮电大学 Small sample image classification method and system based on domain adaptation
CN114283307B (en) * 2021-12-24 2023-10-27 中国科学技术大学 Network training method based on resampling strategy
CN114511887B (en) * 2022-03-31 2022-07-05 北京字节跳动网络技术有限公司 Tissue image identification method and device, readable medium and electronic equipment
CN114882273B (en) * 2022-04-24 2023-04-18 电子科技大学 Visual identification method, device, equipment and storage medium applied to narrow space
CN114863193B (en) * 2022-07-07 2022-12-02 之江实验室 Long-tail learning image classification and training method and device based on mixed batch normalization
CN115953631B (en) * 2023-01-30 2023-09-15 南开大学 Long-tail small sample sonar image classification method and system based on deep migration learning
CN116203929B (en) * 2023-03-01 2024-01-05 中国矿业大学 Industrial process fault diagnosis method for long tail distribution data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB823263A (en) * 1956-09-05 1959-11-11 Atomic Energy Authority Uk Improvements in or relating to nuclear particle discriminators
CN108830416A (en) * 2018-06-13 2018-11-16 四川大学 Ad click rate prediction framework and algorithm based on user behavior
CN109800810A (en) * 2019-01-22 2019-05-24 重庆大学 A kind of few sample learning classifier construction method based on unbalanced data
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
CN110580500A (en) * 2019-08-20 2019-12-17 天津大学 Character interaction-oriented network weight generation few-sample image classification method
CN110633758A (en) * 2019-09-20 2019-12-31 四川长虹电器股份有限公司 Method for detecting and locating cancer region aiming at small sample or sample unbalance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740595B2 (en) * 2017-09-28 2020-08-11 Nec Corporation Long-tail large scale face recognition by non-linear feature level domain adaption

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB823263A (en) * 1956-09-05 1959-11-11 Atomic Energy Authority Uk Improvements in or relating to nuclear particle discriminators
CN108830416A (en) * 2018-06-13 2018-11-16 四川大学 Ad click rate prediction framework and algorithm based on user behavior
CN109800810A (en) * 2019-01-22 2019-05-24 重庆大学 A kind of few sample learning classifier construction method based on unbalanced data
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
CN110580500A (en) * 2019-08-20 2019-12-17 天津大学 Character interaction-oriented network weight generation few-sample image classification method
CN110633758A (en) * 2019-09-20 2019-12-31 四川长虹电器股份有限公司 Method for detecting and locating cancer region aiming at small sample or sample unbalance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Enli Lin 等.Deep reinforcement learning for imbalanced classification.《Applied Intelligence》.2020,第2488-2502页. *
陈琼 等.不平衡数据的迁移学习分类算法.《华南理工大学学报》.2018,第46卷(第46期),第122-130页. *

Also Published As

Publication number Publication date
CN111738301A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111738301B (en) Long-tail distribution image data identification method based on double-channel learning
Asif et al. Ensemble knowledge distillation for learning improved and efficient networks
CN109063565B (en) Low-resolution face recognition method and device
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN101968853B (en) Improved immune algorithm based expression recognition method for optimizing support vector machine parameters
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
Hara et al. Towards good practice for action recognition with spatiotemporal 3d convolutions
CN110334243A (en) Audio representation learning method based on multilayer timing pond
CN111738303A (en) Long-tail distribution image identification method based on hierarchical learning
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN112766378A (en) Cross-domain small sample image classification model method focusing on fine-grained identification
CN107832753B (en) Face feature extraction method based on four-value weight and multiple classification
CN111860278A (en) Human behavior recognition algorithm based on deep learning
CN112329536A (en) Single-sample face recognition method based on alternative pair anti-migration learning
CN114882531A (en) Cross-domain pedestrian re-identification method based on deep learning
Tong et al. Automatic error correction for speaker embedding learning with noisy labels
CN114972753A (en) Lightweight semantic segmentation method and system based on context information aggregation and assisted learning
CN109241315B (en) Rapid face retrieval method based on deep learning
CN115100509B (en) Image identification method and system based on multi-branch block-level attention enhancement network
US20140343945A1 (en) Method of visual voice recognition by following-up the local deformations of a set of points of interest of the speaker&#39;s mouth
Li et al. Adaptive multi-prototype relation network
US20140343944A1 (en) Method of visual voice recognition with selection of groups of most relevant points of interest
CN113111774B (en) Radar signal modulation mode identification method based on active incremental fine adjustment
CN110750672B (en) Image retrieval method based on deep measurement learning and structure distribution learning loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant