Background
With the rapid development of deep convolutional neural networks, the effect of image classification has been impressive. This effort is indistinguishable from the increasingly rich data set. In academia, most data sets have almost uniform distribution of the number of class labels, but data in the real world are not uniform and even have long tail distribution, i.e. a few classes occupy most of the number of pictures, the part of classes are called head classes, while the rest of classes only occupy a few pictures, the part of classes are called tail classes, and particularly, fig. 1 can be seen.
The more popular existing methods of dealing with long tail distributions include resampling and reweighting. The resampling essence is to inversely weight the sampling frequencies of different classes of pictures according to the number of samples. If the number of pictures belonging to the current class is larger, the sampling probability given to the pictures of the current class is lower, and in the opposite case, the corresponding sampling probability is higher. The weight is mainly reflected in the loss of classification, i.e. the loss of the head class is given a lower weight, and the loss of the tail class is given a higher weight.
Both of the above methods, while providing better predictive results, have the undesirable effect of compromising, to some extent, the characterization ability of the depth feature. Some existing methods have some defects, and specific defects are as follows:
1. when no measure is taken for the long tail distribution problem, the long tail distribution shows a good classification effect on the head classes and a poor classification effect on the tail classes. And the larger the maximum ratio difference of the number of the pictures of the head class and the tail class is, the poorer the classification and identification effect of the model on the tail class is.
2. When the resampling strategy is used for the long tail distribution data set, the sampling probability of the head class is reduced, and the sampling probability of the tail class is increased. This alleviates the problem of long tail distribution, but creates another problem. The sampling probability of the tail type picture is high, the distribution of the characteristic space data is changed, the identification and classification effects of the model are influenced, and particularly, the graph 4 can be viewed.
Disclosure of Invention
The invention aims to provide a method for identifying long tail distribution of a double-branch multi-center, which can solve the problems caused by a long tail distribution data set through a double-branch architecture and multi-center design.
The invention solves the technical problem, and adopts the technical scheme that:
the method for identifying the long tail distribution of the double-branch multi-center is characterized by comprising the following steps of:
step 1, initializing two samplers, wherein one sampler adopts default sampling to obtain an image input default branch, and the other sampler adopts a resampling strategy to sample to obtain an image input resampling branch;
step 2, respectively enhancing the data of the pictures obtained by the two samplers;
step 3, performing data enhancement on the pictures output by the default branch and the resampling branch, then respectively inputting the pictures into the corresponding deep convolutional neural networks, extracting high-dimensional feature representations of the two pictures, and then respectively performing global average pooling on the high-dimensional feature representations of the two pictures in the convolutional layers of the corresponding deep convolutional neural networks to obtain respective low-dimensional feature representations, wherein the convolutional layers of the two deep convolutional neural networks are shared except the parameter of the last residual block;
step 4, obtaining the probability of belonging to each category through a full connection layer by the low-dimensional feature representation obtained by the default branch through the convolution layer, multiplying the low-dimensional feature representation obtained by the resampling branch through the convolution layer by a matrix representing multiple centers to obtain a feature matrix, and then taking the maximum row value of the feature matrix to obtain the final probability of belonging to each category;
step 5, calculating the loss of the probability which belongs to each category and is obtained by the default branch and the resampling branch through a loss function respectively;
step 6, multiplying the loss of the default branch by a weight ⍺ to obtain the loss of the default branch, multiplying the loss of the resampling branch by a weight 1- ⍺ to obtain the loss of the resampling branch, wherein ⍺ is a variable from 1 to 0, adding the losses obtained by the two branches to obtain the final loss, and then performing back propagation on the deep convolutional neural network according to the loss and updating the weight;
step 7, continuously iterating the deep convolutional neural network until the deep convolutional neural network is converged, so that the recognition accuracy of the deep convolutional neural network reaches over 90 percent;
and 8, when an identification task needs to be carried out, inputting the picture into a resampling branch, and inputting the picture into a deep convolutional neural network after data enhancement to obtain the probability that the picture belongs to each category.
Further, in step 1, initializing two samplers, one of which adopts default sampling to obtain a picture input default branch, and the other of which uses a resampling strategy to sample, and obtaining a picture input resampling branch specifically means: the method comprises the steps that a default sampler samples according to the same probability of each picture, then the pictures are input into a default branch, a resampling sampler counts a training data set through a resampling strategy, the number of the pictures corresponding to each category is calculated, the sampling probability of the pictures of each category is the same through the resampling strategy, and then the pictures are input into a resampling branch.
Further, in step 2, the data enhancement is performed on the pictures obtained by the two samplers, including performing left-right flipping and/or random cropping and/or random padding policy operations on the pictures.
Further, in step 3, the dimension of the high-dimensional feature representation is 4096, and the dimension of the low-dimensional feature representation is 64.
Further, in step 4, the dimension of the obtained feature belonging to the probability of each class is the number of classes in the training data set, and the dimension of the matrix representing the multiple centers is the number of high-dimensional features, the number of classes in the training set, and the number of centers.
Further, step 5 specifically comprises: and calculating losses by using the probabilities belonging to the categories and obtained by the two branches and the corresponding labels of the pictures, and obtaining the respective losses of the two branches through a loss function.
Further, in step 5, the probabilities belonging to each category and the labels of the corresponding pictures obtained by the two branches are respectively subjected to a Cross entry loss function, so as to obtain the loss of each branch, wherein the Cross entry loss function has the formula:
wherein,
a formula representing the Cross control loss function,
the total number of categories is represented as,
indicates the probability that the current picture belongs to the ith class,
the representation model predicts the probability that the current picture belongs to the ith class.
The method has the advantages that by the method for identifying the long tail distribution of the double-branch multi-center, firstly, a double-branch framework is used, not only a default sampling sampler is used, but also a resampling sampler is used, and therefore the influence caused by the long tail distribution can be preliminarily relieved. In the invention, the double branches have the advantages that the integral distribution of the learning distribution of the default branches is realized, and the resampling branch is used for finely adjusting the learning distribution of the default branches; secondly, a multi-center classifier architecture is used in a resampling branch, so that the influence of data distribution change caused by resampling can be reduced. Experiments prove that due to the double-branch multi-center framework, the influence caused by long tail distribution can be further processed on the basis of resampling, a better identification and classification effect is achieved, and the model has better generalization capability.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and embodiments.
The invention provides a method for identifying the distribution of long tails of double-branch multi-center, which comprises the following steps:
step 1, initializing two samplers, wherein one sampler adopts default sampling to obtain an image input default branch, and the other sampler adopts a resampling strategy to sample to obtain an image input resampling branch;
step 2, respectively enhancing the data of the pictures obtained by the two samplers;
step 3, respectively inputting the pictures of the two branches into respective corresponding networks, extracting high-dimensional feature representations of the two pictures, and respectively performing global average pooling on the two branches to obtain respective low-dimensional feature representations, wherein the convolution layers of the two networks are shared except the parameter of the last residual block;
step 4, obtaining the probability of belonging to each category through a full connection layer by the low-dimensional feature representation obtained by the default branch through the convolution layer, multiplying the low-dimensional feature representation obtained by the resampling branch through the convolution layer by a matrix representing multiple centers to obtain a feature matrix, and then taking the maximum row value of the feature matrix to obtain the final probability of belonging to each category;
step 5, respectively calculating the loss of the vectors which are obtained by the two branches and belong to each classification category through a loss function;
step 6, multiplying the two losses by a specific weight respectively, namely the weight of the default branch is from 1 to 0, the weight of the resampling branch is from 0 to 1, so as to obtain the final loss, and then performing back propagation on the network according to the loss and updating;
step 7, continuously iterating until the network is converged;
and 8, when an identification task needs to be carried out, inputting the picture into a resampling branch to obtain the probability that the picture belongs to each category.
The invention shows long tail distribution and head class and tail class interpretation, specifically as shown in fig. 1, and also shows that resampling has an influence on a feature space, specifically as shown in fig. 4-5, which includes two parts, a feature space before resampling and a feature space after resampling. Before resampling, the classification surface of the feature space can well distinguish the head class, but cannot well distinguish the tail class. The head class has a good classification effect because it has a sufficient number of samples and has a rich distribution of features. The tail class cannot completely represent the feature distribution of the tail class due to the small number of pictures, so that the classification effect is poor, and the situation of misclassification is easily caused. After resampling, the head class and the tail class have enough pictures to enable the model to learn feature distribution, so that the model has a good classification effect on the head class and the tail class. However, resampling can change the original feature space distribution, so that the head class and the tail class are more dispersed in the respective feature spaces.
Examples
The present embodiment provides a method for identifying a long tail distribution of a dual-branch multi-center, a flowchart of which is shown in fig. 2, wherein the method of the present embodiment includes the following steps:
s1, initializing two samplers, wherein one sampler adopts default sampling, the picture obtained by the sampling is input into a default branch, the other sampler adopts a resampling strategy for sampling, and the picture obtained by the sampling is input into a resampling branch.
Here, the default sampler samples with the same probability for each picture. ResamplingThe sampling strategy is that before the sampling probability of each picture is calculated, the training data set needs to be counted first, and the number of pictures corresponding to each category is calculated, wherein the number of pictures owned by the ith category is
Remember that the number of pictures having the most number is
The total sum of all the pictures of the category is N, and the sampling probability of the pictures of each category is N through a resampling strategy
By this strategy, the sampling probability of each class is made the same
And S2, performing data enhancement on the two samples obtained in the first step.
And performing data enhancement on the sampled picture, wherein when the data enhancement is performed on the picture, the data enhancement comprises performing left-right turning and/or random cutting strategy operation on the picture.
And S3, respectively inputting the pictures of the two branches into respective corresponding networks, and sharing the parameters of the convolution layers (except the last residual block) of the two branches. And respectively carrying out global average pooling on the obtained high-dimensional features to obtain low-dimensional feature representation.
In this embodiment, the respective pictures of the two branches are passed through the ResNet-32 convolutional layer to obtain a 4096-dimensional high-dimensional feature representation, and the parameters of the convolutional layers of the two branches are shared except for the last residual block. Wherein, the ResNet-32 convolutional layer is shown in fig. 3, and in fig. 3, the high-dimensional features of the two branches are respectively subjected to global average pooling to obtain respective low-dimensional feature representations, and the dimension of the low-dimensional feature is 64.
And S4, obtaining the probability of each category through a full connection layer by the low-dimensional features obtained by the default branches through the convolution layer, multiplying the low-dimensional features obtained by the resampling branches through the convolution layer by a multi-center matrix to obtain a multi-center low-dimensional matrix, and then taking the maximum value of the low-dimensional features to obtain the final probability belonging to each category.
In the default branch, the obtained 64-dimensional features are subjected to a full-connection layer to obtain a vector of the probability belonging to each category. In the resampling branch, the obtained 64-bit features are multiplied by a matrix with dimension (64, category number, 10) to obtain a matrix with dimension (category number, 10), and then the maximum row value of the matrix is obtained to obtain the feature vector of the probability that the branch belongs to each category.
And S5, calculating the loss of the features which are obtained by the two branches and belong to each classification category respectively.
And respectively passing the eigenvectors obtained by the two branches through a Cross Engine loss function to obtain the loss of each branch.
The Cross control loss function is formulated as:
wherein,
a formula representing the Cross control loss function,
the total number of categories is represented as,
indicates the probability that the current picture belongs to the ith class,
the representation model predicts the probability that the current picture belongs to the ith class.
S6, multiplying the two losses by a specific weight respectively, wherein the weight of the default branch is from 1 to 0, and the weight of the resampling branch is from 0 to 1.
The default branch penalty is multiplied by a weight ⍺ to obtain the default branch penalty. The loss of the resample branch is multiplied by the weights 1- ⍺ to obtain the loss of the resample branch. The two losses are then added to obtain the final loss.
Wherein, ⍺ has the calculation formula:
wherein E represents the current iteration round number,
representing the expected maximum number of iteration rounds.
And S7, continuously iterating until the model converges well enough and has better generalization.
And S8, when an identification task is required, the picture does not need to be subjected to data enhancement, and the picture is input into a resampling branch to obtain the probability that the picture belongs to each category.
In this embodiment, the accuracy of the trained Resnet-32 model on the CIFAR-10 data set is as follows:
the long tail table in the table indicates the maximum value of the ratio of the number of pictures of the head class to the number of pictures of the tail class. As can be seen from the table, the embodiment shows the stable improvement of the dual-branch multi-center architecture on the long-tail data task, which indicates that the method can improve the recognition effect of the model and has better generalization capability.