CN111126577A

CN111126577A - Loss function design method for unbalanced samples

Info

Publication number: CN111126577A
Application number: CN202010233575.3A
Authority: CN
Inventors: 代笃伟; 赵威; 申建虎; 王博; 张伟
Original assignee: Beijing Precision Diagnosis Medical Technology Co Ltd
Current assignee: Beijing Precision Diagnosis Medical Technology Co Ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-05-08

Abstract

The invention discloses a loss function design method for unbalanced samples, which specifically comprises the following steps: setting a neural network, training data and a loss function corresponding to deep learning; obtaining a sample from training data, and counting the number of individuals contained in each category in the sample; the normalized features of all categories are adjusted to be mapped to radius values of the hypersphere according to the number of the individuals, and the radius values are larger when the number of the samples is higher; and (4) circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere. The invention can self-adaptively adjust the radius of the hypersphere distributed by the category according to the number of samples in the category, and can effectively solve the problem of uneven distribution of the samples.

Description

Loss function design method for unbalanced samples

Technical Field

The invention relates to the technical field of deep learning, in particular to a loss function design method for unbalanced samples.

Background

In deep learning, the most important elements are: data, computational power, network structure, loss function. The loss function can be used to evaluate the difference between the predicted and actual results, leading the network to make more accurate predictions. It is necessary to use an appropriate loss function for different deep learning tasks.

The classification problem is one of the most common problems in deep learning, and the Softmax loss function is the most basic loss function in the classification problem. The traditional Softmax still has a large intra-class distance, that is, a better classification effect can be achieved by adding the constraint of the intra-class distance to the Loss function, so that a student proposes Center _ Loss. The overall idea of Center _ Loss is to expect that the smaller the sum of squares of the distances of the features from the feature Center for each sample in a batch, the better, i.e., the smaller the intra-class distance.

The study found that Softmax _ Loss learned that the feature resolution is not strong enough, the Center _ Loss considers making the intra-class compact but not the inter-class separable, and the Triplet _ Loss increases the time consumption, so the learner proposes a-Softmax _ Loss. In Softmax _ Loss, WTx = | | W | | | | | | x | | | cos θ, the feature vector multiplication includes angle information, that is, Softmax makes the learned feature have angular distribution characteristics, and some researchers have improved Softmax _ Loss in order to make the feature learn more separable angular features. In which | | w | =1 is constrained, and the constraint cos (t × θ 1) > cos (θ 2) to the loss function in the angle space is added, and by such learning of the loss function, the learned features can be made to have a more pronounced angular distribution because the decision boundary is only related to the angle at this time.

Subsequent F-Norm sphere face updates A-softmax _ Loss, only pays attention to angle information obtained from data, and does not consider the value of a feature vector, so that not only is the weight W normalized, but also the feature x is normalized, s = 64 is adopted as a feature normalization parameter to replace | | | x | |, and thus a Loss function is easier to converge and the angle information is more concerned relative to distance information.

Compared with the F-Norm sphere face, the most obvious change of the CosinFace is to extract t in cos (t × θ 1) to be cos (θ 1) -t, and compared with the prior art, the training process is concise and easy to converge, and the model performance is also obviously improved.

Although the mapping from cosine range to angle range has a one-to-one relationship, there is a difference that in fact, maximizing the classification limit in the angle space has a clearer geometric interpretation relative to the cosine space, and the edge distance difference in the angle space also corresponds to the arc distance on the hypersphere. Therefore, some researchers have proposed Angular Margin Loss, which places the angle edge t inside the cos (θ) function so that cos (θ + t) is smaller than cos (θ) in the range of θ ∈ [0, π -t ], and this strategy makes the requirements of the whole classification task more strict. Cos (θ + t) = cos θ + cost-sin θ sinnt can be obtained for cos (θ + t), and in contrast to cos (θ) -t of CosinFace, cos (θ + t) of ArcFace is not only simple in form, but also dynamically dependent on sin θ, enabling the network to learn more angular features.

The most intuitive influence of the Loss function Loss on the neural network is to realize the updating of model parameters by calculating the Loss back transfer gradient, and different Loss can make the model more emphasize on learning the characteristics of a certain aspect of data and can better extract the unique characteristics later, so that the Loss has a guiding effect on network optimization. In a classification task, optimization Loss greatly helps to improve a final result, and new solutions are provided in a link of Loss functions from Softmax _ Loss to Center _ Loss, SphereFace, CosinesFace, Arcface and the like, and the Loss functions make great contributions to development of a classification problem in deep learning, but the Loss functions do not specially process an imbalance problem of a training sample.

The ArcFace is a classification loss function which is most frequently used at the present stage, the normalized features are mapped to a hypersphere with the radius of S, and a constraint condition t is added in an angle space, so that samples can be well classified, the inter-class distance is increased, and the intra-class distance is reduced. However, the above series of loss functions do not deal specifically with the problem of sample imbalance. In reality, the samples to be processed are more unbalanced in distribution, so that it is necessary to design a loss function for the unbalanced samples in such a situation.

Disclosure of Invention

The invention provides a loss function design method for unbalanced samples, which can adaptively adjust the radius of a hypersphere distributed by a class according to the number of samples in the class, and can effectively solve the problem of uneven sample distribution.

The technical scheme of the invention is realized as follows:

a method for designing a loss function for an unbalanced sample specifically comprises the following steps:

step 1, setting a neural network, training data and a loss function corresponding to deep learning;

step 2, obtaining samples from the training data, and counting the number of individuals contained in each category in the samples;

step 3, the normalized features of all categories are adjusted to be mapped to radius numerical values of the hypersphere according to the number of individuals, and the radius numerical values are larger when the number of samples is higher;

and 4, circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere.

As a preferred embodiment of the present invention, step 1 sets a neural network, training data and a loss function corresponding to deep learning; specifically, a neural network corresponding to deep learning is set as ResNet50, an image data set is imagenet, training data is MS1M, and a loss function is ArcFace.

As a preferred embodiment of the present invention, step 2 counts the number of individuals contained in each category in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.

As a preferred embodiment of the present invention, step 3 specifically includes the following steps:

step 301, defining a mapping coefficient λ, wherein the calculation mode is that 1 bit behind a decimal point is reserved for a cubic root of classi _ num, i represents the ith category, and the formula is as follows:

λi = round(math.pow(classi_num, 1.0/3)，1)

assuming that λ k is the smallest mapping coefficient in the mapping coefficients λ, and the radius of the hypersphere to which the category needs to be mapped is S;

step 302, calculating the radius Si of the hypersphere to which the remaining categories need to be mapped

Si = (λi/λk)*S

λ i is a mapping coefficient corresponding to the ith class.

As a preferred embodiment of the present invention, the loss function corresponding to each batch of samples is

m represents the number of samples in each batch, n represents the total number of classes in the whole sample, t represents margin between classes, R_jHypersphere radius, θ, of the class to which the jth sample belongs_yjRepresenting the angle between the feature vector of the jth sample and its corresponding weight vector, θ_iAnd representing the included angle between the feature vector of the ith sample and the corresponding class center.

The invention has the beneficial effects that: the radius of the hypersphere distributed by the category can be adjusted in a self-adaptive manner according to the number of samples in the category, and the problem of uneven distribution of the samples can be effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of a method for designing a loss function for an unbalanced sample according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a method for designing a loss function for unbalanced samples, which can adaptively adjust the radius of a hypersphere distributed in a class according to the number of samples in the class, so that classes containing many samples are distributed on a larger hypersphere, and specifically includes the following steps:

step 1, setting a neural network, training data and a loss function corresponding to deep learning; the setting instruction can be input through a keyboard, a touch display screen and other human-computer interaction equipment.

The invention sets a neural network corresponding to deep learning as ResNet50, an image data set as imagenet, training data as MS1M and a loss function as ArcFace. The iamgene image dataset started in 2009, and currently there were 14197122 total images, which were divided into 21841 categories, which is the one most referenced in deep learning. The invention chooses to use imageNet2012, which contains a total of 1000 classes. Deep residual network (ResNet) is a deep learning network proposed by hoeming et al, and its appearance greatly promotes the development of deep learning. Resnet50 is selected in the present invention in consideration of hardware conditions.

step 2, counting the number of individuals contained in each category in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.

the step 3 specifically comprises the following steps:

λi = round(math.pow(classi_num, 1.0/3)，1)

Si = (λi/λk)*S

λ i is a mapping coefficient corresponding to the ith class.

The loss function corresponding to each batch of samples is

The invention is based on the arcface improvement, testing the model on ImageNet evaluation and comparing the accuracy (%) of the model. The test results were as follows:

according to the test result, the method is better than arcface in the classification problem, and the accuracy rate is improved by about 1% in ImageNet.

The invention can self-adaptively adjust the radius of the hypersphere distributed by the category according to the number of samples in the category, and can effectively solve the problem of uneven distribution of the samples. The method can be applied to the field of face recognition, can further adjust and optimize the face recognition model, and improves the accuracy of face recognition.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for designing a loss function for unbalanced samples is characterized by comprising the following steps:

2. The method for designing the loss function of the unbalanced sample according to claim 1, wherein step 1 sets a neural network, training data and a loss function corresponding to deep learning; specifically, a neural network corresponding to deep learning is set as ResNet50, an image data set is imagenet, training data is MS1M, and a loss function is ArcFace.

3. The method as claimed in claim 1, wherein the step 2 counts the number of individuals included in each class in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.

4. The method as claimed in claim 3, wherein the step 3 comprises the following steps:

λi = round(math.pow(classi_num, 1.0/3)，1)

Si = (λi/λk)*S

λ i is a mapping coefficient corresponding to the ith class.

5. The method as claimed in claim 4, wherein the loss function for each batch of samples is defined as