CN111126577A - Loss function design method for unbalanced samples - Google Patents

Loss function design method for unbalanced samples Download PDF

Info

Publication number
CN111126577A
CN111126577A CN202010233575.3A CN202010233575A CN111126577A CN 111126577 A CN111126577 A CN 111126577A CN 202010233575 A CN202010233575 A CN 202010233575A CN 111126577 A CN111126577 A CN 111126577A
Authority
CN
China
Prior art keywords
samples
loss function
sample
radius
hypersphere
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010233575.3A
Other languages
Chinese (zh)
Inventor
代笃伟
赵威
申建虎
王博
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Precision Diagnosis Medical Technology Co Ltd
Original Assignee
Beijing Precision Diagnosis Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Precision Diagnosis Medical Technology Co Ltd filed Critical Beijing Precision Diagnosis Medical Technology Co Ltd
Priority to CN202010233575.3A priority Critical patent/CN111126577A/en
Publication of CN111126577A publication Critical patent/CN111126577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a loss function design method for unbalanced samples, which specifically comprises the following steps: setting a neural network, training data and a loss function corresponding to deep learning; obtaining a sample from training data, and counting the number of individuals contained in each category in the sample; the normalized features of all categories are adjusted to be mapped to radius values of the hypersphere according to the number of the individuals, and the radius values are larger when the number of the samples is higher; and (4) circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere. The invention can self-adaptively adjust the radius of the hypersphere distributed by the category according to the number of samples in the category, and can effectively solve the problem of uneven distribution of the samples.

Description

Loss function design method for unbalanced samples
Technical Field
The invention relates to the technical field of deep learning, in particular to a loss function design method for unbalanced samples.
Background
In deep learning, the most important elements are: data, computational power, network structure, loss function. The loss function can be used to evaluate the difference between the predicted and actual results, leading the network to make more accurate predictions. It is necessary to use an appropriate loss function for different deep learning tasks.
The classification problem is one of the most common problems in deep learning, and the Softmax loss function is the most basic loss function in the classification problem. The traditional Softmax still has a large intra-class distance, that is, a better classification effect can be achieved by adding the constraint of the intra-class distance to the Loss function, so that a student proposes Center _ Loss. The overall idea of Center _ Loss is to expect that the smaller the sum of squares of the distances of the features from the feature Center for each sample in a batch, the better, i.e., the smaller the intra-class distance.
The study found that Softmax _ Loss learned that the feature resolution is not strong enough, the Center _ Loss considers making the intra-class compact but not the inter-class separable, and the Triplet _ Loss increases the time consumption, so the learner proposes a-Softmax _ Loss. In Softmax _ Loss, WTx = | | W | | | | | | x | | | cos θ, the feature vector multiplication includes angle information, that is, Softmax makes the learned feature have angular distribution characteristics, and some researchers have improved Softmax _ Loss in order to make the feature learn more separable angular features. In which | | w | =1 is constrained, and the constraint cos (t × θ 1) > cos (θ 2) to the loss function in the angle space is added, and by such learning of the loss function, the learned features can be made to have a more pronounced angular distribution because the decision boundary is only related to the angle at this time.
Subsequent F-Norm sphere face updates A-softmax _ Loss, only pays attention to angle information obtained from data, and does not consider the value of a feature vector, so that not only is the weight W normalized, but also the feature x is normalized, s = 64 is adopted as a feature normalization parameter to replace | | | x | |, and thus a Loss function is easier to converge and the angle information is more concerned relative to distance information.
Compared with the F-Norm sphere face, the most obvious change of the CosinFace is to extract t in cos (t × θ 1) to be cos (θ 1) -t, and compared with the prior art, the training process is concise and easy to converge, and the model performance is also obviously improved.
Although the mapping from cosine range to angle range has a one-to-one relationship, there is a difference that in fact, maximizing the classification limit in the angle space has a clearer geometric interpretation relative to the cosine space, and the edge distance difference in the angle space also corresponds to the arc distance on the hypersphere. Therefore, some researchers have proposed Angular Margin Loss, which places the angle edge t inside the cos (θ) function so that cos (θ + t) is smaller than cos (θ) in the range of θ ∈ [0, π -t ], and this strategy makes the requirements of the whole classification task more strict. Cos (θ + t) = cos θ + cost-sin θ sinnt can be obtained for cos (θ + t), and in contrast to cos (θ) -t of CosinFace, cos (θ + t) of ArcFace is not only simple in form, but also dynamically dependent on sin θ, enabling the network to learn more angular features.
The most intuitive influence of the Loss function Loss on the neural network is to realize the updating of model parameters by calculating the Loss back transfer gradient, and different Loss can make the model more emphasize on learning the characteristics of a certain aspect of data and can better extract the unique characteristics later, so that the Loss has a guiding effect on network optimization. In a classification task, optimization Loss greatly helps to improve a final result, and new solutions are provided in a link of Loss functions from Softmax _ Loss to Center _ Loss, SphereFace, CosinesFace, Arcface and the like, and the Loss functions make great contributions to development of a classification problem in deep learning, but the Loss functions do not specially process an imbalance problem of a training sample.
The ArcFace is a classification loss function which is most frequently used at the present stage, the normalized features are mapped to a hypersphere with the radius of S, and a constraint condition t is added in an angle space, so that samples can be well classified, the inter-class distance is increased, and the intra-class distance is reduced. However, the above series of loss functions do not deal specifically with the problem of sample imbalance. In reality, the samples to be processed are more unbalanced in distribution, so that it is necessary to design a loss function for the unbalanced samples in such a situation.
Disclosure of Invention
The invention provides a loss function design method for unbalanced samples, which can adaptively adjust the radius of a hypersphere distributed by a class according to the number of samples in the class, and can effectively solve the problem of uneven sample distribution.
The technical scheme of the invention is realized as follows:
a method for designing a loss function for an unbalanced sample specifically comprises the following steps:
step 1, setting a neural network, training data and a loss function corresponding to deep learning;
step 2, obtaining samples from the training data, and counting the number of individuals contained in each category in the samples;
step 3, the normalized features of all categories are adjusted to be mapped to radius numerical values of the hypersphere according to the number of individuals, and the radius numerical values are larger when the number of samples is higher;
and 4, circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere.
As a preferred embodiment of the present invention, step 1 sets a neural network, training data and a loss function corresponding to deep learning; specifically, a neural network corresponding to deep learning is set as ResNet50, an image data set is imagenet, training data is MS1M, and a loss function is ArcFace.
As a preferred embodiment of the present invention, step 2 counts the number of individuals contained in each category in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.
As a preferred embodiment of the present invention, step 3 specifically includes the following steps:
step 301, defining a mapping coefficient λ, wherein the calculation mode is that 1 bit behind a decimal point is reserved for a cubic root of classi _ num, i represents the ith category, and the formula is as follows:
λi = round(math.pow(classi_num, 1.0/3),1)
assuming that λ k is the smallest mapping coefficient in the mapping coefficients λ, and the radius of the hypersphere to which the category needs to be mapped is S;
step 302, calculating the radius Si of the hypersphere to which the remaining categories need to be mapped
Si = (λi/λk)*S
λ i is a mapping coefficient corresponding to the ith class.
As a preferred embodiment of the present invention, the loss function corresponding to each batch of samples is
Figure 793774DEST_PATH_IMAGE001
m represents the number of samples in each batch, n represents the total number of classes in the whole sample, t represents margin between classes, RjHypersphere radius, θ, of the class to which the jth sample belongsyjRepresenting the angle between the feature vector of the jth sample and its corresponding weight vector, θiAnd representing the included angle between the feature vector of the ith sample and the corresponding class center.
The invention has the beneficial effects that: the radius of the hypersphere distributed by the category can be adjusted in a self-adaptive manner according to the number of samples in the category, and the problem of uneven distribution of the samples can be effectively solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a method for designing a loss function for an unbalanced sample according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention provides a method for designing a loss function for unbalanced samples, which can adaptively adjust the radius of a hypersphere distributed in a class according to the number of samples in the class, so that classes containing many samples are distributed on a larger hypersphere, and specifically includes the following steps:
step 1, setting a neural network, training data and a loss function corresponding to deep learning; the setting instruction can be input through a keyboard, a touch display screen and other human-computer interaction equipment.
The invention sets a neural network corresponding to deep learning as ResNet50, an image data set as imagenet, training data as MS1M and a loss function as ArcFace. The iamgene image dataset started in 2009, and currently there were 14197122 total images, which were divided into 21841 categories, which is the one most referenced in deep learning. The invention chooses to use imageNet2012, which contains a total of 1000 classes. Deep residual network (ResNet) is a deep learning network proposed by hoeming et al, and its appearance greatly promotes the development of deep learning. Resnet50 is selected in the present invention in consideration of hardware conditions.
Step 2, obtaining samples from the training data, and counting the number of individuals contained in each category in the samples;
step 2, counting the number of individuals contained in each category in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.
Step 3, the normalized features of all categories are adjusted to be mapped to radius numerical values of the hypersphere according to the number of individuals, and the radius numerical values are larger when the number of samples is higher;
the step 3 specifically comprises the following steps:
step 301, defining a mapping coefficient λ, wherein the calculation mode is that 1 bit behind a decimal point is reserved for a cubic root of classi _ num, i represents the ith category, and the formula is as follows:
λi = round(math.pow(classi_num, 1.0/3),1)
assuming that λ k is the smallest mapping coefficient in the mapping coefficients λ, and the radius of the hypersphere to which the category needs to be mapped is S;
step 302, calculating the radius Si of the hypersphere to which the remaining categories need to be mapped
Si = (λi/λk)*S
λ i is a mapping coefficient corresponding to the ith class.
And 4, circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere.
The loss function corresponding to each batch of samples is
Figure 992805DEST_PATH_IMAGE001
m represents the number of samples in each batch, n represents the total number of classes in the whole sample, t represents margin between classes, RjHypersphere radius, θ, of the class to which the jth sample belongsyjRepresenting the angle between the feature vector of the jth sample and its corresponding weight vector, θiAnd representing the included angle between the feature vector of the ith sample and the corresponding class center.
The invention is based on the arcface improvement, testing the model on ImageNet evaluation and comparing the accuracy (%) of the model. The test results were as follows:
Figure 38122DEST_PATH_IMAGE002
according to the test result, the method is better than arcface in the classification problem, and the accuracy rate is improved by about 1% in ImageNet.
The invention can self-adaptively adjust the radius of the hypersphere distributed by the category according to the number of samples in the category, and can effectively solve the problem of uneven distribution of the samples. The method can be applied to the field of face recognition, can further adjust and optimize the face recognition model, and improves the accuracy of face recognition.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A method for designing a loss function for unbalanced samples is characterized by comprising the following steps:
step 1, setting a neural network, training data and a loss function corresponding to deep learning;
step 2, obtaining samples from the training data, and counting the number of individuals contained in each category in the samples;
step 3, the normalized features of all categories are adjusted to be mapped to radius numerical values of the hypersphere according to the number of individuals, and the radius numerical values are larger when the number of samples is higher;
and 4, circularly iterating the steps 2 and 3, and calculating a loss function corresponding to each batch of samples according to the radius value of the hypersphere.
2. The method for designing the loss function of the unbalanced sample according to claim 1, wherein step 1 sets a neural network, training data and a loss function corresponding to deep learning; specifically, a neural network corresponding to deep learning is set as ResNet50, an image data set is imagenet, training data is MS1M, and a loss function is ArcFace.
3. The method as claimed in claim 1, wherein the step 2 counts the number of individuals included in each class in the sample; specifically, the number of individuals included in each category in the statistical sample is class1_ num, class2_ num, class3_ num, … …, and class N _ num, and the sample includes N categories in total.
4. The method as claimed in claim 3, wherein the step 3 comprises the following steps:
step 301, defining a mapping coefficient λ, wherein the calculation mode is that 1 bit behind a decimal point is reserved for a cubic root of classi _ num, i represents the ith category, and the formula is as follows:
λi = round(math.pow(classi_num, 1.0/3),1)
assuming that λ k is the smallest mapping coefficient in the mapping coefficients λ, and the radius of the hypersphere to which the category needs to be mapped is S;
step 302, calculating the radius Si of the hypersphere to which the remaining categories need to be mapped
Si = (λi/λk)*S
λ i is a mapping coefficient corresponding to the ith class.
5. The method as claimed in claim 4, wherein the loss function for each batch of samples is defined as
Figure 773388DEST_PATH_IMAGE001
m represents the number of samples in each batch, n represents the total number of classes in the whole sample, t represents margin between classes, RjHypersphere radius, θ, of the class to which the jth sample belongsyjRepresenting the angle between the feature vector of the jth sample and its corresponding weight vector, θiAnd representing the included angle between the feature vector of the ith sample and the corresponding class center.
CN202010233575.3A 2020-03-30 2020-03-30 Loss function design method for unbalanced samples Pending CN111126577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010233575.3A CN111126577A (en) 2020-03-30 2020-03-30 Loss function design method for unbalanced samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010233575.3A CN111126577A (en) 2020-03-30 2020-03-30 Loss function design method for unbalanced samples

Publications (1)

Publication Number Publication Date
CN111126577A true CN111126577A (en) 2020-05-08

Family

ID=70494039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010233575.3A Pending CN111126577A (en) 2020-03-30 2020-03-30 Loss function design method for unbalanced samples

Country Status (1)

Country Link
CN (1) CN111126577A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679860A (en) * 2015-02-27 2015-06-03 北京航空航天大学 Classifying method for unbalanced data
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN108846340A (en) * 2018-06-05 2018-11-20 腾讯科技(深圳)有限公司 Face identification method, device and disaggregated model training method, device, storage medium and computer equipment
DE102018009315A1 (en) * 2017-11-27 2019-05-29 Nvidia Corporation Deep learning method for separating reflection and transmission images that are visible on a semi-reflective surface in a computer image of a real world scene
US10429486B1 (en) * 2017-08-18 2019-10-01 DeepSig Inc. Method and system for learned communications signal shaping

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679860A (en) * 2015-02-27 2015-06-03 北京航空航天大学 Classifying method for unbalanced data
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
US10429486B1 (en) * 2017-08-18 2019-10-01 DeepSig Inc. Method and system for learned communications signal shaping
DE102018009315A1 (en) * 2017-11-27 2019-05-29 Nvidia Corporation Deep learning method for separating reflection and transmission images that are visible on a semi-reflective surface in a computer image of a real world scene
CN108846340A (en) * 2018-06-05 2018-11-20 腾讯科技(深圳)有限公司 Face identification method, device and disaggregated model training method, device, storage medium and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄国宏 等: "一种新的RBF神经元网络分类算法", 《计算机仿真》 *

Similar Documents

Publication Publication Date Title
CN106503873A (en) A kind of prediction user follows treaty method, device and the computing device of probability
CN109376864A (en) A kind of knowledge mapping relation inference algorithm based on stacking neural network
CN113140019B (en) Method for generating text-generated image of confrontation network based on fusion compensation
TW202119288A (en) Image classification model training method, image processing method, data classification model training method, data processing method, computer device, and storage medium
CN111401156B (en) Image identification method based on Gabor convolution neural network
WO2019089990A1 (en) Entity segmentation for analysis of sensitivities to potential disruptions
CN106503853A (en) A kind of foreign exchange transaction forecast model based on multiple scale convolutional neural networks
CN116147130A (en) Intelligent home control system and method thereof
CN112885468A (en) Teacher consensus aggregation learning method based on random response differential privacy technology
CN111160538B (en) Method and system for updating margin parameter value in loss function
CN108256630A (en) A kind of over-fitting solution based on low dimensional manifold regularization neural network
CN110175689A (en) A kind of method of probabilistic forecasting, the method and device of model training
CN109829481A (en) A kind of image classification method, device, electronic equipment and readable storage medium storing program for executing
Hu Tolerance rough sets for pattern classification using multiple grey single-layer perceptrons
CN110929532B (en) Data processing method, device, equipment and storage medium
CN106021402A (en) Multi-modal multi-class Boosting frame construction method and device for cross-modal retrieval
CN116229170A (en) Task migration-based federal unsupervised image classification model training method, classification method and equipment
CN114757854A (en) Night vision image quality improving method, device and equipment based on multispectral analysis
Xu Fuzzy C-means clustering image segmentation algorithm based on hidden Markov model
Liu et al. Stock price trend prediction model based on deep residual network and stock price graph
CN110033165A (en) The recommended method of overdue loaning bill collection mode, device, medium, electronic equipment
Wang et al. K-expectiles clustering
CN108629381A (en) Crowd's screening technique based on big data and terminal device
CN111126577A (en) Loss function design method for unbalanced samples
EP4002230A1 (en) Information processing apparatus and information processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508