CN114638336A

CN114638336A - Unbalanced learning focusing on strange samples

Info

Publication number: CN114638336A
Application number: CN202111606351.3A
Authority: CN
Inventors: 胡祝华; 赵瑶池
Original assignee: Hainan University
Current assignee: Hainan University
Priority date: 2021-12-26
Filing date: 2021-12-26
Publication date: 2022-06-17
Anticipated expiration: 2041-12-26
Also published as: CN114638336B

Abstract

The imbalance learning method focusing on the strange samples can be used for imbalance learning and reasoning of a classification model based on a deep neural network. The method takes the network logic output value of the sample as an index value of the familiarity degree of the model to the sample, and considers the sample with a lower logic value as an unfamiliar sample of the model. Specifically, in the model training process, the loss function in the invention is a cost sensitive loss function based on the sample logit value: there may be an instance level and a category level. The invention can improve the aggregation characteristics of the class samples in training and reduce the negative influence of annotation errors on unbalanced learning. In the reasoning process of the model, the invention adopts a migration strategy to realize the reasoning of the model: firstly, obtaining an offset parameter in the optimal classifier on the basis of the offset of the logit on the verification set, and then reasoning on the test set according to the offset parameter. The migration strategy can be used for correcting classification errors of the model caused by migration of the model in the reasoning process.

Description

Unbalanced learning focusing on strange samples

Technical Field

The invention relates to machine learning, in particular to an imbalance learning method focusing on strange samples, which can be used for imbalance learning and reasoning of a classification model based on a deep neural network.

Background

Class imbalance refers to the existence of a quantitative imbalance between different classes in the data. In the past few years, as machine learning has continued to evolve, deep artificial neural networks have made great progress. Deep neural network models are always trained on well-designed data sets, which are usually distributed balanced. However, in the real world, class imbalance data is ubiquitous in real training scenarios due to the different frequencies of occurrence of objects, events and behaviors. If no correction mechanism exists in the training process, the unbalanced characteristics in the data set can cause the recognition performance of the deep neural network model on the subclasses to be low. With the wide application of deep learning, how to learn a high-performance model from data sets with unbalanced classes or long tail distribution becomes an urgent problem to be solved.

At present, some methods for solving the imbalance-like problem exist, and can be divided into a data-based method and an algorithm-based method. In data-based approaches, one common way is to resample the training data by over-sampling small classes, under-sampling large classes, or a combination thereof, but this will change the distribution of the data. Another way is a set of classifiers, each trained from a different sample of the original data set. In recent years, in order to better learn the distribution of the data set, researchers have also expanded the data set by way of sample generation, thereby improving the performance of the model. In algorithm-based approaches, cost-sensitive loss functions are often widely adopted to solve the problem of unbalanced learning by directly or indirectly emphasizing/suppressing the importance of small/large classes by introducing a cost (weight) for each class or each sample. Compared with data-based methods, such methods are intuitive, easier to implement, and widely used in deep learning. In a widely used cost sensitive loss function, the cost is based on the loss or class probability of the sample. However, these methods only utilize the classification features between classes, discard the clustering features inside the classes, and introduce the error of labeling into the computation of the cost sensitive loss function, which easily leads to suboptimal models. In addition, in a task of dense labeling of samples, such as an image segmentation task in computer vision, due to the workload and the tediousness of labeling samples, the labeled contour is often deviated from the real contour by several pixels, so that a large number of wrongly labeled samples are generated in the vicinity of the target contour. In the existing cost-sensitive loss function based on loss or class probability, these labeling errors can be introduced into the calculation of the cost-sensitive loss function, which affects the improvement of the model performance in the training process.

Disclosure of Invention

1. The imbalance learning method focusing on the strange samples can be used for imbalance learning and reasoning of a classification model based on a deep neural network. The method takes the network logic output value of the sample as an index value of the familiarity degree of the model to the sample, and considers the sample with a lower logic value as an unfamiliar sample of the model. Specifically, in the training process of the model, the loss function is a cost sensitive loss function based on the sample logit value; in the inference process of the model, a migration strategy is adopted to realize the inference of the model: firstly, obtaining an offset parameter in the optimal classifier on the basis of the offset of the logit on the verification set, and then reasoning on the test set according to the offset parameter.

The idea of focusing on unfamiliar sample learning is adopted, namely the traditional Cross Entropy loss function is improved into an example-level cost-sensitive loss function (IFSL) focusing on unfamiliar samples, and the traditional balanced Cross Entropy loss function (Cross Entropy Based on Class Frequency) Based on Class Frequency cost is improved into a Class-level cost-sensitive loss function (CFSL) focusing on unfamiliar samples.

2. The formula for calculating the cost sensitive loss function IFSL with the example level focused on strange samples is

Wherein x is_iFor the ith sample input, θ is the depth network model parameter, p_iFor the probability output of the ith sample on the network model, p_itProbability output for true class of i-th sample, y_iIs the label of the i-th sample,

lo on the network model for the ith samplegit output linear transformation to [0,255 ]]Value in area, μ_tFor a logit-based cost sensitive loss function, defined as:

wherein, T epsilon (0,1) represents the definition range of the unfamiliar samples, and mu epsilon (1,2) represents the focusing degree of the unfamiliar samples.

3. The formula for calculating the cost-sensitive loss function CFSL with category level focusing on strange samples is

Wherein x_i,θ,y_i,

And mu_tThe meaning of (a) is as described in the summary of the invention 2.

4. The model inference procedure using the migration strategy is:

1) the optimal offset parameters for the classifiers in the model are first obtained using equation (4).

Wherein, H (·) is the performance index of the model on the verification set, and p ═ { p ·)₀,p₁As the prediction probability, y ═ y is defined as shown in formula (5)₀,y₁Is the label value, b ═ b₀,b₁Is an offset function, defined as shown in equation (6).

Wherein f is_cFor the value of the sample's location on class c,

in the formula (4), the first and second groups,

the optimal delta is the delta which enables the model performance to reach the optimal delta on the verification set.

2) The prediction class is inferred in the classifier of the model using equation (7).

Wherein, C_preFor prediction class, p_cThe calculation formula is shown in (5) for the probability in the category c, and δ is the offset value obtained in step 1).

The invention has the beneficial effects that:

the imbalance learning method focusing on the strange samples is used for imbalance learning and reasoning of the classification model based on the deep neural network, and can improve the aggregation characteristics of the samples in the class and reduce the negative influence of annotation errors on the imbalance learning in the training process, so that the classification performance of the training model is improved. In addition, the inference strategy based on the locality migration can correct the classification error of the model caused by the migration of the model in the inference process.

The existing deep neural network model is usually trained on a balanced distributed data set, and the classification performance is low for subclasses. With the development of the in-place application of deep learning technology in the real world, the unbalanced data ubiquitous in the real world increasingly becomes a bottleneck limiting the deep learning performance and application. Aiming at the problem, the invention improves the existing unbalanced learning method and has wide application prospect.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a drawing of the abstract of the present invention;

FIG. 2 is a graph of performance as a function of offset parameter for this example 1;

FIG. 3 is the reasoning result of the model trained using different cost-sensitive loss functions on the test set in this embodiment 1;

FIG. 4 is a significant box generated by the model generated using CFSL and BCE _ F in this example 2;

fig. 5 is a division result using the offset policy and the conventional policy in this embodiment 2.

Detailed description of the preferred embodiments

The concept, specific steps and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and fig. 2 to 5 to fully understand the objects, features and effects of the present invention. It is to be understood that the described embodiments are merely exemplary of the present invention, and that functional, methodological, or structural equivalents or substitutions that are described by those of ordinary skill in the art based on the described embodiments are within the scope of the present invention.

1. The imbalance learning method focusing on the strange samples can be used for imbalance learning and reasoning of a classification model based on a deep neural network. The method takes the network output logic value of the sample as an index value of the familiarity degree of the model to the sample, and considers the sample with a lower logic value as an unfamiliar sample of the model. Specifically, in the training process of the model, the loss function is a cost sensitive loss function based on the sample logit value; in the inference process of the model, a migration strategy is adopted to realize the inference of the model: firstly, obtaining an offset parameter in the optimal classifier on the basis of the offset of the logit on the verification set, and then reasoning on the test set according to the offset parameter.

Since the cost sensitive loss function is divided into an example-level function and a category-level function, the cost sensitive loss function focusing on strange sample learning in the invention is divided into an example-level method and a category-level method. By focusing on the idea of unfamiliar sample learning, the traditional cross entropy loss function is improved into an example-level cost-sensitive loss function (IFSL) focusing on unfamiliar samples; a traditional balanced Cross Entropy loss function (BCE _ F) Based on Class Frequency cost is improved into a Class-level cost sensitive loss function (CFSL) focused on strange samples.

linear transformation of the logit output on the network model to [0,255 ] for the ith sample]Value in area, μ_tFor a logit-based cost sensitive loss function, defined as:

Wherein x_i,θ,y_i,

And mu_tHave the same meaning asAs described in summary of the invention 2.

4. The model inference procedure using the migration strategy is:

1) first, the optimal parameters of the classifier in the model are obtained using equation (4).

Wherein, H (·) is the performance index of the model on the verification set, and p ═ { p ·)₀,p₁As the prediction probability, y ═ y is defined as shown in formula (5)₀,y₁Is the label value, b ═ b₀,b₁Is the offset function, defined as shown in equation 6.

Wherein f is_cFor the value of the sample's logit on class c,

in the formula (4), the first and second groups of the chemical reaction are shown in the specification,

Wherein, C_preFor the prediction class, p_cThe calculation formula is shown in (5) for the probability in the category c, and δ is the offset value obtained in step 1).

The invention is implemented in two typical visual tasks, one is a semantic segmentation task, and the other is a salient object detection subtask for image salient instance segmentation. Since the example-level cost sensitive method and the category-level cost sensitive method have different application scenarios, the IFSL in the present disclosure is implemented in the semantic segmentation task, as described in embodiment 1. The CFSL in this summary is implemented in the significance target detection subtask of significance example segmentation, as described in example 2.

Example 1:

image semantic segmentation is to assign a semantic label to each pixel in an image, and has wide application, such as scene understanding and automatic driving. The image semantic segmentation task in this embodiment is implemented on the PASCAL VOC 2012 augmentent variant. The pascalloc 2012 augmenter is the most popular data set for image semantic segmentation. It contains 10582 images for training and 1449 images for testing ([1] Evargham, M., et al, The past visual object class challenge: A retroactive, Int J ComputVision, 2015.[2] Hariharan, B., et al, magnetic controls from change detectors in 2011International Conference on Computer Vision (CVPR), 2011.). Because the original data set is class balanced, the present embodiment makes a class unbalanced variant of the original data set. It consists of the most challenging classes of chairs and non-chairs, with the frequency ratio of the small class to the large class being about 1: 100.

depeplav 3+ is a popular semantic segmentation network used as the base network in the experiments of this example. This example compares IFSL with the current state-of-the-art losses, such as CE, BCE _ C, BCE _ F, FL, PF, BENS, and CFL (3. Chen, L.C., et al., Encoder-decoder with associated segment selection for segment identification, In: Proceedings of the European connection component, 20158. 4. Fan, R., et al., S4. network: Single stage sample analysis, In: Proceedings of the IEEE/F connection component and Pattern Recognition (CVPR),2019. 5. Mobil, M.J., feed conversion component, CVP. video and Pattern Recognition (CVPR), IEEE _ 5. J., M.M., M.E., feedback segment, and/or C., In: real-time version, real-version of the I.P.C., real-time version, real-time, real-time, real-, 2021.[7] Cui, Y., et al., Class-based lost based on effective number of samples, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Registration (CVPR),2019.[8] ZHao, Y., et al., Constrained-focal-based lost for registration of fields, IEEEAccess, 2019). Among these methods, BCE _ C, BCE _ F, PF, CFL, and BENS belong to class-level methods, and the rest belong to example-level methods. For each comparison method, the present embodiment searches for the best parameters to obtain the best performance. This embodiment adopts MIoU (Mean Intersection over Unit of the predicted set and the ground-route set) as the performance evaluation index of semantic segmentation.

Table 1 shows the evaluation index on the test set after using the IFSL and the comparison method to obtain the model using the basic network training. Of all the methods, IFSL performs best.

TABLE 1 MIoU comparison (%)% of IFSL and comparative methods on PASCAL VOC 2012AUGMENT

This example performed a number of experiments to analyze the effect of hyper-parameters in IFSL on model performance. The present embodiment sets the values of T to 0.15, 0.20, and 0.25, respectively. For each value of T, the present embodiment is respectively_μMIoU was tested at 1.2, 1.4, 1.6 and 1.8. T and_μthe optimum combination of (a) and the corresponding properties are shown in table 2. It can be seen that when T is 0.2 (meaning that 20% of the whole sample is considered as strange sample) and_μthe best performance is achieved when the ratio is 1.6. In addition, from Table 2, it can be observed that a rule, when the value of T is large (the strange sample ratio is high),_μshould be set to a relatively small value (the strange sample should be less important). The rule accords with the intuitive thinking of people.

TABLE 2 Effect of hyper-parameters in IFSL on model Performance

Since the data distribution of the PASCAL VOC 2012 augmentent variant is severely unbalanced, the model trained by using the IFSL still has a bias, and the present embodiment adopts a biased inference strategy to correct the classification error of the model. This example first randomly splits the test set of the PASCAL VOC 2012 augmentent variant into two equal parts, one serving as the validation set and the other serving as the new test set. The model performance was then tested on the validation set using different values of δ, and the MIoU versus δ curve is shown in fig. 2. MIoU is a concave function of δ and is obtained according to equation (4)

Finally, the performance of the conventional inference strategy and the offset inference strategy of the present invention are compared on a new test set. With the conventional inference strategy, the MIoU of the model is 70.4%, while with the offset strategy, the MIoU of the model reaches 71.0%, which is improved by 0.6 percentage points.

FIG. 3 shows the classification results on a test set for models trained using different cost sensitive loss functions. For CE, some object pixels, especially the boundary pixels of the object (see columns 1 and 4), are wrongly classified as background pixels because they are not given enough attention during the training process. For FL, the subclass is under-represented, also appearing to misclassify the target pixel as a background pixel (see columns 1 and 4). These phenomena also occur for CFL and BENS, in particular BENS. For IFSL, only slight misclassifications occur. Furthermore, the segmentation result can be further improved using an offset strategy, as shown in the last line.

Example 2:

salient instance segmentation segments the salient region and identifies each instance, with a wide range of applications, such as accurate image editing and weakly supervised segmentation. Significant instance partitioning includes two consecutive subtasks: salient object detection and salient segmentation. In the saliency segmentation phase, the samples used for training are class-balanced, but in the saliency target detection phase, there is an extreme class imbalance between saliency boxes and non-saliency boxes (class-frequency ratio of about 1: 1000). Furthermore, there are a large number of recommendation boxes that have some significance, but are labeled as non-significance boxes. This makes the distribution of non-salient blocks very complex. The embodiment performs the unbalanced learning of the deep neural network model on the saliency target detection subtask, and improves the classification between the saliency frame and the non-saliency frame by using the CFSL in the invention, thereby improving the saliency instance segmentation.

The Salient example Segmentation in this example uses the ILSO dataset ([9] G.Li, et al, instant-Level salt Object Segmentation, in: the proceedings of the IEEE International Conference on Computer Vision, 2017.). The present embodiment uses S4net [4] as the base network. Following the standard COCO index, this example uses the maps (average precision) to evaluate the proposed method and reports map0.5 and map0.7.

This embodiment compares CFSL with existing cost-sensitive methods, including BCE _ F, FL, OHEM, PF, BENS, and CFL ([4] Fan, R., et al, S4net: Single stage tolerance segment, In: Proceedings IEEE/CVF Congression on Computer Vision and Pattern Registration (CVPR),2019.[6] Du, J., et al, Parameter-fragment for use In a Class-based and estimated deletion In image classification, IEEE Transactions on Neural Networks and Learning Systems,2021.[7] Cui, Y., Class-fragment for use In a Class-based feedback system, CVF 2019, J., and C., IEEE transaction for use In a Class-sample feedback system, and CVF 2018. J., CVF, J., and C., CVF, J.S, J., a, et al, Training region-based object detectors with online hard execution in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2016. They are the most advanced imbalance method learning methods in the task of object detection or image classification. Where FL is an example level method, and OHEM can be considered a combination of an example level method and a class level method, and others belong to the class level method. For the existing method, the present embodiment searches the entire parameter space for the best hyper-parameter to obtain the best performance.

Embodiments because of the extreme class imbalance of the data for training the significance detection model, the imbalance ratio is about 1: 1000, this makes the model trained after unbalanced training still biased. The present embodiment uses an offset inference strategy in the inference process and compares it with the results of a conventional inference strategy.

TABLE 3 comparison of the performance of CFSL and existing methods

Method	FL	OHEM	PF	BCE_F	CFL	BENS	CFSL
								mAP0.5	87.5	86.3	81.7	86.3	85.7	86.6	87.6
mAP0.7	61.5	63.0	58.0	63.6	63.0	64.4	64.5

First, this example quantitatively compares CFSL with existing methods, as shown in table 3. Among all the methods, CFSL performs best. The CFSL improved the performance of map0.5 and map0.7 by 1.3 and 0.9 percentage points, respectively, compared to its basic method BCE _ F. The advantages of CFSL can be attributed to the improvement in the in-class polymerization characteristics. Using CFSL, better saliency features that are more consistent with human visual laws are learned into the model, so that a more accurate score can be provided for the saliency box. Finally, the segmentation performance of the significant instance segmentation can be improved.

Then, this example qualitatively compares CFSL with its basic method BCE _ F. Figure 4 shows the significant boxes produced by the model generated using these two loss functions. In each image, n (real numbers of salient objects) real salient frames are marked with white frames, and n prediction frames are marked with green and blue frames in the order of prediction probability, as shown in the first line. The BCE _ F method gives high scores for some severely incomplete frames (see the frame in the first image and the blue frame in the second image), erroneously determines some non-significant frames as significant frames (see images 3 and 4), and misses some significant frames (see men in image 3 and cups in the bottom right corner of image 4). In contrast, the CFSL method reduces the frequency of occurrence of these phenomena.

To analyze the effect of the hyper-parameters in CFSL on the model performance, this example performed experiments on the ILSO dataset. Through experimentation, strange samples were found to be mainly from the main group. Therefore, in this embodiment, the values of the parameter T in the formula (2) are set to 0.00075,0.001, 0.00125 and 0.0015, which makes the number of strange samples approximately equal to the number of few classes of samples, avoiding the overwhelming effect of the majority classes. In addition, the parameters in the formula (2) are also adjusted in the embodiment_μValues of (a) are 1.2, 1.5 and 1.8.

T and_μthe optimal combination of (d) and the corresponding mAp0.5 and mAp0.7 performance are shown in Table 4. It can be seen that when T is equal to 0.001 (i.e., the number of strange samples is approximately equal to the number of few classes of samples) and_μat 1.6, the performance is optimal. This embodiment may also observe that when the value of T is relatively large,_μshould be set to a relatively small value, similar to the experimental results of the IFSL. Furthermore, at T and_μin the best combination of (see the third row),_μthe value of (d) is 1.5, which is also similar to the results of the IFSL.

TABLE 4 Effect of the superparameters in CFSL on model Performance (%)

T	Optimization of_μ	mAP0.5	mAP0.7
				0.00075	1.5	85.9	64.3
0.00100	1.5	87.6	64.5
				0.00125	1.2	86.0	63.8
0.00150	1.2	86.9	64.7

Due to extreme class imbalance (about 1000:1) in the object detection phase, the model obtained by imbalance learning still has bias. The present embodiment uses an offset strategy in the inference process to correct misclassifications of the model. Since the mAP value in the significance instance segmentation is calculated over 20 predicted significance instances, the true number of significance instances is always less than it. Therefore, this embodiment uses the F1-score metric as H in equation (4) and obtains the result on the validation set

The embodiment then uses the optimal parameters on the test set

F1-Score for the test model.

This example compares the results of the segmentation of the offset strategy and the conventional strategy on the ILSO dataset, a qualitative and quantitative comparison of which is shown in table 5 and fig. 5. As can be seen from table 5, the migration strategy in the present invention greatly improves the segmentation performance on the test set compared to the conventional inference strategy. As can be seen from fig. 5, a large number of non-salient blocks are erroneously determined as salient blocks using a conventional inference strategy. In contrast, using the migration strategy of the present invention, such classification errors can be greatly reduced, and thus the segmentation result can be improved.

Table 5F 1_ Score comparison using offset policy and conventional policy inference

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. The details of the embodiments are not to be interpreted as limiting the scope of the invention, and any obvious changes, such as equivalent alterations, simple substitutions and the like, based on the technical solution of the invention, can be interpreted without departing from the spirit and scope of the invention.

Claims

The idea of focusing on strange sample learning is adopted, namely, a traditional Cross Entropy loss function is improved into an example-level cost-sensitive loss function (IFSL) focusing on strange samples, and a traditional balance Cross Entropy loss function (BCE _ F) Based on Class Frequency cost is improved into a Class-level cost-sensitive loss function (CFSL) focusing on strange samples.

2. The method of claim 1, wherein: the example-level cost sensitive loss function IFSL focused on strange samples is calculated by the formula

wherein T epsilon (0,1) represents the definition range of the strange samples, and mu epsilon (1,2) represents the focusing degree of the strange samples.

3. The method of claim 1, wherein: the cost sensitive loss function CFSL focused on strange samples at category level is calculated by the formula

Wherein x_i,θ,y_i,

And mu_tThe meaning of (A) is as defined in claim 2.

4. The method of claim 1, the inference step using an offset strategy is:

Wherein f is_cFor the value of the sample's logit on class c,

in the formula (4), the first and second groups,