CN113673592A

CN113673592A - Sample selection method and device and classifier training method and device

Info

Publication number: CN113673592A
Application number: CN202110933655.4A
Authority: CN
Inventors: 谷晓琳; 杨敏; 张燚; 刘科
Original assignee: Beijing Sunwise Space Technology Ltd
Current assignee: Beijing Sunwise Space Technology Ltd
Priority date: 2021-08-15
Filing date: 2021-08-15
Publication date: 2021-11-19

Abstract

The invention provides a sample selection method and a sample selection device, and a classifier training method and a classifier training device, which are applied to a single-stage target detector, wherein the sample selection method comprises the following steps: constructing a cost function considering classification loss and regression loss simultaneously; calculating the cost of all candidate samples by using a cost function according to the prediction information and the target true value output by the detection network; screening candidate samples by using prior information; sorting the screened candidate samples according to the cost before selectionNThe least costly samples are positive samples, and the remainder are negative samples. The classifier training method and the classifier training device distinguish all positive samples according to IoU values of a regression frame and a target frame, improve the contribution of high-quality candidate samples to classification loss, and inhibit the influence of low-quality samples.

Description

Sample selection method and device and classifier training method and device

Technical Field

The invention relates to the field of computer vision, in particular to target detection, and specifically relates to a sample selection method and device and a classifier training method and device.

Background

The current target detection network based on deep learning is mainly divided into a two-stage detector and a single-stage detector. A two-stage detector, such as fast-RCNN, firstly extracts a candidate region through a region generation network, and then sends the candidate region into a detection network for target classification and position regression; the two-stage detector has high detection precision, but has low speed, and is difficult to meet the real-time requirement. The single-stage detector firstly divides an input image into a plurality of grids, selects a proper sample for training by utilizing grid information, directly predicts classification and regression information on the grids, has high detection speed, can detect a target in real time, and is widely applied to various fields.

At present, a single-stage detector mainly comprises four core modules of data preprocessing, network model construction, sample selection and model training. The sample selection mainly selects a proper positive sample for a target and a proper negative sample for a background, and is one of important research contents of a single-stage detector. The sample selection methods for the single-stage detector are largely divided into two, a sample selection method based on IoU loss and a sample selection method based on the center point distance. The IoU loss-based sample selection method comprises the steps of firstly calculating the interaction-over-Union (IoU) value of a candidate sample and a target according to a manually set anchor, and selecting a value larger than a threshold valueT _IOUThe samples of (1) are positive samples, and the rest are negative samples; the sample selection method based on the central point distance selects the distance smaller than the threshold value according to the L1 distance between the grid coordinate point where the candidate sample is located and the central point of the targetT _cThe samples of (1) are positive samples and the rest are negative samples. The two sample selection methods both adopt a many-to-one mode, namely each target corresponds to a plurality of positive samples, and rich supervision information can be provided for model training.

However, the two sample selection methods have the following problems:

(1) the two sample selection methods are based on prior information, loss is calculated according to a preset anchor or a grid coordinate point of an image, prediction information of a model is not considered, and the selected sample is not the optimal sample;

(2) the two sample selection methods only consider regression loss, the target detection integrates two tasks of classification and positioning, the sample selection is carried out only considering the regression loss, and the training of a classifier is not facilitated, as shown in fig. 1;

(3) most current single-stage detectors use a focallloss trained classifier, which considers samples with low classification scores as difficult samples and increases their contribution to classification loss. With many-to-one sample selection, each target corresponds to multiple positive samples, but these positive samples are not all meaningful samples, and all positive samples need to be treated differently as shown in fig. 2.

Disclosure of Invention

In view of the above situation, the present invention provides a sample selection method and apparatus, which constructs a cost function considering both classification loss and regression loss, and selects each sample beforeNAnd the candidate sample with the minimum cost is used as a positive sample, and a classifier training method and a classifier training device are provided at the same time, so that all the positive samples are distinguished according to IoU values of a regression frame and a target frame, the contribution of high-quality candidate samples to classification loss is improved, and the influence of low-quality samples is inhibited.

In order to realize the purpose of the invention, the following scheme is adopted:

a sample selection method based on prediction information is applied to a single-stage target detector and comprises the following steps:

constructing a cost function;

calculating the cost of all candidate samples by using the cost function according to the prediction information and the target truth value output by the detection network of the single-stage target detector;

screening candidate samples by using prior information;

and sorting the screened candidate samples according to the cost, selecting the first N samples with the minimum cost as positive samples, and selecting the rest as negative samples.

Further, the cost function is constructed by adopting a weighted combination of three losses, and the functions of the three losses are as follows: the cross entropy function for classifying loss, the IoU loss function for regression loss, and the L1 loss function for the distance between the center point of the regression box and the target box.

The cost function is:

wherein the content of the first and second substances,λ _locandλ _clsweight coefficients representing regression loss and classification loss respectively,C _locthe error of the regression is represented by,C _clsrepresenting a classification error of the candidate sample with the target frame;

C _locincluding IoU loss, center point L1 distance:

wherein the content of the first and second substances,λ _IoUandλ _L1weighting factors representing IoU loss and the distance of the center point L1 respectively,C _IoUrepresenting IoU loss, by calculating the IoU values of the regression box and the target box for the sample,C _L1representing the distance of the central point L1, and obtaining the distance by calculating the central point of the regression frame of the sample and the target frame;

C _clsthe formula of (1) is:

wherein, inC _clsIn the formula (2), the weight coefficientα _clsTo balance the weights of the target class and the remaining classes,γ _clsis a regulating factor used for regulating the importance of the sample,pclassification information indicative of a prediction of the detected network,i、jrepresenting the target category index.

Further, screening candidate samples by using prior information, reserving the candidate samples with the central points in the target frame, deleting the candidate samples outside the target frame, and adopting a formula:

wherein, Ω represents prior information, and if the grid coordinate point of the candidate sample is in the target frame, Ω is_iE Ω is 1, otherwise infinity.

A sample selection device based on prediction information is applied to a single-stage target detector and comprises:

a construction module for constructing a cost function;

the calculation module is used for calculating the cost of all candidate samples by using the cost function according to the prediction information and the target true value output by the detection network of the single-stage target detector;

the screening module is used for screening the candidate samples by using the prior information;

and the selection module is used for sorting the screened candidate samples according to the cost, selecting the first N samples with the minimum cost as positive samples, and selecting the rest as negative samples.

A classifier training method based on classification and positioning information joint representation is applied to a single-stage target detector and comprises the following steps:

calculating IoU values of a regression box and a target box of the candidate sample according to the regression information of the candidate sample; the candidate samples are positive samples and negative samples selected by the sample selection method based on the prediction information;

for each target, calculating the normalized weight of the sample according to the IoU values of the regression box and the target box of all the candidate samples:

wherein the content of the first and second substances,ω _kweight coefficient representing k-th sample，q _kThe IoU values representing the regression box and the target box for the kth sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target;

when the positive sample is used for training a classifier, weighting the classification loss of the positive sample by a weight coefficient calculated by the following formula:

wherein, in the formula,α _sfor adjusting the contribution of positive and negative samples to the classification loss,pclassification information indicative of a prediction of the detected network,yis a positive and negative sample label ify=1, then a positive sample is indicated; if it isy=0, then a negative sample is indicated,γ _sis an adjustment factor that adjusts for the contribution of different negative samples to the loss of classification.

A classifier training device based on classification and positioning information joint representation is applied to a single-stage target detector and comprises:

an IoU value module, configured to calculate IoU values of the regression box and the target box of the candidate sample according to the regression information of the candidate sample; the candidate samples are positive samples and negative samples selected by the sample selection method based on the prediction information;

a normalization module for calculating, for each target, a normalization weight for a sample based on IoU values of the regression box and the target box for all of the candidate samples:

wherein the content of the first and second substances,ω _kis shown askThe weight coefficient of each of the samples is,q _kis shown askThe IoU values for the regression box and the target box for each sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target;

a weighting module, configured to weight a classification loss of the positive sample by a weight coefficient calculated by the following formula when training a classifier using the positive sample:

An electronic device, comprising: at least one processor and memory; wherein the memory stores computer-executable instructions; execution of computer-executable instructions stored in the memory at the at least one processor causes the at least one processor to perform the method for sample selection based on prediction information or the method for classifier training based on joint representation of classification and localization information.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, controls an apparatus in which the storage medium is located to perform the method for selecting a sample based on prediction information or the method for training a classifier based on a joint representation of classification and localization information.

The invention has the beneficial effects that:

1. by using the sample selection method based on the prediction information, the positive sample is selected according to the prediction information output by the detection network and considering the classification loss and the regression loss, so that the stability of sample selection is improved, the convergence speed of the model is increased, and the detection performance of the model is improved.

2. The classifier is trained by using joint representation based on classification and positioning information, classification errors and regression errors are considered in a loss function of the training classifier at the same time, a normalized weight coefficient is calculated through IoU values of a regression frame and a target frame of a sample, the contribution of different samples to classification loss is distinguished by using the normalized weight coefficient, the contribution of high-quality samples to the loss function is improved, and the influence of low-quality samples is restrained. In addition, the classifier is trained by using the joint representation based on the classification and regression information, so that the relevance of classification and positioning is further enhanced, and the performance of the detector is improved.

Drawings

The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.

Fig. 1 is a positive example of a prior art sample selection strategy.

Fig. 2 shows positive samples selected based on a conventional many-to-one sample selection strategy.

Fig. 3 is an ambiguity of sample selection based on IoU loss and center point L1 distance.

Fig. 4 is a flowchart of a sample selection method based on prediction information according to an embodiment of the present application.

Fig. 5 is a block diagram of a sample selection apparatus based on prediction information according to an embodiment of the present application.

Fig. 6 is a flowchart of a classifier training method based on joint representation of classification and positioning information according to an embodiment of the present application.

Fig. 7 is a block diagram of a classifier training device based on joint representation of classification and positioning information according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

In an aspect of the embodiments of the present application, a sample selection method based on prediction information is provided, which is applied to a single-stage target detector, as shown in fig. 4, and includes the following steps:

firstly, constructing a cost function, and specifically adopting a weighted combination of three losses, wherein the functions of the three losses are as follows: the cross entropy function for classifying loss, the IoU loss function for regression loss, and the L1 loss function for the distance between the center point of the regression box and the target box.

The constructed cost function is as follows:

C _locincluding IoU loss, center point L1 distance:

wherein the content of the first and second substances,λ _IoUandλ _L1weighting factors representing IoU loss and the distance of the center point L1 respectively,C _IoUrepresenting IoU loss, by calculating the IoU values of the regression box and the target box for the sample,C _L1representing the distance of the central point L1, and obtaining the distance by calculating the central point of the regression frame of the sample and the target frame; since there is ambiguity in both the IoU loss and the L1 distance of the center point, the ambiguity of return loss can be better mitigated using a weighted combination of these two losses, as shown in fig. 3. When the IoU value is used to select samples in fig. 3, there are many IoU equal candidate samples, and as shown in fig. 3(a), the IoU values corresponding to the grids with equal anchors in the target frame are equal; and, the samples are selected by using the distance of the central point, as shown in the image of fig. 3(b), the distance loss is equal for the samples with the same central point. When the two are combined to select samples, the number of samples is greatly reducedAnd (5) sense property. As shown in the image of FIG. 3(c), isC _locThree candidate samples that are not equal.

C _clsCalculated by Fcoal Loss, the formula is:

wherein, inC _clsIn the formula (2), the weight coefficientα _clsTo balance the weights of the target class and the remaining classes,γ _clsis a regulating factor used for regulating the importance of the sample,pclassification information indicative of a prediction of the detected network,i、jrepresenting the target category index. The classification penalty not only takes into account the penalty of the target class, but also the impact of the remaining classes. The selected sample is not only less costly for the target class, but also needs to be well distinguishable from other classes.

And then, according to the prediction information and the target truth value obtained by the detection network, calculating the cost of all candidate samples by using the cost function.

Then, screening candidate samples by using prior information; due to the fact that prediction information of the network is inaccurate in the initial training stage, samples selected based on the prediction information are very unstable. In order to further improve the stability of sample selection, candidate samples are screened based on prior information, all candidate samples with a central point in a target frame are reserved, candidate samples outside the target frame are deleted, and a formula is adopted:

wherein, Ω represents prior information, which indicates whether the grid coordinate point of the candidate sample is in the target frame, and Ω represents the grid coordinate point of the candidate sample is in the target frame_mE Ω is 1, otherwise infinity. The cost of candidate samples inside the target box is a finite number, while the cost of candidate samples outside the target box is infinite.

And finally, sorting the screened candidate samples according to the cost, sorting the candidate samples from small to large according to the cost value, and selecting the first N samples with the minimum cost as positive samples and the rest as negative samples.

In another aspect of the embodiments of the present application, a sample selection apparatus based on prediction information is provided, which is applied to a single-stage target detector, as shown in fig. 5, and includes a construction module, a calculation module, a screening module, and a selection module.

The construction module is used for constructing a cost function; specifically, the method is realized by adopting a weighted combination of three losses, and the functions of the three losses are as follows: the cross entropy function for classifying loss, the IoU loss function for regression loss, and the L1 loss function for the distance between the center point of the regression box and the target box.

The cost function constructed by the construction module is as follows:

C _locincluding IoU loss, center point L1 distance:

wherein the content of the first and second substances,λ _IoUandλ _L1weighting factors representing IoU loss and the distance of the center point L1 respectively,C _IoUrepresenting IoU loss, by calculating the IoU values of the regression box and the target box for the sample,C _L1representing the distance of the central point L1, and obtaining the distance by calculating the central point of the regression frame of the sample and the target frame; l1 due to IoU loss and center pointThere is ambiguity in the distance, and as shown in fig. 3, the ambiguity of return loss can be better attenuated using a weighted combination of these two losses. When the IoU value is used to select samples in fig. 3, there are many IoU equal candidate samples, and as shown in fig. 3(a), the IoU values corresponding to the grids with equal anchors in the target frame are equal; and, the samples are selected by using the distance of the central point, as shown in the image of fig. 3(b), the distance loss is equal for the samples with the same central point. When the two are combined to select a sample, the ambiguity is greatly reduced. As shown in the image of FIG. 3(c), isC _locThree candidate samples that are not equal.

C _clsCalculated by Fcoal Loss, the formula is:

And the calculation module is used for calculating the cost of all candidate samples by using the cost function according to the prediction information and the target truth value output by the detection network of the single-stage target detector.

The screening module is used for screening the candidate samples by using the prior information, retaining all the candidate samples with the central points in the target frame, deleting the candidate samples outside the target frame, and adopting a formula:

wherein Ω represents prior information, and the prior information is a candidate sampleIf the grid coordinate point of the candidate sample is in the target frame, and then the omega is set_mE Ω is 1, otherwise infinity. The cost of candidate samples inside the target box is a finite number, while the cost of candidate samples outside the target box is infinite.

And the selection module is used for sorting the screened candidate samples according to the cost, sorting the candidate samples from small to large according to the cost value, and selecting the first N samples with the minimum cost as positive samples and the rest as negative samples.

In another aspect of the embodiments of the present application, a classifier training method based on joint representation of classification and positioning information is provided, which is applied to a single-stage target detector.

Most target detectors use a focallloss trained classifier, with the following specific formula:

wherein, in the formula,yis a sample class label, if it is a positive sample, it isy= 1; if it is a negative sample, theny=0, weight coefficientα _fTo balance the importance of the positive and negative samples,γ _fis an adjustment factor for adjusting the importance between simple samples and difficult samples,prepresenting the predicted classification score. By adopting a many-to-one sample selection method, each target corresponds to a plurality of samples, FcoalLoss is treated equally for each positive sample, and the sample with low classification score is regarded as a difficult sample, so that the contribution to the classification loss is larger. However, for each target, a low score sample is not necessarily a meaningful sample, as shown in FIG. 2.

To this end, the present example proposes a classifier training method based on joint representation of classification and positioning information, as shown in fig. 7, including the following steps:

firstly, according to regression information of a candidate sample, IoU values of a regression frame and a target frame of the candidate sample are calculated; the candidate samples are positive samples and negative samples selected by the prediction information-based sample selection method of the foregoing embodiment;

then, for each target, the normalized weight of the sample is calculated from the IoU values of the regression box and the target box for all the candidate samples:

wherein the content of the first and second substances,ω _kis shown askThe weight coefficient of each of the samples is,q _kthe IoU values representing the regression box and the target box for the kth sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target;

then, when training the classifier using the positive sample, weighting the classification loss of the positive sample by a weight coefficient calculated by the following formula:

In another aspect of the embodiments of the present application, there is provided a classifier training device based on joint representation of classification and positioning information, applied to a single-stage object detector, as shown in fig. 7, including: IoU value module, normalization module, weighting module.

The IoU value module is used for calculating IoU values of a regression box and a target box of the candidate sample according to the regression information of the candidate sample; the candidate samples are positive samples and negative samples selected by the prediction information-based sample selection method.

The normalization module is used for calculating the normalization weight of the sample according to the IoU values of the candidate box and the target box of all the candidate samples for each target:

wherein the content of the first and second substances,ω _kis shown askThe weight coefficient of each of the samples is,q _kis shown askThe IoU values for the regression box and the target box for each sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target.

The weighting module is used for weighting the classification loss of the positive sample through a weighting coefficient calculated by the following formula when the classifier is trained by the positive sample:

In another aspect of the embodiments of the present application, an electronic device is provided, which includes: at least one processor and memory; wherein the memory stores computer-executable instructions; execution of the computer-executable instructions stored in the memory by the at least one processor causes the at least one processor to perform the prediction information based sample selection method described in the previous embodiment or perform the classifier training method based on the joint representation of classification and localization information described in the previous embodiment.

In another aspect of the embodiments of the present application, a computer-readable storage medium stores thereon a computer program, which when executed by a processor controls an apparatus in which the storage medium is located to perform the method for selecting a sample based on prediction information according to the previous embodiment or perform the method for training a classifier based on joint representation of classification and localization information according to the previous embodiment.

The foregoing is merely a preferred embodiment of this invention and is not intended to be exhaustive or to limit the invention to the precise form disclosed. It will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention.

Claims

1. A sample selection method based on prediction information is applied to a single-stage target detector and is characterized by comprising the following steps:

constructing a cost function;

screening candidate samples by using prior information;

2. The prediction information-based sample selection method according to claim 1, wherein the cost function is constructed by a weighted combination of three losses, and the functions of the three losses are: the cross entropy function for classifying loss, the IoU loss function for regression loss, and the L1 loss function for the distance between the center point of the regression box and the target box.

3. The prediction information based sample selection method of claim 1, wherein the cost function is:

C _locincluding IoU loss, center point L1 distance:

C _clsthe formula of (1) is:

，

wherein, inC _clsIn the formula (2), the weight coefficientα _clsTo balance the weights of the target class and the remaining classes,γ _clsis a regulating factor used for regulating the importance of the sample,pclassification information indicative of a prediction of the detected network,i、jrepresenting target classesAnd (4) identifying indexes.

4. The method of claim 3, wherein the selecting of the samples based on the prediction information is performed by using the prior information to screen the candidate samples, i.e. the candidate samples with the center point in the target frame are retained, and the candidate samples outside the target frame are deleted, and using the formula:

wherein, Ω represents prior information, which indicates whether the grid point of the candidate sample is in the target frame, and if the grid coordinate point of the candidate sample is in the target frame, then Ω_iE Ω is 1, otherwise infinity.

5. A sample selection apparatus based on prediction information, applied to a single-stage target detector, comprising:

a construction module for constructing a cost function;

6. The prediction information-based sample selection apparatus of claim 5, wherein the construction module constructs the cost function as:

C _locincluding IoU loss, center point L1 distance:

C _clsthe formula of (1) is:

，

7. A classifier training method based on classification and positioning information joint representation is applied to a single-stage target detector and is characterized by comprising the following steps:

calculating IoU values of a regression box and a target box of the candidate sample according to the regression information of the candidate sample; the candidate samples are positive samples and negative samples selected by the sample selection method based on the prediction information according to any one of claims 1 to 4;

for each target, from the IoU values of all the candidate samples, the normalized weight of the sample is calculated:

wherein the content of the first and second substances,ω _irepresents the weight coefficient of the kth sample,q _kthe IoU values representing the regression box and the target box for the kth sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target;

8. A classifier training device based on classification and positioning information joint representation is applied to a single-stage target detector and is characterized by comprising:

an IoU value module, configured to calculate IoU values of the regression box and the target box of the candidate sample according to the regression information of the candidate sample; the candidate samples are positive samples and negative samples selected by the sample selection method based on the prediction information according to any one of claims 1 to 4;

wherein the content of the first and second substances,ω _krepresents the weight coefficient of the kth sample,q _kthe IoU values representing the regression box and the target box for the kth sample,q _lis shown aslThe IoU values of the candidate box and the target box for a sample,Nrepresenting the total number of samples corresponding to the target;

9. An electronic device, comprising: at least one processor and memory; wherein the memory stores computer-executable instructions; wherein execution of the computer-executable instructions stored in the memory by the at least one processor causes the at least one processor to perform the method for selecting samples based on prediction information according to any one of claims 1-4 or to perform the method for training classifiers based on a joint representation of classification and localization information according to claim 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, controls an apparatus in which the storage medium is located to perform a method for selecting a sample based on prediction information according to any one of claims 1 to 4, or to perform a method for training a classifier based on a joint representation of classification and localization information according to claim 7.