CN112215154A

CN112215154A - Mask-based model evaluation method applied to face detection system

Info

Publication number: CN112215154A
Application number: CN202011091940.8A
Authority: CN
Inventors: 孙家乐; 瞿洪桂; 袁丽燕; 朱海明; 高云丽
Original assignee: Beijing Sinonet Science and Technology Co Ltd
Current assignee: Beijing Sinonet Science and Technology Co Ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-01-12
Anticipated expiration: 2040-10-13
Also published as: CN112215154B

Abstract

The invention relates to the field of face detection, and discloses a mask-based model evaluation method applied to a face detection system, which comprises the steps of obtaining a face detection model to be evaluated, and dividing a labeling frame into a labeling face frame and a labeling mask frame; carrying out face detection on the test picture; calculating IoU values of a prediction box and a marking box, and setting n different IoU thresholds; obtaining m confidence coefficient thresholds, screening a plurality of prediction boxes, respectively matching the prediction boxes with a face labeling box and a mask labeling box, and establishing a confusion matrix according to matching results; traversing all confidence degree thresholds, and obtaining an AP value according to a P-R curve; and traversing all different IoU threshold values to obtain an evaluation index result of the face detection model. The invention distinguishes the face labeling frame and the layout labeling frame, so that the evaluation standard of the face detection model focuses more on the face concerned by the real application scene, and the accuracy of the subsequent face related task is effectively improved.

Description

Mask-based model evaluation method applied to face detection system

Technical Field

The invention relates to the field of face detection, in particular to a mask-based model evaluation method applied to a face detection system.

Background

For the face detection task, a specific face prediction result includes: coordinates and confidence of the bounding box. The face detection model evaluation method judges whether the prediction result is correct or not according to the coincidence degree between the prediction frame and the labeling frame.

In the existing face detection model evaluation method, the Average Precision (AP) of a labeling frame and a prediction frame under a single intersection ratio (IoU) threshold value is used as a model quality evaluation standard. However, in practice it was found that the AP performed better in the single IoU threshold model, and may perform poorly in the other IoU thresholds. The degree of coincidence between the prediction frame and the labeling frame has a great influence on the accuracy of human face related tasks (such as human face key point detection, human face attributes and human face identification), but the existing model evaluation method does not effectively evaluate the human face related tasks.

Due to factors such as shielding, angles, light rays and blurring, whether part of labeling information is a human face or not is difficult to judge, and even if the human face frame difficult to label is correctly detected, the human face frame is not significant for use of the human face detector in a real scene. In the face detection model evaluation method in the prior art, or the face frame difficult to label is treated as a label frame and a common label frame in the same way, so that the expressions of different face detection models on the common label frame cannot be effectively distinguished; or the human face frame difficult to label is taken as a background, and the positive detection of the human face detection model on the part of label information is taken as false detection processing, so that the prediction evaluation of the human face detection model on the human face is not accurate.

Disclosure of Invention

The invention provides a mask-based model evaluation method applied to a face detection system, and aims to solve the problem of inaccurate evaluation caused by the fact that different IoU threshold values are not considered and a face frame which is difficult to label cannot be processed when the face evaluation is carried out by the existing face detection model.

A mask-based model evaluation method applied to a face detection system comprises the following steps:

s1) obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors;

s2) carrying out face detection on the test picture by using the face detection model to be evaluated to obtain all prediction frames in the test picture and the prediction frame information of each prediction frame;

s3) calculating IoU values between all prediction boxes in a test picture and all annotation boxes in the test picture, setting n different IoU thresholds;

s4) setting a confidence threshold value range under the S-th IoU threshold value, and performing equidistant value taking within the confidence threshold value range to obtain m confidence threshold values, where S is 1, 2, …, and n;

s5) screening a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture, wherein i is 1, 2, … and m;

s6) matching the plurality of prediction frames with the confidence degrees larger than the ith confidence degree threshold value in the test picture with the plurality of labeled face frames and the plurality of labeled mask frames in the step S1) respectively according to the IoU value to obtain a matching result, establishing a confusion matrix according to the matching result, and calculating the accuracy and the recall rate corresponding to the ith confidence degree threshold value according to the confusion matrix;

s7) repeating the steps S5) to S6) in turn, traversing all confidence level thresholds under the S-th IoU threshold value, obtaining the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, drawing a P-R curve according to the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, and obtaining an AP value corresponding to the S-th IoU threshold value according to the P-R curve;

s8) repeating steps S4) to S7) in turn, traversing all the different IoU thresholds, obtaining a plurality of AP values corresponding to all the different IoU thresholds respectively;

s9) calculating the average value of the AP values in the step S10), and taking the average value of the AP values as the evaluation index result of the human face detection model.

Further, in step S6), matching, according to the IoU value, the plurality of prediction frames with the confidence degrees greater than the ith confidence degree threshold value in the test picture with the plurality of labeled face frames and the plurality of labeled montage frames in step S1), respectively, to obtain a matching result, including the following steps:

s61) sorting a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture according to the sequence of the confidence degrees from high to low to obtain a plurality of sorted prediction frames;

s62) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the labeled face frames which are not matched to be successful in the test picture respectively to obtain a first IoU maximum value

Subscript z represents the z-th labeled face frame in all the labeled face frames which have not been matched to be successful in the test picture, and the maximum value is judged

If the number of the second prediction frames is greater than the S IoU threshold, if so, successfully matching the jth prediction frame of the plurality of sorted prediction frames with the z-th labeled face frame of all the labeled face frames which have not been matched into a success in the test picture, and entering step S64); if not, marking that the jth prediction frame in the sequenced prediction frames fails to be matched, and entering the step S63);

s63) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the unmasked marking frames which are not matched to be successful in the test picture respectively to obtain a second maximum value

The subscript f represents the f-th labeled mask frame of all the labeled mask frames which are not matched to be successful in the test picture, and the second maximum value is judged

If it is greater than the s-th IoU threshold valueIf so, matching the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the labeled test picture with the fth labeled masking frame of all the labeled masking frames which are not matched into a successful labeled masking frame in the test picture successfully; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);

s64) repeating the steps S62) to S63) in turn, traversing the plurality of sequenced prediction boxes according to the sequence from high confidence to low confidence, and obtaining the matching results of the plurality of sequenced prediction boxes.

Further, in step S6), a confusion matrix is built according to the matching result, and the accuracy and recall corresponding to the ith confidence threshold are calculated according to the confusion matrix, including the following steps:

s61) counting the positive detection number TP of the labeled face frame_iFalse negative number FN of marked face frame_iAnd false detection number FP of labeled face frame_i；

S62) according to the positive check number TP of the labeled face frame_iAnd the missed detection number FN of the labeled face frame_iAnd the false detection number FP of the labeled face frame_iEstablishing a confusion matrix;

s63) obtaining an accuracy corresponding to the ith confidence threshold

Obtaining a recall rate corresponding to the ith confidence threshold

Further, in step S61), the number TP of positive detections of the labeled face frame is counted_iFalse negative number FN of marked face frame_iAnd false detection number FP of labeled face frame_iThe method comprises the following steps:

s611) recording the total number of the face frames marked in the test picture as d, recording the total number of the sequenced prediction frames as k, and recording the matching result as f₁The labeled face frame and f which are successfully matched₂Marking a layout masking frame which is successfully matched;

s612) mixing f₁The successfully matched labeled face frame is used as a positive check number to obtain the positive check number TP of the labeled face frame_i＝f₁；

S613) the total number f of the successfully matched labeled face frames₁And the total number f of successfully matched marked mask frames₂Summing to obtain the total number of successfully matched labeled boxes as (f)₁+f₂) Taking the difference value between the total number of the sequenced prediction frames and the total number of the successfully matched labeling frames as a false detection number to obtain a false detection number FP of the labeling face frame_i＝k-f₁-f₂；

S614) taking the difference value between the total number of the labeled face frames in all the labeled frames and the total number of the labeled face frames successfully matched as the undetected number of the labeled face frames to obtain the undetected number FN of the labeled face frames_i＝d-f₁。

Further, in step S7), a P-R curve is plotted according to the accuracy and the recall ratio corresponding to all confidence thresholds respectively at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, including that the recall ratio and the accuracy are respectively used as an independent variable and a dependent variable, the abscissa of the P-R curve is the recall ratio, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve after being vertically connected with the abscissa is the AP value.

Further, in step S1), the environmental factors include a face occlusion degree, a face pose, and a picture blurring degree.

Further, in step S2), the prediction box information includes coordinates of the prediction box, a width of the prediction box, a height of the prediction box, and a confidence.

The invention has the beneficial effects that:

the method distinguishes the face labeling frame and the layout labeling frame, and discriminates the face labeling frame and the layout labeling frame when calculating the confusion matrix. Only the positive detection number caused by labeling the face frame is concerned, and the influence of the labeling mask frame which is not concerned by the real application scene on the positive detection number is avoided. Meanwhile, compared with the scheme of 'not distinguishing the two types of marking frames', the number of missed detections caused by marking the layout covering frames is reduced; compared with the scheme of regarding the mark mask frame as the background, the method reduces the false detection number caused by the mark mask frame; and finally, the evaluation standard of the face detection model is more focused on the face concerned by the real application scene.

The accuracy of the regression of the prediction frame has a large influence on the accuracy of the subsequent face-related tasks, the performance of the face detection model under different IoU thresholds (especially the high IoU threshold) is comprehensively considered, the accuracy and the recall rate are introduced in the evaluation process of the face detection model, and the accuracy of the subsequent face-related tasks (such as face key point detection, face attributes and face recognition) is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a mask-based model evaluation method applied to a face detection system according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a P-R curve provided in the first embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In a first embodiment, a mask-based model evaluation method applied to a face detection system, as shown in fig. 1, includes the following steps:

s1) obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors; the environmental factors include the degree of face occlusion, face pose and picture blur.

In this embodiment, various factors such as occlusion, angle, light, blur, etc. are synthesized, and the labeling information is divided into: labeling a face frame and labeling a layout frame. In the embodiment, a test set is obtained firstly, the test set comprises 13 scenes, 2791 pictures and 25110 marking frames, and the face quality in the marking frames is uneven. According to the current embodiment, the blocking degree, the human face posture and the picture blurring degree are combined to mark the labeling boxes which are difficult to distinguish by five sense organs or are invisible in more than half of the human face as the labeling mask boxes, and other labeling boxes are marked as the labeling human face boxes. The split labeling information comprises 16308 labeling face boxes and 8802 labeling mask boxes.

S2) carrying out face detection on the test picture by using the face detection model to be evaluated to obtain all prediction frames in the test picture and the prediction frame information of each prediction frame; the prediction box information includes coordinates of the prediction box, a width of the prediction box, a height of the prediction box, and a confidence.

S3) calculating IoU values between all prediction boxes in a test picture and all annotation boxes in the test picture, setting n different IoU thresholds; the IoU value is the ratio of the intersection area of a single prediction box and a single label box to the union area of the prediction box and the label box. The larger the IoU value, the closer the prediction box is to the label box and the more accurate the prediction box position regresses.

S4) setting a confidence threshold value range under the S-th IoU threshold value, and performing equidistant value taking within the confidence threshold value range to obtain m confidence threshold values, where S is 1, 2, …, and n.

In this embodiment, the confidence threshold value range 0-1 is divided into 1000 equal parts (including 0, not including 1), and the confusion matrix status when the confidence threshold values are correlated is respectively counted. The smaller the confidence resolution granularity, the more accurate the evaluation result, but at the same time, the larger the calculation amount and the longer the evaluation time. By taking the above factors into consideration, the present embodiment equally divides the confidence level by 1000.

S5) screening out a plurality of prediction frames with confidence greater than the ith confidence threshold in the test picture, where i is 1, 2, …, m. The confidence threshold is used for identifying the possibility that the current prediction box is a human face and not a background, and the value range is as follows: 0 to 1. When the prediction frame and the labeling frame are matched, the prediction frame with a large confidence coefficient threshold value is matched preferentially, so that the situation that the prediction frame with a low confidence coefficient is successfully matched with the labeling frame and the prediction frame with a high confidence coefficient is regarded as false detection is avoided.

S6) according to IoU values, respectively matching a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture with a plurality of labeled face frames and a plurality of labeled layout frames in the step S1) to obtain matching results, establishing a confusion matrix according to the matching results, and calculating the accuracy and the recall rate corresponding to the ith confidence degree threshold value according to the confusion matrix, wherein the method comprises the following steps:

in this embodiment, all the label boxes include a field that is matched to be successful, where: marking the matched labeled face frame as 1; marking the matched mark mask frame as-1; and marking the marked face box and the marked layout box which are not matched to be successful as 0. Current step S62), the prediction box only tries to match with the labeled face box that has not been matched successfully.

The subscript f represents the f-th labeled mask frame of all the labeled mask frames which have not been matched to be successful in the test picture, and the second maximum value is judged

Whether the confidence coefficient of the marking test picture is greater than the ith IoU threshold value or not, if so, matching the jth prediction frame with the confidence coefficient greater than the ith confidence coefficient threshold value in the marking test picture with the fth marking masking frame of all the marking masking frames which are not matched into a successful marking masking frame in the test picture; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);

current step S63) corresponds to relaxing the criterion of the face detection model for whether the face represented by the callout box is correctly detected. The successfully matched labeling mask frame is not subjected to positive detection, and the undetected labeling mask frame is not subjected to missed detection, so that a transition zone is provided for the face detection model to judge whether the labeling mask frame is detected by the part of the face frame which is difficult to label. And the evaluation result is more focused on a real face recognition application scene, namely whether the face detection model is used for detecting the marked face frame positively or not and detecting the background wrongly or not.

In this embodiment, the IoU threshold values are counted as 0.5, 0.55, 0.6,. and 0.95, respectively. The performance of the face detector at different IoU thresholds (especially the high IoU threshold) was considered together with the impact of the prediction box regression accuracy on the evaluation criteria.

In step S6), a confusion matrix is established according to the matching result, and the accuracy and recall rate corresponding to the ith confidence threshold are calculated according to the confusion matrix, including the following steps:

s61) counting the positive detection number TP of the labeled face frame_iFalse negative number FN of marked face frame_iAnd false detection number FP of labeled face frame_iThe method comprises the following steps:

s63) obtaining an accuracy corresponding to the ith confidence threshold

Obtaining a recall rate corresponding to the ith confidence threshold

The face detection task is a single target detection task, which is equivalent to the judgment of the face and the background of each proposal frame and is similar to the two classification tasks to some extent, so the concept of a confusion matrix commonly used by the two classification tasks is introduced. But different from the traditional two-classification method that the quantity is counted after the predicted value and the true value of the model are compared, so as to establish a confusion matrix, the invention marks the positive check number TP of the face frame_iFalse negative number FN of marked face frame_iAnd false detection number FP of labeled face frame_iAn confusion matrix is established. The method counts the total number of positive detection, false detection and missed detection of the test pictures under different confidence degree thresholds, calculates the accuracy rate and the recall rate, and further obtains the AP value under a specific IoU threshold.

S7) repeating the steps S5) to S6) in turn, traversing all confidence thresholds under the S-th IoU threshold, obtaining the accuracy and the recall ratio respectively corresponding to all the confidence thresholds under the S-th IoU threshold, drawing a P-R curve according to the accuracy and the recall ratio respectively corresponding to all the confidence thresholds under the S-th IoU threshold, and obtaining the AP value corresponding to the S-th IoU threshold according to the P-R curve.

In step S7), a P-R curve is plotted according to the accuracy and the recall rate respectively corresponding to all confidence thresholds at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, including plotting the recall rate and the accuracy as an independent variable and a dependent variable respectively, the abscissa of the P-R curve is the recall rate, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve after being vertically connected with the abscissa is the AP value (see fig. 2).

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A mask-based model evaluation method applied to a face detection system is characterized by comprising the following steps:

s1), obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors;

2. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S6), a plurality of prediction frames with confidence degrees greater than the ith confidence degree threshold in the test picture are respectively matched with the plurality of labeled face frames and the plurality of labeled mask frames in step S1) according to IoU values, so as to obtain matching results, and the method comprises the following steps:

If the confidence coefficient of the test picture is greater than the ith IoU threshold, labeling the jth prediction frame with the confidence coefficient greater than the ith confidence coefficient threshold in the test picture and all the non-previous prediction frames in the test pictureMatching the f-th labeled masking frame which is matched into the successful labeled masking frame successfully; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);

3. The mask-based model evaluation method applied to the face detection system of claim 2, wherein in step S6), a confusion matrix is established according to the matching result, and the accuracy and recall corresponding to the ith confidence threshold are calculated according to the confusion matrix, comprising the following steps:

s63) obtaining an accuracy corresponding to the ith confidence threshold

Obtaining a recall rate corresponding to the ith confidence threshold

4. The mask-based model evaluation method for a face detection system according to claim 3, wherein in step S61), the number of positive detections TP of the labeled face frame is counted_iFalse negative number FN of marked face frame_iAnd false detection number FP of labeled face frame_iThe method comprises the following steps:

s611) marking the face frame in the test pictureThe total number is recorded as d, the total number of the sequenced prediction boxes is recorded as k, and the matching result comprises f₁The labeled face frame and f which are successfully matched₂Marking a layout masking frame which is successfully matched;

5. The mask-based model evaluation method for a face detection system of claim 1, wherein in step S7), a P-R curve is plotted according to the accuracies and recall ratios corresponding to all confidence thresholds respectively at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, which includes plotting the recall ratios and accuracies as independent variables and dependent variables respectively, wherein the abscissa of the P-R curve is the recall ratio, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve and the abscissa are connected vertically is the AP value.

6. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S1), the environmental factors include a face occlusion degree, a face pose and a picture blurring degree.

7. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S2), the prediction box information comprises the coordinates of the prediction box, the width of the prediction box, the height and the confidence of the prediction box.