CN112215154A - Mask-based model evaluation method applied to face detection system - Google Patents

Mask-based model evaluation method applied to face detection system Download PDF

Info

Publication number
CN112215154A
CN112215154A CN202011091940.8A CN202011091940A CN112215154A CN 112215154 A CN112215154 A CN 112215154A CN 202011091940 A CN202011091940 A CN 202011091940A CN 112215154 A CN112215154 A CN 112215154A
Authority
CN
China
Prior art keywords
face
frames
frame
prediction
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011091940.8A
Other languages
Chinese (zh)
Other versions
CN112215154B (en
Inventor
孙家乐
瞿洪桂
袁丽燕
朱海明
高云丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinonet Science and Technology Co Ltd
Original Assignee
Beijing Sinonet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinonet Science and Technology Co Ltd filed Critical Beijing Sinonet Science and Technology Co Ltd
Priority to CN202011091940.8A priority Critical patent/CN112215154B/en
Publication of CN112215154A publication Critical patent/CN112215154A/en
Application granted granted Critical
Publication of CN112215154B publication Critical patent/CN112215154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of face detection, and discloses a mask-based model evaluation method applied to a face detection system, which comprises the steps of obtaining a face detection model to be evaluated, and dividing a labeling frame into a labeling face frame and a labeling mask frame; carrying out face detection on the test picture; calculating IoU values of a prediction box and a marking box, and setting n different IoU thresholds; obtaining m confidence coefficient thresholds, screening a plurality of prediction boxes, respectively matching the prediction boxes with a face labeling box and a mask labeling box, and establishing a confusion matrix according to matching results; traversing all confidence degree thresholds, and obtaining an AP value according to a P-R curve; and traversing all different IoU threshold values to obtain an evaluation index result of the face detection model. The invention distinguishes the face labeling frame and the layout labeling frame, so that the evaluation standard of the face detection model focuses more on the face concerned by the real application scene, and the accuracy of the subsequent face related task is effectively improved.

Description

Mask-based model evaluation method applied to face detection system
Technical Field
The invention relates to the field of face detection, in particular to a mask-based model evaluation method applied to a face detection system.
Background
For the face detection task, a specific face prediction result includes: coordinates and confidence of the bounding box. The face detection model evaluation method judges whether the prediction result is correct or not according to the coincidence degree between the prediction frame and the labeling frame.
In the existing face detection model evaluation method, the Average Precision (AP) of a labeling frame and a prediction frame under a single intersection ratio (IoU) threshold value is used as a model quality evaluation standard. However, in practice it was found that the AP performed better in the single IoU threshold model, and may perform poorly in the other IoU thresholds. The degree of coincidence between the prediction frame and the labeling frame has a great influence on the accuracy of human face related tasks (such as human face key point detection, human face attributes and human face identification), but the existing model evaluation method does not effectively evaluate the human face related tasks.
Due to factors such as shielding, angles, light rays and blurring, whether part of labeling information is a human face or not is difficult to judge, and even if the human face frame difficult to label is correctly detected, the human face frame is not significant for use of the human face detector in a real scene. In the face detection model evaluation method in the prior art, or the face frame difficult to label is treated as a label frame and a common label frame in the same way, so that the expressions of different face detection models on the common label frame cannot be effectively distinguished; or the human face frame difficult to label is taken as a background, and the positive detection of the human face detection model on the part of label information is taken as false detection processing, so that the prediction evaluation of the human face detection model on the human face is not accurate.
Disclosure of Invention
The invention provides a mask-based model evaluation method applied to a face detection system, and aims to solve the problem of inaccurate evaluation caused by the fact that different IoU threshold values are not considered and a face frame which is difficult to label cannot be processed when the face evaluation is carried out by the existing face detection model.
A mask-based model evaluation method applied to a face detection system comprises the following steps:
s1) obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors;
s2) carrying out face detection on the test picture by using the face detection model to be evaluated to obtain all prediction frames in the test picture and the prediction frame information of each prediction frame;
s3) calculating IoU values between all prediction boxes in a test picture and all annotation boxes in the test picture, setting n different IoU thresholds;
s4) setting a confidence threshold value range under the S-th IoU threshold value, and performing equidistant value taking within the confidence threshold value range to obtain m confidence threshold values, where S is 1, 2, …, and n;
s5) screening a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture, wherein i is 1, 2, … and m;
s6) matching the plurality of prediction frames with the confidence degrees larger than the ith confidence degree threshold value in the test picture with the plurality of labeled face frames and the plurality of labeled mask frames in the step S1) respectively according to the IoU value to obtain a matching result, establishing a confusion matrix according to the matching result, and calculating the accuracy and the recall rate corresponding to the ith confidence degree threshold value according to the confusion matrix;
s7) repeating the steps S5) to S6) in turn, traversing all confidence level thresholds under the S-th IoU threshold value, obtaining the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, drawing a P-R curve according to the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, and obtaining an AP value corresponding to the S-th IoU threshold value according to the P-R curve;
s8) repeating steps S4) to S7) in turn, traversing all the different IoU thresholds, obtaining a plurality of AP values corresponding to all the different IoU thresholds respectively;
s9) calculating the average value of the AP values in the step S10), and taking the average value of the AP values as the evaluation index result of the human face detection model.
Further, in step S6), matching, according to the IoU value, the plurality of prediction frames with the confidence degrees greater than the ith confidence degree threshold value in the test picture with the plurality of labeled face frames and the plurality of labeled montage frames in step S1), respectively, to obtain a matching result, including the following steps:
s61) sorting a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture according to the sequence of the confidence degrees from high to low to obtain a plurality of sorted prediction frames;
s62) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the labeled face frames which are not matched to be successful in the test picture respectively to obtain a first IoU maximum value
Figure BDA0002722421090000031
Subscript z represents the z-th labeled face frame in all the labeled face frames which have not been matched to be successful in the test picture, and the maximum value is judged
Figure BDA0002722421090000032
If the number of the second prediction frames is greater than the S IoU threshold, if so, successfully matching the jth prediction frame of the plurality of sorted prediction frames with the z-th labeled face frame of all the labeled face frames which have not been matched into a success in the test picture, and entering step S64); if not, marking that the jth prediction frame in the sequenced prediction frames fails to be matched, and entering the step S63);
s63) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the unmasked marking frames which are not matched to be successful in the test picture respectively to obtain a second maximum value
Figure BDA0002722421090000033
The subscript f represents the f-th labeled mask frame of all the labeled mask frames which are not matched to be successful in the test picture, and the second maximum value is judged
Figure BDA0002722421090000034
If it is greater than the s-th IoU threshold valueIf so, matching the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the labeled test picture with the fth labeled masking frame of all the labeled masking frames which are not matched into a successful labeled masking frame in the test picture successfully; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);
s64) repeating the steps S62) to S63) in turn, traversing the plurality of sequenced prediction boxes according to the sequence from high confidence to low confidence, and obtaining the matching results of the plurality of sequenced prediction boxes.
Further, in step S6), a confusion matrix is built according to the matching result, and the accuracy and recall corresponding to the ith confidence threshold are calculated according to the confusion matrix, including the following steps:
s61) counting the positive detection number TP of the labeled face frameiFalse negative number FN of marked face frameiAnd false detection number FP of labeled face framei
S62) according to the positive check number TP of the labeled face frameiAnd the missed detection number FN of the labeled face frameiAnd the false detection number FP of the labeled face frameiEstablishing a confusion matrix;
s63) obtaining an accuracy corresponding to the ith confidence threshold
Figure BDA0002722421090000041
Obtaining a recall rate corresponding to the ith confidence threshold
Figure BDA0002722421090000042
Further, in step S61), the number TP of positive detections of the labeled face frame is countediFalse negative number FN of marked face frameiAnd false detection number FP of labeled face frameiThe method comprises the following steps:
s611) recording the total number of the face frames marked in the test picture as d, recording the total number of the sequenced prediction frames as k, and recording the matching result as f1The labeled face frame and f which are successfully matched2Marking a layout masking frame which is successfully matched;
s612) mixing f1The successfully matched labeled face frame is used as a positive check number to obtain the positive check number TP of the labeled face framei=f1
S613) the total number f of the successfully matched labeled face frames1And the total number f of successfully matched marked mask frames2Summing to obtain the total number of successfully matched labeled boxes as (f)1+f2) Taking the difference value between the total number of the sequenced prediction frames and the total number of the successfully matched labeling frames as a false detection number to obtain a false detection number FP of the labeling face framei=k-f1-f2
S614) taking the difference value between the total number of the labeled face frames in all the labeled frames and the total number of the labeled face frames successfully matched as the undetected number of the labeled face frames to obtain the undetected number FN of the labeled face framesi=d-f1
Further, in step S7), a P-R curve is plotted according to the accuracy and the recall ratio corresponding to all confidence thresholds respectively at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, including that the recall ratio and the accuracy are respectively used as an independent variable and a dependent variable, the abscissa of the P-R curve is the recall ratio, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve after being vertically connected with the abscissa is the AP value.
Further, in step S1), the environmental factors include a face occlusion degree, a face pose, and a picture blurring degree.
Further, in step S2), the prediction box information includes coordinates of the prediction box, a width of the prediction box, a height of the prediction box, and a confidence.
The invention has the beneficial effects that:
the method distinguishes the face labeling frame and the layout labeling frame, and discriminates the face labeling frame and the layout labeling frame when calculating the confusion matrix. Only the positive detection number caused by labeling the face frame is concerned, and the influence of the labeling mask frame which is not concerned by the real application scene on the positive detection number is avoided. Meanwhile, compared with the scheme of 'not distinguishing the two types of marking frames', the number of missed detections caused by marking the layout covering frames is reduced; compared with the scheme of regarding the mark mask frame as the background, the method reduces the false detection number caused by the mark mask frame; and finally, the evaluation standard of the face detection model is more focused on the face concerned by the real application scene.
The accuracy of the regression of the prediction frame has a large influence on the accuracy of the subsequent face-related tasks, the performance of the face detection model under different IoU thresholds (especially the high IoU threshold) is comprehensively considered, the accuracy and the recall rate are introduced in the evaluation process of the face detection model, and the accuracy of the subsequent face-related tasks (such as face key point detection, face attributes and face recognition) is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a mask-based model evaluation method applied to a face detection system according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a P-R curve provided in the first embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In a first embodiment, a mask-based model evaluation method applied to a face detection system, as shown in fig. 1, includes the following steps:
s1) obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors; the environmental factors include the degree of face occlusion, face pose and picture blur.
In this embodiment, various factors such as occlusion, angle, light, blur, etc. are synthesized, and the labeling information is divided into: labeling a face frame and labeling a layout frame. In the embodiment, a test set is obtained firstly, the test set comprises 13 scenes, 2791 pictures and 25110 marking frames, and the face quality in the marking frames is uneven. According to the current embodiment, the blocking degree, the human face posture and the picture blurring degree are combined to mark the labeling boxes which are difficult to distinguish by five sense organs or are invisible in more than half of the human face as the labeling mask boxes, and other labeling boxes are marked as the labeling human face boxes. The split labeling information comprises 16308 labeling face boxes and 8802 labeling mask boxes.
S2) carrying out face detection on the test picture by using the face detection model to be evaluated to obtain all prediction frames in the test picture and the prediction frame information of each prediction frame; the prediction box information includes coordinates of the prediction box, a width of the prediction box, a height of the prediction box, and a confidence.
S3) calculating IoU values between all prediction boxes in a test picture and all annotation boxes in the test picture, setting n different IoU thresholds; the IoU value is the ratio of the intersection area of a single prediction box and a single label box to the union area of the prediction box and the label box. The larger the IoU value, the closer the prediction box is to the label box and the more accurate the prediction box position regresses.
S4) setting a confidence threshold value range under the S-th IoU threshold value, and performing equidistant value taking within the confidence threshold value range to obtain m confidence threshold values, where S is 1, 2, …, and n.
In this embodiment, the confidence threshold value range 0-1 is divided into 1000 equal parts (including 0, not including 1), and the confusion matrix status when the confidence threshold values are correlated is respectively counted. The smaller the confidence resolution granularity, the more accurate the evaluation result, but at the same time, the larger the calculation amount and the longer the evaluation time. By taking the above factors into consideration, the present embodiment equally divides the confidence level by 1000.
S5) screening out a plurality of prediction frames with confidence greater than the ith confidence threshold in the test picture, where i is 1, 2, …, m. The confidence threshold is used for identifying the possibility that the current prediction box is a human face and not a background, and the value range is as follows: 0 to 1. When the prediction frame and the labeling frame are matched, the prediction frame with a large confidence coefficient threshold value is matched preferentially, so that the situation that the prediction frame with a low confidence coefficient is successfully matched with the labeling frame and the prediction frame with a high confidence coefficient is regarded as false detection is avoided.
S6) according to IoU values, respectively matching a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture with a plurality of labeled face frames and a plurality of labeled layout frames in the step S1) to obtain matching results, establishing a confusion matrix according to the matching results, and calculating the accuracy and the recall rate corresponding to the ith confidence degree threshold value according to the confusion matrix, wherein the method comprises the following steps:
s61) sorting a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture according to the sequence of the confidence degrees from high to low to obtain a plurality of sorted prediction frames;
s62) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the labeled face frames which are not matched to be successful in the test picture respectively to obtain a first IoU maximum value
Figure BDA0002722421090000071
Subscript z represents the z-th labeled face frame in all the labeled face frames which have not been matched to be successful in the test picture, and the maximum value is judged
Figure BDA0002722421090000072
If the number of the second prediction frames is greater than the S IoU threshold, if so, successfully matching the jth prediction frame of the plurality of sorted prediction frames with the z-th labeled face frame of all the labeled face frames which have not been matched into a success in the test picture, and entering step S64); if not, marking that the jth prediction frame in the sequenced prediction frames fails to be matched, and entering the step S63);
in this embodiment, all the label boxes include a field that is matched to be successful, where: marking the matched labeled face frame as 1; marking the matched mark mask frame as-1; and marking the marked face box and the marked layout box which are not matched to be successful as 0. Current step S62), the prediction box only tries to match with the labeled face box that has not been matched successfully.
S63) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the unmasked marking frames which are not matched to be successful in the test picture respectively to obtain a second maximum value
Figure BDA0002722421090000081
The subscript f represents the f-th labeled mask frame of all the labeled mask frames which have not been matched to be successful in the test picture, and the second maximum value is judged
Figure BDA0002722421090000082
Whether the confidence coefficient of the marking test picture is greater than the ith IoU threshold value or not, if so, matching the jth prediction frame with the confidence coefficient greater than the ith confidence coefficient threshold value in the marking test picture with the fth marking masking frame of all the marking masking frames which are not matched into a successful marking masking frame in the test picture; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);
current step S63) corresponds to relaxing the criterion of the face detection model for whether the face represented by the callout box is correctly detected. The successfully matched labeling mask frame is not subjected to positive detection, and the undetected labeling mask frame is not subjected to missed detection, so that a transition zone is provided for the face detection model to judge whether the labeling mask frame is detected by the part of the face frame which is difficult to label. And the evaluation result is more focused on a real face recognition application scene, namely whether the face detection model is used for detecting the marked face frame positively or not and detecting the background wrongly or not.
S64) repeating the steps S62) to S63) in turn, traversing the plurality of sequenced prediction boxes according to the sequence from high confidence to low confidence, and obtaining the matching results of the plurality of sequenced prediction boxes.
In this embodiment, the IoU threshold values are counted as 0.5, 0.55, 0.6,. and 0.95, respectively. The performance of the face detector at different IoU thresholds (especially the high IoU threshold) was considered together with the impact of the prediction box regression accuracy on the evaluation criteria.
In step S6), a confusion matrix is established according to the matching result, and the accuracy and recall rate corresponding to the ith confidence threshold are calculated according to the confusion matrix, including the following steps:
s61) counting the positive detection number TP of the labeled face frameiFalse negative number FN of marked face frameiAnd false detection number FP of labeled face frameiThe method comprises the following steps:
s611) recording the total number of the face frames marked in the test picture as d, recording the total number of the sequenced prediction frames as k, and recording the matching result as f1The labeled face frame and f which are successfully matched2Marking a layout masking frame which is successfully matched;
s612) mixing f1The successfully matched labeled face frame is used as a positive check number to obtain the positive check number TP of the labeled face framei=f1
S613) the total number f of the successfully matched labeled face frames1And the total number f of successfully matched marked mask frames2Summing to obtain the total number of successfully matched labeled boxes as (f)1+f2) Taking the difference value between the total number of the sequenced prediction frames and the total number of the successfully matched labeling frames as a false detection number to obtain a false detection number FP of the labeling face framei=k-f1-f2
S614) taking the difference value between the total number of the labeled face frames in all the labeled frames and the total number of the labeled face frames successfully matched as the undetected number of the labeled face frames to obtain the undetected number FN of the labeled face framesi=d-f1
S62) according to the positive check number TP of the labeled face frameiAnd the missed detection number FN of the labeled face frameiAnd the false detection number FP of the labeled face frameiEstablishing a confusion matrix;
s63) obtaining an accuracy corresponding to the ith confidence threshold
Figure BDA0002722421090000091
Obtaining a recall rate corresponding to the ith confidence threshold
Figure BDA0002722421090000092
The face detection task is a single target detection task, which is equivalent to the judgment of the face and the background of each proposal frame and is similar to the two classification tasks to some extent, so the concept of a confusion matrix commonly used by the two classification tasks is introduced. But different from the traditional two-classification method that the quantity is counted after the predicted value and the true value of the model are compared, so as to establish a confusion matrix, the invention marks the positive check number TP of the face frameiFalse negative number FN of marked face frameiAnd false detection number FP of labeled face frameiAn confusion matrix is established. The method counts the total number of positive detection, false detection and missed detection of the test pictures under different confidence degree thresholds, calculates the accuracy rate and the recall rate, and further obtains the AP value under a specific IoU threshold.
S7) repeating the steps S5) to S6) in turn, traversing all confidence thresholds under the S-th IoU threshold, obtaining the accuracy and the recall ratio respectively corresponding to all the confidence thresholds under the S-th IoU threshold, drawing a P-R curve according to the accuracy and the recall ratio respectively corresponding to all the confidence thresholds under the S-th IoU threshold, and obtaining the AP value corresponding to the S-th IoU threshold according to the P-R curve.
In step S7), a P-R curve is plotted according to the accuracy and the recall rate respectively corresponding to all confidence thresholds at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, including plotting the recall rate and the accuracy as an independent variable and a dependent variable respectively, the abscissa of the P-R curve is the recall rate, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve after being vertically connected with the abscissa is the AP value (see fig. 2).
S8) repeating steps S4) to S7) in turn, traversing all the different IoU thresholds, obtaining a plurality of AP values corresponding to all the different IoU thresholds respectively;
s9) calculating the average value of the AP values in the step S10), and taking the average value of the AP values as the evaluation index result of the human face detection model.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the method distinguishes the face labeling frame and the layout labeling frame, and discriminates the face labeling frame and the layout labeling frame when calculating the confusion matrix. Only the positive detection number caused by labeling the face frame is concerned, and the influence of the labeling mask frame which is not concerned by the real application scene on the positive detection number is avoided. Meanwhile, compared with the scheme of 'not distinguishing the two types of marking frames', the number of missed detections caused by marking the layout covering frames is reduced; compared with the scheme of regarding the mark mask frame as the background, the method reduces the false detection number caused by the mark mask frame; and finally, the evaluation standard of the face detection model is more focused on the face concerned by the real application scene.
The accuracy of the regression of the prediction frame has a large influence on the accuracy of the subsequent face-related tasks, the performance of the face detection model under different IoU thresholds (especially the high IoU threshold) is comprehensively considered, the accuracy and the recall rate are introduced in the evaluation process of the face detection model, and the accuracy of the subsequent face-related tasks (such as face key point detection, face attributes and face recognition) is effectively improved.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (7)

1. A mask-based model evaluation method applied to a face detection system is characterized by comprising the following steps:
s1), obtaining a test picture and a face detection model to be evaluated, obtaining all labeling frames in the test picture, and dividing all labeling frames in the test picture into a plurality of labeling face frames and a plurality of labeling mask frames according to environmental factors;
s2) carrying out face detection on the test picture by using the face detection model to be evaluated to obtain all prediction frames in the test picture and the prediction frame information of each prediction frame;
s3) calculating IoU values between all prediction boxes in a test picture and all annotation boxes in the test picture, setting n different IoU thresholds;
s4) setting a confidence threshold value range under the S-th IoU threshold value, and performing equidistant value taking within the confidence threshold value range to obtain m confidence threshold values, where S is 1, 2, …, and n;
s5) screening a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture, wherein i is 1, 2, … and m;
s6) matching the plurality of prediction frames with the confidence degrees larger than the ith confidence degree threshold value in the test picture with the plurality of labeled face frames and the plurality of labeled mask frames in the step S1) respectively according to the IoU value to obtain a matching result, establishing a confusion matrix according to the matching result, and calculating the accuracy and the recall rate corresponding to the ith confidence degree threshold value according to the confusion matrix;
s7) repeating the steps S5) to S6) in turn, traversing all confidence level thresholds under the S-th IoU threshold value, obtaining the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, drawing a P-R curve according to the accuracy and the recall ratio respectively corresponding to all the confidence level thresholds under the S-th IoU threshold value, and obtaining an AP value corresponding to the S-th IoU threshold value according to the P-R curve;
s8) repeating steps S4) to S7) in turn, traversing all the different IoU thresholds, obtaining a plurality of AP values corresponding to all the different IoU thresholds respectively;
s9) calculating the average value of the AP values in the step S10), and taking the average value of the AP values as the evaluation index result of the human face detection model.
2. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S6), a plurality of prediction frames with confidence degrees greater than the ith confidence degree threshold in the test picture are respectively matched with the plurality of labeled face frames and the plurality of labeled mask frames in step S1) according to IoU values, so as to obtain matching results, and the method comprises the following steps:
s61) sorting a plurality of prediction frames with confidence degrees larger than the ith confidence degree threshold value in the test picture according to the sequence of the confidence degrees from high to low to obtain a plurality of sorted prediction frames;
s62) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the labeled face frames which are not matched to be successful in the test picture respectively to obtain a first IoU maximum value
Figure FDA0002722421080000021
Subscript z represents the z-th labeled face frame in all the labeled face frames which have not been matched to be successful in the test picture, and the maximum value is judged
Figure FDA0002722421080000022
If the number of the second prediction frames is greater than the S IoU threshold, if so, successfully matching the jth prediction frame of the plurality of sorted prediction frames with the z-th labeled face frame of all the labeled face frames which have not been matched into a success in the test picture, and entering step S64); if not, marking that the jth prediction frame in the sequenced prediction frames fails to be matched, and entering the step S63);
s63) calculating IoU values between the jth prediction frame in the sequenced prediction frames and all the unmasked marking frames which are not matched to be successful in the test picture respectively to obtain a second maximum value
Figure FDA0002722421080000023
The subscript f represents the f-th labeled mask frame of all the labeled mask frames which are not matched to be successful in the test picture, and the second maximum value is judged
Figure FDA0002722421080000024
If the confidence coefficient of the test picture is greater than the ith IoU threshold, labeling the jth prediction frame with the confidence coefficient greater than the ith confidence coefficient threshold in the test picture and all the non-previous prediction frames in the test pictureMatching the f-th labeled masking frame which is matched into the successful labeled masking frame successfully; if not, marking that the matching of the jth prediction frame with the confidence coefficient larger than the ith confidence coefficient threshold value in the test picture fails, and entering the step S64);
s64) repeating the steps S62) to S63) in turn, traversing the plurality of sequenced prediction boxes according to the sequence from high confidence to low confidence, and obtaining the matching results of the plurality of sequenced prediction boxes.
3. The mask-based model evaluation method applied to the face detection system of claim 2, wherein in step S6), a confusion matrix is established according to the matching result, and the accuracy and recall corresponding to the ith confidence threshold are calculated according to the confusion matrix, comprising the following steps:
s61) counting the positive detection number TP of the labeled face frameiFalse negative number FN of marked face frameiAnd false detection number FP of labeled face framei
S62) according to the positive check number TP of the labeled face frameiAnd the missed detection number FN of the labeled face frameiAnd the false detection number FP of the labeled face frameiEstablishing a confusion matrix;
s63) obtaining an accuracy corresponding to the ith confidence threshold
Figure FDA0002722421080000031
Obtaining a recall rate corresponding to the ith confidence threshold
Figure FDA0002722421080000032
4. The mask-based model evaluation method for a face detection system according to claim 3, wherein in step S61), the number of positive detections TP of the labeled face frame is countediFalse negative number FN of marked face frameiAnd false detection number FP of labeled face frameiThe method comprises the following steps:
s611) marking the face frame in the test pictureThe total number is recorded as d, the total number of the sequenced prediction boxes is recorded as k, and the matching result comprises f1The labeled face frame and f which are successfully matched2Marking a layout masking frame which is successfully matched;
s612) mixing f1The successfully matched labeled face frame is used as a positive check number to obtain the positive check number TP of the labeled face framei=f1
S613) the total number f of the successfully matched labeled face frames1And the total number f of successfully matched marked mask frames2Summing to obtain the total number of successfully matched labeled boxes as (f)1+f2) Taking the difference value between the total number of the sequenced prediction frames and the total number of the successfully matched labeling frames as a false detection number to obtain a false detection number FP of the labeling face framei=k-f1-f2
S614) taking the difference value between the total number of the labeled face frames in all the labeled frames and the total number of the labeled face frames successfully matched as the undetected number of the labeled face frames to obtain the undetected number FN of the labeled face framesi=d-f1
5. The mask-based model evaluation method for a face detection system of claim 1, wherein in step S7), a P-R curve is plotted according to the accuracies and recall ratios corresponding to all confidence thresholds respectively at the S-th IoU threshold, and an AP value corresponding to the S-th IoU threshold is obtained from the P-R curve, which includes plotting the recall ratios and accuracies as independent variables and dependent variables respectively, wherein the abscissa of the P-R curve is the recall ratio, the ordinate of the P-R curve is the accuracy, and the area enclosed by the two ends of the P-R curve and the abscissa are connected vertically is the AP value.
6. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S1), the environmental factors include a face occlusion degree, a face pose and a picture blurring degree.
7. The mask-based model evaluation method applied to the face detection system of claim 1, wherein in step S2), the prediction box information comprises the coordinates of the prediction box, the width of the prediction box, the height and the confidence of the prediction box.
CN202011091940.8A 2020-10-13 2020-10-13 Mask-based model evaluation method applied to face detection system Active CN112215154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011091940.8A CN112215154B (en) 2020-10-13 2020-10-13 Mask-based model evaluation method applied to face detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011091940.8A CN112215154B (en) 2020-10-13 2020-10-13 Mask-based model evaluation method applied to face detection system

Publications (2)

Publication Number Publication Date
CN112215154A true CN112215154A (en) 2021-01-12
CN112215154B CN112215154B (en) 2021-05-25

Family

ID=74053850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011091940.8A Active CN112215154B (en) 2020-10-13 2020-10-13 Mask-based model evaluation method applied to face detection system

Country Status (1)

Country Link
CN (1) CN112215154B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883946A (en) * 2021-04-29 2021-06-01 南京视察者智能科技有限公司 Adaptive threshold value selection method and face recognition method
WO2021218899A1 (en) * 2020-04-30 2021-11-04 京东方科技集团股份有限公司 Method for training facial recognition model, and method and apparatus for facial recognition
CN116129142A (en) * 2023-02-07 2023-05-16 广州市玄武无线科技股份有限公司 Image recognition model testing method and device, terminal equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462736A (en) * 2014-08-07 2017-02-22 华为技术有限公司 A processing device and method for face detection
WO2017149315A1 (en) * 2016-03-02 2017-09-08 Holition Limited Locating and augmenting object features in images
US20190180083A1 (en) * 2017-12-11 2019-06-13 Adobe Inc. Depicted Skin Selection
CN111582214A (en) * 2020-05-15 2020-08-25 中国科学院自动化研究所 Twin network-based behavior analysis method, system and device for cage-raised animals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462736A (en) * 2014-08-07 2017-02-22 华为技术有限公司 A processing device and method for face detection
WO2017149315A1 (en) * 2016-03-02 2017-09-08 Holition Limited Locating and augmenting object features in images
US20190180083A1 (en) * 2017-12-11 2019-06-13 Adobe Inc. Depicted Skin Selection
CN111582214A (en) * 2020-05-15 2020-08-25 中国科学院自动化研究所 Twin network-based behavior analysis method, system and device for cage-raised animals

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021218899A1 (en) * 2020-04-30 2021-11-04 京东方科技集团股份有限公司 Method for training facial recognition model, and method and apparatus for facial recognition
CN112883946A (en) * 2021-04-29 2021-06-01 南京视察者智能科技有限公司 Adaptive threshold value selection method and face recognition method
CN116129142A (en) * 2023-02-07 2023-05-16 广州市玄武无线科技股份有限公司 Image recognition model testing method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN112215154B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN112215154B (en) Mask-based model evaluation method applied to face detection system
CN106934346B (en) A kind of method of target detection performance optimization
CN109882019B (en) Automobile electric tail door opening method based on target detection and motion recognition
US7783106B2 (en) Video segmentation combining similarity analysis and classification
US7729512B2 (en) Stereo image processing to detect moving objects
US8325981B2 (en) Human tracking apparatus, human tracking method, and human tracking processing program
JP5127392B2 (en) Classification boundary determination method and classification boundary determination apparatus
EP3223196A1 (en) A method and a device for generating a confidence measure for an estimation derived from images captured by a camera mounted on a vehicle
CN101344922B (en) Human face detection method and device
CN105975929A (en) Fast pedestrian detection method based on aggregated channel features
US20070047822A1 (en) Learning method for classifiers, apparatus, and program for discriminating targets
CN111524164B (en) Target tracking method and device and electronic equipment
CN106778737A (en) A kind of car plate antidote, device and a kind of video acquisition device
KR20060064974A (en) Apparatus and method for detecting face in image using boost algorithm
CN109902576B (en) Training method and application of head and shoulder image classifier
CN111507232B (en) Stranger identification method and system based on multi-mode multi-strategy fusion
US20110243398A1 (en) Pattern recognition apparatus and pattern recognition method that reduce effects on recognition accuracy, and storage medium
CN108764338B (en) Pedestrian tracking method applied to video analysis
CN115620518B (en) Intersection traffic conflict judging method based on deep learning
CN106599918B (en) vehicle tracking method and system
CN110674680A (en) Living body identification method, living body identification device and storage medium
CN116311063A (en) Personnel fine granularity tracking method and system based on face recognition under monitoring video
CN111325265B (en) Detection method and device for tampered image
KR101991307B1 (en) Electronic device capable of feature vector assignment to a tracklet for multi-object tracking and operating method thereof
CN115565157A (en) Multi-camera multi-target vehicle tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant