CN110490170B - Face candidate frame extraction method - Google Patents
Face candidate frame extraction method Download PDFInfo
- Publication number
- CN110490170B CN110490170B CN201910793249.5A CN201910793249A CN110490170B CN 110490170 B CN110490170 B CN 110490170B CN 201910793249 A CN201910793249 A CN 201910793249A CN 110490170 B CN110490170 B CN 110490170B
- Authority
- CN
- China
- Prior art keywords
- frame
- anchor
- image
- face
- anchor frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Abstract
The invention provides a face candidate frame extraction method, which mainly comprises the following two steps: step 1 is a candidate network training process; step 2 is a candidate frame extraction process; step 1, preparing 3 types of training data and training a candidate network by using the obtained data as a training sample, wherein the training data are a slice image, a label of an anchor frame corresponding to the slice image and a real frame regression vector respectively; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1, wherein the candidate network is a full convolution neural network. The invention combines the multi-anchor frame with the image pyramid, reduces the number of layers of the image pyramid, greatly improves the running speed while ensuring the accuracy, and gives consideration to the accuracy and the real-time performance, so that the invention has good practicability.
Description
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of face detection, in particular to a face candidate frame extraction method.
[ background of the invention ]
Face detection (face detection) is a basic step of related application in the field of computer vision, and plays a decisive role in performance of subsequent related application. The face detection is a computer technology for finding relevant information such as the position, size and posture of a face in any input information (such as images, videos and the like), wherein the position and size of the face are the most basic face information which must be obtained by face detection and are usually marked by a face candidate box.
The current common face detection method comprises the following steps: firstly, carrying out image pyramid operation on an original image to obtain a plurality of layers of pyramid images, secondly, using a candidate network to extract candidate frames which may be human faces in each layer of pyramid images, and finally, using a one-stage or multi-stage refining network to carry out screening and coordinate correction on the candidate frames to finally obtain the human face candidate frames. In the method, the extraction of the face candidate frame needs to process each layer of pyramid images, and the calculation amount is large, so that the time consumption is long, the extraction speed is poor, and the practicability is poor.
[ summary of the invention ]
The invention aims to provide a face candidate frame extraction method, which effectively reduces the calculated amount and improves the extraction speed of the face candidate frame.
The purpose of the invention is realized by the following technical scheme:
a face candidate frame extraction method mainly comprises two steps: step 1: a candidate network training process; step 2: a candidate frame extraction process; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1.
Step 1: preparing 3 types of training data which are respectively a slice image, a label of an anchor frame corresponding to the slice image and a regression vector of a real frame, wherein the corresponding data acquisition mode is as follows:
(1) Slice image:
randomly selecting square slices with side length not less than A from the original image of the training data set, uniformly scaling the side lengths of the square slices to A, and finally obtaining slice images with the size of A x A.
(2) Label of anchor frame corresponding to slice image:
1) Anchor frame corresponding to slice image:
selecting side lengths A scale respectively on an original image corresponding to the slice image by taking the central point of the slice image as the center i K square anchor frames; wherein scale is a scaling factor, and the value range is 0-1; the value of i is 0,1, … and k-1, wherein k is a positive integer and the value satisfiesThe preferred value is the maximum k value.
2) The label of the anchor frame:
calculating the IOU value of each anchor frame and the corresponding real face frame in the original image,wherein S ∩ Is the area of the intersection of the anchor frame and the real face frame, S ∪ The area of the anchor frame and the real face frame is collected.And giving labels to the anchor frames by using the maximum IOU value corresponding to each anchor frame, wherein the label categories are divided into positive sample labels, partial sample labels and negative sample labels.
(3) True frame regression vector:
for the anchor frame given with the positive sample label and the partial sample label in the above step (2), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the corresponding anchor frame, w g 、h g Width and height of the real face frame, w a 、h a Corresponding to the width and height of the anchor frame.
And training the candidate network by taking the obtained 3 types of training data as training samples, wherein the training termination condition is that iteration times or loss convergence is met.
Step 2: the extraction process of the candidate frame is as follows:
(a) Carrying out image pyramid processing on an original image to be detected to obtain a multilayer pyramid image;
(b) Inputting the trained candidate network to calculate from the pyramid image of the kth layer and the pyramid image of each j layer to obtain a face confidence coefficient feature map and a prediction frame regression vector feature map, wherein j is a positive integer less than or equal to k;
(c) Selecting pixel points of which the face confidence degrees are more than or equal to a threshold value T in the face confidence degree characteristic diagram in the step (b);
(d) Obtaining an anchor frame corresponding to the pixel point on the original image, namely a primary selection anchor frame, according to the pixel point selected in the step (c);
(e) And (d) aiming at the initial selection anchor frame obtained in the step (d), firstly removing redundant anchor frames by using a non-maximum value inhibition method, and then carrying out coordinate correction on the residual anchor frames by using corresponding predicted frame regression vectors to finally obtain the face candidate frame.
The invention combines the multi-anchor frame with the image pyramid for use, and the pyramid image to be processed is the initial pyramid imageThe number of layers of the image pyramid is effectively reduced, the running speed is greatly increased while the accuracy is ensured, and both the accuracy and the real-time performance are considered, so that the method has good practicability.
[ description of the drawings ]
Fig. 1 is a schematic flow chart of a candidate network training process according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a candidate frame extraction process according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a candidate network according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described below by referring to the accompanying drawings and listing examples.
The invention provides a face candidate frame extraction method, which mainly comprises the following two steps: step 1 is a candidate network training process, as shown in fig. 1; step 2 is a candidate frame extraction process, as shown in fig. 2; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1.
The candidate network is a convolutional neural network, and the specific network structure is shown in fig. 3.
The candidate network training process of this embodiment is specifically as follows:
(1) Randomly selecting square slices with the side length not less than 24 pixels from an original image of a training data set, uniformly scaling the side lengths of the square slices to 24, and finally obtaining slice images with the size of 24 x 24 pixels as one of candidate network training input data;
(2) Taking the center point of the slice image obtained in the step (1) as the center, obtaining k square anchor frames by scaling, wherein the empirical value of the scaling factor scale is 0.7937, and therefore the side lengths of the square anchor frames are respectively 24 × scale 1 I sequentially takes the values of 0,1, … and k-1,k as 4;
(3) Calculating the IOU value of each anchor frame in the step (2) and the corresponding real face frame in the original image, wherein the calculation formula is as follows:wherein S ∩ Is the area of the intersection of the anchor frame and the real face frame, S ∪ The area of the anchor frame and the real face frame is collected. Giving a label to each anchor frame by using the maximum IOU value corresponding to each anchor frame;
the assignment criteria for the tags are described as follows: giving a positive sample label to the corresponding anchor frame when the maximum IOU value is more than or equal to 0.5, giving a partial sample label to the corresponding anchor frame when the maximum IOU value is less than 0.5 and more than or equal to 0.3, giving a negative sample label to the corresponding anchor frame when the maximum IOU value is less than 0.2, and not giving labels to the rest anchor frames; the positive sample label anchor frame can be marked as 1, and the negative sample label anchor frame can be marked as 0;
(4) For the anchor frame given with the positive sample label and the partial sample label in the step (3), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the anchor frame, w g 、h g Width and height of the real face frame, w a 、h a The width and height of the anchor frame.
(5) The trained candidate network outputs two types of data, namely face confidence and prediction frame regression vectors, so that for the face confidence, anchor frame labels of all positive sample labels and negative sample labels and slice images corresponding to the anchor frames are used for training; for the predicted frame regression vector, training by using real frame regression vectors of all positive sample label anchor frames and part of sample label anchor frames and slice images corresponding to the anchor frames; and the termination condition of the training is to meet the iteration times or loss convergence, and finally the trained candidate network is obtained.
The specific flow of the candidate box extracting process in this embodiment can be described as follows:
the method comprises the steps of obtaining an original image to be detected with a resolution of 1920 x 1080 from an intelligent foreground, conducting image pyramid processing on the original image to be detected to obtain 18 layers of pyramid images, inputting each 3 layers of pyramid images into a trained candidate network from the 4 th layer of pyramid images, obtaining two outputs of a face confidence coefficient feature image and a prediction frame regression vector feature image, selecting pixel points with a face confidence coefficient larger than or equal to 0.6 from the face confidence coefficient feature image, finding out an anchor frame corresponding to the original image to be an initial selection anchor frame, removing redundant anchor frames in the initial selection anchor frame by using a non-maximum value inhibition method, conducting coordinate correction calculation on the residual anchor frames by using corresponding prediction frame regression vectors, and finally obtaining the face candidate frame.
The embodiment combines the multi-anchor frame with the image pyramid for use, reduces the pyramid image to be processed from the first 18 layers to 6 layers, and effectively reduces the number of image pyramid layers.
Through tests, when a picture with the size of 1920 x 1080 is subjected to face candidate frame extraction on a common personal computer at present, the face candidate frame extraction time in the MTCNN face detection method is about 255ms, and the face candidate frame extraction time in the embodiment of the application is about 33ms, so that the extraction speed of the face candidate frame is greatly optimized, and the method has good practicability.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (5)
1. A face candidate frame extraction method is characterized by mainly comprising the following two steps: step 1: a candidate network training process; step 2: a candidate frame extraction process; step 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in step 1;
the step 1 comprises the following steps: a. preparing 3 types of training data, and b, training the candidate network by using the obtained data as a training sample; the training data are respectively a slice image, a label of an anchor frame corresponding to the slice image and a real frame regression vector, and the data acquisition mode is specifically described as follows:
(1) Slice image:
randomly selecting square slices with side length not less than A from an original image of a training data set, uniformly scaling the side lengths of the square slices to A, and finally obtaining slice images with the size of A x A;
(2) Label of anchor frame corresponding to slice image:
1) Anchor frame corresponding to slice image:
selecting side lengths A scale respectively on an original image corresponding to the slice image by taking the center point of the slice image as the center i K square anchor frames; wherein scale is a scaling factor, and the value range is 0-1; the value of i is 0,1, … and k-1, wherein k is a positive integer and satisfies the condition
2) The label of the anchor frame:
calculating the IOU value of each anchor frame and the corresponding real face frame in the original image,wherein S ∩ Is the area of the intersection of the anchor frame and the real face frame, S ∪ The area of the part is collected by the anchor frame and the real face frame; assigning labels to the anchor frames according to the maximum IOU value corresponding to each anchor frame, wherein the label categories are divided into positive sample labels, partial sample labels and negative sample labels;
(3) True frame regression vector:
for the anchor frame given with the positive sample label and the partial sample label in the above step (2), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the corresponding anchor frame, w g 、h g Width and height of the real face frame, w a 、h a Corresponding to the width and height of the anchor frame;
the specific process of the step 2 is as follows:
(a) Carrying out image pyramid processing on an original image to be detected to obtain a multilayer pyramid image;
(b) Inputting the trained candidate network to calculate from the pyramid image of the kth layer and the pyramid image of each j layer to obtain a face confidence coefficient feature map and a prediction frame regression vector feature map, wherein j is a positive integer less than or equal to k;
(c) Selecting pixel points of which the face confidence degrees are more than or equal to a threshold value T in the face confidence degree characteristic diagram in the step (b);
(d) Obtaining an anchor frame corresponding to the pixel point on the original image, namely a primary selection anchor frame, according to the pixel point selected in the step (c);
(e) And (d) aiming at the initial selection anchor frame obtained in the step (d), firstly removing redundant anchor frames by using a non-maximum value inhibition method, and then carrying out coordinate correction on the residual anchor frames by using corresponding predicted frame regression vectors to finally obtain the face candidate frame.
2. The method as claimed in claim 1, wherein the candidate network is a full convolutional neural network.
4. The method as claimed in claim 1, wherein the threshold T is 0.6.
5. The method of claim 1, wherein the assigned corresponding anchor frame positive sample label with the maximum IOU value of 0.5 or more, the assigned corresponding anchor frame partial sample label with the maximum IOU value of 0.5 or more and 0.3 or more, the assigned corresponding anchor frame negative sample label with the maximum IOU value of 0.2 or less, and the remaining anchor frames are not assigned labels.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910793249.5A CN110490170B (en) | 2019-08-27 | 2019-08-27 | Face candidate frame extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910793249.5A CN110490170B (en) | 2019-08-27 | 2019-08-27 | Face candidate frame extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110490170A CN110490170A (en) | 2019-11-22 |
CN110490170B true CN110490170B (en) | 2023-01-06 |
Family
ID=68554292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910793249.5A Active CN110490170B (en) | 2019-08-27 | 2019-08-27 | Face candidate frame extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110490170B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046792A (en) * | 2019-12-11 | 2020-04-21 | 三一重工股份有限公司 | Face detection method and device, electronic equipment and computer readable storage medium |
CN114299283B (en) * | 2022-03-07 | 2022-05-20 | 科大天工智能装备技术(天津)有限公司 | Image-based displacement measurement method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871134A (en) * | 2016-09-23 | 2018-04-03 | 北京眼神科技有限公司 | A kind of method for detecting human face and device |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9858496B2 (en) * | 2016-01-20 | 2018-01-02 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
-
2019
- 2019-08-27 CN CN201910793249.5A patent/CN110490170B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871134A (en) * | 2016-09-23 | 2018-04-03 | 北京眼神科技有限公司 | A kind of method for detecting human face and device |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
Also Published As
Publication number | Publication date |
---|---|
CN110490170A (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038409B (en) | Pedestrian detection method | |
CN106845487B (en) | End-to-end license plate identification method | |
CN109583340B (en) | Video target detection method based on deep learning | |
US11222211B2 (en) | Method and apparatus for segmenting video object, electronic device, and storage medium | |
CN110781350B (en) | Pedestrian retrieval method and system oriented to full-picture monitoring scene | |
CN111709909A (en) | General printing defect detection method based on deep learning and model thereof | |
CN107767416B (en) | Method for identifying pedestrian orientation in low-resolution image | |
CN111860439A (en) | Unmanned aerial vehicle inspection image defect detection method, system and equipment | |
CN109886159B (en) | Face detection method under non-limited condition | |
CN110443279B (en) | Unmanned aerial vehicle image vehicle detection method based on lightweight neural network | |
CN106845338B (en) | Pedestrian detection method and system in video stream | |
CN109858327B (en) | Character segmentation method based on deep learning | |
CN115147418B (en) | Compression training method and device for defect detection model | |
CN111160407A (en) | Deep learning target detection method and system | |
CN110490170B (en) | Face candidate frame extraction method | |
CN111931686A (en) | Video satellite target tracking method based on background knowledge enhancement | |
CN112396035A (en) | Object detection method and device based on attention detection model | |
CN116129426A (en) | Fine granularity classification method for cervical cell smear 18 category | |
CN111445388A (en) | Image super-resolution reconstruction model training method, ship tracking method and ship tracking device | |
CN111292308A (en) | Convolutional neural network-based infrared defect detection method for photovoltaic solar panel | |
CN111612802B (en) | Re-optimization training method based on existing image semantic segmentation model and application | |
CN109583584B (en) | Method and system for enabling CNN with full connection layer to accept indefinite shape input | |
CN116363535A (en) | Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network | |
CN113657225B (en) | Target detection method | |
CN114913504A (en) | Vehicle target identification method of remote sensing image fused with self-attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |