CN110490170B - Face candidate frame extraction method - Google Patents

Face candidate frame extraction method Download PDF

Info

Publication number
CN110490170B
CN110490170B CN201910793249.5A CN201910793249A CN110490170B CN 110490170 B CN110490170 B CN 110490170B CN 201910793249 A CN201910793249 A CN 201910793249A CN 110490170 B CN110490170 B CN 110490170B
Authority
CN
China
Prior art keywords
frame
anchor
image
face
anchor frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910793249.5A
Other languages
Chinese (zh)
Other versions
CN110490170A (en
Inventor
刘中秋
陈高曙
祝江威
张伟
唐松鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaxis Biometrics Co Ltd
Original Assignee
Miaxis Biometrics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaxis Biometrics Co Ltd filed Critical Miaxis Biometrics Co Ltd
Priority to CN201910793249.5A priority Critical patent/CN110490170B/en
Publication of CN110490170A publication Critical patent/CN110490170A/en
Application granted granted Critical
Publication of CN110490170B publication Critical patent/CN110490170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Abstract

The invention provides a face candidate frame extraction method, which mainly comprises the following two steps: step 1 is a candidate network training process; step 2 is a candidate frame extraction process; step 1, preparing 3 types of training data and training a candidate network by using the obtained data as a training sample, wherein the training data are a slice image, a label of an anchor frame corresponding to the slice image and a real frame regression vector respectively; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1, wherein the candidate network is a full convolution neural network. The invention combines the multi-anchor frame with the image pyramid, reduces the number of layers of the image pyramid, greatly improves the running speed while ensuring the accuracy, and gives consideration to the accuracy and the real-time performance, so that the invention has good practicability.

Description

Face candidate frame extraction method
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of face detection, in particular to a face candidate frame extraction method.
[ background of the invention ]
Face detection (face detection) is a basic step of related application in the field of computer vision, and plays a decisive role in performance of subsequent related application. The face detection is a computer technology for finding relevant information such as the position, size and posture of a face in any input information (such as images, videos and the like), wherein the position and size of the face are the most basic face information which must be obtained by face detection and are usually marked by a face candidate box.
The current common face detection method comprises the following steps: firstly, carrying out image pyramid operation on an original image to obtain a plurality of layers of pyramid images, secondly, using a candidate network to extract candidate frames which may be human faces in each layer of pyramid images, and finally, using a one-stage or multi-stage refining network to carry out screening and coordinate correction on the candidate frames to finally obtain the human face candidate frames. In the method, the extraction of the face candidate frame needs to process each layer of pyramid images, and the calculation amount is large, so that the time consumption is long, the extraction speed is poor, and the practicability is poor.
[ summary of the invention ]
The invention aims to provide a face candidate frame extraction method, which effectively reduces the calculated amount and improves the extraction speed of the face candidate frame.
The purpose of the invention is realized by the following technical scheme:
a face candidate frame extraction method mainly comprises two steps: step 1: a candidate network training process; step 2: a candidate frame extraction process; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1.
Step 1: preparing 3 types of training data which are respectively a slice image, a label of an anchor frame corresponding to the slice image and a regression vector of a real frame, wherein the corresponding data acquisition mode is as follows:
(1) Slice image:
randomly selecting square slices with side length not less than A from the original image of the training data set, uniformly scaling the side lengths of the square slices to A, and finally obtaining slice images with the size of A x A.
(2) Label of anchor frame corresponding to slice image:
1) Anchor frame corresponding to slice image:
selecting side lengths A scale respectively on an original image corresponding to the slice image by taking the central point of the slice image as the center i K square anchor frames; wherein scale is a scaling factor, and the value range is 0-1; the value of i is 0,1, … and k-1, wherein k is a positive integer and the value satisfies
Figure BDA0002180105280000021
The preferred value is the maximum k value.
2) The label of the anchor frame:
calculating the IOU value of each anchor frame and the corresponding real face frame in the original image,
Figure BDA0002180105280000022
wherein S Is the area of the intersection of the anchor frame and the real face frame, S The area of the anchor frame and the real face frame is collected.And giving labels to the anchor frames by using the maximum IOU value corresponding to each anchor frame, wherein the label categories are divided into positive sample labels, partial sample labels and negative sample labels.
(3) True frame regression vector:
for the anchor frame given with the positive sample label and the partial sample label in the above step (2), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
Figure BDA0002180105280000023
Figure BDA0002180105280000024
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the corresponding anchor frame, w g 、h g Width and height of the real face frame, w a 、h a Corresponding to the width and height of the anchor frame.
And training the candidate network by taking the obtained 3 types of training data as training samples, wherein the training termination condition is that iteration times or loss convergence is met.
Step 2: the extraction process of the candidate frame is as follows:
(a) Carrying out image pyramid processing on an original image to be detected to obtain a multilayer pyramid image;
(b) Inputting the trained candidate network to calculate from the pyramid image of the kth layer and the pyramid image of each j layer to obtain a face confidence coefficient feature map and a prediction frame regression vector feature map, wherein j is a positive integer less than or equal to k;
(c) Selecting pixel points of which the face confidence degrees are more than or equal to a threshold value T in the face confidence degree characteristic diagram in the step (b);
(d) Obtaining an anchor frame corresponding to the pixel point on the original image, namely a primary selection anchor frame, according to the pixel point selected in the step (c);
(e) And (d) aiming at the initial selection anchor frame obtained in the step (d), firstly removing redundant anchor frames by using a non-maximum value inhibition method, and then carrying out coordinate correction on the residual anchor frames by using corresponding predicted frame regression vectors to finally obtain the face candidate frame.
The invention combines the multi-anchor frame with the image pyramid for use, and the pyramid image to be processed is the initial pyramid image
Figure BDA0002180105280000025
The number of layers of the image pyramid is effectively reduced, the running speed is greatly increased while the accuracy is ensured, and both the accuracy and the real-time performance are considered, so that the method has good practicability.
[ description of the drawings ]
Fig. 1 is a schematic flow chart of a candidate network training process according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a candidate frame extraction process according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a candidate network according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described below by referring to the accompanying drawings and listing examples.
The invention provides a face candidate frame extraction method, which mainly comprises the following two steps: step 1 is a candidate network training process, as shown in fig. 1; step 2 is a candidate frame extraction process, as shown in fig. 2; and 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in the step 1.
The candidate network is a convolutional neural network, and the specific network structure is shown in fig. 3.
The candidate network training process of this embodiment is specifically as follows:
(1) Randomly selecting square slices with the side length not less than 24 pixels from an original image of a training data set, uniformly scaling the side lengths of the square slices to 24, and finally obtaining slice images with the size of 24 x 24 pixels as one of candidate network training input data;
(2) Taking the center point of the slice image obtained in the step (1) as the center, obtaining k square anchor frames by scaling, wherein the empirical value of the scaling factor scale is 0.7937, and therefore the side lengths of the square anchor frames are respectively 24 × scale 1 I sequentially takes the values of 0,1, … and k-1,k as 4;
(3) Calculating the IOU value of each anchor frame in the step (2) and the corresponding real face frame in the original image, wherein the calculation formula is as follows:
Figure BDA0002180105280000031
wherein S Is the area of the intersection of the anchor frame and the real face frame, S The area of the anchor frame and the real face frame is collected. Giving a label to each anchor frame by using the maximum IOU value corresponding to each anchor frame;
the assignment criteria for the tags are described as follows: giving a positive sample label to the corresponding anchor frame when the maximum IOU value is more than or equal to 0.5, giving a partial sample label to the corresponding anchor frame when the maximum IOU value is less than 0.5 and more than or equal to 0.3, giving a negative sample label to the corresponding anchor frame when the maximum IOU value is less than 0.2, and not giving labels to the rest anchor frames; the positive sample label anchor frame can be marked as 1, and the negative sample label anchor frame can be marked as 0;
(4) For the anchor frame given with the positive sample label and the partial sample label in the step (3), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
Figure BDA0002180105280000032
Figure BDA0002180105280000033
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the anchor frame, w g 、h g Width and height of the real face frame, w a 、h a The width and height of the anchor frame.
(5) The trained candidate network outputs two types of data, namely face confidence and prediction frame regression vectors, so that for the face confidence, anchor frame labels of all positive sample labels and negative sample labels and slice images corresponding to the anchor frames are used for training; for the predicted frame regression vector, training by using real frame regression vectors of all positive sample label anchor frames and part of sample label anchor frames and slice images corresponding to the anchor frames; and the termination condition of the training is to meet the iteration times or loss convergence, and finally the trained candidate network is obtained.
The specific flow of the candidate box extracting process in this embodiment can be described as follows:
the method comprises the steps of obtaining an original image to be detected with a resolution of 1920 x 1080 from an intelligent foreground, conducting image pyramid processing on the original image to be detected to obtain 18 layers of pyramid images, inputting each 3 layers of pyramid images into a trained candidate network from the 4 th layer of pyramid images, obtaining two outputs of a face confidence coefficient feature image and a prediction frame regression vector feature image, selecting pixel points with a face confidence coefficient larger than or equal to 0.6 from the face confidence coefficient feature image, finding out an anchor frame corresponding to the original image to be an initial selection anchor frame, removing redundant anchor frames in the initial selection anchor frame by using a non-maximum value inhibition method, conducting coordinate correction calculation on the residual anchor frames by using corresponding prediction frame regression vectors, and finally obtaining the face candidate frame.
The embodiment combines the multi-anchor frame with the image pyramid for use, reduces the pyramid image to be processed from the first 18 layers to 6 layers, and effectively reduces the number of image pyramid layers.
Through tests, when a picture with the size of 1920 x 1080 is subjected to face candidate frame extraction on a common personal computer at present, the face candidate frame extraction time in the MTCNN face detection method is about 255ms, and the face candidate frame extraction time in the embodiment of the application is about 33ms, so that the extraction speed of the face candidate frame is greatly optimized, and the method has good practicability.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (5)

1. A face candidate frame extraction method is characterized by mainly comprising the following two steps: step 1: a candidate network training process; step 2: a candidate frame extraction process; step 2, extracting a face candidate frame of the image to be detected based on the candidate network trained in step 1;
the step 1 comprises the following steps: a. preparing 3 types of training data, and b, training the candidate network by using the obtained data as a training sample; the training data are respectively a slice image, a label of an anchor frame corresponding to the slice image and a real frame regression vector, and the data acquisition mode is specifically described as follows:
(1) Slice image:
randomly selecting square slices with side length not less than A from an original image of a training data set, uniformly scaling the side lengths of the square slices to A, and finally obtaining slice images with the size of A x A;
(2) Label of anchor frame corresponding to slice image:
1) Anchor frame corresponding to slice image:
selecting side lengths A scale respectively on an original image corresponding to the slice image by taking the center point of the slice image as the center i K square anchor frames; wherein scale is a scaling factor, and the value range is 0-1; the value of i is 0,1, … and k-1, wherein k is a positive integer and satisfies the condition
Figure FDA0002180105270000011
2) The label of the anchor frame:
calculating the IOU value of each anchor frame and the corresponding real face frame in the original image,
Figure FDA0002180105270000012
wherein S Is the area of the intersection of the anchor frame and the real face frame, S The area of the part is collected by the anchor frame and the real face frame; assigning labels to the anchor frames according to the maximum IOU value corresponding to each anchor frame, wherein the label categories are divided into positive sample labels, partial sample labels and negative sample labels;
(3) True frame regression vector:
for the anchor frame given with the positive sample label and the partial sample label in the above step (2), calculating a real frame regression vector of the real face frame corresponding to the maximum IOU value relative to the corresponding anchor frame, wherein the calculation formula is as follows:
Figure FDA0002180105270000013
Figure FDA0002180105270000014
wherein x g 、y g As coordinates of the center point of the real face frame, x a 、y a As coordinates of the center point of the corresponding anchor frame, w g 、h g Width and height of the real face frame, w a 、h a Corresponding to the width and height of the anchor frame;
the specific process of the step 2 is as follows:
(a) Carrying out image pyramid processing on an original image to be detected to obtain a multilayer pyramid image;
(b) Inputting the trained candidate network to calculate from the pyramid image of the kth layer and the pyramid image of each j layer to obtain a face confidence coefficient feature map and a prediction frame regression vector feature map, wherein j is a positive integer less than or equal to k;
(c) Selecting pixel points of which the face confidence degrees are more than or equal to a threshold value T in the face confidence degree characteristic diagram in the step (b);
(d) Obtaining an anchor frame corresponding to the pixel point on the original image, namely a primary selection anchor frame, according to the pixel point selected in the step (c);
(e) And (d) aiming at the initial selection anchor frame obtained in the step (d), firstly removing redundant anchor frames by using a non-maximum value inhibition method, and then carrying out coordinate correction on the residual anchor frames by using corresponding predicted frame regression vectors to finally obtain the face candidate frame.
2. The method as claimed in claim 1, wherein the candidate network is a full convolutional neural network.
3. The method as claimed in claim 1, wherein k is satisfied
Figure FDA0002180105270000021
Is measured.
4. The method as claimed in claim 1, wherein the threshold T is 0.6.
5. The method of claim 1, wherein the assigned corresponding anchor frame positive sample label with the maximum IOU value of 0.5 or more, the assigned corresponding anchor frame partial sample label with the maximum IOU value of 0.5 or more and 0.3 or more, the assigned corresponding anchor frame negative sample label with the maximum IOU value of 0.2 or less, and the remaining anchor frames are not assigned labels.
CN201910793249.5A 2019-08-27 2019-08-27 Face candidate frame extraction method Active CN110490170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910793249.5A CN110490170B (en) 2019-08-27 2019-08-27 Face candidate frame extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910793249.5A CN110490170B (en) 2019-08-27 2019-08-27 Face candidate frame extraction method

Publications (2)

Publication Number Publication Date
CN110490170A CN110490170A (en) 2019-11-22
CN110490170B true CN110490170B (en) 2023-01-06

Family

ID=68554292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910793249.5A Active CN110490170B (en) 2019-08-27 2019-08-27 Face candidate frame extraction method

Country Status (1)

Country Link
CN (1) CN110490170B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046792A (en) * 2019-12-11 2020-04-21 三一重工股份有限公司 Face detection method and device, electronic equipment and computer readable storage medium
CN114299283B (en) * 2022-03-07 2022-05-20 科大天工智能装备技术(天津)有限公司 Image-based displacement measurement method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning

Also Published As

Publication number Publication date
CN110490170A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN108038409B (en) Pedestrian detection method
CN106845487B (en) End-to-end license plate identification method
CN109583340B (en) Video target detection method based on deep learning
US11222211B2 (en) Method and apparatus for segmenting video object, electronic device, and storage medium
CN110781350B (en) Pedestrian retrieval method and system oriented to full-picture monitoring scene
CN111709909A (en) General printing defect detection method based on deep learning and model thereof
CN107767416B (en) Method for identifying pedestrian orientation in low-resolution image
CN111860439A (en) Unmanned aerial vehicle inspection image defect detection method, system and equipment
CN109886159B (en) Face detection method under non-limited condition
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN106845338B (en) Pedestrian detection method and system in video stream
CN109858327B (en) Character segmentation method based on deep learning
CN115147418B (en) Compression training method and device for defect detection model
CN111160407A (en) Deep learning target detection method and system
CN110490170B (en) Face candidate frame extraction method
CN111931686A (en) Video satellite target tracking method based on background knowledge enhancement
CN112396035A (en) Object detection method and device based on attention detection model
CN116129426A (en) Fine granularity classification method for cervical cell smear 18 category
CN111445388A (en) Image super-resolution reconstruction model training method, ship tracking method and ship tracking device
CN111292308A (en) Convolutional neural network-based infrared defect detection method for photovoltaic solar panel
CN111612802B (en) Re-optimization training method based on existing image semantic segmentation model and application
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input
CN116363535A (en) Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
CN113657225B (en) Target detection method
CN114913504A (en) Vehicle target identification method of remote sensing image fused with self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant