CN110189255B - Face detection method based on two-stage detection - Google Patents

Face detection method based on two-stage detection Download PDF

Info

Publication number
CN110189255B
CN110189255B CN201910455695.5A CN201910455695A CN110189255B CN 110189255 B CN110189255 B CN 110189255B CN 201910455695 A CN201910455695 A CN 201910455695A CN 110189255 B CN110189255 B CN 110189255B
Authority
CN
China
Prior art keywords
face
resolution
image
super
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910455695.5A
Other languages
Chinese (zh)
Other versions
CN110189255A (en
Inventor
于力
刘意文
邹见效
杨瞻远
徐红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910455695.5A priority Critical patent/CN110189255B/en
Publication of CN110189255A publication Critical patent/CN110189255A/en
Application granted granted Critical
Publication of CN110189255B publication Critical patent/CN110189255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution

Abstract

The invention discloses a face detection method based on two-stage detection, which comprises the steps of respectively training a face detection model and a super-resolution reconstruction model based on a GAN network, then inputting a face image to be detected into the face detection model to obtain coordinate information of each candidate region of a face target and a confidence value of the candidate region belonging to a face, carrying out primary judgment according to the confidence value, and then inputting the face target to be determined into a generator in the super-resolution reconstruction model based on the GAN network for further judgment. The invention adopts two-stage detection, and can effectively improve the detection rate of the low-resolution face image.

Description

Face detection method based on two-stage detection
Technical Field
The invention belongs to the technical field of low-resolution face detection, and particularly relates to a face detection method based on two-stage detection.
Background
The face detection problem is originally presented as a sub-problem of a face recognition system, and gradually becomes an independent subject with the continuous and intensive research. The current face detection technology integrates the fields of machine learning, computer vision, mode recognition, artificial intelligence and the like in a crossed manner, becomes the basis of all face image analysis and derivative applications, and has great influence on the response speed and the accurate detection capability of derivative systems. In the process of continuously expanding the application scene of face detection, problems of undersize or excessively low quality of the input face image and the like caused by various reasons are gradually encountered, and for the face images with low resolution, the accuracy of a face detection system is often greatly reduced. The problem of detection of low quality and small size face images is commonly referred to as low resolution face detection.
The essence of the current face detection algorithm is a binary problem, and the basic flow is that effective features are extracted from a region to be detected, and then whether a face exists is judged by the features, and low-resolution face detection is also researched on the basis. The low resolution face has three characteristics: the method has the advantages that the information quantity is small, the noise is high, and the available tools are few, so that a candidate region cannot extract enough effective features to express the region, and the conventional method cannot extract enough effective features to express a low-resolution face from the aspect of feature expression; the inherent deficiency that appears in deep neural networks that the preceding convolutional layer cannot provide a sufficiently powerful feature map, and the following convolutional layer cannot provide enough features of the low-resolution face region, makes it very difficult to detect the low-resolution face.
In order to solve the problem of low-resolution face detection, a great deal of targeted research is carried out by a plurality of excellent scholars, and comprehensively, the scholars at home and abroad mainly focus on the processing of the problem in three directions, namely, a resolution robust feature expression method for a face region is found, and a new classifier and an image super-resolution method are designed according to the characteristics of a low-resolution face. It should be recognized that, the current research for low-resolution small face detection is still in the development stage, and there are many problems to be solved, on one hand, how to effectively extract the context information of the low-resolution face and integrate the context information into the detection network, and still further exploration is needed to provide better performance for the low-resolution face detector; on the other hand, a complete face detection system is necessarily a full-scale face detection system, which requires that the detection capability of faces of other scales must be considered when processing the low-resolution face detection problem, and in fact, the fusion problem of multi-scale detection leads to the low-resolution face detection system being low in accuracy or processing speed, which is a big problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face detection method based on two-stage detection.
In order to achieve the purpose, the face detection method based on deep learning comprises the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and the face image training samples are adopted to train a face detection model;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to a face; presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Less than 1; for each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting;
s4: and (3) inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image R, then inputting the super-resolution reconstruction image R into a discriminator D, judging whether the image R is a qualified super-resolution reconstruction image and whether the image R contains the face target by the discriminator, if the image R is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the candidate area as a face target area, and otherwise, judging that the face target does not exist.
The invention relates to a face detection method based on two-stage detection, which comprises the steps of firstly respectively training a face detection model and a super-resolution reconstruction model based on a GAN network, then inputting a face image to be detected into the face detection model to obtain coordinate information of each candidate region of a face target and a confidence value that the candidate region belongs to a face, carrying out primary judgment according to the confidence value, and then inputting the face target to be determined into a generator in the super-resolution reconstruction model based on the GAN network for further judgment. The invention adopts two-stage detection, and can effectively improve the detection rate of the low-resolution face image.
Drawings
FIG. 1 is a flow chart of an embodiment of a face detection method based on two-stage detection according to the present invention;
FIG. 2 is a schematic diagram of the structure of an R-FCN network;
FIG. 3 is a flowchart of the improved frame regression algorithm in this embodiment;
fig. 4 is a block diagram of a generator in the srna network;
FIG. 5 is a block diagram of an arbiter in an SRGAN network;
FIG. 6 is a PR graph of three methods in this experimental verification;
FIG. 7 is an exemplary diagram of the detection result of the SFD face detection method in the present experimental verification;
FIG. 8 is an exemplary diagram of the detection results of the R-FCN face detection method in the experimental verification;
FIG. 9 is an exemplary diagram of the test results of the present invention in this experimental verification;
FIG. 10 is a PR graph showing the face detection of a clear detection sample set by three methods in the present experimental verification;
FIG. 11 is a PR graph of face detection on a general fuzzy detection sample set by three methods in the experimental verification;
fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification.
Detailed Description
Specific embodiments of the present invention are described below in conjunction with the accompanying drawings so that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a flow chart of a specific embodiment of the face detection method based on two-stage detection according to the present invention. As shown in fig. 1, the two-stage detection-based face detection method of the present invention specifically includes the following steps:
s101: training a face detection model:
the method comprises the steps of obtaining a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples.
S102: training a super-resolution reconstruction model:
the method comprises the steps of obtaining a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, training a super-resolution reconstruction model based on a GAN (generic adaptive Network) Network by adopting the super-resolution face image reconstruction training samples, and generating the super-resolution reconstruction model based on the GAN Network, and the super-resolution reconstruction model comprises a generator G and a discriminator D.
S103: adopting a face detection model to carry out preliminary detection:
and inputting the face image to be detected into a face detection model to obtain the coordinate information of each candidate region of the face target and a confidence value C of the candidate region belonging to the face. Presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Is less than 1. For each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the candidate region as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting.
S104: detecting by adopting a super-resolution reconstruction model:
inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the image SR is a qualified super-resolution reconstruction image and whether the image SR contains the face target by the discriminator, if the image SR is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the face target as a face target area, and otherwise, judging that the face target does not exist in the corresponding candidate area.
By adopting the face detection method based on the two-stage detection, the super-resolution reconstruction model is adopted as the assistance of the face detection model, and the candidate region with low reliability is further detected, so that the missing detection and the false detection of the face target are avoided, and the detection performance is improved.
As for the face detection model, a specific face detection model can be selected as needed, in this embodiment, an R-FCN network is selected as the face detection model, and low-resolution face detection is improved to improve the detection effect. The R-FCN Network is modified on the basis of a traditional fast R-CNN structure, the core design idea is that on the basis of an RPN (regional provider Network) Network proposed in fast RCNN, position sensitive information is introduced, an ROI layer is moved backwards, a position sensitive characteristic diagram is used for calculating the probability that entities in an image to be detected belong to each category, and the detection rate can be greatly improved while high positioning accuracy is kept. FIG. 2 is a schematic diagram of the structure of an R-FCN network. As shown in FIG. 2, the workflow of the R-FCN can be briefly described as follows:
the image is input into a pre-trained classification network (a network before Conv4 of a ResNet-101 network is used in the figure 2), and corresponding network parameters are fixed. There are 3 branches on the feature map (feature map) obtained on the last convolutional layer of the pre-trained network:
the 1 st branch is to perform RPN operation on the feature map to obtain corresponding candidate region ROI, and the specific method is as follows: anchor boxes (Anchors) are generated on the feature map according to preset parameters, the anchor boxes being a set of regions having different sizes and aspect ratios across the input image. And then identifying an anchor frame containing the foreground, and converting the anchor frame into a target Bounding Box (Bounding Box) by using a Bounding Box regression algorithm so that the Bounding Box can more closely fit the contained foreground object.
The 2 nd branch is to obtain a position-sensitive score map (position-sensitive score map) with dimensions K × K (C + 1) on the feature map for classification.
The 3 rd branch is to obtain a position sensitivity score mapping with 4 x K dimensions on the feature for regression;
finally, position-Sensitive ROI Pooling (Position-Sensitive Rol Pooling, used herein) is performed on the K × K (C + 1) -dimensional Position sensitivity score map and the 4 × K-dimensional Position sensitivity score map, respectively, to obtain confidence and Position information of each candidate region, and then the corresponding category is obtained through confidence determination.
In the embodiment, firstly, the generation parameters of the anchor frame are improved. In the conventional R-FCN network, the anchor frame is generated by using three dimensions and three aspect ratios, wherein the three dimensions are {128 × 128,256 × 256,512 × 512} by default, and the three aspect ratios are {1, 2. When the detection target is a small face, omission of the small face region is likely to occur. Therefore, in the present embodiment, the generated dimension of the anchor frame is modified into five dimensions of {16 × 16,32 × 32,128 × 128,256 × 256,512 × 512}, and each dimension generates three anchor frames of aspect ratios {1, 1. Two small scales are added for detecting small faces, and the three scales reserved later are used for extracting face regions with regular sizes.
For the border regression algorithm, the core idea of the prior art mostly adopts an NMS (Non Maximum Suppression algorithm) algorithm, which is to find a local Maximum and suppress a Non-Maximum value, and mainly calculates a cross-over-unity (IoU) ratio with other anchor boxes by using an anchor box with the highest confidence coefficient in an iterative manner, and filters the boxes with larger cross-over-unity. However, the NMS algorithm has been found to have the following problems:
1) The NMS algorithm forcibly sets the confidence of the adjacent candidate frames with the overlapped part to 0, namely forcibly deletes the candidate frames with the IoU value larger than the threshold value directly and roughly in operation, if a real target to be detected appears in the overlapped area, the target is detected to fail with high probability, the missing rate is increased, and the average detection rate is reduced.
2) When the NMS algorithm is used for frame regression, the intersection-to-parallel ratio judgment threshold value N is used t It is difficult to determine the optimal value, and setting too large increases the false detection rate, and setting too small increases the false detection rate.
In order to solve the above problem, the frame regression algorithm is improved based on the NMS algorithm in the present embodiment. Fig. 3 is a flowchart of the improved bounding box regression algorithm in this embodiment. As shown in fig. 3, the specific steps of the improved border regression algorithm in this embodiment include:
s301: initializing data:
including a background anchor frame set B = { B = 1 ,b 2 ,…,b N },b n N =1,2, \ 8230;, N, N indicates the number of anchor boxes containing the background, noting that the confidence of each anchor box is s n . Initializing a set of reserved anchor frames
Figure GDA0003917600010000061
S302: selecting a current optimal anchor frame:
and selecting the anchor frame with the maximum confidence level from the current anchor frame set B, recording the anchor frame as the current optimal anchor frame B ', adding the current optimal anchor frame B ' into the reserved anchor frame set D, and deleting the current optimal anchor frame B ' from the anchor frame set B.
S303: and judging whether the anchor frame set B is empty, if so, finishing the regression of the frame, and otherwise, entering the step S304.
S304: and updating the confidence coefficient:
for each anchor frame B in the current anchor frame set B n Calculating the intersection ratio iou (b ', b) of the current optimal anchor frame b' and the current optimal anchor frame b i ) Then each anchor frame b is updated using the following formula n S confidence of n
Figure GDA0003917600010000062
Wherein N is t Is a preset intersection ratio threshold value.
And then returns to step S302.
As for the super-resolution reconstruction model based on the GAN network, the SRGAN network is employed in the present embodiment. The SRGAN Network is a super-resolution image reconstruction model widely used and having an excellent effect at present, and is trained based on a GAN (generic adaptive Network) Network. The SRGAN network consists of a generator G and a discriminator D. Fig. 4 is a block diagram of a generator in the srna network. Fig. 5 is a block diagram of an arbiter in a srna network. The core of the generator is a number of residual blocks therein, each containing two 3 x 3 convolutional layers followed by a batch normalization layer (BN) and a prlu as activation functions, two 2 × sub-pixel convolutional layers being used to increase the feature size. The discriminator D uses a network structure similar to that of VGG19, but does not perform maxporoling pooling. And the part D of the discriminator comprises 8 convolutional layers, the number of the features is continuously increased along with the continuous deepening of the network, the feature size is continuously reduced, leakyReLU is used as an activation function, and finally the probability of the learned real sample is obtained by utilizing two full-connection layers and a final sigmoid activation function.
The existing SRGAN network has the problems that models are difficult to train and the distributions are overlapped, and researches show that the problems are caused by adopting KL divergence and JS divergence as standards for measuring the distance between the real sample distribution and the generated sample distribution in the traditional SRGAN network. Through research in the embodiment, the EM divergence is adopted to solve the above problems. The EM divergence is a symmetric divergence defined as:
let Ω ∈ R n Is a bounded continuous open set, S is the set of all Radon probability distributions in Ω, if for a certain p ≠ 1, k > 0, the formula for calculating EM divergence is as follows:
Figure GDA0003917600010000071
wherein, P r And P g Representing two different probability distributions, P u Representing a random probability distribution, inf representing the lowest bound, x representing the obedience P r The samples of the distribution are taken as a sample,
Figure GDA0003917600010000072
express compliance P g The samples of the distribution are taken as a sample,
Figure GDA0003917600010000073
represents the samples x and
Figure GDA0003917600010000074
a random linear combination of (2), P u Representing a sample
Figure GDA0003917600010000075
K and p each represent a constant,
Figure GDA0003917600010000076
is the function space of all first-order differentiable functions with tight support property on omega, | | | | | represents to solve the norm.
The advantage of EM divergence is that for two different distributions, even if there is no overlap between them, the distance between the two distributions can still be reflected. This means that meaningful gradients can be provided at any time during training, so that the whole SRGAN network can be stably trained, and the problems of mode collapse and the like caused by gradient disappearance possibly occurring in the original SRGAN network training process can be effectively solved. In the embodiment, an objective function in model training is improved based on EM divergence. Optimizing an objective function based on the maximum and minimum problems of the SRGAN network after EM divergence improvement:
Figure GDA0003917600010000081
where x represents the true high resolution sampleZ denotes the low resolution samples input to the generator G, G (z) is the super-resolution reconstructed samples generated in the generator G, P g Representing the probability distribution, P, of super-resolved reconstructed samples r The probability distribution of the true high resolution sample is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high resolution sample and the super resolution reconstruction sample are the true samples, E [ 2 ]]The mathematical expectation is represented by the mathematical expectation,
Figure GDA0003917600010000082
representing a random linear combination of true high resolution samples x and super-resolution reconstructed samples G (z), P u Representing a sample
Figure GDA0003917600010000083
K and p each represent a constant.
In the training process, the optimization objective function is decomposed into two optimization problems:
1. optimization of the discriminator D:
Figure GDA0003917600010000084
2. optimization of generator G:
Figure GDA0003917600010000085
based on the technical derivation, the invention improves the training method of the SRGAN model to obtain a more advantageous SRGAN model, thereby improving the quality of the super-resolution face image reconstruction result. The specific training method comprises the following steps:
firstly, a plurality of high-resolution face images I are obtained HR Obtaining a corresponding low-resolution face image I through down sampling LR Each high resolution face image I HR And a corresponding low resolution face image I LR And forming a training sample, thereby obtaining a training sample set. In this embodiment, a gaussian pyramid is used for downsampling, and the original image is first processedAnd (3) performing convolution on the image G0 (the 0 th layer of the Gaussian pyramid) serving as the bottommost layer by using a Gaussian kernel (5 x 5), then performing down-sampling (removing even rows and columns) on the convolved image to obtain an image G1 on the upper layer, and performing iteration to complete 4-time down-sampling.
Then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
Figure GDA0003917600010000086
the optimized objective function of the discriminator D is:
Figure GDA0003917600010000087
where x denotes the true high resolution face image, z denotes the low resolution face image input to the generator G, G (z) is the super-resolution reconstructed face image generated in the generator G, P g Representing the probability distribution, P, of a super-resolved reconstructed face image r The probability distribution of the real high-resolution face image is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [ 2 ]]The mathematical expectation is represented by the mathematical expectation,
Figure GDA0003917600010000091
representing a random linear combination of a true high resolution face image x and a super-resolution reconstructed face image G (z), P u Representing a sample
Figure GDA0003917600010000092
K and p each represent a constant.
In the training process of the SRGAN network, firstly, a generator G carries out low-resolution face image I in each training sample X LR Performing super-resolution reconstruction, wherein the specific method comprises the following steps: low resolution face image I in training sample X by generator G LR To carry out the upward miningObtaining a super-resolution reconstruction face image I SR . Because the embodiment is used for the high-resolution face image I HR Carrying out down-sampling by 4 times to obtain a low-resolution face image I SR Thus, in generating super-resolution reconstructed face image I SR Is also 4.
Then the low-resolution face image I LR Corresponding high-resolution face image I HR And the super-resolution reconstructed face image I generated by the generator G SR Inputting the input into a discriminator D, and calculating a loss function L of the training sample according to the following formula SR
Figure GDA0003917600010000093
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003917600010000094
the content loss function of the training sample is expressed by the following calculation formula:
Figure GDA0003917600010000095
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003917600010000096
the content loss function based on the mean square error is expressed by the following calculation formula:
Figure GDA0003917600010000097
wherein W represents a high resolution face image I HR H represents a high resolution face image I HR R, represents the down-sampling factor,
Figure GDA0003917600010000098
representing high resolution face images I HR The pixel value of the pixel point with the middle coordinate of (x, y),
Figure GDA0003917600010000099
representation of super-resolution reconstructed face image I SR The pixel value of the pixel point with the middle coordinate of (x, y).
Figure GDA00039176000100000910
Representing the VGG loss, the formula is as follows:
Figure GDA00039176000100000911
wherein i represents the maximum pooling layer number in the VGG-19 network in the discriminator D, and j represents the number of the convolution layers between the i-th maximum pooling layer and the i + 1-th maximum pooling layer, in the existing VGG-19 network, the maximum pooling layer number is 5, and the convolution layer number between two adjacent maximum pooling layers is 2 or 4. Phi is a unit of i,j A feature map W representing the j convolutional layer acquisition after the ith max pooling layer of the VGG-19 network in the discriminator D i,j Representation of characteristic diagram phi i,j Width of (H) i,j Representation of characteristic diagram phi i,j Is high.
Figure GDA0003917600010000101
Representing the countervailing loss, this portion of the loss function biases the SRGAN network through the "spoof" arbiter towards producing an output that is closer to the natural image, which is calculated as follows:
Figure GDA0003917600010000102
wherein the content of the first and second substances,
Figure GDA0003917600010000103
indicates that the discriminator D reconstructs the face image (i.e. I) from the super-resolution generated by the generator SR ) Subscript θ as the probability of a true high resolution face image D 、θ G Respectively represent discriminators D and DThe network parameters of the generator G, W represents the dimension serial number of the network parameters, W =1,2, \ 8230, and W represents the dimension of the network parameters.
In order to better meet the requirement that the super-resolution reconstruction model needs to detect whether the super-resolution reconstruction image contains the human face target or not, the classification loss L is added when the loss function is calculated clc The calculation formula is as follows:
Figure GDA0003917600010000104
wherein, { y 1 ,y 2 ,…,y v ,…,y V Denotes a high resolution face image I HR Whether the image is the calibration data of the face or not, V represents a high-resolution face image I HR The number of the face areas marked in the middle is in a value range of {0,1}.
Since the improved optimization objective function in this implementation has no log term, adam optimization algorithm can be preferably used to realize the objective function optimization of the generator G and the arbiter, thereby improving the training efficiency. As for the generator G, the weight w of the generator G is updated in a descending order by using an Adam optimization algorithm G
Figure GDA0003917600010000105
Wherein the content of the first and second substances,
Figure GDA0003917600010000106
represents the weight w G Decreasing gradient of z m Representation of super-resolution reconstructed face image I SR The value of the M-th pixel in (M =1,2, \ 8230;, M, M denotes the number of pixels, D (G (z) m ) ) the representation discriminator D judges the super-resolution reconstructed face image I SR The m-th pixel is a high-resolution face image I HR Probability of middle pixel, alpha denotes learning rate, beta 1 Exponential decay Rate, beta, representing an estimate of the first moment 2 The exponential decay rate of the second moment estimate is expressed. Typical values of three parameters of the Adam optimization algorithm are α =0.00001、β 1 =0.9 and β 2 =0.999。
Updating weight w of discriminator D in descending order by Adam optimization algorithm D
Figure GDA0003917600010000107
Wherein the content of the first and second substances,
Figure GDA0003917600010000108
represents the weight w D Gradient of descent, x m Representing high resolution face images I HR Value of mth pixel, D (x) m ) Representation discriminator D for judging high-resolution face image I HR The mth pixel is a high-resolution face image I HR The probability of a middle pixel being in the image,
Figure GDA0003917600010000111
to represent
Figure GDA0003917600010000112
The gradient of the fall-off is,
Figure GDA0003917600010000113
μ m =m/M,
Figure GDA0003917600010000114
the representation discriminator D judges
Figure GDA0003917600010000115
For high resolution face images I HR Probability of a pixel in (c).
In the present embodiment, it is preferable to alternately update the weight w of the generator G G Weight w of sum discriminator D D That is, the parameters of the generator G are first fixed and the parameters of the discriminator D are updated, and then the parameters of the discriminator D are fixed and the parameters of the generator G are updated, and so on alternately.
In order to better illustrate the technical effect of the invention, the invention is experimentally verified by adopting a group of low-resolution face images. In the experimental verification, the face detection model adopts the R-FCN model which is subjected to anchor frame generation parameter improvement and frame regression algorithm improvement in the embodiment, and the super-resolution reconstruction model based on the GAN network adopts the SRGAN model obtained by the improved training method in the embodiment. When a Face detection model and a super-resolution reconstruction model based on a GAN network are trained, a wire Face training sample set is adopted, 10 images are randomly extracted from 61 classifications, and 610 images are taken as detection images in total. In order to realize the comparison of technical effects, an SFD face detection method and an R-FCN face detection method are selected as comparison methods in the experimental verification.
In order to evaluate the technical effects of the face detection method and the comparison method, a PR curve is selected as an evaluation standard. The PR curve is a curve drawn with Precision (Precision) as the ordinate and Recall (Recall) as the abscissa.
FIG. 6 is a PR graph of three methods in this experimental verification. As shown in fig. 6, in the three face detection methods of the present invention, the whole PR curve is closer to the upper right corner, and the value of the mapp (Mean Average Precision) is 0.947, which is also the best in the three sets of data.
Fig. 7 is an exemplary diagram of a detection result of the SFD face detection method in the experimental verification. Fig. 8 is an exemplary diagram of a detection result of the R-FCN face detection method in the experimental verification of this time. Fig. 9 is a diagram illustrating an example of the detection result of the present invention in the verification of the experiment. As can be seen from comparing fig. 7 to fig. 9, the present invention detects 14 faces in total, and shows more excellent detection performance than the other two methods, i.e., 11 and 9 faces respectively.
And then carrying out face detection on the image samples under different definitions. The fuzziness (blu) attributes of each Face target are marked in the widget Face training sample set and are divided into three types of clearness, general fuzziness and severe fuzziness, and accordingly a plurality of samples are extracted from image samples with different fuzziness degrees to form a detection sample set. Fig. 10 is a PR curve diagram of face detection performed on a clear detection sample set by three methods in this experimental verification. Fig. 11 is a PR curve diagram of face detection performed on a general blur detection sample set by three methods in the experimental verification. Fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification. As shown in fig. 11 to 12, when the sample definition is high, the three methods can well detect the face part, and the difference is not very large, and the mAP value is very high; in the test group with the general sample ambiguity, the mAP values of the three algorithms are slightly reduced, but still exceed 97%, which indicates that under the general ambiguity, the three methods have very good detection capability and do not pose too much challenge to the three algorithms. Meanwhile, the invention has some advantages compared with SFD and R-FCN when the face blurring degree is general, but the advantages are not obvious; under the condition that a detected sample is seriously blurred, the difference of the three methods begins to appear, wherein SFD performance is worst, mAP is reduced by about 10 percentage points compared with the condition that the detected sample is clear in blurring degree, the reduction range of the method is minimum, and is reduced by about 5 percentage points, in this case, the mAP value of the method is higher by about 2 percentage points compared with an original R-FCN model, and PR curves can wrap PR curves of the other two comparison methods obviously, so that compared with the other two methods, the method has better stability and higher detection rate under the condition of low resolution.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (8)

1. A face detection method based on two-stage detection is characterized by comprising the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to a face; presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Less than 1; for each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting;
s4: inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the super-resolution reconstruction image SR is a qualified super-resolution reconstruction image and whether the super-resolution reconstruction image SR comprises the face target or not by the discriminator, if the image SR is not the qualified super-resolution reconstruction image and comprises the face target, judging that the face target exists in a corresponding candidate area, outputting the candidate area as a face target area, and otherwise judging that the face target does not exist in the candidate area.
2. The face detection method of claim 1, wherein the face detection model uses an R-FCN network.
3. The face detection method according to claim 2, wherein the generated scale of the anchor frame in the R-FCN network includes five scales {16 x 16,32 x 32,128 x 128,256 x 256,512 x 512}, three aspect ratios {1, 1.
4. The face detection method of claim 1, wherein the GAN network-based super-resolution reconstruction model adopts an SRGAN network.
5. The face detection method of claim 4, wherein the SRGAN network is trained by the following method:
firstly, a plurality of high-resolution face images I are obtained HR Obtaining a corresponding low-resolution face image I through down-sampling LR Each high resolution face image I HR And a corresponding low resolution face image I LR Forming a training sample, thereby obtaining a training sample set;
then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
Figure FDA0003917598000000021
the optimized objective function of the discriminator D is:
Figure FDA0003917598000000022
where x denotes the true high resolution face image, z denotes the low resolution face image input to the generator G, G (z) is the super-resolution reconstructed face image generated in the generator G, P g Representing the probability distribution, P, of a super-resolved reconstructed face image r The probability distribution of the real high-resolution face image is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [ 2 ]]Which represents the mathematical expectation that,
Figure FDA0003917598000000023
representing true high resolution face image x and hyper-resolutionResolution reconstructs a random linear combination of face images G (z), k and p each representing a constant.
6. The face detection method of claim 5, wherein in the SRGAN network training process, the loss function L of the training sample is calculated according to the following formula SR
Figure FDA0003917598000000024
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003917598000000025
a content loss function representing the training samples,
Figure FDA0003917598000000026
denotes the resistance to loss, L clc Indicating a classification loss.
7. The face detection method according to claim 5, wherein in the SRGAN network training process, adam optimization algorithm is adopted to realize the objective function optimization of the generator G and the discriminator D, and the specific method is as follows:
updating the weight w of the generator G in descending order using the Adam optimization algorithm G
Figure FDA0003917598000000027
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003917598000000028
represents the weight w G Decreasing gradient of z m Representation of super-resolution reconstructed face image I SR The value of the M-th pixel in (M =1,2, \ 8230;, M, M denotes the number of pixels, D (G (z) m ) Representation discriminator D judges the super-resolution reconstructed face image I SR The m-th pixel is highResolution face image I HR Probability of middle pixel, alpha denotes learning rate, beta 1 Exponential decay rate, beta, representing an estimate of the first moment 2 An exponential decay rate representing the second moment estimate;
updating weight w of discriminator D in descending order by Adam optimization algorithm D
Figure FDA0003917598000000031
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003917598000000032
represents a weight w D Gradient of descent, x m Representing high resolution face images I HR Value of mth pixel, D (x) m ) Representation discriminator D for discriminating high resolution face image I HR The mth pixel is a high-resolution face image I HR The probability of a middle pixel being in the image,
Figure FDA0003917598000000033
to represent
Figure FDA0003917598000000034
The gradient of the fall-off is,
Figure FDA0003917598000000035
μ m =m/M,
Figure FDA0003917598000000036
the representation discriminator D judges
Figure FDA0003917598000000037
For high resolution face images I HR Probability of a middle pixel.
8. The face detection method of claim 7, wherein the generator G and the discriminator D alternately update the weight w of the generator G when optimizing the objective function G Weight w of sum discriminator D D
CN201910455695.5A 2019-05-29 2019-05-29 Face detection method based on two-stage detection Active CN110189255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910455695.5A CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910455695.5A CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Publications (2)

Publication Number Publication Date
CN110189255A CN110189255A (en) 2019-08-30
CN110189255B true CN110189255B (en) 2023-01-17

Family

ID=67718558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910455695.5A Active CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Country Status (1)

Country Link
CN (1) CN110189255B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705498A (en) * 2019-10-12 2020-01-17 北京泰豪信息科技有限公司 Low-resolution face recognition method
CN110705509A (en) * 2019-10-16 2020-01-17 上海眼控科技股份有限公司 Face direction recognition method and device, computer equipment and storage medium
CN110866484B (en) * 2019-11-11 2022-09-09 珠海全志科技股份有限公司 Driver face detection method, computer device and computer readable storage medium
CN111144215B (en) * 2019-11-27 2023-11-24 北京迈格威科技有限公司 Image processing method, device, electronic equipment and storage medium
CN111222420A (en) * 2019-12-24 2020-06-02 重庆市通信产业服务有限公司 FTP protocol-based low-bandwidth-requirement helmet identification method
CN111339950B (en) * 2020-02-27 2024-01-23 北京交通大学 Remote sensing image target detection method
US11610316B2 (en) * 2020-03-06 2023-03-21 Siemens Healthcare Gmbh Method of computing a boundary
CN113836974A (en) * 2020-06-23 2021-12-24 江苏翼视智能科技有限公司 Monitoring video pedestrian detection method based on super-resolution reconstruction
CN112102234B (en) * 2020-08-06 2022-05-20 复旦大学 Ear sclerosis focus detection and diagnosis system based on target detection neural network
CN112418009B (en) * 2020-11-06 2024-03-22 中保车服科技服务股份有限公司 Image quality detection method, terminal equipment and storage medium
CN112437451B (en) * 2020-11-10 2022-08-02 南京大学 Wireless network flow prediction method and device based on generation countermeasure network
CN112288044B (en) * 2020-12-24 2021-07-27 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN113283306B (en) * 2021-04-30 2023-06-23 青岛云智环境数据管理有限公司 Rodent identification analysis method based on deep learning and migration learning
CN114862683B (en) * 2022-07-07 2022-12-09 浪潮电子信息产业股份有限公司 Model generation method, target detection method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN108090873A (en) * 2017-12-20 2018-05-29 河北工业大学 Pyramid face image super-resolution reconstruction method based on regression model
CN108229381A (en) * 2017-12-29 2018-06-29 湖南视觉伟业智能科技有限公司 Face image synthesis method, apparatus, storage medium and computer equipment

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5696848A (en) * 1995-03-09 1997-12-09 Eastman Kodak Company System for creating a high resolution image from a sequence of lower resolution motion images
US6937135B2 (en) * 2001-05-30 2005-08-30 Hewlett-Packard Development Company, L.P. Face and environment sensing watch
KR101308946B1 (en) * 2012-02-02 2013-09-24 한국과학기술연구원 Method for reconstructing three dimensional facial shape
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
CN106951867B (en) * 2017-03-22 2019-08-23 成都擎天树科技有限公司 Face identification method, device, system and equipment based on convolutional neural networks
CN106874894B (en) * 2017-03-28 2020-04-14 电子科技大学 Human body target detection method based on regional full convolution neural network
CN107154023B (en) * 2017-05-17 2019-11-05 电子科技大学 Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
EP3438920A1 (en) * 2017-07-31 2019-02-06 Institut Pasteur Method, device, and computer program for improving the reconstruction of dense super-resolution images from diffraction-limited images acquired by single molecule localization microscopy
CN108090417A (en) * 2017-11-27 2018-05-29 上海交通大学 A kind of method for detecting human face based on convolutional neural networks
CN108446617B (en) * 2018-03-09 2022-04-22 华南理工大学 Side face interference resistant rapid human face detection method
CN108805027B (en) * 2018-05-03 2020-03-24 电子科技大学 Face recognition method under low resolution condition
CN108681718B (en) * 2018-05-20 2021-08-06 北京工业大学 Unmanned aerial vehicle low-altitude target accurate detection and identification method
CN109543548A (en) * 2018-10-26 2019-03-29 桂林电子科技大学 A kind of face identification method, device and storage medium
CN109614985B (en) * 2018-11-06 2023-06-20 华南理工大学 Target detection method based on densely connected feature pyramid network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN108090873A (en) * 2017-12-20 2018-05-29 河北工业大学 Pyramid face image super-resolution reconstruction method based on regression model
CN108229381A (en) * 2017-12-29 2018-06-29 湖南视觉伟业智能科技有限公司 Face image synthesis method, apparatus, storage medium and computer equipment

Also Published As

Publication number Publication date
CN110189255A (en) 2019-08-30

Similar Documents

Publication Publication Date Title
CN110189255B (en) Face detection method based on two-stage detection
CN110211045B (en) Super-resolution face image reconstruction method based on SRGAN network
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN110738697B (en) Monocular depth estimation method based on deep learning
CN106940816B (en) CT image pulmonary nodule detection system based on 3D full convolution neural network
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN112016507B (en) Super-resolution-based vehicle detection method, device, equipment and storage medium
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
CN109840483B (en) Landslide crack detection and identification method and device
CN111784671A (en) Pathological image focus region detection method based on multi-scale deep learning
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN112348770A (en) Bridge crack detection method based on multi-resolution convolution network
CN109815931B (en) Method, device, equipment and storage medium for identifying video object
CN113298718A (en) Single image super-resolution reconstruction method and system
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
CN116994140A (en) Cultivated land extraction method, device, equipment and medium based on remote sensing image
CN115293966A (en) Face image reconstruction method and device and storage medium
CN112329793B (en) Significance detection method based on structure self-adaption and scale self-adaption receptive fields
CN116311004B (en) Video moving target detection method based on sparse optical flow extraction
CN116758411A (en) Ship small target detection method based on remote sensing image pixel-by-pixel processing
CN116665153A (en) Road scene segmentation method based on improved deep bv3+ network model
CN115358952A (en) Image enhancement method, system, equipment and storage medium based on meta-learning
CN110910332B (en) Visual SLAM system dynamic fuzzy processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant