CN110189255B - Face detection method based on two-stage detection - Google Patents
Face detection method based on two-stage detection Download PDFInfo
- Publication number
- CN110189255B CN110189255B CN201910455695.5A CN201910455695A CN110189255B CN 110189255 B CN110189255 B CN 110189255B CN 201910455695 A CN201910455695 A CN 201910455695A CN 110189255 B CN110189255 B CN 110189255B
- Authority
- CN
- China
- Prior art keywords
- face
- resolution
- image
- super
- face image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 114
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 24
- 238000004422 calculation algorithm Methods 0.000 claims description 23
- 238000009826 distribution Methods 0.000 claims description 16
- 238000005457 optimization Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 8
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 19
- 238000012795 verification Methods 0.000 description 16
- 238000011176 pooling Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 208000029152 Small face Diseases 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229910052704 radon Inorganic materials 0.000 description 1
- SYUHGPGVQRZVTB-UHFFFAOYSA-N radon atom Chemical compound [Rn] SYUHGPGVQRZVTB-UHFFFAOYSA-N 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face detection method based on two-stage detection, which comprises the steps of respectively training a face detection model and a super-resolution reconstruction model based on a GAN network, then inputting a face image to be detected into the face detection model to obtain coordinate information of each candidate region of a face target and a confidence value of the candidate region belonging to a face, carrying out primary judgment according to the confidence value, and then inputting the face target to be determined into a generator in the super-resolution reconstruction model based on the GAN network for further judgment. The invention adopts two-stage detection, and can effectively improve the detection rate of the low-resolution face image.
Description
Technical Field
The invention belongs to the technical field of low-resolution face detection, and particularly relates to a face detection method based on two-stage detection.
Background
The face detection problem is originally presented as a sub-problem of a face recognition system, and gradually becomes an independent subject with the continuous and intensive research. The current face detection technology integrates the fields of machine learning, computer vision, mode recognition, artificial intelligence and the like in a crossed manner, becomes the basis of all face image analysis and derivative applications, and has great influence on the response speed and the accurate detection capability of derivative systems. In the process of continuously expanding the application scene of face detection, problems of undersize or excessively low quality of the input face image and the like caused by various reasons are gradually encountered, and for the face images with low resolution, the accuracy of a face detection system is often greatly reduced. The problem of detection of low quality and small size face images is commonly referred to as low resolution face detection.
The essence of the current face detection algorithm is a binary problem, and the basic flow is that effective features are extracted from a region to be detected, and then whether a face exists is judged by the features, and low-resolution face detection is also researched on the basis. The low resolution face has three characteristics: the method has the advantages that the information quantity is small, the noise is high, and the available tools are few, so that a candidate region cannot extract enough effective features to express the region, and the conventional method cannot extract enough effective features to express a low-resolution face from the aspect of feature expression; the inherent deficiency that appears in deep neural networks that the preceding convolutional layer cannot provide a sufficiently powerful feature map, and the following convolutional layer cannot provide enough features of the low-resolution face region, makes it very difficult to detect the low-resolution face.
In order to solve the problem of low-resolution face detection, a great deal of targeted research is carried out by a plurality of excellent scholars, and comprehensively, the scholars at home and abroad mainly focus on the processing of the problem in three directions, namely, a resolution robust feature expression method for a face region is found, and a new classifier and an image super-resolution method are designed according to the characteristics of a low-resolution face. It should be recognized that, the current research for low-resolution small face detection is still in the development stage, and there are many problems to be solved, on one hand, how to effectively extract the context information of the low-resolution face and integrate the context information into the detection network, and still further exploration is needed to provide better performance for the low-resolution face detector; on the other hand, a complete face detection system is necessarily a full-scale face detection system, which requires that the detection capability of faces of other scales must be considered when processing the low-resolution face detection problem, and in fact, the fusion problem of multi-scale detection leads to the low-resolution face detection system being low in accuracy or processing speed, which is a big problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face detection method based on two-stage detection.
In order to achieve the purpose, the face detection method based on deep learning comprises the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and the face image training samples are adopted to train a face detection model;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to a face; presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Less than 1; for each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting;
s4: and (3) inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image R, then inputting the super-resolution reconstruction image R into a discriminator D, judging whether the image R is a qualified super-resolution reconstruction image and whether the image R contains the face target by the discriminator, if the image R is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the candidate area as a face target area, and otherwise, judging that the face target does not exist.
The invention relates to a face detection method based on two-stage detection, which comprises the steps of firstly respectively training a face detection model and a super-resolution reconstruction model based on a GAN network, then inputting a face image to be detected into the face detection model to obtain coordinate information of each candidate region of a face target and a confidence value that the candidate region belongs to a face, carrying out primary judgment according to the confidence value, and then inputting the face target to be determined into a generator in the super-resolution reconstruction model based on the GAN network for further judgment. The invention adopts two-stage detection, and can effectively improve the detection rate of the low-resolution face image.
Drawings
FIG. 1 is a flow chart of an embodiment of a face detection method based on two-stage detection according to the present invention;
FIG. 2 is a schematic diagram of the structure of an R-FCN network;
FIG. 3 is a flowchart of the improved frame regression algorithm in this embodiment;
fig. 4 is a block diagram of a generator in the srna network;
FIG. 5 is a block diagram of an arbiter in an SRGAN network;
FIG. 6 is a PR graph of three methods in this experimental verification;
FIG. 7 is an exemplary diagram of the detection result of the SFD face detection method in the present experimental verification;
FIG. 8 is an exemplary diagram of the detection results of the R-FCN face detection method in the experimental verification;
FIG. 9 is an exemplary diagram of the test results of the present invention in this experimental verification;
FIG. 10 is a PR graph showing the face detection of a clear detection sample set by three methods in the present experimental verification;
FIG. 11 is a PR graph of face detection on a general fuzzy detection sample set by three methods in the experimental verification;
fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification.
Detailed Description
Specific embodiments of the present invention are described below in conjunction with the accompanying drawings so that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a flow chart of a specific embodiment of the face detection method based on two-stage detection according to the present invention. As shown in fig. 1, the two-stage detection-based face detection method of the present invention specifically includes the following steps:
s101: training a face detection model:
the method comprises the steps of obtaining a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples.
S102: training a super-resolution reconstruction model:
the method comprises the steps of obtaining a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, training a super-resolution reconstruction model based on a GAN (generic adaptive Network) Network by adopting the super-resolution face image reconstruction training samples, and generating the super-resolution reconstruction model based on the GAN Network, and the super-resolution reconstruction model comprises a generator G and a discriminator D.
S103: adopting a face detection model to carry out preliminary detection:
and inputting the face image to be detected into a face detection model to obtain the coordinate information of each candidate region of the face target and a confidence value C of the candidate region belonging to the face. Presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Is less than 1. For each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the candidate region as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting.
S104: detecting by adopting a super-resolution reconstruction model:
inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the image SR is a qualified super-resolution reconstruction image and whether the image SR contains the face target by the discriminator, if the image SR is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the face target as a face target area, and otherwise, judging that the face target does not exist in the corresponding candidate area.
By adopting the face detection method based on the two-stage detection, the super-resolution reconstruction model is adopted as the assistance of the face detection model, and the candidate region with low reliability is further detected, so that the missing detection and the false detection of the face target are avoided, and the detection performance is improved.
As for the face detection model, a specific face detection model can be selected as needed, in this embodiment, an R-FCN network is selected as the face detection model, and low-resolution face detection is improved to improve the detection effect. The R-FCN Network is modified on the basis of a traditional fast R-CNN structure, the core design idea is that on the basis of an RPN (regional provider Network) Network proposed in fast RCNN, position sensitive information is introduced, an ROI layer is moved backwards, a position sensitive characteristic diagram is used for calculating the probability that entities in an image to be detected belong to each category, and the detection rate can be greatly improved while high positioning accuracy is kept. FIG. 2 is a schematic diagram of the structure of an R-FCN network. As shown in FIG. 2, the workflow of the R-FCN can be briefly described as follows:
the image is input into a pre-trained classification network (a network before Conv4 of a ResNet-101 network is used in the figure 2), and corresponding network parameters are fixed. There are 3 branches on the feature map (feature map) obtained on the last convolutional layer of the pre-trained network:
the 1 st branch is to perform RPN operation on the feature map to obtain corresponding candidate region ROI, and the specific method is as follows: anchor boxes (Anchors) are generated on the feature map according to preset parameters, the anchor boxes being a set of regions having different sizes and aspect ratios across the input image. And then identifying an anchor frame containing the foreground, and converting the anchor frame into a target Bounding Box (Bounding Box) by using a Bounding Box regression algorithm so that the Bounding Box can more closely fit the contained foreground object.
The 2 nd branch is to obtain a position-sensitive score map (position-sensitive score map) with dimensions K × K (C + 1) on the feature map for classification.
The 3 rd branch is to obtain a position sensitivity score mapping with 4 x K dimensions on the feature for regression;
finally, position-Sensitive ROI Pooling (Position-Sensitive Rol Pooling, used herein) is performed on the K × K (C + 1) -dimensional Position sensitivity score map and the 4 × K-dimensional Position sensitivity score map, respectively, to obtain confidence and Position information of each candidate region, and then the corresponding category is obtained through confidence determination.
In the embodiment, firstly, the generation parameters of the anchor frame are improved. In the conventional R-FCN network, the anchor frame is generated by using three dimensions and three aspect ratios, wherein the three dimensions are {128 × 128,256 × 256,512 × 512} by default, and the three aspect ratios are {1, 2. When the detection target is a small face, omission of the small face region is likely to occur. Therefore, in the present embodiment, the generated dimension of the anchor frame is modified into five dimensions of {16 × 16,32 × 32,128 × 128,256 × 256,512 × 512}, and each dimension generates three anchor frames of aspect ratios {1, 1. Two small scales are added for detecting small faces, and the three scales reserved later are used for extracting face regions with regular sizes.
For the border regression algorithm, the core idea of the prior art mostly adopts an NMS (Non Maximum Suppression algorithm) algorithm, which is to find a local Maximum and suppress a Non-Maximum value, and mainly calculates a cross-over-unity (IoU) ratio with other anchor boxes by using an anchor box with the highest confidence coefficient in an iterative manner, and filters the boxes with larger cross-over-unity. However, the NMS algorithm has been found to have the following problems:
1) The NMS algorithm forcibly sets the confidence of the adjacent candidate frames with the overlapped part to 0, namely forcibly deletes the candidate frames with the IoU value larger than the threshold value directly and roughly in operation, if a real target to be detected appears in the overlapped area, the target is detected to fail with high probability, the missing rate is increased, and the average detection rate is reduced.
2) When the NMS algorithm is used for frame regression, the intersection-to-parallel ratio judgment threshold value N is used t It is difficult to determine the optimal value, and setting too large increases the false detection rate, and setting too small increases the false detection rate.
In order to solve the above problem, the frame regression algorithm is improved based on the NMS algorithm in the present embodiment. Fig. 3 is a flowchart of the improved bounding box regression algorithm in this embodiment. As shown in fig. 3, the specific steps of the improved border regression algorithm in this embodiment include:
s301: initializing data:
including a background anchor frame set B = { B = 1 ,b 2 ,…,b N },b n N =1,2, \ 8230;, N, N indicates the number of anchor boxes containing the background, noting that the confidence of each anchor box is s n . Initializing a set of reserved anchor frames
S302: selecting a current optimal anchor frame:
and selecting the anchor frame with the maximum confidence level from the current anchor frame set B, recording the anchor frame as the current optimal anchor frame B ', adding the current optimal anchor frame B ' into the reserved anchor frame set D, and deleting the current optimal anchor frame B ' from the anchor frame set B.
S303: and judging whether the anchor frame set B is empty, if so, finishing the regression of the frame, and otherwise, entering the step S304.
S304: and updating the confidence coefficient:
for each anchor frame B in the current anchor frame set B n Calculating the intersection ratio iou (b ', b) of the current optimal anchor frame b' and the current optimal anchor frame b i ) Then each anchor frame b is updated using the following formula n S confidence of n :
Wherein N is t Is a preset intersection ratio threshold value.
And then returns to step S302.
As for the super-resolution reconstruction model based on the GAN network, the SRGAN network is employed in the present embodiment. The SRGAN Network is a super-resolution image reconstruction model widely used and having an excellent effect at present, and is trained based on a GAN (generic adaptive Network) Network. The SRGAN network consists of a generator G and a discriminator D. Fig. 4 is a block diagram of a generator in the srna network. Fig. 5 is a block diagram of an arbiter in a srna network. The core of the generator is a number of residual blocks therein, each containing two 3 x 3 convolutional layers followed by a batch normalization layer (BN) and a prlu as activation functions, two 2 × sub-pixel convolutional layers being used to increase the feature size. The discriminator D uses a network structure similar to that of VGG19, but does not perform maxporoling pooling. And the part D of the discriminator comprises 8 convolutional layers, the number of the features is continuously increased along with the continuous deepening of the network, the feature size is continuously reduced, leakyReLU is used as an activation function, and finally the probability of the learned real sample is obtained by utilizing two full-connection layers and a final sigmoid activation function.
The existing SRGAN network has the problems that models are difficult to train and the distributions are overlapped, and researches show that the problems are caused by adopting KL divergence and JS divergence as standards for measuring the distance between the real sample distribution and the generated sample distribution in the traditional SRGAN network. Through research in the embodiment, the EM divergence is adopted to solve the above problems. The EM divergence is a symmetric divergence defined as:
let Ω ∈ R n Is a bounded continuous open set, S is the set of all Radon probability distributions in Ω, if for a certain p ≠ 1, k > 0, the formula for calculating EM divergence is as follows:
wherein, P r And P g Representing two different probability distributions, P u Representing a random probability distribution, inf representing the lowest bound, x representing the obedience P r The samples of the distribution are taken as a sample,express compliance P g The samples of the distribution are taken as a sample,represents the samples x anda random linear combination of (2), P u Representing a sampleK and p each represent a constant,is the function space of all first-order differentiable functions with tight support property on omega, | | | | | represents to solve the norm.
The advantage of EM divergence is that for two different distributions, even if there is no overlap between them, the distance between the two distributions can still be reflected. This means that meaningful gradients can be provided at any time during training, so that the whole SRGAN network can be stably trained, and the problems of mode collapse and the like caused by gradient disappearance possibly occurring in the original SRGAN network training process can be effectively solved. In the embodiment, an objective function in model training is improved based on EM divergence. Optimizing an objective function based on the maximum and minimum problems of the SRGAN network after EM divergence improvement:
where x represents the true high resolution sampleZ denotes the low resolution samples input to the generator G, G (z) is the super-resolution reconstructed samples generated in the generator G, P g Representing the probability distribution, P, of super-resolved reconstructed samples r The probability distribution of the true high resolution sample is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high resolution sample and the super resolution reconstruction sample are the true samples, E [ 2 ]]The mathematical expectation is represented by the mathematical expectation,representing a random linear combination of true high resolution samples x and super-resolution reconstructed samples G (z), P u Representing a sampleK and p each represent a constant.
In the training process, the optimization objective function is decomposed into two optimization problems:
1. optimization of the discriminator D:
2. optimization of generator G:
based on the technical derivation, the invention improves the training method of the SRGAN model to obtain a more advantageous SRGAN model, thereby improving the quality of the super-resolution face image reconstruction result. The specific training method comprises the following steps:
firstly, a plurality of high-resolution face images I are obtained HR Obtaining a corresponding low-resolution face image I through down sampling LR Each high resolution face image I HR And a corresponding low resolution face image I LR And forming a training sample, thereby obtaining a training sample set. In this embodiment, a gaussian pyramid is used for downsampling, and the original image is first processedAnd (3) performing convolution on the image G0 (the 0 th layer of the Gaussian pyramid) serving as the bottommost layer by using a Gaussian kernel (5 x 5), then performing down-sampling (removing even rows and columns) on the convolved image to obtain an image G1 on the upper layer, and performing iteration to complete 4-time down-sampling.
Then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
the optimized objective function of the discriminator D is:
where x denotes the true high resolution face image, z denotes the low resolution face image input to the generator G, G (z) is the super-resolution reconstructed face image generated in the generator G, P g Representing the probability distribution, P, of a super-resolved reconstructed face image r The probability distribution of the real high-resolution face image is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [ 2 ]]The mathematical expectation is represented by the mathematical expectation,representing a random linear combination of a true high resolution face image x and a super-resolution reconstructed face image G (z), P u Representing a sampleK and p each represent a constant.
In the training process of the SRGAN network, firstly, a generator G carries out low-resolution face image I in each training sample X LR Performing super-resolution reconstruction, wherein the specific method comprises the following steps: low resolution face image I in training sample X by generator G LR To carry out the upward miningObtaining a super-resolution reconstruction face image I SR . Because the embodiment is used for the high-resolution face image I HR Carrying out down-sampling by 4 times to obtain a low-resolution face image I SR Thus, in generating super-resolution reconstructed face image I SR Is also 4.
Then the low-resolution face image I LR Corresponding high-resolution face image I HR And the super-resolution reconstructed face image I generated by the generator G SR Inputting the input into a discriminator D, and calculating a loss function L of the training sample according to the following formula SR :
Wherein,the content loss function of the training sample is expressed by the following calculation formula:
wherein,the content loss function based on the mean square error is expressed by the following calculation formula:
wherein W represents a high resolution face image I HR H represents a high resolution face image I HR R, represents the down-sampling factor,representing high resolution face images I HR The pixel value of the pixel point with the middle coordinate of (x, y),representation of super-resolution reconstructed face image I SR The pixel value of the pixel point with the middle coordinate of (x, y).
wherein i represents the maximum pooling layer number in the VGG-19 network in the discriminator D, and j represents the number of the convolution layers between the i-th maximum pooling layer and the i + 1-th maximum pooling layer, in the existing VGG-19 network, the maximum pooling layer number is 5, and the convolution layer number between two adjacent maximum pooling layers is 2 or 4. Phi is a unit of i,j A feature map W representing the j convolutional layer acquisition after the ith max pooling layer of the VGG-19 network in the discriminator D i,j Representation of characteristic diagram phi i,j Width of (H) i,j Representation of characteristic diagram phi i,j Is high.
Representing the countervailing loss, this portion of the loss function biases the SRGAN network through the "spoof" arbiter towards producing an output that is closer to the natural image, which is calculated as follows:
wherein,indicates that the discriminator D reconstructs the face image (i.e. I) from the super-resolution generated by the generator SR ) Subscript θ as the probability of a true high resolution face image D 、θ G Respectively represent discriminators D and DThe network parameters of the generator G, W represents the dimension serial number of the network parameters, W =1,2, \ 8230, and W represents the dimension of the network parameters.
In order to better meet the requirement that the super-resolution reconstruction model needs to detect whether the super-resolution reconstruction image contains the human face target or not, the classification loss L is added when the loss function is calculated clc The calculation formula is as follows:
wherein, { y 1 ,y 2 ,…,y v ,…,y V Denotes a high resolution face image I HR Whether the image is the calibration data of the face or not, V represents a high-resolution face image I HR The number of the face areas marked in the middle is in a value range of {0,1}.
Since the improved optimization objective function in this implementation has no log term, adam optimization algorithm can be preferably used to realize the objective function optimization of the generator G and the arbiter, thereby improving the training efficiency. As for the generator G, the weight w of the generator G is updated in a descending order by using an Adam optimization algorithm G :
Wherein,represents the weight w G Decreasing gradient of z m Representation of super-resolution reconstructed face image I SR The value of the M-th pixel in (M =1,2, \ 8230;, M, M denotes the number of pixels, D (G (z) m ) ) the representation discriminator D judges the super-resolution reconstructed face image I SR The m-th pixel is a high-resolution face image I HR Probability of middle pixel, alpha denotes learning rate, beta 1 Exponential decay Rate, beta, representing an estimate of the first moment 2 The exponential decay rate of the second moment estimate is expressed. Typical values of three parameters of the Adam optimization algorithm are α =0.00001、β 1 =0.9 and β 2 =0.999。
Updating weight w of discriminator D in descending order by Adam optimization algorithm D :
Wherein,represents the weight w D Gradient of descent, x m Representing high resolution face images I HR Value of mth pixel, D (x) m ) Representation discriminator D for judging high-resolution face image I HR The mth pixel is a high-resolution face image I HR The probability of a middle pixel being in the image,to representThe gradient of the fall-off is,μ m =m/M,the representation discriminator D judgesFor high resolution face images I HR Probability of a pixel in (c).
In the present embodiment, it is preferable to alternately update the weight w of the generator G G Weight w of sum discriminator D D That is, the parameters of the generator G are first fixed and the parameters of the discriminator D are updated, and then the parameters of the discriminator D are fixed and the parameters of the generator G are updated, and so on alternately.
In order to better illustrate the technical effect of the invention, the invention is experimentally verified by adopting a group of low-resolution face images. In the experimental verification, the face detection model adopts the R-FCN model which is subjected to anchor frame generation parameter improvement and frame regression algorithm improvement in the embodiment, and the super-resolution reconstruction model based on the GAN network adopts the SRGAN model obtained by the improved training method in the embodiment. When a Face detection model and a super-resolution reconstruction model based on a GAN network are trained, a wire Face training sample set is adopted, 10 images are randomly extracted from 61 classifications, and 610 images are taken as detection images in total. In order to realize the comparison of technical effects, an SFD face detection method and an R-FCN face detection method are selected as comparison methods in the experimental verification.
In order to evaluate the technical effects of the face detection method and the comparison method, a PR curve is selected as an evaluation standard. The PR curve is a curve drawn with Precision (Precision) as the ordinate and Recall (Recall) as the abscissa.
FIG. 6 is a PR graph of three methods in this experimental verification. As shown in fig. 6, in the three face detection methods of the present invention, the whole PR curve is closer to the upper right corner, and the value of the mapp (Mean Average Precision) is 0.947, which is also the best in the three sets of data.
Fig. 7 is an exemplary diagram of a detection result of the SFD face detection method in the experimental verification. Fig. 8 is an exemplary diagram of a detection result of the R-FCN face detection method in the experimental verification of this time. Fig. 9 is a diagram illustrating an example of the detection result of the present invention in the verification of the experiment. As can be seen from comparing fig. 7 to fig. 9, the present invention detects 14 faces in total, and shows more excellent detection performance than the other two methods, i.e., 11 and 9 faces respectively.
And then carrying out face detection on the image samples under different definitions. The fuzziness (blu) attributes of each Face target are marked in the widget Face training sample set and are divided into three types of clearness, general fuzziness and severe fuzziness, and accordingly a plurality of samples are extracted from image samples with different fuzziness degrees to form a detection sample set. Fig. 10 is a PR curve diagram of face detection performed on a clear detection sample set by three methods in this experimental verification. Fig. 11 is a PR curve diagram of face detection performed on a general blur detection sample set by three methods in the experimental verification. Fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification. As shown in fig. 11 to 12, when the sample definition is high, the three methods can well detect the face part, and the difference is not very large, and the mAP value is very high; in the test group with the general sample ambiguity, the mAP values of the three algorithms are slightly reduced, but still exceed 97%, which indicates that under the general ambiguity, the three methods have very good detection capability and do not pose too much challenge to the three algorithms. Meanwhile, the invention has some advantages compared with SFD and R-FCN when the face blurring degree is general, but the advantages are not obvious; under the condition that a detected sample is seriously blurred, the difference of the three methods begins to appear, wherein SFD performance is worst, mAP is reduced by about 10 percentage points compared with the condition that the detected sample is clear in blurring degree, the reduction range of the method is minimum, and is reduced by about 5 percentage points, in this case, the mAP value of the method is higher by about 2 percentage points compared with an original R-FCN model, and PR curves can wrap PR curves of the other two comparison methods obviously, so that compared with the other two methods, the method has better stability and higher detection rate under the condition of low resolution.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (8)
1. A face detection method based on two-stage detection is characterized by comprising the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to a face; presetting confidence threshold T 1 And T 2 And 0 < T 1 <T 2 Less than 1; for each candidate region, if the corresponding confidence value C ≧ T 2 Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected 1 ≤C<T 2 If not, judging that the candidate area has no human face target, and not outputting;
s4: inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the super-resolution reconstruction image SR is a qualified super-resolution reconstruction image and whether the super-resolution reconstruction image SR comprises the face target or not by the discriminator, if the image SR is not the qualified super-resolution reconstruction image and comprises the face target, judging that the face target exists in a corresponding candidate area, outputting the candidate area as a face target area, and otherwise judging that the face target does not exist in the candidate area.
2. The face detection method of claim 1, wherein the face detection model uses an R-FCN network.
3. The face detection method according to claim 2, wherein the generated scale of the anchor frame in the R-FCN network includes five scales {16 x 16,32 x 32,128 x 128,256 x 256,512 x 512}, three aspect ratios {1, 1.
4. The face detection method of claim 1, wherein the GAN network-based super-resolution reconstruction model adopts an SRGAN network.
5. The face detection method of claim 4, wherein the SRGAN network is trained by the following method:
firstly, a plurality of high-resolution face images I are obtained HR Obtaining a corresponding low-resolution face image I through down-sampling LR Each high resolution face image I HR And a corresponding low resolution face image I LR Forming a training sample, thereby obtaining a training sample set;
then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
the optimized objective function of the discriminator D is:
where x denotes the true high resolution face image, z denotes the low resolution face image input to the generator G, G (z) is the super-resolution reconstructed face image generated in the generator G, P g Representing the probability distribution, P, of a super-resolved reconstructed face image r The probability distribution of the real high-resolution face image is shown, D (x) and D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [ 2 ]]Which represents the mathematical expectation that,representing true high resolution face image x and hyper-resolutionResolution reconstructs a random linear combination of face images G (z), k and p each representing a constant.
6. The face detection method of claim 5, wherein in the SRGAN network training process, the loss function L of the training sample is calculated according to the following formula SR :
7. The face detection method according to claim 5, wherein in the SRGAN network training process, adam optimization algorithm is adopted to realize the objective function optimization of the generator G and the discriminator D, and the specific method is as follows:
updating the weight w of the generator G in descending order using the Adam optimization algorithm G :
Wherein,represents the weight w G Decreasing gradient of z m Representation of super-resolution reconstructed face image I SR The value of the M-th pixel in (M =1,2, \ 8230;, M, M denotes the number of pixels, D (G (z) m ) Representation discriminator D judges the super-resolution reconstructed face image I SR The m-th pixel is highResolution face image I HR Probability of middle pixel, alpha denotes learning rate, beta 1 Exponential decay rate, beta, representing an estimate of the first moment 2 An exponential decay rate representing the second moment estimate;
updating weight w of discriminator D in descending order by Adam optimization algorithm D :
Wherein,represents a weight w D Gradient of descent, x m Representing high resolution face images I HR Value of mth pixel, D (x) m ) Representation discriminator D for discriminating high resolution face image I HR The mth pixel is a high-resolution face image I HR The probability of a middle pixel being in the image,to representThe gradient of the fall-off is,μ m =m/M,the representation discriminator D judgesFor high resolution face images I HR Probability of a middle pixel.
8. The face detection method of claim 7, wherein the generator G and the discriminator D alternately update the weight w of the generator G when optimizing the objective function G Weight w of sum discriminator D D 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910455695.5A CN110189255B (en) | 2019-05-29 | 2019-05-29 | Face detection method based on two-stage detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910455695.5A CN110189255B (en) | 2019-05-29 | 2019-05-29 | Face detection method based on two-stage detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110189255A CN110189255A (en) | 2019-08-30 |
CN110189255B true CN110189255B (en) | 2023-01-17 |
Family
ID=67718558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910455695.5A Expired - Fee Related CN110189255B (en) | 2019-05-29 | 2019-05-29 | Face detection method based on two-stage detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110189255B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705498A (en) * | 2019-10-12 | 2020-01-17 | 北京泰豪信息科技有限公司 | Low-resolution face recognition method |
CN110705509A (en) * | 2019-10-16 | 2020-01-17 | 上海眼控科技股份有限公司 | Face direction recognition method and device, computer equipment and storage medium |
CN110866484B (en) * | 2019-11-11 | 2022-09-09 | 珠海全志科技股份有限公司 | Driver face detection method, computer device and computer readable storage medium |
CN111144215B (en) * | 2019-11-27 | 2023-11-24 | 北京迈格威科技有限公司 | Image processing method, device, electronic equipment and storage medium |
CN111222420A (en) * | 2019-12-24 | 2020-06-02 | 重庆市通信产业服务有限公司 | FTP protocol-based low-bandwidth-requirement helmet identification method |
CN111339950B (en) * | 2020-02-27 | 2024-01-23 | 北京交通大学 | Remote sensing image target detection method |
US11610316B2 (en) * | 2020-03-06 | 2023-03-21 | Siemens Healthcare Gmbh | Method of computing a boundary |
CN113836974A (en) * | 2020-06-23 | 2021-12-24 | 江苏翼视智能科技有限公司 | Monitoring video pedestrian detection method based on super-resolution reconstruction |
CN112102234B (en) * | 2020-08-06 | 2022-05-20 | 复旦大学 | Ear sclerosis focus detection and diagnosis system based on target detection neural network |
CN112183183A (en) * | 2020-08-13 | 2021-01-05 | 南京众智未来人工智能研究院有限公司 | Target detection method and device and readable storage medium |
CN112418009B (en) * | 2020-11-06 | 2024-03-22 | 中保车服科技服务股份有限公司 | Image quality detection method, terminal equipment and storage medium |
CN112437451B (en) * | 2020-11-10 | 2022-08-02 | 南京大学 | Wireless network flow prediction method and device based on generation countermeasure network |
CN112288044B (en) * | 2020-12-24 | 2021-07-27 | 成都索贝数码科技股份有限公司 | News picture attribute identification method of multi-scale residual error network based on tree structure |
CN113283306B (en) * | 2021-04-30 | 2023-06-23 | 青岛云智环境数据管理有限公司 | Rodent identification analysis method based on deep learning and migration learning |
CN114862683B (en) * | 2022-07-07 | 2022-12-09 | 浪潮电子信息产业股份有限公司 | Model generation method, target detection method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239736A (en) * | 2017-04-28 | 2017-10-10 | 北京智慧眼科技股份有限公司 | Method for detecting human face and detection means based on multitask concatenated convolutional neutral net |
CN108090873A (en) * | 2017-12-20 | 2018-05-29 | 河北工业大学 | Pyramid face image super-resolution reconstruction method based on regression model |
CN108229381A (en) * | 2017-12-29 | 2018-06-29 | 湖南视觉伟业智能科技有限公司 | Face image synthesis method, apparatus, storage medium and computer equipment |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5696848A (en) * | 1995-03-09 | 1997-12-09 | Eastman Kodak Company | System for creating a high resolution image from a sequence of lower resolution motion images |
US6937135B2 (en) * | 2001-05-30 | 2005-08-30 | Hewlett-Packard Development Company, L.P. | Face and environment sensing watch |
KR101308946B1 (en) * | 2012-02-02 | 2013-09-24 | 한국과학기술연구원 | Method for reconstructing three dimensional facial shape |
WO2018053340A1 (en) * | 2016-09-15 | 2018-03-22 | Twitter, Inc. | Super resolution using a generative adversarial network |
CN106951867B (en) * | 2017-03-22 | 2019-08-23 | 成都擎天树科技有限公司 | Face identification method, device, system and equipment based on convolutional neural networks |
CN106874894B (en) * | 2017-03-28 | 2020-04-14 | 电子科技大学 | Human body target detection method based on regional full convolution neural network |
CN107154023B (en) * | 2017-05-17 | 2019-11-05 | 电子科技大学 | Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution |
CN107481188A (en) * | 2017-06-23 | 2017-12-15 | 珠海经济特区远宏科技有限公司 | A kind of image super-resolution reconstructing method |
EP3438920A1 (en) * | 2017-07-31 | 2019-02-06 | Institut Pasteur | Method, device, and computer program for improving the reconstruction of dense super-resolution images from diffraction-limited images acquired by single molecule localization microscopy |
CN108090417A (en) * | 2017-11-27 | 2018-05-29 | 上海交通大学 | A kind of method for detecting human face based on convolutional neural networks |
CN108446617B (en) * | 2018-03-09 | 2022-04-22 | 华南理工大学 | Side face interference resistant rapid human face detection method |
CN108805027B (en) * | 2018-05-03 | 2020-03-24 | 电子科技大学 | Face recognition method under low resolution condition |
CN108681718B (en) * | 2018-05-20 | 2021-08-06 | 北京工业大学 | Unmanned aerial vehicle low-altitude target accurate detection and identification method |
CN109543548A (en) * | 2018-10-26 | 2019-03-29 | 桂林电子科技大学 | A kind of face identification method, device and storage medium |
CN109614985B (en) * | 2018-11-06 | 2023-06-20 | 华南理工大学 | Target detection method based on densely connected feature pyramid network |
-
2019
- 2019-05-29 CN CN201910455695.5A patent/CN110189255B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239736A (en) * | 2017-04-28 | 2017-10-10 | 北京智慧眼科技股份有限公司 | Method for detecting human face and detection means based on multitask concatenated convolutional neutral net |
CN108090873A (en) * | 2017-12-20 | 2018-05-29 | 河北工业大学 | Pyramid face image super-resolution reconstruction method based on regression model |
CN108229381A (en) * | 2017-12-29 | 2018-06-29 | 湖南视觉伟业智能科技有限公司 | Face image synthesis method, apparatus, storage medium and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110189255A (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189255B (en) | Face detection method based on two-stage detection | |
CN110211045B (en) | Super-resolution face image reconstruction method based on SRGAN network | |
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
CN106940816B (en) | CT image pulmonary nodule detection system based on 3D full convolution neural network | |
CN112016507B (en) | Super-resolution-based vehicle detection method, device, equipment and storage medium | |
CN114120102A (en) | Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium | |
CN112800964B (en) | Remote sensing image target detection method and system based on multi-module fusion | |
CN107784288B (en) | Iterative positioning type face detection method based on deep neural network | |
CN109840483B (en) | Landslide crack detection and identification method and device | |
CN116994140A (en) | Cultivated land extraction method, device, equipment and medium based on remote sensing image | |
WO2023116632A1 (en) | Video instance segmentation method and apparatus based on spatio-temporal memory information | |
CN109815931B (en) | Method, device, equipment and storage medium for identifying video object | |
CN117911879B (en) | SAM-fused fine-granularity high-resolution remote sensing image change detection method | |
CN114445356A (en) | Multi-resolution-based full-field pathological section image tumor rapid positioning method | |
CN115661860A (en) | Method, device and system for dog behavior and action recognition technology and storage medium | |
CN116758411A (en) | Ship small target detection method based on remote sensing image pixel-by-pixel processing | |
CN112329793A (en) | Significance detection method based on structure self-adaption and scale self-adaption receptive fields | |
CN117475357B (en) | Monitoring video image shielding detection method and system based on deep learning | |
CN115358952A (en) | Image enhancement method, system, equipment and storage medium based on meta-learning | |
CN116311004B (en) | Video moving target detection method based on sparse optical flow extraction | |
CN113657214B (en) | Building damage assessment method based on Mask RCNN | |
CN113205467B (en) | Image processing method and device based on fuzzy detection | |
CN112818833B (en) | Face multitasking detection method, system, device and medium based on deep learning | |
CN118071749B (en) | Training method and system for steel surface defect detection model | |
CN117689880B (en) | Method and system for target recognition in biomedical images based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20230117 |