CN110189255A - Method for detecting human face based on hierarchical detection - Google Patents

Method for detecting human face based on hierarchical detection Download PDF

Info

Publication number
CN110189255A
CN110189255A CN201910455695.5A CN201910455695A CN110189255A CN 110189255 A CN110189255 A CN 110189255A CN 201910455695 A CN201910455695 A CN 201910455695A CN 110189255 A CN110189255 A CN 110189255A
Authority
CN
China
Prior art keywords
face
resolution
image
super
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910455695.5A
Other languages
Chinese (zh)
Other versions
CN110189255B (en
Inventor
于力
刘意文
邹见效
杨瞻远
徐红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910455695.5A priority Critical patent/CN110189255B/en
Publication of CN110189255A publication Critical patent/CN110189255A/en
Application granted granted Critical
Publication of CN110189255B publication Critical patent/CN110189255B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of method for detecting human face based on hierarchical detection, it is trained respectively to Face datection model with the Super-resolution reconstruction established model based on GAN network first, then facial image to be detected is inputted into Face datection model, the coordinate information and the candidate region that obtain each candidate region of human face target belong to the confidence value of face, tentatively judged according to confidence value, the generator that then human face target to be determined is input in the Super-resolution reconstruction established model based on GAN network is further judged.The present invention uses hierarchical detection, can effectively improve the verification and measurement ratio to low-resolution face image.

Description

Face detection method based on two-stage detection
Technical Field
The invention belongs to the technical field of low-resolution face detection, and particularly relates to a face detection method based on two-stage detection.
Background
The face detection problem initially appears as a sub-problem of the face recognition system, and gradually becomes an independent subject with the further research. The current face detection technology integrates the fields of machine learning, computer vision, mode recognition, artificial intelligence and the like in a crossed manner, becomes the basis of all face image analysis and derivative applications, and has great influence on the response speed and the accurate detection capability of derivative systems. In the process of continuously expanding the application scene of face detection, problems of undersize or too low quality of the input face image and the like caused by various reasons are gradually encountered, and for the face image with low resolution, the accuracy of a face detection system is often greatly reduced. The problem of detection of low quality and small size face images is commonly referred to as low resolution face detection.
The essence of the current face detection algorithm is a binary problem, and the basic flow is that effective features are extracted from a region to be detected, and then whether a face exists is judged by the features, and low-resolution face detection is also researched on the basis. The low resolution face has three characteristics: the method has the advantages that the information quantity is small, the noise is high, and the available tools are few, so that a candidate region cannot extract enough effective features to express the region, and the conventional method cannot extract enough effective features to express a low-resolution face from the aspect of feature expression; the inherent deficiency that appears in deep neural networks that the preceding convolutional layer cannot provide a sufficiently powerful feature map, and the following convolutional layer cannot provide enough features of the low-resolution face region, makes it very difficult to detect the low-resolution face.
In order to solve the problem of low-resolution face detection, a great deal of targeted research is carried out by a plurality of excellent scholars, and comprehensively, the scholars at home and abroad mainly focus on the processing of the problem in three directions, namely, a resolution robust feature expression method for a face region is found, and a new classifier and an image super-resolution method are designed according to the characteristics of a low-resolution face. It should be recognized that, the current research for low-resolution small face detection is still in the development stage, and there are many problems to be solved, on one hand, how to effectively extract the context information of the low-resolution face and integrate the context information into the detection network, and still further exploration is needed to provide better performance for the low-resolution face detector; on the other hand, a complete face detection system is necessarily a full-scale face detection system, which requires that the detection capability of faces of other scales must be considered when processing the problem of low-resolution face detection, and in fact, the fusion problem of multi-scale detection leads to the low-resolution face detection system being low in accuracy or processing speed, which is a great problem to be solved urgently.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face detection method based on two-stage detection.
In order to achieve the above object, the face detection method based on deep learning of the present invention comprises the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting the face image to be detected into the face detection model to obtain the personCoordinate information of each candidate region of the face target and a confidence value C of the candidate region belonging to the face; presetting confidence threshold T1And T2And 0 < T1<T2Less than 1; for each candidate region, if the corresponding confidence value C ≧ T2Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected1≤C<T2If not, judging that the candidate area has no human face target, and not outputting;
s4: and (3) inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image R, then inputting the super-resolution reconstruction image R into a discriminator D, judging whether the image R is a qualified super-resolution reconstruction image and whether the image R contains the face target by the discriminator, if the image R is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the candidate area as a face target area, and otherwise, judging that the face target does not exist.
The invention relates to a face detection method based on two-stage detection, which comprises the steps of firstly respectively training a face detection model and a super-resolution reconstruction model based on a GAN network, then inputting a face image to be detected into the face detection model to obtain coordinate information of each candidate region of a face target and a confidence value that the candidate region belongs to a face, carrying out primary judgment according to the confidence value, and then inputting the face target to be determined into a generator in the super-resolution reconstruction model based on the GAN network for further judgment. The invention adopts two-stage detection, and can effectively improve the detection rate of the low-resolution face image.
Drawings
FIG. 1 is a flow chart of an embodiment of a face detection method based on two-stage detection according to the present invention;
FIG. 2 is a schematic diagram of the structure of an R-FCN network;
FIG. 3 is a flowchart of an improved frame regression algorithm in the present embodiment;
fig. 4 is a block diagram of a generator in the srna network;
FIG. 5 is a block diagram of an arbiter in an SRGAN network;
FIG. 6 is a PR graph of three methods in this experimental verification;
FIG. 7 is an exemplary diagram of the detection result of the SFD face detection method in the present experimental verification;
FIG. 8 is an exemplary diagram of the detection results of the R-FCN face detection method in the experimental verification;
FIG. 9 is an exemplary diagram of the test results of the present invention in this experimental verification;
FIG. 10 is a PR graph showing the face detection of a clear detection sample set by three methods in the present experimental verification;
FIG. 11 is a PR graph of face detection on a general fuzzy detection sample set by three methods in the experimental verification;
fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a flowchart of a specific embodiment of a face detection method based on two-stage detection according to the present invention. As shown in fig. 1, the two-stage detection-based face detection method of the present invention specifically includes the following steps:
s101: training a face detection model:
the method comprises the steps of obtaining a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples.
S102: training a super-resolution reconstruction model:
the method comprises the steps of obtaining a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, adopting the super-resolution face image reconstruction training samples to train a super-resolution reconstruction model based on a GAN (generic adaptive Network) Network, and the super-resolution reconstruction model based on the GAN Network comprises a generator G and a discriminator D.
S103: adopting a face detection model to carry out preliminary detection:
inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to the face. Presetting confidence threshold T1And T2And 0 < T1<T2Is less than 1. For each candidate region, if the corresponding confidence value C ≧ T2Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected1≤C<T2If not, judging that the candidate area has no human face target, and not outputting.
S104: detecting by adopting a super-resolution reconstruction model:
inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the image SR is a qualified super-resolution reconstruction image and whether the image SR contains the face target by the discriminator, if the image SR is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the face target as a face target area, and otherwise, judging that the face target does not exist in the corresponding candidate area.
By adopting the face detection method based on the two-stage detection, the super-resolution reconstruction model is adopted as the assistance of the face detection model, and the candidate area with low reliability is further detected, so that the missing detection and the false detection of the face target are avoided, and the detection performance is improved.
As for the face detection model, a specific face detection model can be selected as needed, in this embodiment, an R-FCN network is selected as the face detection model, and low-resolution face detection is improved to improve the detection effect. The R-FCN Network is modified on the basis of a traditional fast R-CNN structure, the core design idea is that on the basis of an RPN (regional provider Network) Network provided in FasterRCNN, position sensitive information is introduced, an ROI layer is moved backwards, a position sensitive characteristic diagram is used for calculating the probability that entities in an image to be detected belong to each category, and the detection rate can be greatly improved while high positioning accuracy is kept. FIG. 2 is a schematic diagram of the structure of an R-FCN network. As shown in FIG. 2, the workflow of the R-FCN can be briefly described as follows:
the images were input into a pre-trained classification network (fig. 2, network before Conv4 using the ResNet-101 network) and their corresponding network parameters were fixed. There are 3 branches on the feature map (feature map) obtained on the last convolutional layer of the pre-trained network:
the 1 st branch is to perform RPN operation on the feature map to obtain corresponding candidate region ROI, and the specific method is as follows: anchor boxes (Anchors) are generated on the feature map according to preset parameters, the anchor boxes being a set of regions having different sizes and aspect ratios across the input image. And then identifying an anchor frame containing the foreground, and converting the anchor frame into a target Bounding Box (Bounding Box) by using a Bounding Box regression algorithm so that the Bounding Box can be more closely fitted with the contained foreground object.
The 2 nd branch is to obtain a position-sensitive score map (position-sensitive score map) with dimension K × K (C +1) on the feature map for classification.
The 3 rd branch is to obtain a position sensitivity score mapping with 4 x K dimensions on the feature for regression;
finally, Position-Sensitive ROI Pooling (Position-Sensitive Rol Pooling, used herein) is performed on the K × K (C +1) -dimensional Position-Sensitive score map and the 4 × K-dimensional Position-Sensitive score map, respectively, to obtain the confidence and Position information of each candidate region, and then the corresponding category is obtained through confidence determination.
In the embodiment, firstly, the generation parameters of the anchor frame are improved. In the conventional R-FCN network, the anchor frame is generated by using three dimensions and three aspect ratios, wherein the three dimensions are {128 × 128,256 × 256,512 × 512} and the three aspect ratios are {1:1,1:2,2:1} by default, so that 9 dimensions can be obtained. When the detection target is a small face, omission of the small face region is likely to occur. Therefore, in the embodiment, the generation scale of the anchor frame is modified into five scales of {16 × 16,32 × 32,128 × 128,256 × 256,512 × 512}, and each scale generates three anchor frames with length-width ratios of {1:1,1:2,2:1} for 15 sizes. Two small scales are added for detecting small faces, and the three scales reserved later are used for extracting face regions with regular sizes.
In terms of a border regression algorithm, the prior art mostly adopts an NMS (Non Maximum Suppression algorithm) algorithm, and a core idea thereof is to find a local Maximum and suppress a Non-Maximum, mainly by using an anchor frame with the highest confidence to calculate a cross-over-unity (IoU, which represents an overlapping rate of a candidate frame and a standard frame) with other anchor frames in an iterative manner, and filter those frames with larger cross-over. However, it has been found that the NMS algorithm has the following problems:
1) the NMS algorithm forcibly sets the confidence of the adjacent candidate boxes with the overlapped part to be 0, namely forcibly deletes the candidate box with the IoU value being larger than the threshold value directly and roughly in operation, if a real target to be detected appears in the overlapped area, the target is detected to fail with a high probability, the missing rate is increased, and the average detection rate is reduced.
2) When the NMS algorithm is used for frame regression, the intersection-to-parallel ratio judgment threshold value N is usedtIt is difficult to determine the optimal value, and setting too large increases the false detection rate, and setting too small increases the false detection rate.
In order to solve the above problem, the frame regression algorithm is improved based on the NMS algorithm in the present embodiment. Fig. 3 is a flowchart of the improved bounding box regression algorithm in this embodiment. As shown in fig. 3, the specific steps of the improved border regression algorithm in this embodiment include:
s301: initializing data:
note that the anchor frame set B ═ B containing the background1,b2,…,bN},bnN is 1,2, …, N represents the number of anchor frames containing background, and the confidence of each anchor frame is sn. Initializing a set of reserved anchor frames
S302: selecting a current optimal anchor frame:
and selecting the anchor frame with the maximum confidence level from the current anchor frame set B, recording the anchor frame as the current optimal anchor frame B ', adding the current optimal anchor frame B ' into the reserved anchor frame set D, and deleting the current optimal anchor frame B ' from the anchor frame set B.
S303: and judging whether the anchor frame set B is empty, if so, finishing the regression of the frame, and otherwise, entering the step S304.
S304: and updating the confidence coefficient:
for each anchor frame B in the current anchor frame set BnCalculating the intersection ratio iou (b ', b) of the current optimal anchor frame b' and the current optimal anchor frame bi) Then each anchor frame b is updated using the following formulanS confidence ofn
Wherein N istIs a preset intersection ratio threshold value.
And then returns to step S302.
As for the super-resolution reconstruction model based on the GAN network, the SRGAN network is employed in the present embodiment. The SRGAN Network is a super-resolution image reconstruction model widely used and having an excellent effect at present, and is trained based on a GAN (generic adaptive Network) Network. The SRGAN network consists of a generator G and a discriminator D. Fig. 4 is a block diagram of a generator in the srna network. Fig. 5 is a block diagram of an arbiter in a srna network. The core of the generator is a number of residual blocks therein, each containing two 3 x 3 convolutional layers followed by a batch normalization layer (BN) and a prellu as activation functions, two 2 x sub-pixel convolutional layers being used to increase the feature size. Arbiter D employs a network structure similar to VGG19, but does not perform maxporoling pooling. And the part D of the discriminator comprises 8 convolutional layers, the number of the features is continuously increased along with the continuous deepening of the network, the feature size is continuously reduced, LeakyReLU is used as an activation function, and finally the probability of the learned real sample is obtained by utilizing two full-connection layers and a final sigmoid activation function.
The existing SRGAN network has the problems that models are difficult to train and the distributions are overlapped, and researches show that the problems are caused by adopting KL divergence and JS divergence as standards for measuring the distance between the real sample distribution and the generated sample distribution in the traditional SRGAN network. Through research in the embodiment, the EM divergence is adopted to solve the above problems. The EM divergence is a symmetric divergence defined as:
let omega be an element of RnIs a bounded continuous open set, S is the set of all Radon probability distributions in Ω, if for a certain p ≠ 1, k > 0, the calculation formula for EM divergence is as follows:
wherein, PrAnd PgRepresenting two different probability distributions, PuRepresenting a random probability distribution, inf representing the lowest bound, x representing the obedience PrThe samples of the distribution are taken as a sample,express compliance PgThe samples of the distribution are taken as a sample,represents the samples x anda random linear combination of PuRepresenting a sampleK and p each represent a constant,is the function space of all first-order differentiable functions with tight support property on omega, | | | | | represents to solve the norm.
The advantage of EM divergence is that for two different distributions, even if there is no overlap between them, the distance between the two distributions can still be reflected. This means that meaningful gradients can be provided at any time during training, so that the whole SRGAN network can be stably trained, and the problems of mode collapse and the like caused by gradient disappearance possibly occurring in the original SRGAN network training process can be effectively solved. In the embodiment, an objective function in model training is improved based on EM divergence. Optimizing an objective function based on the maximum and minimum problems of the SRGAN network after EM divergence improvement:
where x denotes true high resolution samples, z denotes low resolution samples input to the generator G, G (z) is the super-resolution reconstructed sample generated in the generator G, PgRepresenting the probability distribution, P, of super-resolved reconstructed samplesrThe probability distribution of the real high resolution sample is shown, D (x), D (G (z)) respectively show the probability that the discriminator D judges whether the high resolution sample and the super resolution reconstruction sample are the real samples, E [ 2 ]]The mathematical expectation is represented by the mathematical expectation,representing a random linear combination of true high resolution samples x and super resolution reconstruction samples G (z), PuRepresenting a sampleK and p each represent a constant.
In the training process, the optimization objective function is decomposed into two optimization problems:
1. optimization of the discriminator D:
2. optimization of generator G:
based on the technical derivation, the invention improves the training method of the SRGAN model to obtain a more advantageous SRGAN model, thereby improving the quality of the super-resolution face image reconstruction result. The specific training method comprises the following steps:
firstly, a plurality of high-resolution face images I are obtainedHRObtaining a corresponding low-resolution face image I through down samplingLREach high resolution face image IHRAnd a corresponding low resolution face image ILRAnd forming a training sample, thereby obtaining a training sample set. In this embodiment, downsampling is performed using a gaussian pyramid, the original image is first convolved with a gaussian kernel (5 × 5) as a bottom layer image G0 (layer 0 of the gaussian pyramid), and then downsampled (even rows and columns are removed) to obtain an upper layer image G1, and downsampling is performed iteratively by 4 times.
Then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
the optimized objective function of the discriminator D is:
wherein x denotes a true high resolution face image, z denotes a low resolution face image input to the generator G, G (z) is a super-resolution reconstructed face image generated in the generator G, PgRepresenting the probability distribution, P, of a super-resolved reconstructed face imagerThe probability distribution of the real high-resolution face image is shown, D (x), D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [, ]]The mathematical expectation is represented by the mathematical expectation,representing a random linear combination of the true high resolution face image x and the super-resolution reconstructed face image G (z), PuRepresenting a sampleK and p each represent a constant.
In the training process of the SRGAN network, firstly, a generator G carries out low-resolution face image I in each training sample XLRPerforming super-resolution reconstruction, wherein the specific method comprises the following steps: low resolution face image I in training sample X by generator GLRPerforming up-sampling to obtain a super-resolution reconstructed face image ISR. Because the embodiment is used for the high-resolution face image IHR4 times of down sampling is carried out to obtain a low-resolution face image ISRThus, in generating a super-resolution reconstructed face image ISRIs also 4.
Then the low-resolution face image ILRCorresponding high-resolution face image IHRAnd the super-resolution reconstructed face image I generated by the generator GSRInputting the input into a discriminator D, and calculating a loss function L of the training sample according to the following formulaSR
Wherein,the content loss function of the training sample is expressed by the following calculation formula:
wherein,the content loss function based on the mean square error is expressed by the following calculation formula:
wherein W represents a high resolution face image IHRH represents a high resolution face image IHRR, represents the down-sampling factor,representing high resolution face images IHRThe pixel value of the pixel point with the middle coordinate of (x, y),representation of super-resolution reconstructed face image ISRAnd the pixel value of the pixel point with the middle coordinate of (x, y).
Representing the VGG loss, the calculation formula is as follows:
wherein i represents the maximum pooling layer number in the VGG-19 network in the discriminator D, and j represents the number of the convolution layers between the i-th maximum pooling layer and the i + 1-th maximum pooling layer, in the existing VGG-19 network, the maximum pooling layer number is 5, and the convolution layer number between two adjacent maximum pooling layers is 2 or 4. Phi is ai,jA feature map W representing the j convolutional layer acquisition after the i-th max pooling layer of the VGG-19 network in the discriminator Di,jRepresentation of the characteristic diagram phii,jWidth of (H)i,jRepresentation of the characteristic diagram phii,jIs high.
Representing the countervailing loss, this portion of the loss function biases the SRGAN network through the "spoof" discriminator to produce an output that is closer to the natural image, as calculated by the following equation:
wherein,indicates that the discriminator D reconstructs the face image (i.e. I) from the super-resolution generated by the generatorSR) Subscript θ as the probability of a true high resolution face imageD、θGThe network parameters of the discriminator D and the generator G are respectively represented, W represents the dimension number of the network parameter, W is 1,2, …, and W represents the dimension of the network parameter.
In the invention, the super-resolution reconstruction model needs to detect whether the super-resolution reconstruction image contains the face target, and in order to better meet the requirement, the classification loss L is added when the loss function is calculatedclcThe calculation formula is as follows:
wherein, { y1,y2,…,yv,…,yVDenotes a high resolution face image IHRWhether the image is the calibration data of the face or not, V represents a high-resolution face image IHRThe number of the face areas marked in the middle is in a value range of {0,1 }.
Since the improved optimization objective function in the implementation has no log term, the Adam optimization algorithm can be optimized to realize the objective function optimization of the generator G and the discriminator, thereby improving the training efficiency. As for the generator G, the weight of the generator G is updated in a descending order by using an Adam optimization algorithmwG
Wherein,represents a weight wGDecreasing gradient of zmRepresentation of super-resolution reconstructed face image ISRThe value of the mth pixel, M being 1,2, …, M representing the number of pixels, D (G (z)m) ) the representation discriminator D judges the super-resolution reconstructed face image ISRThe m-th pixel is a high-resolution face image IHRProbability of middle pixel, α denotes learning rate, β1Exponential decay Rate representing first moment estimate, β2Typical values of the three parameters of the Adam optimization algorithm are α -0.00001, β10.9 and β2=0.999。
Updating weight w of discriminator D in descending order by using Adam optimization algorithmD
Wherein,represents a weight wDDecreasing gradient, xmRepresenting high resolution face images IHRValue of mth pixel, D (x)m) Representation discriminator D for judging high-resolution face image IHRThe mth pixel is a high-resolution face image IHRThe probability of a middle pixel being in the image,to representThe gradient of the fall-off is,μm=m/M,the representation discriminator D judgesFor high resolution face images IHRProbability of a pixel in (c).
In the present embodiment, it is preferable to alternately update the weight w of the generator GGWeight w of sum discriminator DDThat is, the parameters of the generator G are first fixed and the parameters of the discriminator D are updated, and then the parameters of the discriminator D are fixed and the parameters of the generator G are updated, and so on alternately.
In order to better illustrate the technical effect of the invention, the invention is experimentally verified by adopting a group of low-resolution face images. In the experimental verification, the face detection model adopts the R-FCN model which is subjected to anchor frame generation parameter improvement and frame regression algorithm improvement in the embodiment, and the super-resolution reconstruction model based on the GAN network adopts the SRGAN model obtained by the improved training method in the embodiment. When a Face detection model and a super-resolution reconstruction model based on a GAN network are trained, a wire Face training sample set is adopted, 10 images are randomly extracted from 61 classifications, and 610 images are taken as detection images in total. In order to realize the comparison of technical effects, an SFD face detection method and an R-FCN face detection method are selected as comparison methods in the experimental verification.
In order to evaluate the technical effects of the face detection method and the comparison method, a PR curve is selected as an evaluation standard. The PR curve is a curve drawn with Precision (Precision) as the ordinate and Recall (Recall) as the abscissa.
Fig. 6 is a PR curve diagram of three methods in this experimental verification. As shown in fig. 6, in the three face detection methods of the present invention, the PR curve is closer to the upper right corner as a whole, and the value of the mapp (Mean Average Precision, i.e., Average AP value) is 0.947, which is also the best in the three sets of data.
Fig. 7 is an exemplary diagram of a detection result of the SFD face detection method in the experimental verification. Fig. 8 is an exemplary diagram of a detection result of the R-FCN face detection method in the experimental verification of this time. Fig. 9 is an exemplary diagram of the detection result of the present invention in the verification of the experiment. As can be seen from comparing fig. 7 to fig. 9, the present invention detects 14 faces in total, and shows more excellent detection performance than the other two methods, i.e., 11 and 9 faces respectively.
And then carrying out face detection on the image samples under different definitions. The fuzziness (blu) attributes of each Face target are marked in the widget Face training sample set and are divided into three types of clearness, general fuzziness and severe fuzziness, and accordingly a plurality of samples are extracted from image samples with different fuzziness degrees to form a detection sample set. Fig. 10 is a PR curve diagram of face detection performed on a clear detection sample set by three methods in this experimental verification. Fig. 11 is a PR curve diagram of face detection performed on a general blur detection sample set by three methods in the experimental verification. Fig. 12 is a PR curve diagram of face detection performed on a severely blurred detection sample set by three methods in the experimental verification. As shown in fig. 11 to 12, the three methods can well detect the face part when the sample definition is high, and the difference is not very large, and the mAP value is very high; in the test group with the general sample ambiguity, the mAP values of the three algorithms are slightly reduced, but still exceed 97%, which indicates that under the general ambiguity, the three methods have very good detection capability and do not pose too much challenge to the three algorithms. Meanwhile, the invention has some advantages compared with SFD and R-FCN when the face blurring degree is general, but the advantages are not obvious; under the condition that a detected sample is seriously blurred, the difference of the three methods begins to appear, wherein SFD performance is worst, mAP is reduced by about 10 percentage points compared with the condition that the detected sample is clear in blurring degree, the reduction range of the method is minimum, and is reduced by about 5 percentage points, in this case, the mAP value of the method is higher by about 2 percentage points compared with an original R-FCN model, and PR curves can wrap PR curves of the other two comparison methods obviously, so that compared with the other two methods, the method has better stability and higher detection rate under the condition of low resolution.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (9)

1. A face detection method based on two-stage detection is characterized by comprising the following steps:
s1: acquiring a plurality of face image training samples, wherein each training sample comprises a face image and face target information, and training a face detection model by adopting the face image training samples;
s2: acquiring a plurality of super-resolution face image reconstruction training samples, wherein each training sample comprises a low-resolution image containing a face and a corresponding high-resolution image, the super-resolution face image reconstruction training samples are adopted to train a super-resolution reconstruction model based on a GAN network, and the super-resolution reconstruction model based on the GAN network comprises a generator G and a discriminator D;
s3: inputting a face image to be detected into a face detection model to obtain coordinate information of each candidate region of a face target and a confidence value C of the candidate region belonging to a face; presetting confidence threshold T1And T2And 0 < T1<T2Less than 1; for each candidate region, if the corresponding confidence value C ≧ T2Judging that the candidate region has a face target, outputting the face target as a face target region, and if the corresponding confidence value T is detected1≤C<T2If not, judging that the candidate area has no human face target, and not outputting;
s4: inputting each face target to be determined into a generator G in a super-resolution reconstruction model based on a GAN network to generate a super-resolution reconstruction image SR, then inputting the super-resolution reconstruction image SR into a discriminator D, judging whether the image SR is a qualified super-resolution reconstruction image and whether the image SR contains the face target by the discriminator, if the image SR is the qualified super-resolution reconstruction image and contains the face target, judging that the face target exists in a corresponding candidate area, outputting the face target as a face target area, and otherwise, judging that the face target does not exist in the corresponding candidate area.
2. The face detection method of claim 1, wherein the face detection model uses an R-FCN network.
3. The face detection method of claim 2, wherein the generated scale of the anchor frame in the R-FCN network includes five scales {16 x 16,32 x 32,128 x 128,256 x 256,512 x 512}, and three aspect ratios {1:1,1:2,2:1 }.
4. The face detection method of claim 2, wherein the border regression algorithm in the R-FCN network comprises the following specific steps:
1) note that the anchor frame set B ═ B containing the background1,b2,…,bN},bnN is 1,2, …, N represents the number of anchor frames containing background, and the confidence of each anchor frame is sn. Initializing a set of reserved anchor frames
2) Selecting an anchor frame with the maximum confidence level from the current anchor frame set B, recording the anchor frame as a current optimal anchor frame B ', adding the current optimal anchor frame B ' into the reserved anchor frame set D, and deleting the current optimal anchor frame B ' from the anchor frame set B;
3) judging whether the anchor frame set B is empty, if so, finishing frame regression, and otherwise, entering the step 4);
4) for each anchor frame B in the current anchor frame set BnCalculating the intersection ratio iou (b ', b) of the current optimal anchor frame b' and the current optimal anchor frame bi) Then each anchor frame b is updated using the following formulanS confidence ofn
Wherein N istIs a preset cross-over ratio threshold;
and then returns to step 2).
5. The face detection method of claim 1, wherein the GAN network-based super-resolution reconstruction model adopts an SRGAN network.
6. The face detection method of claim 5, wherein the SRGAN network is trained by the following method:
firstly, a plurality of high-resolution face images I are obtainedHRObtaining a corresponding low-resolution face image I through down samplingLREach high resolution face image IHRAnd a corresponding low resolution face image ILRForming a training sample, thereby obtaining a training sample set;
then, training the SRGAN network by using the obtained training sample set, wherein the optimization objective function of the generator G in the training process is as follows:
the optimized objective function of the discriminator D is:
wherein x denotes a true high resolution face image, z denotes a low resolution face image input to the generator G, G (z) is a super-resolution reconstructed face image generated in the generator G, PgRepresenting the probability distribution, P, of a super-resolved reconstructed face imagerThe probability distribution of the real high-resolution face image is shown, D (x), D (G (z)) respectively show the probability that the discriminator D judges whether the high-resolution face image and the super-resolution reconstructed face image are the real face images, E [, ]]The mathematical expectation is represented by the mathematical expectation,representing a random linear combination of the true high resolution face image x and the super resolution reconstructed face image g (z), k and p each representing a constant.
7. The face detection method of claim 6, wherein in the SRGAN network training process, the loss function L of the training sample is calculated according to the following formulaSR
Wherein,a content loss function representing the training samples,denotes the loss of antagonism, LclcIndicating a classification loss.
8. The face detection method of claim 6, wherein in the SRGAN network training process, an Adam optimization algorithm is adopted to realize objective function optimization of a generator G and a discriminator D, and the specific method is as follows:
updating the weight w of the generator G in descending order using the Adam optimization algorithmG
Wherein,represents a weight wGDecreasing gradient of zmRepresentation of super-resolution reconstructed face image ISRThe value of the mth pixel, M being 1,2, …, M representing the number of pixels, D (G (z)m) ) the representation discriminator D judges the super-resolution reconstructed face image ISRThe m-th pixel is a high-resolution face image IHRProbability of middle pixel, α denotes learning rate, β1Exponential decay Rate representing first moment estimate, β2An exponential decay rate representing the second moment estimate;
updating weight w of discriminator D in descending order by using Adam optimization algorithmD
Wherein,represents a weight wDDecreasing gradient, xmRepresenting high resolution face images IHRM-th pixelValue of (a), D (x)m) Representation discriminator D for judging high-resolution face image IHRThe mth pixel is a high-resolution face image IHRThe probability of a middle pixel being in the image,to representThe gradient of the fall-off is,μm=m/M,the representation discriminator D judgesFor high resolution face images IHRProbability of a pixel in (c).
9. The super-resolution facial image reconstruction method according to claim 8, wherein the weight w of the generator G is updated alternately during the optimization of the objective function of the generator G and the discriminator DGWeight w of sum discriminator DD
CN201910455695.5A 2019-05-29 2019-05-29 Face detection method based on two-stage detection Expired - Fee Related CN110189255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910455695.5A CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910455695.5A CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Publications (2)

Publication Number Publication Date
CN110189255A true CN110189255A (en) 2019-08-30
CN110189255B CN110189255B (en) 2023-01-17

Family

ID=67718558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910455695.5A Expired - Fee Related CN110189255B (en) 2019-05-29 2019-05-29 Face detection method based on two-stage detection

Country Status (1)

Country Link
CN (1) CN110189255B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705509A (en) * 2019-10-16 2020-01-17 上海眼控科技股份有限公司 Face direction recognition method and device, computer equipment and storage medium
CN110705498A (en) * 2019-10-12 2020-01-17 北京泰豪信息科技有限公司 Low-resolution face recognition method
CN110866484A (en) * 2019-11-11 2020-03-06 珠海全志科技股份有限公司 Driver face detection method, computer device and computer readable storage medium
CN111144215A (en) * 2019-11-27 2020-05-12 北京迈格威科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN111222420A (en) * 2019-12-24 2020-06-02 重庆市通信产业服务有限公司 FTP protocol-based low-bandwidth-requirement helmet identification method
CN111339950A (en) * 2020-02-27 2020-06-26 北京交通大学 Remote sensing image target detection method
CN112102234A (en) * 2020-08-06 2020-12-18 复旦大学 Ear sclerosis focus detection and diagnosis system based on target detection neural network
CN112183183A (en) * 2020-08-13 2021-01-05 南京众智未来人工智能研究院有限公司 Target detection method and device and readable storage medium
CN112288044A (en) * 2020-12-24 2021-01-29 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN112418009A (en) * 2020-11-06 2021-02-26 中保车服科技服务股份有限公司 Image quality detection method, terminal device and storage medium
CN112437451A (en) * 2020-11-10 2021-03-02 南京大学 Wireless network flow prediction method and device based on generation countermeasure network
CN113283306A (en) * 2021-04-30 2021-08-20 青岛云智环境数据管理有限公司 Rodent identification and analysis method based on deep learning and transfer learning
US20210279884A1 (en) * 2020-03-06 2021-09-09 Siemens Healthcare Gmbh Method of computing a boundary
CN113836974A (en) * 2020-06-23 2021-12-24 江苏翼视智能科技有限公司 Monitoring video pedestrian detection method based on super-resolution reconstruction
CN114862683A (en) * 2022-07-07 2022-08-05 浪潮电子信息产业股份有限公司 Model generation method, target detection method, device, equipment and medium

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5696848A (en) * 1995-03-09 1997-12-09 Eastman Kodak Company System for creating a high resolution image from a sequence of lower resolution motion images
US20020180586A1 (en) * 2001-05-30 2002-12-05 Kitson Frederick Lee Face and environment sensing watch
US20130202162A1 (en) * 2012-02-02 2013-08-08 Korea Institute Of Science And Technology Method of reconstructing three-dimensional facial shape
CN106874894A (en) * 2017-03-28 2017-06-20 电子科技大学 A kind of human body target detection method based on the full convolutional neural networks in region
CN106951867A (en) * 2017-03-22 2017-07-14 成都擎天树科技有限公司 Face identification method, device, system and equipment based on convolutional neural networks
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN108090873A (en) * 2017-12-20 2018-05-29 河北工业大学 Pyramid face image super-resolution reconstruction method based on regression model
CN108090417A (en) * 2017-11-27 2018-05-29 上海交通大学 A kind of method for detecting human face based on convolutional neural networks
CN108229381A (en) * 2017-12-29 2018-06-29 湖南视觉伟业智能科技有限公司 Face image synthesis method, apparatus, storage medium and computer equipment
CN108446617A (en) * 2018-03-09 2018-08-24 华南理工大学 The human face quick detection method of anti-side face interference
CN108681718A (en) * 2018-05-20 2018-10-19 北京工业大学 A kind of accurate detection recognition method of unmanned plane low target
CN108805027A (en) * 2018-05-03 2018-11-13 电子科技大学 Face identification method under the conditions of low resolution
EP3438920A1 (en) * 2017-07-31 2019-02-06 Institut Pasteur Method, device, and computer program for improving the reconstruction of dense super-resolution images from diffraction-limited images acquired by single molecule localization microscopy
CN109543548A (en) * 2018-10-26 2019-03-29 桂林电子科技大学 A kind of face identification method, device and storage medium
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5696848A (en) * 1995-03-09 1997-12-09 Eastman Kodak Company System for creating a high resolution image from a sequence of lower resolution motion images
US20020180586A1 (en) * 2001-05-30 2002-12-05 Kitson Frederick Lee Face and environment sensing watch
US20130202162A1 (en) * 2012-02-02 2013-08-08 Korea Institute Of Science And Technology Method of reconstructing three-dimensional facial shape
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN106951867A (en) * 2017-03-22 2017-07-14 成都擎天树科技有限公司 Face identification method, device, system and equipment based on convolutional neural networks
CN106874894A (en) * 2017-03-28 2017-06-20 电子科技大学 A kind of human body target detection method based on the full convolutional neural networks in region
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
EP3438920A1 (en) * 2017-07-31 2019-02-06 Institut Pasteur Method, device, and computer program for improving the reconstruction of dense super-resolution images from diffraction-limited images acquired by single molecule localization microscopy
CN108090417A (en) * 2017-11-27 2018-05-29 上海交通大学 A kind of method for detecting human face based on convolutional neural networks
CN108090873A (en) * 2017-12-20 2018-05-29 河北工业大学 Pyramid face image super-resolution reconstruction method based on regression model
CN108229381A (en) * 2017-12-29 2018-06-29 湖南视觉伟业智能科技有限公司 Face image synthesis method, apparatus, storage medium and computer equipment
CN108446617A (en) * 2018-03-09 2018-08-24 华南理工大学 The human face quick detection method of anti-side face interference
CN108805027A (en) * 2018-05-03 2018-11-13 电子科技大学 Face identification method under the conditions of low resolution
CN108681718A (en) * 2018-05-20 2018-10-19 北京工业大学 A kind of accurate detection recognition method of unmanned plane low target
CN109543548A (en) * 2018-10-26 2019-03-29 桂林电子科技大学 A kind of face identification method, device and storage medium
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ADRIAN BULAT等: "Super-FAN: Integrated Facial Landmark Localization and Super-Resolution of Real-World Low Resolution Faces in Arbitrary Poses with GANs", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
代江: "基于GAN的视频超分辨率研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
杜彦璞: "基于生成对抗网络的遥感图像超分辨率方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
贾洁: "基于生成对抗网络的人脸超分辨率重建及识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
郜雨桐等: "基于卷积神经网络的车辆型号识别研究", 《应用科技》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705498A (en) * 2019-10-12 2020-01-17 北京泰豪信息科技有限公司 Low-resolution face recognition method
CN110705509A (en) * 2019-10-16 2020-01-17 上海眼控科技股份有限公司 Face direction recognition method and device, computer equipment and storage medium
CN110866484A (en) * 2019-11-11 2020-03-06 珠海全志科技股份有限公司 Driver face detection method, computer device and computer readable storage medium
CN110866484B (en) * 2019-11-11 2022-09-09 珠海全志科技股份有限公司 Driver face detection method, computer device and computer readable storage medium
CN111144215A (en) * 2019-11-27 2020-05-12 北京迈格威科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN111144215B (en) * 2019-11-27 2023-11-24 北京迈格威科技有限公司 Image processing method, device, electronic equipment and storage medium
CN111222420A (en) * 2019-12-24 2020-06-02 重庆市通信产业服务有限公司 FTP protocol-based low-bandwidth-requirement helmet identification method
CN111339950B (en) * 2020-02-27 2024-01-23 北京交通大学 Remote sensing image target detection method
CN111339950A (en) * 2020-02-27 2020-06-26 北京交通大学 Remote sensing image target detection method
US20210279884A1 (en) * 2020-03-06 2021-09-09 Siemens Healthcare Gmbh Method of computing a boundary
US11610316B2 (en) * 2020-03-06 2023-03-21 Siemens Healthcare Gmbh Method of computing a boundary
CN113836974A (en) * 2020-06-23 2021-12-24 江苏翼视智能科技有限公司 Monitoring video pedestrian detection method based on super-resolution reconstruction
CN112102234A (en) * 2020-08-06 2020-12-18 复旦大学 Ear sclerosis focus detection and diagnosis system based on target detection neural network
CN112183183A (en) * 2020-08-13 2021-01-05 南京众智未来人工智能研究院有限公司 Target detection method and device and readable storage medium
CN112418009B (en) * 2020-11-06 2024-03-22 中保车服科技服务股份有限公司 Image quality detection method, terminal equipment and storage medium
CN112418009A (en) * 2020-11-06 2021-02-26 中保车服科技服务股份有限公司 Image quality detection method, terminal device and storage medium
CN112437451A (en) * 2020-11-10 2021-03-02 南京大学 Wireless network flow prediction method and device based on generation countermeasure network
CN112288044B (en) * 2020-12-24 2021-07-27 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN112288044A (en) * 2020-12-24 2021-01-29 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN113283306A (en) * 2021-04-30 2021-08-20 青岛云智环境数据管理有限公司 Rodent identification and analysis method based on deep learning and transfer learning
CN114862683A (en) * 2022-07-07 2022-08-05 浪潮电子信息产业股份有限公司 Model generation method, target detection method, device, equipment and medium

Also Published As

Publication number Publication date
CN110189255B (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN110189255B (en) Face detection method based on two-stage detection
CN110211045B (en) Super-resolution face image reconstruction method based on SRGAN network
CN110738697B (en) Monocular depth estimation method based on deep learning
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN108399362B (en) Rapid pedestrian detection method and device
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN109840483B (en) Landslide crack detection and identification method and device
CN109815931B (en) Method, device, equipment and storage medium for identifying video object
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN113298718A (en) Single image super-resolution reconstruction method and system
CN113344110B (en) Fuzzy image classification method based on super-resolution reconstruction
CN112329793B (en) Significance detection method based on structure self-adaption and scale self-adaption receptive fields
CN116071309B (en) Method, device, equipment and storage medium for detecting sound scanning defect of component
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
CN117593187A (en) Remote sensing image super-resolution reconstruction method based on meta-learning and transducer
CN117911879B (en) SAM-fused fine-granularity high-resolution remote sensing image change detection method
CN117475357B (en) Monitoring video image shielding detection method and system based on deep learning
CN112818833B (en) Face multitasking detection method, system, device and medium based on deep learning
CN112926619B (en) High-precision underwater laser target recognition system
CN112990215B (en) Image denoising method, device, equipment and storage medium
CN117689880B (en) Method and system for target recognition in biomedical images based on machine learning
CN117808650B (en) Precipitation prediction method based on Transform-Flownet and R-FPN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20230117