CN110189255A

CN110189255A - Method for detecting human face based on hierarchical detection

Info

Publication number: CN110189255A
Application number: CN201910455695.5A
Authority: CN
Inventors: 于力; 刘意文; 邹见效; 杨瞻远; 徐红兵
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2019-08-30
Anticipated expiration: 2039-05-29
Also published as: CN110189255B

Abstract

The invention discloses a kind of method for detecting human face based on hierarchical detection, it is trained respectively to Face datection model with the Super-resolution reconstruction established model based on GAN network first, then facial image to be detected is inputted into Face datection model, the coordinate information and the candidate region that obtain each candidate region of human face target belong to the confidence value of face, tentatively judged according to confidence value, the generator that then human face target to be determined is input in the Super-resolution reconstruction established model based on GAN network is further judged.The present invention uses hierarchical detection, can effectively improve the verification and measurement ratio to low-resolution face image.

Description

Method for detecting human face based on hierarchical detection

Technical field

The invention belongs to low resolution human face detection tech fields, more specifically, are related to a kind of based on hierarchical detection Method for detecting human face.

Background technique

Face datection problem is occurred as a subproblem of face identification system, with the continuous depth of research Enter and becomes an independent project gradually.Current human face detection tech mixing together machine learning, computer vision, mould The fields such as formula identification and artificial intelligence, become the basis of the derivative application of all face image analysing computers, and to these flavors Response speed and accurate detectability have significant impact.During face datection application scene is constantly expanded, gradually Encounter leads to problems such as the facial image of input undersized or quality is too low due to various reasons, for these low resolution Facial image, the accuracy rate of face detection system, which often will appear, to decline to a great extent.Usually by the face of low quality and small size The test problems of image are referred to as low resolution Face datection.

Current Face datection algorithm essence is all two classification problems, and basic procedure is first to extract from area to be tested Validity feature, then by these features to determine whether there are face, low resolution Face datection is also on this basis It is studied.There are three features for low resolution face tool: information content is few, noise is mostly and less using tool, this leads to me Enough validity features can not be extracted from candidate region to express this region, from the point of view of feature representation level, passing It shows as not extracting in system method enough for expressing the validity feature of low resolution face；In deep neural network The convolutional layer for showing as front can not provide sufficiently strong driving feature map, and can not provide in subsequent convolutional layer enough The feature of low resolution human face region, this inadequate natural endowment cause detection low resolution face extremely difficult.

In order to solve the problems, such as low resolution Face datection, many outstanding scholars, which have done, largely targetedly to be studied, comprehensive From the point of view of, domestic and foreign scholars are concentrated mainly on three directions to the processing of this problem and carry out, and are found for human face region respectively Resolution ratio robust feature expression, new classifier and image super-resolution side are designed for the characteristics of low resolution face Method.It is to be recognized that currently for the research of the small Face datection of low resolution still in developing stage, it is also necessary to solution More problems, on the one hand, how to efficiently extract out the contextual information of low resolution face and is dissolved among detection network, Better performance is provided for low resolution human-face detector to still need to further explore；On the other hand, a complete face inspection Examining system, the necessarily face detection system of full size, this requires we handle low resolution Face datection problem when It waits, it is necessary in view of the detectability to other scale faces, in fact, the fusion problem of exactly this multiple scale detecting is led Cause that present low resolution face detection system or precision are lower or processing speed is very slow, this is urgently to be resolved one big Problem.

Summary of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of Face datection sides based on hierarchical detection Method first passes through Face datection model and is filtered to facial image, then uses the oversubscription based on GAN network to sample to be determined Resolution reconstruction model is further detected, to improve the verification and measurement ratio to low-resolution face image.

For achieving the above object, the present invention is based on the method for detecting human face of deep learning the following steps are included:

S1: obtaining several facial image training samples, and each training sample includes an image and face containing face Target information is trained Face datection model using the above facial image training sample；

S2: obtaining several super-resolution face image training samples, and each training sample includes one and contains face Low-resolution image and corresponding high-definition picture, using super-resolution face image training sample to be based on GAN The Super-resolution reconstruction established model of network is trained, and the Super-resolution reconstruction established model based on GAN network includes generator G and differentiation Device D；

S3: facial image to be detected is inputted into Face datection model, obtains the coordinate of each candidate region of human face target Information and the candidate region belong to the confidence value C of face；Default confidence threshold value T₁And T₂, and 0 < T₁< T₂< 1；It is right In each candidate region, if corresponding confidence value C >=T₂, then determine the candidate region there are human face target, as Human face target region is exported, if corresponding confidence value T₁≤ C < T₂, then using the candidate region as face to be determined Otherwise target determines that there is no human face targets for the candidate region, without output；

S4: the generator each human face target to be determined being input in the Super-resolution reconstruction established model based on GAN network G generates super-resolution rebuilding image R, is then input to arbiter D, judges whether it is qualified oversubscription by arbiter Resolution reconstruction image and whether include human face target, if image R is both qualified super-resolution rebuilding image and includes face Target then determines that there are human face targets for corresponding candidate region, are exported as human face target region, otherwise determine it There is no human face targets.

The present invention is based on the method for detecting human face of hierarchical detection, first to Face datection model and based on GAN network Super-resolution reconstruction established model is trained respectively, and facial image to be detected is then inputted Face datection model, obtains face The coordinate information of each candidate region of target and the candidate region belong to the confidence value of face, are carried out just according to confidence value Step judgement, the generator being then input to human face target to be determined in the Super-resolution reconstruction established model based on GAN network carry out Further judgement.The present invention uses hierarchical detection, can effectively improve the verification and measurement ratio to low-resolution face image.

Detailed description of the invention

Fig. 1 is the specific embodiment flow chart of the method for detecting human face the present invention is based on hierarchical detection；

Fig. 2 is the structural schematic diagram of R-FCN network；

Fig. 3 is improved frame regression algorithm flow chart in the present embodiment；

Fig. 4 is the structure chart of generator in SRGAN network；

Fig. 5 is the structure chart of arbiter in SRGAN network；

Fig. 6 is the PR curve graph of three kinds of methods in this experimental verification；

Fig. 7 is the testing result exemplary diagram of SFD method for detecting human face in this experimental verification；

Fig. 8 is the testing result exemplary diagram of R-FCN method for detecting human face in this experimental verification；

Fig. 9 is testing result exemplary diagram of the invention in this experimental verification；

Figure 10 is the PR curve graph that three kinds of methods carry out Face datection to clear detection sample set in this experimental verification；

Figure 11 is the PR curve that three kinds of methods carry out Face datection to general fuzzy detection sample set in this experimental verification Figure；

Figure 12 is the PR curve that three kinds of methods carry out Face datection to serious fuzzy detection sample set in this experimental verification Figure.

Specific embodiment

A specific embodiment of the invention is described with reference to the accompanying drawing, preferably so as to those skilled in the art Understand the present invention.Requiring particular attention is that in the following description, when known function and the detailed description of design perhaps When can desalinate main contents of the invention, these descriptions will be ignored herein.

Embodiment

Fig. 1 is the specific embodiment flow chart of the method for detecting human face the present invention is based on hierarchical detection.As shown in Figure 1, Specific steps the present invention is based on the method for detecting human face of hierarchical detection include:

S101: training face detection model:

Several facial image training samples are obtained, each training sample includes an image and human face target containing face Information is trained Face datection model using the above facial image training sample.

S102: training Super-resolution reconstruction established model:

Several super-resolution face image training samples are obtained, each training sample includes one containing the low of face Image in different resolution and corresponding high-definition picture, using super-resolution face image training sample to based on GAN network Super-resolution reconstruction established model be trained, based on GAN (Generative Adversarial Network, generate confrontation net Network) the Super-resolution reconstruction established model of network includes generator G and arbiter D.

S103: Preliminary detection is carried out using Face datection model:

Facial image to be detected is inputted into Face datection model, obtains the coordinate information of each candidate region of human face target And the candidate region belongs to the confidence value C of face.Default confidence threshold value T₁And T₂, and 0 < T₁< T₂< 1.For each A candidate region, if corresponding confidence value C >=T₂, then determine that there are human face targets for the candidate region, as face Target area is exported, if corresponding confidence value T₁≤ C < T₂, then using the candidate region as human face target to be determined, Otherwise determine that there is no human face targets for the candidate region, without output.

S104: it is detected using Super-resolution reconstruction established model:

The generator G each human face target to be determined being input in the Super-resolution reconstruction established model based on GAN network is raw At super-resolution rebuilding image SR, it is then input to arbiter D, judges whether it is qualified super-resolution by arbiter Reconstruction image and whether include human face target, if image SR is both qualified super-resolution rebuilding image and includes face mesh Mark, then determine that there are human face targets for corresponding candidate region, are exported as human face target region, otherwise determine it not There are human face targets.

Using above based on the method for detecting human face of hierarchical detection, using Super-resolution reconstruction established model as Face datection mould The auxiliary of type further detects the less high candidate region of confidence level, so that missing inspection and the erroneous detection of human face target are avoided, Improve detection performance.

For Face datection model, it can according to need the specific Face datection model of selection, selected in the present embodiment R-FCN network is improved as Face datection model, and for low resolution Face datection, to improve detection effect.R- FCN network is transformed in traditional Faster R-CNN structure basis, and core design thought is to utilize Faster On the basis of RPN (Reginal Proposal Network, the Area generation network) network proposed in RCNN, it is quick to introduce position Feel information, ROI layers are moved back, entity in image to be detected is calculated using position sensing characteristic pattern and belongs to the general of each classification Rate can greatly improve detection rates while keeping higher positioning accuracy.Fig. 2 is the structural representation of R-FCN network Figure.As shown in Fig. 2, the workflow of R-FCN can be summarized as follows:

Image is inputted in the good sorter network of a pre-training (used in Fig. 2 ResNet-101 network Conv4 it Preceding network), fix its corresponding network parameter.In the characteristic pattern that the last one convolutional layer of pre-training network obtains There are 3 branches on (feature map):

1st branch is exactly the progress RPN operation on this feature figure, obtains corresponding candidate region ROI, specific method Are as follows: anchor frame (Anchors) is generated according to parameter preset on characteristic pattern, anchor frame is one group has difference on entire input picture The region of size and length-width ratio.Then it identifies the anchor frame comprising prospect, converts target for anchor frame using frame regression algorithm It surrounds frame (Bounding Box), it can the closer fitting foreground object that is included.

2nd branch is exactly the position sensing score mapping that K*K* (C+1) dimension is obtained on this feature figure (position-sensitive score map), for classifying.

3rd branch is exactly the position sensing score mapping that a 4*K*K dimension is obtained in this feature, for being returned Return；

Finally, dividing on the position sensing score mapping of the mapping of position sensing score and 4*K*K dimension that K*K* (C+1) is tieed up (Position-Sensitive Rol Pooling, used herein is average pond for the ROI pondization operation of other execution position sensitivity Change operation), the confidence level and location information of each candidate region are obtained, then corresponding classification is obtained by confidence declaration.

The generation parameter of anchor frame is improved first in the present embodiment.In traditional R-FCN network, adopted when generating anchor frame With three kinds of scales and three kinds of length-width ratios, the lower three kinds of scales of default situations are respectively { 128*128,256*256,512*512 }, and three kinds Length-width ratio is { 1:1,1:2,2:1 }, then available 9 kinds of sizes.When detecting target is lesser face, it is easy to happen small The missing inspection of human face region.It therefore is { 16*16,32*32,128*128,256* by the generation scale modification of anchor frame in the present embodiment 256,512*512 } five kinds of scales, same every kind of scale generate length-width ratio { 1:1,1:2,2:1 } three kinds of anchor frames, amount to 15 kinds of rulers It is very little.Two kinds of small scales of addition are used to detect small face, and three retained below kind scale is used to extract the face area of conventional size Domain.

For frame regression algorithm, the prior art mostly uses NMS (Non Maximum Suppression, non-maximum Restrainable algorithms) algorithm, core ideas be find local maximum, inhibit non-maximum, mainly be exactly by way of iteration, It constantly goes to calculate with other anchor frames with the highest anchor frame of confidence level and hand over and than (Intersection-over-Union, IoU, table Show candidate frame and demarcate the overlapping rate of frame), filter those friendships and bigger frame.However it has been investigated that, NMS algorithm exist with Lower problem:

1) confidence level for closing on candidate frame that lap can will be present in NMS algorithm sets 0 by force, i.e., directly thick in operation The sudden and violent candidate frame that IoU value is greater than threshold value is deleted by force, if a true target to be detected appears in overlapping region at this time Interior, very big probability will lead to the failure of this target detection, increase omission factor, reduce average recall rate.

2) it when carrying out frame recurrence using NMS algorithm, hands over and than decision threshold N_tIt is difficult to determine optimal value, setting is too It will increase false detection rate greatly, it is too small and will increase omission factor.

In order to solve problem above, frame regression algorithm is improved on the basis of NMS algorithm in the present embodiment. Fig. 3 is improved frame regression algorithm flow chart in the present embodiment.As shown in figure 3, improved frame returns in the present embodiment The specific steps of reduction method include:

S301: initialization data:

Note includes the anchor frame set B={ b of background₁,b₂,…,b_N, b_nIndicate that n-th of anchor frame, n=1,2 ..., N, N indicate Anchor frame quantity comprising background remembers that the confidence level of each anchor frame is s_n.Initialization retains anchor frame set

S302: current optimal anchor frame is chosen:

The maximum anchor frame of confidence level is chosen from current anchor frame set B, remembers that it is current optimal anchor frame b ', it will be current optimal Anchor frame b ' addition retains anchor frame set D, and current optimal anchor frame b ' is deleted from anchor frame set B.

S303: judge whether that anchor frame set B for sky, if so, frame recurrence terminates, otherwise enters step S304.

S304: confidence level is updated:

For each anchor frame b in current anchor frame set B_n, calculate its friendship with current optimal anchor frame b ' and than iou (b ', b_i), each anchor frame b is then updated using following formula_nConfidence level s_n:

Wherein, N_tFor preset friendship and compare threshold value.

Then return step S302.

It is based on for the Super-resolution reconstruction established model of GAN network, SRGAN network is used in the present embodiment.SRGAN network It is when former is using extensive, excellent super-resolution image reconstruction model, based on GAN (Generative Adversarial Network generates confrontation network) network training forms.SRGAN network is sentenced by a generator G and one Other device D is collectively constituted.Fig. 4 is the structure chart of generator in SRGAN network.Fig. 5 is the structure chart of arbiter in SRGAN network. The core of generator is multiple residual blocks therein, and each residual block includes the convolutional layer of two 3*3, and convolutional layer is followed by batch normalizing Change layer (batch normalization, BN) and PReLU as activation primitive, two 2 × sub-pix convolutional layer (sub-pixel Convolution layers) it is used to increase characteristic size.Arbiter D using a similar VGG19 network structure, But without carrying out the pond maxpooling.The part arbiter D includes 8 convolutional layers, the continuous intensification of adjoint network, characteristic Amount is continuously increased, and characteristic size constantly reduces, using LeakyReLU as activation primitive, finally using two full articulamentums with Final sigmoid activation primitive obtains the probability of the authentic specimen learnt.

Existing SRGAN network has that model is difficult to training and is distributed overlapping, it has been investigated that, these problems source In used in traditional SRGAN network KL divergence and JS divergence as measure between authentic specimen distribution and generation sample distribution away from From standard.In the present embodiment after study, problem above is solved using EM divergence.EM divergence is a kind of symmetrical divergence, Its is defined as:

If Ω ∈ RⁿIt is the continuous opener of bounded, S is the set of all Radon probability distribution in Ω, if to some p ≠ 1, k > 0, then the calculation formula of EM divergence is as follows:

Wherein, P_rAnd P_gIndicate two different probability distributions, P_uIndicate a random probability distribution, inf indicates most lower bound, x It indicates to obey P_rThe sample of distribution,It indicates to obey P_gThe sample of distribution,Indicate sample x andA stochastic linear combination, P_uIndicate sampleProbability distribution, k and p respectively indicate a constant,It is all with tight support property on Ω The function space of single order differentiable function, | | | | norm is sought in expression.

The advantage of EM divergence is to two different distributions, even if not having lap between them, still is able to anti- Reflect the distance between two distributions.This means that significant gradient can be provided constantly in training, entire SRGAN network energy is allowed Enough stable training, caused by can effectively solve to be likely to occur in original SRGAN network training process is disappeared as gradient The problems such as mode is collapsed.In the present embodiment, objective function in model training is improved based on EM divergence.It is dissipated based on EM Spend the optimization object function of the minimax problem of improved SRGAN network:

Wherein, x indicates that true high-resolution sample, z indicate that the low resolution sample of input generator G, G (z) are to generate The super-resolution rebuilding sample generated in device G, P_gIndicate the probability distribution of Super-resolution Reconstruction sample, P_rIndicate true high-resolution The probability distribution of sample, D (x), D (G (z)) respectively indicate arbiter D and judge that high-resolution sample, super-resolution rebuilding sample are The no probability for authentic specimen, E [] indicate mathematic expectaion,Indicate true high-resolution sample x and super-resolution rebuilding sample G (z) stochastic linear combination, P_uIndicate sampleProbability distribution, k and p respectively indicate a constant.

In the training process, above-mentioned optimization object function is decomposed into two optimization problems:

1, the optimization to resolving device D:

2, the optimization to generator G:

It is derived based on the above technology, the present invention improves the training method of SRGAN model, obtains more advantage SRGAN model, to improve the quality of super-resolution face image result.Specific training method are as follows:

Several high-resolution human face image I are obtained first^HR, corresponding low-resolution face image is obtained by down-sampling I^LR, every panel height resolution ratio facial image I^HRWith corresponding low-resolution face image I^LRA training sample is constituted, to obtain Training sample set.In the present embodiment, down-sampling is carried out using gaussian pyramid, first using original image as bottom image G0 (the 0th layer of gaussian pyramid), carries out convolution to it using Gaussian kernel (5*5), then carries out down-sampling to the image after convolution (removal even number row and column) obtains a tomographic image G1, and iteration carries out completing 4 times of down-samplings.

Then SRGAN network is trained using obtained training sample set, the optimization of generator G in training process Objective function are as follows:

The optimization object function of arbiter D are as follows:

Wherein, x indicates true high-resolution human face image, and z indicates the low-resolution face image of input generator G, G It (z) is the super-resolution rebuilding facial image generated in generator G, P_gIndicate the probability distribution of Super-resolution Reconstruction facial image, P_r Indicate the probability distribution of true high-resolution human face image, D (x), D (G (z)) respectively indicate arbiter D and judge high-resolution Facial image, super-resolution rebuilding facial image whether be real human face image probability, E [] indicate mathematic expectaion,It indicates The stochastic linear combination of true high-resolution human face image x and super-resolution rebuilding facial image G (z), P_uIndicate sample Probability distribution, k and p respectively indicate a constant.

In the training process of SRGAN network, first by generator G to the low resolution face figure in each training sample X As I^LRSuper-resolution rebuilding is carried out, method particularly includes: by generator G to the low-resolution face image I in training sample X^LRInto Row up-sampling, obtains super-resolution rebuilding facial image I^SR.By being to high-resolution human face image I in this present embodiment^HRIt carries out 4 times of down-samplings obtain low-resolution face image I^SR, therefore generating super-resolution rebuilding facial image I^SRUp-sampling multiple It also is 4.

Then by low-resolution face image I^LRCorresponding high-resolution human face image I^HRIt is super with being generated by generator G Resolution reconstruction facial image I^SRArbiter D is inputted, calculates the loss function L of training sample according to the following formula_SR:

Wherein,Indicate the content loss function of training sample, calculation formula is as follows:

Wherein,Indicate the content loss function based on mean square deviation error, calculation formula is as follows:

Wherein, W indicates high-resolution human face image I^HRWidth, H indicate high-resolution human face image I^HRHeight, r expression under Decimation factor,Indicate high-resolution human face image I^HRMiddle coordinate is the pixel value of the pixel of (x, y),Indicate oversubscription Resolution rebuilds facial image I^SRMiddle coordinate is the pixel value of the pixel of (x, y).

Indicate VGG loss, calculation formula is as follows:

Wherein, i indicates that maximum pond sequence number, j are indicated and i-th layer of maximum pond layer in VGG-19 network in arbiter D Convolutional layer serial number between the layer of i+1 layer maximum pond, in existing VGG-19 network, maximum pond layer number be 5, two Convolution layer number between the layer of adjacent maximum pond is 2 or 4.φ_i,jIndicate i-th layer of maximum pond of VGG-19 network in arbiter D Change the characteristic pattern of j-th of convolutional layer acquisition after layer, W_i,jIndicate characteristic pattern φ_i,jWidth, H_i,jIndicate characteristic pattern φ_i,j's It is high.

Indicate confrontation loss, this partial loss function makes SRGAN network by " deception " arbiter to be biased to The output exported closer to natural image is generated, calculation formula is as follows:

Wherein,The super-resolution rebuilding facial image that expression arbiter D generates generator is (i.e. I^SR) as the probability of true high-resolution human face image, subscript θ_D、θ_GRespectively indicate the network ginseng of arbiter D and generator G Number, w indicate the dimension serial number of network parameter, w=1,2 ..., W, and W indicates the dimension of network parameter.

Since whether Super-resolution reconstruction established model needs to detect in super-resolution rebuilding image comprising face mesh in the present invention Classification Loss L is added in order to preferably meet this demand in mark when calculating loss function_clc, calculation formula is as follows:

Wherein, { y₁,y₂,…,y_v,…,y_VIndicate high-resolution human face image I^HRWhether be face nominal data, V table Show high-resolution human face image I^HRThe human face region quantity of middle calibration, value range are { 0,1 }.

It, can preferred Adam optimization algorithm since optimization object function improved in this implementation does not have log The objective function optimization for realizing generator G and arbiter, to improve training effectiveness.For generator G, optimized using Adam The weight w of algorithm descending update generator G_G:

Wherein,Indicate weight w_GDecline gradient, z_mIndicate super-resolution rebuilding facial image I^SRIn m-th of picture The value of element, m=1,2 ..., M, M indicate pixel quantity, D (G (z_m)) indicate that arbiter D judges super-resolution rebuilding facial image I^SRIn m-th pixel be high-resolution human face image I^HRThe probability of middle pixel, α indicate learning rate, β₁Indicate single order moments estimation Exponential decay rate, β₂Indicate the exponential decay rate of second order moments estimation.The typical value of three parameters of Adam optimization algorithm be α= 0.00001、β₁=0.9 and β₂=0.999.

The weight w of arbiter D is updated using Adam optimization algorithm descending_D:

Wherein,Indicate weight w_DThe gradient of decline, x_mIndicate high-resolution human face image I^HRThe value of m-th of pixel, D (x_m) indicate that arbiter D judges high-resolution human face image I^HRM-th of pixel is high-resolution human face image I^HRMiddle pixel it is general Rate,It indicatesThe gradient of decline,μ_m=m/M,Indicate that arbiter D sentences It is disconnectedFor high-resolution human face image I^HRIn middle pixel probability.

In the present embodiment, the weight w of generator G is preferably alternately updated_GWith the weight w of arbiter D_D, i.e., fixed first life It grows up to be a useful person the parameter of G, updates the parameter of arbiter D, then fix the parameter of arbiter D, update the parameter of generator G, so hand over For progress.

Technical effect in order to better illustrate the present invention carries out the present invention using one group of low-resolution face image real Verifying.Using having carried out in the present embodiment, anchor frame generates parameter improvement to Face datection model and frame returns in this experimental verification The improved R-FCN model of reduction method, the Super-resolution reconstruction established model based on GAN network are used through improving training side in the present embodiment The SRGAN model that method obtains.When Face datection model and the Super-resolution reconstruction established model based on GAN network are trained, adopt 10 images are respectively randomly selected with Wider Face training sample set, and from 61 classification, amount to 610 images as detection Image.In order to realize the control of technical effect, SFD method for detecting human face and R-FCN Face datection are chosen in this experimental verification Method method as a comparison.

In order to assess the technical effect of the present inventor's face detecting method and control methods, select PR curve as assessment mark It is quasi-.It is ordinate that PR curve, which is with precision ratio (Precision), recall ratio (Recall) is curve that abscissa is drawn.

Fig. 6 is the PR curve graph of three kinds of methods in this experimental verification.As shown in fig. 6, the present invention is in three kinds of Face datections In method, PR curve is whole closer to the upper right corner, mAP (Mean Average Precision, i.e., averagely AP (average essence Exactness) value) value is to behave oneself best in 0.947 and three groups of data.

Fig. 7 is the testing result exemplary diagram of SFD method for detecting human face in this experimental verification.Fig. 8 is this experimental verification The testing result exemplary diagram of middle R-FCN method for detecting human face.Fig. 9 is testing result example of the invention in this experimental verification Figure.Compare Fig. 7 to Fig. 9 it is found that the present invention altogether detect 14 faces, be respectively 11 and 9 people higher than other two methods Face as a result, showing more excellent detection performance.

Next Face datection is carried out to the image pattern under different clarity.It is marked in Wider Face training sample set Fuzziness (blur) attribute of each human face target, be divided into it is clear, general fuzzy and serious three kinds fuzzy, accordingly from different moulds Several samples are extracted on the image pattern of paste degree constitutes detection sample set.Figure 10 be in this experimental verification three kinds of methods to clear Clear detection sample set carries out the PR curve graph of Face datection.Figure 11 be in this experimental verification three kinds of methods to general fuzzy detection The PR curve graph of sample set progress Face datection.Figure 12 be in this experimental verification three kinds of methods to serious fuzzy detection sample set Carry out the PR curve graph of Face datection.As shown in Figure 11 to Figure 12, three kinds of methods, can be very when sample clarity is higher Good detects face part, and gap is not very greatly, and mAP value is all very high；In the general test group of sample fog-level In, three kinds of algorithm mAP values, which have, slightly to be declined, but is still above 97%, is illustrated under general fog-level, three kinds of methods are all With extraordinary detectability, can't they be constituted with too big challenge.It can be found that, the present invention is fuzzy in face simultaneously When degree is general, having begun has some advantages relative to SFD and R-FCN, but is not obvious；It is tight in detection sample In the case that molality is pasted, the gap of three kinds of methods starts to occur, and wherein SFD performance is worst, is clearly sample with detection fuzziness This when, is compared, and mAP has dropped about 10 percentage points, and fall of the present invention is minimum, is probably reduced only by 5 percentage points Up and down, in the case, mAP value of the present invention is higher by about 2 percentage points than original R-FCN model, and PR curve can be significantly The PR curve of other two control methods is wrapped up, therefore relative to other two methods, in low resolution, this hair It is bright to possess better stability and higher verification and measurement ratio.

Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the present invention is not limited to the range of specific embodiment, to the common skill of the art For art personnel, if various change the attached claims limit and determine the spirit and scope of the present invention in, these Variation is it will be apparent that all utilize the innovation and creation of present inventive concept in the column of protection.

Claims

1. a kind of method for detecting human face based on hierarchical detection, which comprises the following steps:

S1: obtaining several facial image training samples, and each training sample includes an image and human face target containing face Information is trained Face datection model using the above facial image training sample；

S2: obtaining several super-resolution face image training samples, and each training sample includes one containing the low of face Image in different resolution and corresponding high-definition picture, using super-resolution face image training sample to based on GAN network Super-resolution reconstruction established model be trained, the Super-resolution reconstruction established model based on GAN network includes generator G and arbiter D；

S3: facial image to be detected is inputted into Face datection model, obtains the coordinate information of each candidate region of human face target And the candidate region belongs to the confidence value C of face；Default confidence threshold value T₁And T₂, and 0 < T₁< T₂< 1；For each A candidate region, if corresponding confidence value C >=T₂, then determine that there are human face targets for the candidate region, as face Target area is exported, if corresponding confidence value T₁≤ C < T₂, then using the candidate region as human face target to be determined, Otherwise determine that there is no human face targets for the candidate region, without output；

S4: the generator G each human face target to be determined being input in the Super-resolution reconstruction established model based on GAN network, it is raw At super-resolution rebuilding image SR, it is then input to arbiter D, judges whether it is qualified super-resolution by arbiter Reconstruction image and whether include human face target, if image SR is both qualified super-resolution rebuilding image and includes face mesh Mark, then determine that there are human face targets for corresponding candidate region, are exported as human face target region, otherwise determine it not There are human face targets.

2. method for detecting human face according to claim 1, which is characterized in that the Face datection model uses R-FCN net Network.

3. method for detecting human face according to claim 2, which is characterized in that the generation ruler of anchor frame in the R-FCN network Degree includes five kinds of scales { 16*16,32*32,128*128,256*256,512*512 }, three kinds of length-width ratios { 1:1,1:2,2:1 }.

4. method for detecting human face according to claim 2, which is characterized in that frame regression algorithm in the R-FCN network Specific steps include:

1) note includes the anchor frame set B={ b of background₁,b₂,…,b_N, b_nIndicate that n-th of anchor frame, n=1,2 ..., N, N indicate packet Anchor frame quantity containing background remembers that the confidence level of each anchor frame is s_n.Initialization retains anchor frame set

2) the maximum anchor frame of confidence level is chosen from current anchor frame set B, remembers that it is current optimal anchor frame b ', it will current optimal anchor Frame b ' addition retains anchor frame set D, and current optimal anchor frame b ' is deleted from anchor frame set B；

3) judge whether 4) anchor frame set B for sky, if so, frame recurrence terminates, is otherwise entered step；

4) for each anchor frame b in current anchor frame set B_n, calculate its friendship with current optimal anchor frame b ' and than iou (b ', b_i), each anchor frame b is then updated using following formula_nConfidence level s_n:

Wherein, N_tFor preset friendship and compare threshold value；

Then return step 2).

5. method for detecting human face according to claim 1, which is characterized in that the Super-resolution reconstruction based on GAN network Established model uses SRGAN network.

6. method for detecting human face according to claim 5, which is characterized in that the SRGAN network is instructed using following methods It gets:

Several high-resolution human face image I are obtained first^HR, corresponding low-resolution face image I is obtained by down-sampling^LR, often Panel height resolution ratio facial image I^HRWith corresponding low-resolution face image I^LRA training sample is constituted, to be trained Sample set；

Then SRGAN network is trained using obtained training sample set, the optimization aim of generator G in training process Function are as follows:

The optimization object function of arbiter D are as follows:

Wherein, x indicates true high-resolution human face image, and z indicates the low-resolution face image of input generator G, G (z) For the super-resolution rebuilding facial image generated in generator G, P_gIndicate the probability distribution of Super-resolution Reconstruction facial image, P_rTable Show the probability distribution of true high-resolution human face image, D (x), D (G (z)) respectively indicate arbiter D and judge high-resolution human Face image, super-resolution rebuilding facial image whether be real human face image probability, E [] indicate mathematic expectaion,Indicate true A stochastic linear of real high-resolution human face image x and super-resolution rebuilding facial image G (z) combine, and k and p are respectively indicated One constant.

7. method for detecting human face according to claim 6, which is characterized in that in the SRGAN network training process, according to The loss function L of following formula calculating training sample_SR:

Wherein,Indicate the content loss function of training sample,Indicate confrontation loss, L_clcPresentation class loss.

8. method for detecting human face according to claim 6, which is characterized in that in the SRGAN network training process, use Adam optimization algorithm realizes the objective function optimization of generator G and arbiter D, method particularly includes:

Using Adam optimization algorithm, descending updates the weight w of generator G_G:

Wherein,Indicate weight w_GDecline gradient, z_mIndicate super-resolution rebuilding facial image I^SRIn m-th pixel Value, m=1,2 ..., M, M indicate pixel quantity, D (G (z_m)) indicate that arbiter D judges super-resolution rebuilding facial image I^SRIn M-th of pixel is high-resolution human face image I^HRThe probability of middle pixel, α indicate learning rate, β₁Indicate the index of single order moments estimation Attenuation rate, β₂Indicate the exponential decay rate of second order moments estimation；

Wherein,Indicate weight w_DThe gradient of decline, x_mIndicate high-resolution human face image I^HRThe value of m-th of pixel, D (x_m) Indicate that arbiter D judges high-resolution human face image I^HRM-th of pixel is high-resolution human face image I^HRThe probability of middle pixel,It indicatesThe gradient of decline,μ_m=m/M,Indicate arbiter D judgementFor high-resolution human face image I^HRIn middle pixel probability.

9. super-resolution face image method according to claim 8, which is characterized in that the step generator G With the weight w for alternately updating generator G when the objective function optimization of arbiter D_GWith the weight w of arbiter D_D。