CN108334847B

CN108334847B - A kind of face identification method based on deep learning under real scene

Info

Publication number: CN108334847B
Application number: CN201810119263.2A
Authority: CN
Inventors: 张永强; 丁明理; 白延成; 李贤�; 杨光磊; 董娜
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2019-10-22
Anticipated expiration: 2038-02-06
Also published as: CN108334847A

Abstract

The present invention provides the face identification method based on deep learning under a kind of real scene, it is that single factors influence can only be solved for the face identification method solved under existing real scene, it is proposed without can solve the influence of the factors such as posture, illumination, it include: to predict the face location of each picture in tranining database using an existing human-face detector, and intercept and save true face and inhuman face image；Corresponding low-resolution image is obtained according to facial image and non-face picture are down-sampled；Building generates confrontation network, and generating confrontation network includes generator and discriminator；Generator further comprises up-sampling network and optimization network；It is trained using high-resolution human face, inhuman face image and corresponding low resolution face, inhuman face image to confrontation network is generated；Mark the position of face in input picture to the score from the face candidate region that existing human-face detector obtains according to discriminator.The present invention is suitable for the recognition detection of face.

Description

A kind of face identification method based on deep learning under real scene

Technical field

The present invention relates to field of face identification, and in particular to the recognition of face based on deep learning under a kind of real scene Method.

Background technique

With the development of the applications such as e-commerce, recognition of face becomes most potential biometric verification of identity means, this Application background requires Automatic face recognition system that can have certain recognition capability to the facial image under real scene, thus The a series of problems faced makes attention of the Face datection initially as an independent project by researcher.In addition, true Face recognition technology under real field scape has urgent application demand in numerous areas such as security protection, criminal investigation, search and rescue.

There are very important basic research value and urgent application in field of machine vision based on Face datection Demand is also evolving update for the relevant art of Face datection.However current method for detecting human face be all mostly It being carried out on the image of posing for photograph of laboratory ideally, these images of posing for photograph have a characteristic that first, and face is larger, and Positioned at the center of image；Second, background is more clean simple.However the image in reality scene, face are usually extremely small And background is complex, while may be subjected to scale, posture, blocking, expression, dressing, the influence of the factors such as illumination. According in relation to studies have shown that these influence factors bring great challenge to the face identification method under real scene really, Tremendous influence is caused to the accuracy rate of the Face datection under real scene.In view of the above-mentioned problems, some be directed to real scene In face identification method proposed that but the face identification method under these real scenes at this stage can only solve list in succession The problem of one factor influences, such as the influence of scale bring can be mitigated based on the face recognition technology for generating confrontation network, but It is the influence that not can solve the other influences factor such as posture, illumination；Face identification method based on face alignment method can mitigate Posture bring influences, but not can solve the influence of the other factors such as scale, fuzzy.

Summary of the invention

The purpose of the present invention is to solve the face identification methods under existing real scene can only solve single factors shadow It rings, the influence without can solve the other influences factor such as posture, illumination；And the face identification method based on face alignment method can be with The shortcomings that mitigating the influence of posture bring, but not can solve the influence of other factors such as scale, fuzzy, and propose a kind of true The face identification method based on deep learning under scene, comprising:

Step 1 establishes tranining database；

Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained First high-resolution human face image and the inhuman face image of the first high-resolution；And handle the first high-resolution human face image with And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution；

Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator；Wherein generator Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution Facial image and the inhuman face image of the second high-resolution；The input of discriminator is the first high-resolution human face image, first high The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated Belong to the Probability p of facial image for input picture out₁, second output is the Probability p that input picture is true picture₂；Wherein reflect Other device further comprises being sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted Input of the output result of sample network as optimization network, optimization network are the output end of generator；

Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2, The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network；

Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and The region that p is greater than preset threshold is drawn in input picture.

The invention has the benefit that

1, can mitigate simultaneously scale, block, expression, dressing, the factors bring such as illumination influence, realize real scene In Face datection so that test object is no longer limited to the single Face datection under simple background, in particular so that face is examined Survey technology is not only applicable to the image of posing for photograph under laboratory scene；

2, overcoming human face detection tech at this stage is really being discrimination is low in scene difficulty, is promoted based on depth Application of the method for detecting human face of study under real scene；

3, present invention introduces an optimization network, the high-definition picture for generating that generator generates in confrontation network is solved Fuzzy problem, while optimizing the image that network can overcome generator to generate and being also stranded by what above-mentioned many influence factors were influenced It is difficult；

4, " face identification method based on deep learning under real scene " proposed by the present invention compares benchmark (Baseline) discrimination of human-face detector has a very big promotion, reached current highest testing result 94.4%/ 93.3%87.3%.

Detailed description of the invention

Fig. 1 is that wherein every vertical curve is most for the experimental data figure that influences on Face datection accuracy rate of each influence factor High point indicates the face recognition accuracy rate that one embodiment of the present of invention can be realized, and minimum point is indicated can using conventional method The accuracy rate reached；When intermediate level curve indicates the image that processing is influenced by scale, one embodiment of the present of invention and existing There is the mean value that technology can achieve the effect that；

Fig. 2 is the schematic diagram for generating confrontation network in the prior art；

Fig. 3 is the signal of the face identification method based on deep learning under the real scene of one embodiment of the invention Figure；Wherein 1 The^stBranch refers to the 1st branch of human-face detector；The K^thBranch refers to human-face detector K-th of branch；Input refers to input；Conv, Conv1 ... Conv5 are the different convolutional layer of serial number；Resideual Blocks is one of the ResNet network that one embodiment of the invention uses convolutional layer；De-Conv is warp lamination； Sigmoid is excitation function；LR is low-resolution image；SR is the high resolution graphics that generator is generated by low-resolution image Picture；HR is high-resolution true picture；Face is facial image；Non-Face is inhuman face image；

Fig. 4 is the flow chart of one embodiment of the invention.

Specific embodiment

Specific embodiment 1: the face identification method based on deep learning under the real scene of present embodiment, such as Shown in Fig. 4, comprising:

Step 1 establishes tranining database.Such as can using WIDER FACE database as tranining database, or Picture construction tranining database of the size of facial image between 10 to 30 pixels, sets in this way in WIDER FACE database The benefit set is the more difficult small Face datection problem that can solve face between 10-30 pixel.Present embodiment is also supported User oneself constructs database by acquiring the image of real scene.

Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained First high-resolution human face image and the inhuman face image of the first high-resolution；And handle the first high-resolution human face image with And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution.Wherein face is examined Surveying implement body can be used the depth residual error network of ResNet-50 structure.

Step 2 may further are as follows: uses each image in existing human-face detector prediction tranining database Face location, obtain predetermined quantity for indicating the indicia framing of face location, the size and location according to indicia framing are being schemed It is intercepted to obtain the first high-resolution human face image and the inhuman face image of the first high-resolution as in；Reuse bilinear interpolation The down-sampled processing of 4 times of method progress the first high-resolution human face image and the inhuman face image of the first high-resolution obtain low point Resolution facial image and the inhuman face image of low resolution." 4 times down-sampled " can be understood as the picture in a region 32*32 Element is indicated using 4*4 pixel.

Because the image in step 1 in tranining database is generally high-definition picture, and needs benefit in the next steps High-resolution image is generated by low-resolution image with generator, therefore needs to obtain direct from database in step 2 The high definition facial image and non-face image procossing arrived is low-resolution image.

The face location that human-face detector detects is assumed in a rectangle frame, then the rectangle frame upper left corner can be used Transverse and longitudinal coordinate and 4 tuples of lower right corner transverse and longitudinal coordinate composition indicate, the tool that can be indicated simultaneously where face is arranged in this way Body position and box size.Those skilled in the art it is also envisioned that the position of face can also be expressed using other modes, as long as The size and specific location of box can be given expression to, the present invention is with no restriction.

Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator；Wherein generator Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution Facial image and the inhuman face image of the second high-resolution；The input of discriminator is the first high-resolution human face image, first high The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated Belong to the Probability p of facial image for input picture out₁, second output is the Probability p that input picture is true picture₂；Wherein give birth to Growing up to be a useful person further comprises sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted Input of the output result of sample network as optimization network, optimization network are the output end of generator.

Wherein up-sample the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length For 1 convolutional layer；8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；1 convolution nucleus number Mesh is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；1 convolution kernel number is 256, convolution kernel size is 3, the warp lamination that convolution step-length is 2；1 convolution kernel number is 256, the warp that convolution kernel size is 3, convolution step-length is 3 Lamination；And 1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1.

Optimize the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length is 1 Convolutional layer；8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；1 convolution sum number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；1 convolution kernel number is 256, convolution kernel size is 3, volume The warp lamination that product step-length is 2；The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3； And 1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.

The network structure of discriminator are as follows: 1 convolution kernel number is 64, the volume that convolution kernel size is 3, convolution step-length is 2 Lamination；1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；1 convolution kernel number is 256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；1 convolution kernel number is 512, convolution kernel size is 3, volume The convolutional layer that product step-length is 2；1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；And 2 parallel convolutional layers are true high-definition pictures particularly for differentiation input picture or have generator synthesis First convolutional layer fc of high-definition picture_GANWith for judge input picture whether be face the second convolutional layer fc_clc。

Wherein generate loss function used in confrontation network are as follows:

WhereinIndicate confrontation loss function,Indicate Pixel-level Loss function,Indicate the loss function of optimization network function, Presentation class loss function；

Wherein θ, ω are respectively the network parameter of discriminator and generator, D_θ(),G_ω() is discriminator and generator respectively Power function；It is the low-resolution image and corresponding high-definition picture of input respectively；y_iFor the mark of input picture Label, y_i=1, y_i=0 to respectively represent input picture be face and non-face；α, β are confrontation loss functions, pixel in objective function The weight distribution coefficient of grade loss function and Classification Loss function, N are training sample sums；ω₁For the net for up-sampling network Network parameter, ω₂For optimize network network parameter,It is the power function for up-sampling network,It is the function of optimizing network Function.

Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2, The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network.

Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and The region that p is greater than preset threshold is drawn in input picture.Probability p herein is the probability in actual test, and in step 3 p₁And p₂It is the probability generated in training process, meaning is different.

For example, human-face detector interception facial image simultaneously records face former defeated after inputting an image into human-face detector Coordinate information where entering in image, the Probability p of output₁The position of a facial image is corresponded to, can be used five yuan Group (x1, y1, x2, y2, p₁) Lai Jilu, the top left co-ordinate of rectangle frame where wherein x1 and y1 can be face, x2 and y2 can To be rectangle frame bottom right angular coordinate, then differentiate five-tuple in whether p₁Meet certain threshold condition, if met, basis Face location is marked by coordinate information x1, y1, x2, y2 in original input image.

Mentality of designing and principle of the invention further described below:

Present invention is generally directed to the deficiency of existing human face detection tech, solves the human face detection tech based on deep learning The problem that single influence factor influences in many influence factors under real scene can only be mitigated, overcome existing Face datection skill Art low difficulty of discrimination under really scene provides the recognition of face side based on deep learning under a kind of real scene Method.Facial image can be overcome simultaneously using " face identification method based on deep learning under real scene " in the present invention By scale, posture, block, expression, dressing, the factors such as illumination are influenced the problem of, realize the face inspection in real scene It surveys, so that test object is no longer limited to the single Face datection under simple background, in particular so that human face detection tech is not only Suitable for the image of posing for photograph under laboratory scene.This method is on the basis of " based on the face recognition technology for generating confrontation network " On, an optimization network (Refinement Network) is introduced, solves the high score for generating that generator generates in confrontation network The blurred image problem of resolution, while optimizing network can overcome the image of generator generation also by above-mentioned many influence factor shadows Loud difficulty.

The present invention with the image (video frame) under real scene be research object, in real scene, facial image in addition to It by scale, posture, blocks, outside expression, dressing, the factors such as illumination are influenced, there are also following features: first, image capture device Cause face extremely small farther out apart from face；Second, due to the quick movement of camera or people cause captured image compared with Be it is fuzzy,.In view of the above-mentioned problems, most representative solution is " based on the recognition of face for generating confrontation network at this stage Technology ", but be usually relatively fuzzyyer, reason based on the high-definition picture that confrontation network generates is generated are as follows: first, due to The objective function of generator is usually the least square error for generating the pixel of image and input picture, leads to the image generated excessively Smoothly；Second, when target face too small (10*10 pixel), details letter would generally be lacked by generating the image that confrontation network generates Breath；Third, (as quickly movement leads to image fault) in the case where input picture distortion itself, generating confrontation network can not be given birth to At corresponding clearly high-definition picture.

For the above problem existing for " based on the face recognition technology for generating confrontation network ", the present invention is generating confrontation net An optimization network is introduced in the generator network of network, the network structure for optimizing network is as shown in table 1.Optimize the main work of network With are as follows: first, the fuzzy high-definition picture generated for generating confrontation network optimizes network implementations de-fuzzy；Second, Input picture by block, illumination, dressing, the factors such as expression are influenced when, optimization network can play block, go illumination, The effects of going dressing, expression neutralized is illuminated by the light, the figure that dressing, the factors such as expression influence some fuzzy especially severes As after the processing for optimizing network, available corresponding clearly image is based on these clearly unaffected factor shadows The differentiation of the enterprising pedestrian's face of loud image will become more simple.

In addition, the loss function for optimizing network is introduced into the objective function for generating confrontation network by the present invention, so that raw Grow up to be a useful person generation high-definition picture include more detailed information, be more clear so that discriminator be more easier to determine it is defeated The image entered is facial image.In " based on the face recognition technology for generating confrontation network ", objective function is by right Anti- loss function (adversarial loss), Pixel-level loss function (pixel-wise loss) and Classification Loss function (classification loss) composition, can indicate are as follows:

First item is confrontation loss function (adversarial loss) in formula, and Section 2 is Pixel-level loss function (pixel-wise loss), Section 3 are Classification Loss function (classification loss), wherein θ, and ω is respectively to reflect The network parameter of other device and generator, D_θ(),G_ω() is the power function of discriminator and generator respectively,It is defeated respectively The low-resolution image and corresponding high-definition picture entered, y_iFor the label (y of input picture_i=1, y_i=0 respectively represent it is defeated Entering image is face and non-face), α, β are that loss function, Pixel-level loss function and Classification Loss are fought in objective function The weight distribution coefficient of function, N are training sample sums.It is more clear to make to generate the image that network generates, mitigates simultaneously It blocks, expression, dressing, the influence of the influence factors such as illumination, the present invention is modified the objective function for generating confrontation network, will The loss function of optimization network is introduced into the objective function of whole network, and new objective function can indicate are as follows:

In formula,It is the loss function for optimizing network, wherein ω₁For the network for up-sampling network Parameter, ω₂For optimize network network parameter,It is the power function for up-sampling network,It is the function letter for optimizing network Number.

To sum up, an optimization network is introduced into " based on the face recognition technology for generating confrontation network " by the present invention, is mentioned The Face datection frame of a kind of novel " based on confrontation network+optimization network is generated " is gone out.In addition, comprehensively considering existing The deficiency of " based on the human face detection tech for generating confrontation network " and the actual demand of real scene human face detection, the present invention couple The existing objective function based on the method for detecting human face for generating confrontation network is modified.In the present invention, confrontation network is generated Input be under real scene various influence factors effect under low-resolution image, the output of generator is fuzzy have The high-definition picture that various influence factors influence, the input as optimization network simultaneously of this high-resolution image, optimization The output of network is the high-definition picture that clearly unaffected factor influences." the base under a kind of real scene of the invention In the face identification method of deep learning " eliminate facial image in real scene by scale, block, expression, dressing, illumination Etc. factors influence, improve real scene human face detection accuracy rate, solve the Face datection of existing deep learning Technology is not suitable for the Face datection problem under real scene, promotes the development of the human face detection tech under real scene, is Human face detection tech plays certain impetus from laboratory to practical application.

Table 1 optimizes network of network structure chart, wherein " Conv " refers to convolutional layer, " x8 " has referred to 8 identical volumes Lamination

The present invention is further explained in the light of specific embodiments, as shown in figure 3, first according to the actual demand of oneself Prepare training sample (present invention uses existing WIDER FACE database), then using the training sample training one prepared Human-face detector, the present invention directly adopt an existing MB-FCN human-face detector.Later, trained human-face detector pair The face location of each image that training sample is concentrated predicted, and according to the face location information interception face of prediction And inhuman face image (background), obtained face and inhuman face image will be as the training samples for generating confrontation network.Finally, sharp The face and inhuman face image for using these to intercept train generation confrontation network as training sample, and wherein generator is according to low point The input picture study of resolution generates corresponding high-definition picture, and optimization network is in the high-definition picture for generating network generation On the basis of further study obtain not by posture, block, the clearly high score that expression, dressing, the influence factors such as illumination are influenced Resolution image, discriminator will provide a more accurate Face datection knot according to the clearly high-definition picture ultimately generated Fruit.Every part is described in detail below:

Prepare training sample first.Training sample image can be collected voluntarily according to actual needs, and then be constructed corresponding Database can also select existing disclosed Face datection database, such as WIDER FACE, the databases such as FDDB.For convenience It is compared with other existing methods, the present invention is using the image in WIDER FACE data set as training sample and test specimens This.WIDER FACE data set is a Face datection benchmark database, and the image in database is from the WIDER announced It is select on data set, and these images are to capture to obtain under really scene, wherein having many faces is extremely Small (between 10-30 pixel), at the same these faces by block, expression, dressing, the factors such as illumination are influenced, this true field Face datection under scape proposes huge challenge to existing method for detecting human face.WIDER FACE data set includes 32203 A image and 393703 facial images, entire data set are constituted based on 61 event types, for each event type, Randomly select 40% data as training set, 10% data set as verifying collection, 50% data as test set.Together When, all images are divided into three class according to the size (50/30/10) of facial image by WIDER FACE data set, respectively It is simple/medium difficulty/more difficult (Easy/Medium/Hard).The present invention mainly overcomes those influence factors under real scene Face datection caused by difficulty, and then improve real scene under Face datection accuracy rate.

Training human-face detector.Using one human-face detector of above-mentioned ready training sample training, effect be for Subsequent generation confrontation network interception generates training sample, and the quality of human-face detector will directly influence generation confrontation network instruction Practice the quality of sample.Here human-face detector can be existing any human-face detector, and the present invention is by this Face datection Device does the accuracy rate of Face datection further as a reference line (Baseline) It is promoted.Since this human-face detector is not emphasis of the invention, so the present invention uses a ready-made human-face detector (MB-FCN), infrastructure network ResNet-50.In order to realize the detection of multiple dimensioned face, MB-FCN detector it is defeated There are multiple branches out, the recognition of face in some range scale is solved the problems, such as in each Bifurcation Set.In addition, micro- in order to realize The detection of small face, MB-FCN detector use Feature Fusion, i.e., by the shallow-layer feature of convolutional layer low layer (containing a large amount of Detailed information) and high-rise further feature (containing a large amount of semantic information) merged.In the present invention, MB-FCN face is utilized Detector makes a living into the detailed process that confrontation network interception generates training sample are as follows: for training sample, utilizes MB-FCN face The each image that detector concentrates WIDER FACE training sample carries out face location information prediction, and from each image Middle interception generation 600 may include region and the preservation of face, and the image that these are saved is as the instruction for generating confrontation network Practice sample；For test sample, the present invention handles each image in test set also with MB-FCN human-face detector, And from each image interception generate 600 may comprising face region and preservation, these save images finally by It generates the generator in confrontation network and obtains corresponding high-resolution (4 times of up-samplings) image, then obtained by optimizing network Relatively sharp image show that these high-resolution images are facial image, Jin Ershi on earth finally by discriminator The function of the Face datection under real scene is showed.

Building generates confrontation training sample.It include the image of face, the present invention for the possibility that above-mentioned interception generates The image of each generation and the face location true value (ground-truth bounding boxes) artificially marked are sought first Overlapping area (intersection of union, IOU), if IOU be greater than 0.5 if be labeled as positive sample (face), if IOU is less than 0.45 labeled as negative sample (non-face, i.e. background).According to the above method, the present invention is obtained 1,075,968 Positive sample and 1,626,328 negative samples.It is adopted on 4 times since the image that the generation confrontation network in the present invention generates realizes Sample, so needing corresponding low resolution and high-definition picture as training sample when training.Herein, the present invention is by MB-FCN The image that detector interception generates is as high-definition picture, by these high-definition pictures bilinear interpolation (bi- Linear interpolation method) image that obtains again of down-sampling 4 is as corresponding low-resolution image.

Generator.The effect of generator is to generate corresponding high-resolution according to the training study of the image of the low resolution of input Rate image, so that discriminator, which passes through the high-definition picture generated, more easily determines the low-resolution image inputted Classification (face/non-face).Generator is a deep learning network in the present invention, wherein including two deconvolution networks, often 2 times of up-samplings of a deconvolution network implementations, the resolution ratio of the output image of generator network entire in this way will be input picture 4 times.

Optimize network.The main function of optimization network has: first, the more fuzzy image that generator generates passes through optimization The processing of network becomes relatively sharp, plays the role of deblurring；Second, optimization network can reduce block, expression, dressing, The influence of the factors such as illumination, reach block, expression is neutralized, is gone dressing, is gone the effects of illumination.As shown in table 1, optimize network Also a deep learning network, whole network include 8 convolution modules and 4 convolutional layers with residual error form, all volumes The convolution step-length of lamination is all 1, so the resolution ratio of optimization network output image and the resolution ratio of generator output image are phases With, the output image for only optimizing network is relatively sharp compared with the output image of generator.

Discriminator.The master network structure of discriminator uses VGG19 network.It is excessive in order to avoid being brought by convolutional calculation Down-sampling operation, the present invention eliminate the pond layer (max-pooling layer) in " conv5 ".In addition, identifying to realize Device can differentiate that input picture is true high-definition picture or the image (true/false) for having generator to synthesize and sentences simultaneously Disconnected input picture is face (face/non-face), and present invention removes the full articulamentums of whole in VGG19 network, i.e., Fc6, fc7, fc8, and replace with two parallel volume bases, respectively fc_GAN, fc_clc.Wherein fc_GANEffect be differentiate input Image is true high-definition picture or the high-definition picture (true/false) for having generator synthesis, fc_clcEffect be sentence Disconnected input picture is face (face/non-face).

Training generates confrontation network.Using the generator built, optimizes network, discriminator network structure and marked Positive negative training sample can train generation confrontation network.The present invention is by allowing generator network+optimization network and mirror The mutual game of other device network, alternative optimization mode carry out learning training and entirely generate confrontation network.Generator network+optimization net As input, output result needs to imitate as far as possible true in high-resolution sample set stochastical sampling network from low resolution sample Real sample.The input of discriminator network is then true high-resolution sample or the height of generator+optimization network output synthesis Image in different resolution, the purpose is to the output of generator+optimization network is distinguished while being differentiated as far as possible from authentic specimen High-definition picture is face, and generator+optimization network will then cheat differentiation network as much as possible.Two groups of networks are mutual Confrontation, continuous adjusting parameter, final purpose are to make to differentiate network can not judge whether generation+optimization network output result is true It is real, and then reach generator+optimization network and can produce clearly high-definition picture, while discriminator can be differentiated accurately Input picture is the purpose of face.In the present invention, the network parameter of generator network and optimization network is from original initialization Start to train, standard deviation is used to initialize convolution nuclear parameter (weight) for 0.02 Gaussian Profile, deviation (bias) is initial Turn to 0.In order to avoid generator+optimization network is stuck in local best points, the present invention applies input picture and output figure first As the least square error of pixel removes one generator network+optimization network of training as objective function, then using trained Model deinitialization generator+optimization network network parameter.The network parameter of discriminator utilizes on ImageNet data set The good model of pre-training is initialized, for newly-increased full articulamentum fc_GANAnd fc_clc, the Gauss point for being 0.1 with standard deviation Cloth deinitialization, while deviation (bias) is initialized as 0.In addition, the loss function for optimizing network is introduced into target by the present invention In function so that generator+optimization network output image is more clear, and then reach discriminator be more easier to differentiate it is true/false With face/non-face purpose.When training whole network, each mini-batch includes 64 images, and positive and negative sample proportion is 1:1, Total the number of iterations 6 takes turns (epoch), and the learning rate of preceding 3 wheel iteration is 0.0001, and the learning rate of rear 3 wheel iteration is 0.00001.

Trained Face datection network through the above steps, not by existing human face detection tech be easy by scale, block, The face inspection under real scene may be implemented in the influence of the factors such as expression, dressing, illumination and the limitation for causing accuracy rate low It surveys." a kind of real scene under the face identification method based on deep learning " Face detection precision is quasi- experiments have shown that of the invention Really, while detection efficiency is high, and table 2 is experimental result correlation data, and wherein mAP is Average Accuracy (mean Average It Precision), is the index assessed the superiority and inferiority of training network.As can be seen that the present invention mentions from correlation data " face identification method based on deep learning under real scene " out is than the identification of benchmark (Baseline) human-face detector Rate has a very big promotion, has reached current highest 94.4%/93.3%87.3% of testing result.Meanwhile the present invention is logical Crossing comparative test (without refinement network vs.ours) proves, in the case where no optimization network, people The discrimination of face detection has decline, respectively 0.4%/0.4%/1.0% on Easy/Medium/Hard, thus may card Bright optimization network plays an important role in the Face datection under real scene.

2 experimental result correlation data of table

The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to The protection scope of the appended claims of the present invention.

Claims

1. the face identification method based on deep learning under a kind of real scene, comprising:

Step 1 establishes tranining database；

Step 2 using the face location of each image in human-face detector prediction tranining database, and intercepts and obtains first High-resolution human face image and the inhuman face image of the first high-resolution；And handle the first high-resolution human face image and The inhuman face image of one high-resolution obtains corresponding low-resolution face image and the inhuman face image of low resolution；

Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator；Wherein generator is defeated Enter the low-resolution face image and the inhuman face image of low resolution obtained for step 2, exports as the second high-resolution human face Image and the inhuman face image of the second high-resolution；The input of discriminator is the first high-resolution human face image, the first high-resolution First output of the inhuman face image of rate, the second high-resolution human face image, the inhuman face image of the second high-resolution, discriminator is Input picture belongs to the Probability p of facial image₁, second output is the Probability p that input picture is true picture₂；Wherein generator It further comprise sequentially connected up-sampling network and optimization network, up-sampling network is the input terminal of generator, up-samples net Input of the output result of network as optimization network, optimization network are the output end of generator；

The network structure of discriminator are as follows:

1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；

1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；

1 convolution kernel number is 256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；

1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 2；

1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；And

2 parallel convolutional layers are true high-definition picture particularly for differentiation input picture or are closed by generator At high-definition picture the first convolutional layer fc_GANWith for judge input picture whether be face the second convolutional layer fc_clc；

Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution, low point obtained in step 2 The inhuman face image of resolution facial image, low resolution is trained generation confrontation network；

Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and by face candidate area Domain is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and is inputting The region that p is greater than preset threshold is drawn in image；

It is characterized in that, generating the loss function of confrontation network are as follows:

Wherein θ, ω are respectively the network parameter of discriminator and generator, D_θ(),G_ω() is the function of discriminator and generator respectively It can function；It is the low-resolution image and corresponding high-definition picture of input respectively；y_iFor the label of input picture, y_i=1, y_i=0 to respectively represent input picture be face and non-face；α, β are that loss function, Pixel-level damage are fought in objective function The weight distribution coefficient of function and Classification Loss function is lost, N is training sample sum；ω₁For the network ginseng for up-sampling network Number, ω₂For optimize network network parameter,It is the power function for up-sampling network,It is the power function for optimizing network.

2. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step Use WIDER FACE database as tranining database in rapid one.

3. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step In rapid one, picture construction training number of the size of facial image in WIDER FACE database between 10 to 30 pixels is used According to library.

4. the face identification method as claimed in any of claims 1 to 3 based on deep learning, which is characterized in that Step 2 specifically:

Using the face location of each image in human-face detector prediction tranining database, obtain predetermined quantity for indicating The indicia framing of face location is intercepted to obtain the first high-resolution human face figure according to the size and location of indicia framing in the picture Picture and the inhuman face image of the first high-resolution；

4 times of down-sampled processing the first high-resolution human face images and the first high-resolution are carried out using bilinear interpolation Inhuman face image obtains corresponding low-resolution face image and the inhuman face image of low resolution.

5. the face identification method according to claim 4 based on deep learning, which is characterized in that described in step 2 Human-face detector is the depth residual error network of ResNet-50 structure.

6. the face identification method according to claim 1 based on deep learning, which is characterized in that in step 3, above adopt The network structure of sample network are as follows:

1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；

8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；

The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 2；

The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3；And

1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1；

Optimize the network structure of network are as follows:

8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；

1 convolution sum number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1；

1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.