CN108334847B - A kind of face identification method based on deep learning under real scene - Google Patents

A kind of face identification method based on deep learning under real scene Download PDF

Info

Publication number
CN108334847B
CN108334847B CN201810119263.2A CN201810119263A CN108334847B CN 108334847 B CN108334847 B CN 108334847B CN 201810119263 A CN201810119263 A CN 201810119263A CN 108334847 B CN108334847 B CN 108334847B
Authority
CN
China
Prior art keywords
face
network
image
resolution
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810119263.2A
Other languages
Chinese (zh)
Other versions
CN108334847A (en
Inventor
张永强
丁明理
白延成
李贤�
杨光磊
董娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201810119263.2A priority Critical patent/CN108334847B/en
Publication of CN108334847A publication Critical patent/CN108334847A/en
Application granted granted Critical
Publication of CN108334847B publication Critical patent/CN108334847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • G06T3/4076Super resolution, i.e. output image resolution higher than sensor resolution by iteratively correcting the provisional high resolution image using the original low-resolution image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The present invention provides the face identification method based on deep learning under a kind of real scene, it is that single factors influence can only be solved for the face identification method solved under existing real scene, it is proposed without can solve the influence of the factors such as posture, illumination, it include: to predict the face location of each picture in tranining database using an existing human-face detector, and intercept and save true face and inhuman face image;Corresponding low-resolution image is obtained according to facial image and non-face picture are down-sampled;Building generates confrontation network, and generating confrontation network includes generator and discriminator;Generator further comprises up-sampling network and optimization network;It is trained using high-resolution human face, inhuman face image and corresponding low resolution face, inhuman face image to confrontation network is generated;Mark the position of face in input picture to the score from the face candidate region that existing human-face detector obtains according to discriminator.The present invention is suitable for the recognition detection of face.

Description

A kind of face identification method based on deep learning under real scene
Technical field
The present invention relates to field of face identification, and in particular to the recognition of face based on deep learning under a kind of real scene Method.
Background technique
With the development of the applications such as e-commerce, recognition of face becomes most potential biometric verification of identity means, this Application background requires Automatic face recognition system that can have certain recognition capability to the facial image under real scene, thus The a series of problems faced makes attention of the Face datection initially as an independent project by researcher.In addition, true Face recognition technology under real field scape has urgent application demand in numerous areas such as security protection, criminal investigation, search and rescue.
There are very important basic research value and urgent application in field of machine vision based on Face datection Demand is also evolving update for the relevant art of Face datection.However current method for detecting human face be all mostly It being carried out on the image of posing for photograph of laboratory ideally, these images of posing for photograph have a characteristic that first, and face is larger, and Positioned at the center of image;Second, background is more clean simple.However the image in reality scene, face are usually extremely small And background is complex, while may be subjected to scale, posture, blocking, expression, dressing, the influence of the factors such as illumination. According in relation to studies have shown that these influence factors bring great challenge to the face identification method under real scene really, Tremendous influence is caused to the accuracy rate of the Face datection under real scene.In view of the above-mentioned problems, some be directed to real scene In face identification method proposed that but the face identification method under these real scenes at this stage can only solve list in succession The problem of one factor influences, such as the influence of scale bring can be mitigated based on the face recognition technology for generating confrontation network, but It is the influence that not can solve the other influences factor such as posture, illumination;Face identification method based on face alignment method can mitigate Posture bring influences, but not can solve the influence of the other factors such as scale, fuzzy.
Summary of the invention
The purpose of the present invention is to solve the face identification methods under existing real scene can only solve single factors shadow It rings, the influence without can solve the other influences factor such as posture, illumination;And the face identification method based on face alignment method can be with The shortcomings that mitigating the influence of posture bring, but not can solve the influence of other factors such as scale, fuzzy, and propose a kind of true The face identification method based on deep learning under scene, comprising:
Step 1 establishes tranining database;
Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained First high-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image with And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution;
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution Facial image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, first high The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated Belong to the Probability p of facial image for input picture out1, second output is the Probability p that input picture is true picture2;Wherein reflect Other device further comprises being sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted Input of the output result of sample network as optimization network, optimization network are the output end of generator;
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2, The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network;
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and The region that p is greater than preset threshold is drawn in input picture.
The invention has the benefit that
1, can mitigate simultaneously scale, block, expression, dressing, the factors bring such as illumination influence, realize real scene In Face datection so that test object is no longer limited to the single Face datection under simple background, in particular so that face is examined Survey technology is not only applicable to the image of posing for photograph under laboratory scene;
2, overcoming human face detection tech at this stage is really being discrimination is low in scene difficulty, is promoted based on depth Application of the method for detecting human face of study under real scene;
3, present invention introduces an optimization network, the high-definition picture for generating that generator generates in confrontation network is solved Fuzzy problem, while optimizing the image that network can overcome generator to generate and being also stranded by what above-mentioned many influence factors were influenced It is difficult;
4, " face identification method based on deep learning under real scene " proposed by the present invention compares benchmark (Baseline) discrimination of human-face detector has a very big promotion, reached current highest testing result 94.4%/ 93.3%87.3%.
Detailed description of the invention
Fig. 1 is that wherein every vertical curve is most for the experimental data figure that influences on Face datection accuracy rate of each influence factor High point indicates the face recognition accuracy rate that one embodiment of the present of invention can be realized, and minimum point is indicated can using conventional method The accuracy rate reached;When intermediate level curve indicates the image that processing is influenced by scale, one embodiment of the present of invention and existing There is the mean value that technology can achieve the effect that;
Fig. 2 is the schematic diagram for generating confrontation network in the prior art;
Fig. 3 is the signal of the face identification method based on deep learning under the real scene of one embodiment of the invention Figure;Wherein 1 ThestBranch refers to the 1st branch of human-face detector;The KthBranch refers to human-face detector K-th of branch;Input refers to input;Conv, Conv1 ... Conv5 are the different convolutional layer of serial number;Resideual Blocks is one of the ResNet network that one embodiment of the invention uses convolutional layer;De-Conv is warp lamination; Sigmoid is excitation function;LR is low-resolution image;SR is the high resolution graphics that generator is generated by low-resolution image Picture;HR is high-resolution true picture;Face is facial image;Non-Face is inhuman face image;
Fig. 4 is the flow chart of one embodiment of the invention.
Specific embodiment
Specific embodiment 1: the face identification method based on deep learning under the real scene of present embodiment, such as Shown in Fig. 4, comprising:
Step 1 establishes tranining database.Such as can using WIDER FACE database as tranining database, or Picture construction tranining database of the size of facial image between 10 to 30 pixels, sets in this way in WIDER FACE database The benefit set is the more difficult small Face datection problem that can solve face between 10-30 pixel.Present embodiment is also supported User oneself constructs database by acquiring the image of real scene.
Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained First high-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image with And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution.Wherein face is examined Surveying implement body can be used the depth residual error network of ResNet-50 structure.
Step 2 may further are as follows: uses each image in existing human-face detector prediction tranining database Face location, obtain predetermined quantity for indicating the indicia framing of face location, the size and location according to indicia framing are being schemed It is intercepted to obtain the first high-resolution human face image and the inhuman face image of the first high-resolution as in;Reuse bilinear interpolation The down-sampled processing of 4 times of method progress the first high-resolution human face image and the inhuman face image of the first high-resolution obtain low point Resolution facial image and the inhuman face image of low resolution." 4 times down-sampled " can be understood as the picture in a region 32*32 Element is indicated using 4*4 pixel.
Because the image in step 1 in tranining database is generally high-definition picture, and needs benefit in the next steps High-resolution image is generated by low-resolution image with generator, therefore needs to obtain direct from database in step 2 The high definition facial image and non-face image procossing arrived is low-resolution image.
The face location that human-face detector detects is assumed in a rectangle frame, then the rectangle frame upper left corner can be used Transverse and longitudinal coordinate and 4 tuples of lower right corner transverse and longitudinal coordinate composition indicate, the tool that can be indicated simultaneously where face is arranged in this way Body position and box size.Those skilled in the art it is also envisioned that the position of face can also be expressed using other modes, as long as The size and specific location of box can be given expression to, the present invention is with no restriction.
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution Facial image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, first high The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated Belong to the Probability p of facial image for input picture out1, second output is the Probability p that input picture is true picture2;Wherein give birth to Growing up to be a useful person further comprises sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted Input of the output result of sample network as optimization network, optimization network are the output end of generator.
Wherein up-sample the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length For 1 convolutional layer;8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution nucleus number Mesh is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution kernel number is 256, convolution kernel size is 3, the warp lamination that convolution step-length is 2;1 convolution kernel number is 256, the warp that convolution kernel size is 3, convolution step-length is 3 Lamination;And 1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1.
Optimize the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length is 1 Convolutional layer;8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution sum number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution kernel number is 256, convolution kernel size is 3, volume The warp lamination that product step-length is 2;The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3; And 1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.
The network structure of discriminator are as follows: 1 convolution kernel number is 64, the volume that convolution kernel size is 3, convolution step-length is 2 Lamination;1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;1 convolution kernel number is 256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;1 convolution kernel number is 512, convolution kernel size is 3, volume The convolutional layer that product step-length is 2;1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;And 2 parallel convolutional layers are true high-definition pictures particularly for differentiation input picture or have generator synthesis First convolutional layer fc of high-definition pictureGANWith for judge input picture whether be face the second convolutional layer fcclc
Wherein generate loss function used in confrontation network are as follows:
WhereinIndicate confrontation loss function,Indicate Pixel-level Loss function,Indicate the loss function of optimization network function, Presentation class loss function;
Wherein θ, ω are respectively the network parameter of discriminator and generator, Dθ(),Gω() is discriminator and generator respectively Power function;It is the low-resolution image and corresponding high-definition picture of input respectively;yiFor the mark of input picture Label, yi=1, yi=0 to respectively represent input picture be face and non-face;α, β are confrontation loss functions, pixel in objective function The weight distribution coefficient of grade loss function and Classification Loss function, N are training sample sums;ω1For the net for up-sampling network Network parameter, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the function of optimizing network Function.
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2, The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network.
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and The region that p is greater than preset threshold is drawn in input picture.Probability p herein is the probability in actual test, and in step 3 p1And p2It is the probability generated in training process, meaning is different.
For example, human-face detector interception facial image simultaneously records face former defeated after inputting an image into human-face detector Coordinate information where entering in image, the Probability p of output1The position of a facial image is corresponded to, can be used five yuan Group (x1, y1, x2, y2, p1) Lai Jilu, the top left co-ordinate of rectangle frame where wherein x1 and y1 can be face, x2 and y2 can To be rectangle frame bottom right angular coordinate, then differentiate five-tuple in whether p1Meet certain threshold condition, if met, basis Face location is marked by coordinate information x1, y1, x2, y2 in original input image.
Mentality of designing and principle of the invention further described below:
Present invention is generally directed to the deficiency of existing human face detection tech, solves the human face detection tech based on deep learning The problem that single influence factor influences in many influence factors under real scene can only be mitigated, overcome existing Face datection skill Art low difficulty of discrimination under really scene provides the recognition of face side based on deep learning under a kind of real scene Method.Facial image can be overcome simultaneously using " face identification method based on deep learning under real scene " in the present invention By scale, posture, block, expression, dressing, the factors such as illumination are influenced the problem of, realize the face inspection in real scene It surveys, so that test object is no longer limited to the single Face datection under simple background, in particular so that human face detection tech is not only Suitable for the image of posing for photograph under laboratory scene.This method is on the basis of " based on the face recognition technology for generating confrontation network " On, an optimization network (Refinement Network) is introduced, solves the high score for generating that generator generates in confrontation network The blurred image problem of resolution, while optimizing network can overcome the image of generator generation also by above-mentioned many influence factor shadows Loud difficulty.
The present invention with the image (video frame) under real scene be research object, in real scene, facial image in addition to It by scale, posture, blocks, outside expression, dressing, the factors such as illumination are influenced, there are also following features: first, image capture device Cause face extremely small farther out apart from face;Second, due to the quick movement of camera or people cause captured image compared with Be it is fuzzy,.In view of the above-mentioned problems, most representative solution is " based on the recognition of face for generating confrontation network at this stage Technology ", but be usually relatively fuzzyyer, reason based on the high-definition picture that confrontation network generates is generated are as follows: first, due to The objective function of generator is usually the least square error for generating the pixel of image and input picture, leads to the image generated excessively Smoothly;Second, when target face too small (10*10 pixel), details letter would generally be lacked by generating the image that confrontation network generates Breath;Third, (as quickly movement leads to image fault) in the case where input picture distortion itself, generating confrontation network can not be given birth to At corresponding clearly high-definition picture.
For the above problem existing for " based on the face recognition technology for generating confrontation network ", the present invention is generating confrontation net An optimization network is introduced in the generator network of network, the network structure for optimizing network is as shown in table 1.Optimize the main work of network With are as follows: first, the fuzzy high-definition picture generated for generating confrontation network optimizes network implementations de-fuzzy;Second, Input picture by block, illumination, dressing, the factors such as expression are influenced when, optimization network can play block, go illumination, The effects of going dressing, expression neutralized is illuminated by the light, the figure that dressing, the factors such as expression influence some fuzzy especially severes As after the processing for optimizing network, available corresponding clearly image is based on these clearly unaffected factor shadows The differentiation of the enterprising pedestrian's face of loud image will become more simple.
In addition, the loss function for optimizing network is introduced into the objective function for generating confrontation network by the present invention, so that raw Grow up to be a useful person generation high-definition picture include more detailed information, be more clear so that discriminator be more easier to determine it is defeated The image entered is facial image.In " based on the face recognition technology for generating confrontation network ", objective function is by right Anti- loss function (adversarial loss), Pixel-level loss function (pixel-wise loss) and Classification Loss function (classification loss) composition, can indicate are as follows:
First item is confrontation loss function (adversarial loss) in formula, and Section 2 is Pixel-level loss function (pixel-wise loss), Section 3 are Classification Loss function (classification loss), wherein θ, and ω is respectively to reflect The network parameter of other device and generator, Dθ(),Gω() is the power function of discriminator and generator respectively,It is defeated respectively The low-resolution image and corresponding high-definition picture entered, yiFor the label (y of input picturei=1, yi=0 respectively represent it is defeated Entering image is face and non-face), α, β are that loss function, Pixel-level loss function and Classification Loss are fought in objective function The weight distribution coefficient of function, N are training sample sums.It is more clear to make to generate the image that network generates, mitigates simultaneously It blocks, expression, dressing, the influence of the influence factors such as illumination, the present invention is modified the objective function for generating confrontation network, will The loss function of optimization network is introduced into the objective function of whole network, and new objective function can indicate are as follows:
In formula,It is the loss function for optimizing network, wherein ω1For the network for up-sampling network Parameter, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the function letter for optimizing network Number.
To sum up, an optimization network is introduced into " based on the face recognition technology for generating confrontation network " by the present invention, is mentioned The Face datection frame of a kind of novel " based on confrontation network+optimization network is generated " is gone out.In addition, comprehensively considering existing The deficiency of " based on the human face detection tech for generating confrontation network " and the actual demand of real scene human face detection, the present invention couple The existing objective function based on the method for detecting human face for generating confrontation network is modified.In the present invention, confrontation network is generated Input be under real scene various influence factors effect under low-resolution image, the output of generator is fuzzy have The high-definition picture that various influence factors influence, the input as optimization network simultaneously of this high-resolution image, optimization The output of network is the high-definition picture that clearly unaffected factor influences." the base under a kind of real scene of the invention In the face identification method of deep learning " eliminate facial image in real scene by scale, block, expression, dressing, illumination Etc. factors influence, improve real scene human face detection accuracy rate, solve the Face datection of existing deep learning Technology is not suitable for the Face datection problem under real scene, promotes the development of the human face detection tech under real scene, is Human face detection tech plays certain impetus from laboratory to practical application.
Table 1 optimizes network of network structure chart, wherein " Conv " refers to convolutional layer, " x8 " has referred to 8 identical volumes Lamination
<embodiment>
The present invention is further explained in the light of specific embodiments, as shown in figure 3, first according to the actual demand of oneself Prepare training sample (present invention uses existing WIDER FACE database), then using the training sample training one prepared Human-face detector, the present invention directly adopt an existing MB-FCN human-face detector.Later, trained human-face detector pair The face location of each image that training sample is concentrated predicted, and according to the face location information interception face of prediction And inhuman face image (background), obtained face and inhuman face image will be as the training samples for generating confrontation network.Finally, sharp The face and inhuman face image for using these to intercept train generation confrontation network as training sample, and wherein generator is according to low point The input picture study of resolution generates corresponding high-definition picture, and optimization network is in the high-definition picture for generating network generation On the basis of further study obtain not by posture, block, the clearly high score that expression, dressing, the influence factors such as illumination are influenced Resolution image, discriminator will provide a more accurate Face datection knot according to the clearly high-definition picture ultimately generated Fruit.Every part is described in detail below:
Prepare training sample first.Training sample image can be collected voluntarily according to actual needs, and then be constructed corresponding Database can also select existing disclosed Face datection database, such as WIDER FACE, the databases such as FDDB.For convenience It is compared with other existing methods, the present invention is using the image in WIDER FACE data set as training sample and test specimens This.WIDER FACE data set is a Face datection benchmark database, and the image in database is from the WIDER announced It is select on data set, and these images are to capture to obtain under really scene, wherein having many faces is extremely Small (between 10-30 pixel), at the same these faces by block, expression, dressing, the factors such as illumination are influenced, this true field Face datection under scape proposes huge challenge to existing method for detecting human face.WIDER FACE data set includes 32203 A image and 393703 facial images, entire data set are constituted based on 61 event types, for each event type, Randomly select 40% data as training set, 10% data set as verifying collection, 50% data as test set.Together When, all images are divided into three class according to the size (50/30/10) of facial image by WIDER FACE data set, respectively It is simple/medium difficulty/more difficult (Easy/Medium/Hard).The present invention mainly overcomes those influence factors under real scene Face datection caused by difficulty, and then improve real scene under Face datection accuracy rate.
Training human-face detector.Using one human-face detector of above-mentioned ready training sample training, effect be for Subsequent generation confrontation network interception generates training sample, and the quality of human-face detector will directly influence generation confrontation network instruction Practice the quality of sample.Here human-face detector can be existing any human-face detector, and the present invention is by this Face datection Device does the accuracy rate of Face datection further as a reference line (Baseline) It is promoted.Since this human-face detector is not emphasis of the invention, so the present invention uses a ready-made human-face detector (MB-FCN), infrastructure network ResNet-50.In order to realize the detection of multiple dimensioned face, MB-FCN detector it is defeated There are multiple branches out, the recognition of face in some range scale is solved the problems, such as in each Bifurcation Set.In addition, micro- in order to realize The detection of small face, MB-FCN detector use Feature Fusion, i.e., by the shallow-layer feature of convolutional layer low layer (containing a large amount of Detailed information) and high-rise further feature (containing a large amount of semantic information) merged.In the present invention, MB-FCN face is utilized Detector makes a living into the detailed process that confrontation network interception generates training sample are as follows: for training sample, utilizes MB-FCN face The each image that detector concentrates WIDER FACE training sample carries out face location information prediction, and from each image Middle interception generation 600 may include region and the preservation of face, and the image that these are saved is as the instruction for generating confrontation network Practice sample;For test sample, the present invention handles each image in test set also with MB-FCN human-face detector, And from each image interception generate 600 may comprising face region and preservation, these save images finally by It generates the generator in confrontation network and obtains corresponding high-resolution (4 times of up-samplings) image, then obtained by optimizing network Relatively sharp image show that these high-resolution images are facial image, Jin Ershi on earth finally by discriminator The function of the Face datection under real scene is showed.
Building generates confrontation training sample.It include the image of face, the present invention for the possibility that above-mentioned interception generates The image of each generation and the face location true value (ground-truth bounding boxes) artificially marked are sought first Overlapping area (intersection of union, IOU), if IOU be greater than 0.5 if be labeled as positive sample (face), if IOU is less than 0.45 labeled as negative sample (non-face, i.e. background).According to the above method, the present invention is obtained 1,075,968 Positive sample and 1,626,328 negative samples.It is adopted on 4 times since the image that the generation confrontation network in the present invention generates realizes Sample, so needing corresponding low resolution and high-definition picture as training sample when training.Herein, the present invention is by MB-FCN The image that detector interception generates is as high-definition picture, by these high-definition pictures bilinear interpolation (bi- Linear interpolation method) image that obtains again of down-sampling 4 is as corresponding low-resolution image.
Generator.The effect of generator is to generate corresponding high-resolution according to the training study of the image of the low resolution of input Rate image, so that discriminator, which passes through the high-definition picture generated, more easily determines the low-resolution image inputted Classification (face/non-face).Generator is a deep learning network in the present invention, wherein including two deconvolution networks, often 2 times of up-samplings of a deconvolution network implementations, the resolution ratio of the output image of generator network entire in this way will be input picture 4 times.
Optimize network.The main function of optimization network has: first, the more fuzzy image that generator generates passes through optimization The processing of network becomes relatively sharp, plays the role of deblurring;Second, optimization network can reduce block, expression, dressing, The influence of the factors such as illumination, reach block, expression is neutralized, is gone dressing, is gone the effects of illumination.As shown in table 1, optimize network Also a deep learning network, whole network include 8 convolution modules and 4 convolutional layers with residual error form, all volumes The convolution step-length of lamination is all 1, so the resolution ratio of optimization network output image and the resolution ratio of generator output image are phases With, the output image for only optimizing network is relatively sharp compared with the output image of generator.
Discriminator.The master network structure of discriminator uses VGG19 network.It is excessive in order to avoid being brought by convolutional calculation Down-sampling operation, the present invention eliminate the pond layer (max-pooling layer) in " conv5 ".In addition, identifying to realize Device can differentiate that input picture is true high-definition picture or the image (true/false) for having generator to synthesize and sentences simultaneously Disconnected input picture is face (face/non-face), and present invention removes the full articulamentums of whole in VGG19 network, i.e., Fc6, fc7, fc8, and replace with two parallel volume bases, respectively fcGAN, fcclc.Wherein fcGANEffect be differentiate input Image is true high-definition picture or the high-definition picture (true/false) for having generator synthesis, fcclcEffect be sentence Disconnected input picture is face (face/non-face).
Training generates confrontation network.Using the generator built, optimizes network, discriminator network structure and marked Positive negative training sample can train generation confrontation network.The present invention is by allowing generator network+optimization network and mirror The mutual game of other device network, alternative optimization mode carry out learning training and entirely generate confrontation network.Generator network+optimization net As input, output result needs to imitate as far as possible true in high-resolution sample set stochastical sampling network from low resolution sample Real sample.The input of discriminator network is then true high-resolution sample or the height of generator+optimization network output synthesis Image in different resolution, the purpose is to the output of generator+optimization network is distinguished while being differentiated as far as possible from authentic specimen High-definition picture is face, and generator+optimization network will then cheat differentiation network as much as possible.Two groups of networks are mutual Confrontation, continuous adjusting parameter, final purpose are to make to differentiate network can not judge whether generation+optimization network output result is true It is real, and then reach generator+optimization network and can produce clearly high-definition picture, while discriminator can be differentiated accurately Input picture is the purpose of face.In the present invention, the network parameter of generator network and optimization network is from original initialization Start to train, standard deviation is used to initialize convolution nuclear parameter (weight) for 0.02 Gaussian Profile, deviation (bias) is initial Turn to 0.In order to avoid generator+optimization network is stuck in local best points, the present invention applies input picture and output figure first As the least square error of pixel removes one generator network+optimization network of training as objective function, then using trained Model deinitialization generator+optimization network network parameter.The network parameter of discriminator utilizes on ImageNet data set The good model of pre-training is initialized, for newly-increased full articulamentum fcGANAnd fcclc, the Gauss point for being 0.1 with standard deviation Cloth deinitialization, while deviation (bias) is initialized as 0.In addition, the loss function for optimizing network is introduced into target by the present invention In function so that generator+optimization network output image is more clear, and then reach discriminator be more easier to differentiate it is true/false With face/non-face purpose.When training whole network, each mini-batch includes 64 images, and positive and negative sample proportion is 1:1, Total the number of iterations 6 takes turns (epoch), and the learning rate of preceding 3 wheel iteration is 0.0001, and the learning rate of rear 3 wheel iteration is 0.00001.
Trained Face datection network through the above steps, not by existing human face detection tech be easy by scale, block, The face inspection under real scene may be implemented in the influence of the factors such as expression, dressing, illumination and the limitation for causing accuracy rate low It surveys." a kind of real scene under the face identification method based on deep learning " Face detection precision is quasi- experiments have shown that of the invention Really, while detection efficiency is high, and table 2 is experimental result correlation data, and wherein mAP is Average Accuracy (mean Average It Precision), is the index assessed the superiority and inferiority of training network.As can be seen that the present invention mentions from correlation data " face identification method based on deep learning under real scene " out is than the identification of benchmark (Baseline) human-face detector Rate has a very big promotion, has reached current highest 94.4%/93.3%87.3% of testing result.Meanwhile the present invention is logical Crossing comparative test (without refinement network vs.ours) proves, in the case where no optimization network, people The discrimination of face detection has decline, respectively 0.4%/0.4%/1.0% on Easy/Medium/Hard, thus may card Bright optimization network plays an important role in the Face datection under real scene.
2 experimental result correlation data of table
The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to The protection scope of the appended claims of the present invention.

Claims (6)

1. the face identification method based on deep learning under a kind of real scene, comprising:
Step 1 establishes tranining database;
Step 2 using the face location of each image in human-face detector prediction tranining database, and intercepts and obtains first High-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image and The inhuman face image of one high-resolution obtains corresponding low-resolution face image and the inhuman face image of low resolution;
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator is defeated Enter the low-resolution face image and the inhuman face image of low resolution obtained for step 2, exports as the second high-resolution human face Image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, the first high-resolution First output of the inhuman face image of rate, the second high-resolution human face image, the inhuman face image of the second high-resolution, discriminator is Input picture belongs to the Probability p of facial image1, second output is the Probability p that input picture is true picture2;Wherein generator It further comprise sequentially connected up-sampling network and optimization network, up-sampling network is the input terminal of generator, up-samples net Input of the output result of network as optimization network, optimization network are the output end of generator;
The network structure of discriminator are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;And
2 parallel convolutional layers are true high-definition picture particularly for differentiation input picture or are closed by generator At high-definition picture the first convolutional layer fcGANWith for judge input picture whether be face the second convolutional layer fcclc
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution, low point obtained in step 2 The inhuman face image of resolution facial image, low resolution is trained generation confrontation network;
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and by face candidate area Domain is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and is inputting The region that p is greater than preset threshold is drawn in image;
It is characterized in that, generating the loss function of confrontation network are as follows:
WhereinIndicate confrontation loss function,Indicate Pixel-level loss Function,Indicate the loss function of optimization network function, Presentation class loss function;
Wherein θ, ω are respectively the network parameter of discriminator and generator, Dθ(),Gω() is the function of discriminator and generator respectively It can function;It is the low-resolution image and corresponding high-definition picture of input respectively;yiFor the label of input picture, yi=1, yi=0 to respectively represent input picture be face and non-face;α, β are that loss function, Pixel-level damage are fought in objective function The weight distribution coefficient of function and Classification Loss function is lost, N is training sample sum;ω1For the network ginseng for up-sampling network Number, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the power function for optimizing network.
2. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step Use WIDER FACE database as tranining database in rapid one.
3. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step In rapid one, picture construction training number of the size of facial image in WIDER FACE database between 10 to 30 pixels is used According to library.
4. the face identification method as claimed in any of claims 1 to 3 based on deep learning, which is characterized in that Step 2 specifically:
Using the face location of each image in human-face detector prediction tranining database, obtain predetermined quantity for indicating The indicia framing of face location is intercepted to obtain the first high-resolution human face figure according to the size and location of indicia framing in the picture Picture and the inhuman face image of the first high-resolution;
4 times of down-sampled processing the first high-resolution human face images and the first high-resolution are carried out using bilinear interpolation Inhuman face image obtains corresponding low-resolution face image and the inhuman face image of low resolution.
5. the face identification method according to claim 4 based on deep learning, which is characterized in that described in step 2 Human-face detector is the depth residual error network of ResNet-50 structure.
6. the face identification method according to claim 1 based on deep learning, which is characterized in that in step 3, above adopt The network structure of sample network are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 2;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3;And
1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1;
Optimize the network structure of network are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
1 convolution sum number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 2;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3;And
1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.
CN201810119263.2A 2018-02-06 2018-02-06 A kind of face identification method based on deep learning under real scene Active CN108334847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810119263.2A CN108334847B (en) 2018-02-06 2018-02-06 A kind of face identification method based on deep learning under real scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810119263.2A CN108334847B (en) 2018-02-06 2018-02-06 A kind of face identification method based on deep learning under real scene

Publications (2)

Publication Number Publication Date
CN108334847A CN108334847A (en) 2018-07-27
CN108334847B true CN108334847B (en) 2019-10-22

Family

ID=62928509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810119263.2A Active CN108334847B (en) 2018-02-06 2018-02-06 A kind of face identification method based on deep learning under real scene

Country Status (1)

Country Link
CN (1) CN108334847B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11263525B2 (en) 2017-10-26 2022-03-01 Nvidia Corporation Progressive modification of neural networks
US11250329B2 (en) 2017-10-26 2022-02-15 Nvidia Corporation Progressive modification of generative adversarial neural networks
CN109359559B (en) * 2018-09-27 2021-11-12 天津师范大学 Pedestrian re-identification method based on dynamic shielding sample
CN111222505A (en) * 2018-11-25 2020-06-02 杭州凝眸智能科技有限公司 Method and system for accurately detecting micro target
CN109685863A (en) * 2018-12-11 2019-04-26 帝工(杭州)科技产业有限公司 A method of rebuilding medicine breast image
CN109784349B (en) * 2018-12-25 2021-02-19 东软集团股份有限公司 Image target detection model establishing method, device, storage medium and program product
CN109753946A (en) * 2019-01-23 2019-05-14 哈尔滨工业大学 A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point
CN109815893B (en) * 2019-01-23 2021-03-26 中山大学 Color face image illumination domain normalization method based on cyclic generation countermeasure network
CN110197205B (en) * 2019-05-09 2022-04-22 三峡大学 Image identification method of multi-feature-source residual error network
CN111242837B (en) * 2020-01-03 2023-05-12 杭州电子科技大学 Face anonymity privacy protection method based on generation countermeasure network
CN111414888A (en) * 2020-03-31 2020-07-14 杭州博雅鸿图视频技术有限公司 Low-resolution face recognition method, system, device and storage medium
CN111951373B (en) * 2020-06-30 2024-02-13 重庆灵翎互娱科技有限公司 Face image processing method and equipment
CN112069993B (en) * 2020-09-04 2024-02-13 西安西图之光智能科技有限公司 Dense face detection method and system based on five-sense organ mask constraint and storage medium
CN112288044B (en) * 2020-12-24 2021-07-27 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN113221626B (en) * 2021-03-04 2023-10-20 北京联合大学 Human body posture estimation method based on Non-local high-resolution network
CN113705341A (en) * 2021-07-16 2021-11-26 国家石油天然气管网集团有限公司 Small-scale face detection method based on generation countermeasure network
CN113553961B (en) * 2021-07-27 2023-09-05 北京京东尚科信息技术有限公司 Training method and device of face recognition model, electronic equipment and storage medium
CN113470027B (en) * 2021-09-03 2022-03-25 广东电网有限责任公司惠州供电局 Insulating sheath identification method, device, system and medium based on generation countermeasure
CN113688799B (en) * 2021-09-30 2022-10-04 合肥工业大学 Facial expression recognition method for generating confrontation network based on improved deep convolution
CN114359113A (en) * 2022-03-15 2022-04-15 天津市电子计算机研究所有限公司 Detection method and application system of face image reconstruction and restoration method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107194380A (en) * 2017-07-03 2017-09-22 上海荷福人工智能科技(集团)有限公司 The depth convolutional network and learning method of a kind of complex scene human face identification
CN107423701A (en) * 2017-07-17 2017-12-01 北京智慧眼科技股份有限公司 The non-supervisory feature learning method and device of face based on production confrontation network
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292813B (en) * 2017-05-17 2019-10-22 浙江大学 A kind of multi-pose Face generation method based on generation confrontation network
CN107341517B (en) * 2017-07-07 2020-08-11 哈尔滨工业大学 Multi-scale small object detection method based on deep learning inter-level feature fusion
CN107577985B (en) * 2017-07-18 2019-10-15 南京邮电大学 The implementation method of the face head portrait cartooning of confrontation network is generated based on circulation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN107194380A (en) * 2017-07-03 2017-09-22 上海荷福人工智能科技(集团)有限公司 The depth convolutional network and learning method of a kind of complex scene human face identification
CN107423701A (en) * 2017-07-17 2017-12-01 北京智慧眼科技股份有限公司 The non-supervisory feature learning method and device of face based on production confrontation network
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network;Christian Ledig 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;全文 *
基于生成式对抗网络的人脸识别开发;张卫 等;《电子世界》;20171023;全文 *

Also Published As

Publication number Publication date
CN108334847A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
CN108334847B (en) A kind of face identification method based on deep learning under real scene
CN108334848A (en) A kind of small face identification method based on generation confrontation network
CN107154023B (en) Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution
CN105069746B (en) Video real-time face replacement method and its system based on local affine invariant and color transfer technology
CN108875600A (en) A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN110175613A (en) Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN111898406B (en) Face detection method based on focus loss and multitask cascade
CN109711288A (en) Remote sensing ship detecting method based on feature pyramid and distance restraint FCN
CN110059694A (en) The intelligent identification Method of lteral data under power industry complex scene
CN109241871A (en) A kind of public domain stream of people&#39;s tracking based on video data
CN108564049A (en) A kind of fast face detection recognition method based on deep learning
CN106446930A (en) Deep convolutional neural network-based robot working scene identification method
CN110189255A (en) Method for detecting human face based on hierarchical detection
CN109753946A (en) A kind of real scene pedestrian&#39;s small target deteection network and detection method based on the supervision of body key point
CN110232387A (en) A kind of heterologous image matching method based on KAZE-HOG algorithm
Hsu et al. Deep hierarchical network with line segment learning for quantitative analysis of facial palsy
CN109766873A (en) A kind of pedestrian mixing deformable convolution recognition methods again
CN114758362B (en) Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding
CN109272487A (en) The quantity statistics method of crowd in a kind of public domain based on video
CN109584162A (en) A method of based on the image super-resolution reconstruct for generating network
CN110287806A (en) A kind of traffic sign recognition method based on improvement SSD network
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN106446890A (en) Candidate area extraction method based on window scoring and superpixel segmentation
CN107767416A (en) The recognition methods of pedestrian&#39;s direction in a kind of low-resolution image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant