CN108334847B - A kind of face identification method based on deep learning under real scene - Google Patents
A kind of face identification method based on deep learning under real scene Download PDFInfo
- Publication number
- CN108334847B CN108334847B CN201810119263.2A CN201810119263A CN108334847B CN 108334847 B CN108334847 B CN 108334847B CN 201810119263 A CN201810119263 A CN 201810119263A CN 108334847 B CN108334847 B CN 108334847B
- Authority
- CN
- China
- Prior art keywords
- face
- network
- image
- resolution
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
- G06T3/4076—Super resolution, i.e. output image resolution higher than sensor resolution by iteratively correcting the provisional high resolution image using the original low-resolution image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Abstract
The present invention provides the face identification method based on deep learning under a kind of real scene, it is that single factors influence can only be solved for the face identification method solved under existing real scene, it is proposed without can solve the influence of the factors such as posture, illumination, it include: to predict the face location of each picture in tranining database using an existing human-face detector, and intercept and save true face and inhuman face image;Corresponding low-resolution image is obtained according to facial image and non-face picture are down-sampled;Building generates confrontation network, and generating confrontation network includes generator and discriminator;Generator further comprises up-sampling network and optimization network;It is trained using high-resolution human face, inhuman face image and corresponding low resolution face, inhuman face image to confrontation network is generated;Mark the position of face in input picture to the score from the face candidate region that existing human-face detector obtains according to discriminator.The present invention is suitable for the recognition detection of face.
Description
Technical field
The present invention relates to field of face identification, and in particular to the recognition of face based on deep learning under a kind of real scene
Method.
Background technique
With the development of the applications such as e-commerce, recognition of face becomes most potential biometric verification of identity means, this
Application background requires Automatic face recognition system that can have certain recognition capability to the facial image under real scene, thus
The a series of problems faced makes attention of the Face datection initially as an independent project by researcher.In addition, true
Face recognition technology under real field scape has urgent application demand in numerous areas such as security protection, criminal investigation, search and rescue.
There are very important basic research value and urgent application in field of machine vision based on Face datection
Demand is also evolving update for the relevant art of Face datection.However current method for detecting human face be all mostly
It being carried out on the image of posing for photograph of laboratory ideally, these images of posing for photograph have a characteristic that first, and face is larger, and
Positioned at the center of image;Second, background is more clean simple.However the image in reality scene, face are usually extremely small
And background is complex, while may be subjected to scale, posture, blocking, expression, dressing, the influence of the factors such as illumination.
According in relation to studies have shown that these influence factors bring great challenge to the face identification method under real scene really,
Tremendous influence is caused to the accuracy rate of the Face datection under real scene.In view of the above-mentioned problems, some be directed to real scene
In face identification method proposed that but the face identification method under these real scenes at this stage can only solve list in succession
The problem of one factor influences, such as the influence of scale bring can be mitigated based on the face recognition technology for generating confrontation network, but
It is the influence that not can solve the other influences factor such as posture, illumination;Face identification method based on face alignment method can mitigate
Posture bring influences, but not can solve the influence of the other factors such as scale, fuzzy.
Summary of the invention
The purpose of the present invention is to solve the face identification methods under existing real scene can only solve single factors shadow
It rings, the influence without can solve the other influences factor such as posture, illumination;And the face identification method based on face alignment method can be with
The shortcomings that mitigating the influence of posture bring, but not can solve the influence of other factors such as scale, fuzzy, and propose a kind of true
The face identification method based on deep learning under scene, comprising:
Step 1 establishes tranining database;
Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained
First high-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image with
And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution;
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator
Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution
Facial image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, first high
The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated
Belong to the Probability p of facial image for input picture out1, second output is the Probability p that input picture is true picture2;Wherein reflect
Other device further comprises being sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted
Input of the output result of sample network as optimization network, optimization network are the output end of generator;
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2,
The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network;
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited
Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and
The region that p is greater than preset threshold is drawn in input picture.
The invention has the benefit that
1, can mitigate simultaneously scale, block, expression, dressing, the factors bring such as illumination influence, realize real scene
In Face datection so that test object is no longer limited to the single Face datection under simple background, in particular so that face is examined
Survey technology is not only applicable to the image of posing for photograph under laboratory scene;
2, overcoming human face detection tech at this stage is really being discrimination is low in scene difficulty, is promoted based on depth
Application of the method for detecting human face of study under real scene;
3, present invention introduces an optimization network, the high-definition picture for generating that generator generates in confrontation network is solved
Fuzzy problem, while optimizing the image that network can overcome generator to generate and being also stranded by what above-mentioned many influence factors were influenced
It is difficult;
4, " face identification method based on deep learning under real scene " proposed by the present invention compares benchmark
(Baseline) discrimination of human-face detector has a very big promotion, reached current highest testing result 94.4%/
93.3%87.3%.
Detailed description of the invention
Fig. 1 is that wherein every vertical curve is most for the experimental data figure that influences on Face datection accuracy rate of each influence factor
High point indicates the face recognition accuracy rate that one embodiment of the present of invention can be realized, and minimum point is indicated can using conventional method
The accuracy rate reached;When intermediate level curve indicates the image that processing is influenced by scale, one embodiment of the present of invention and existing
There is the mean value that technology can achieve the effect that;
Fig. 2 is the schematic diagram for generating confrontation network in the prior art;
Fig. 3 is the signal of the face identification method based on deep learning under the real scene of one embodiment of the invention
Figure;Wherein 1 ThestBranch refers to the 1st branch of human-face detector;The KthBranch refers to human-face detector
K-th of branch;Input refers to input;Conv, Conv1 ... Conv5 are the different convolutional layer of serial number;Resideual
Blocks is one of the ResNet network that one embodiment of the invention uses convolutional layer;De-Conv is warp lamination;
Sigmoid is excitation function;LR is low-resolution image;SR is the high resolution graphics that generator is generated by low-resolution image
Picture;HR is high-resolution true picture;Face is facial image;Non-Face is inhuman face image;
Fig. 4 is the flow chart of one embodiment of the invention.
Specific embodiment
Specific embodiment 1: the face identification method based on deep learning under the real scene of present embodiment, such as
Shown in Fig. 4, comprising:
Step 1 establishes tranining database.Such as can using WIDER FACE database as tranining database, or
Picture construction tranining database of the size of facial image between 10 to 30 pixels, sets in this way in WIDER FACE database
The benefit set is the more difficult small Face datection problem that can solve face between 10-30 pixel.Present embodiment is also supported
User oneself constructs database by acquiring the image of real scene.
Step 2 using the face location of each image in human-face detector prediction tranining database, and is intercepted and is obtained
First high-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image with
And first the inhuman face image of high-resolution obtain low-resolution face image and the inhuman face image of low resolution.Wherein face is examined
Surveying implement body can be used the depth residual error network of ResNet-50 structure.
Step 2 may further are as follows: uses each image in existing human-face detector prediction tranining database
Face location, obtain predetermined quantity for indicating the indicia framing of face location, the size and location according to indicia framing are being schemed
It is intercepted to obtain the first high-resolution human face image and the inhuman face image of the first high-resolution as in;Reuse bilinear interpolation
The down-sampled processing of 4 times of method progress the first high-resolution human face image and the inhuman face image of the first high-resolution obtain low point
Resolution facial image and the inhuman face image of low resolution." 4 times down-sampled " can be understood as the picture in a region 32*32
Element is indicated using 4*4 pixel.
Because the image in step 1 in tranining database is generally high-definition picture, and needs benefit in the next steps
High-resolution image is generated by low-resolution image with generator, therefore needs to obtain direct from database in step 2
The high definition facial image and non-face image procossing arrived is low-resolution image.
The face location that human-face detector detects is assumed in a rectangle frame, then the rectangle frame upper left corner can be used
Transverse and longitudinal coordinate and 4 tuples of lower right corner transverse and longitudinal coordinate composition indicate, the tool that can be indicated simultaneously where face is arranged in this way
Body position and box size.Those skilled in the art it is also envisioned that the position of face can also be expressed using other modes, as long as
The size and specific location of box can be given expression to, the present invention is with no restriction.
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator
Input be step 2 obtained low-resolution face image and the inhuman face image of low resolution, export as the second high-resolution
Facial image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, first high
The inhuman face image of resolution ratio, the second high-resolution human face image, the inhuman face image of the second high-resolution, first of discriminator are defeated
Belong to the Probability p of facial image for input picture out1, second output is the Probability p that input picture is true picture2;Wherein give birth to
Growing up to be a useful person further comprises sequentially connected up-sampling network and optimization network, and up-sampling network is the input terminal of generator, is above adopted
Input of the output result of sample network as optimization network, optimization network are the output end of generator.
Wherein up-sample the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length
For 1 convolutional layer;8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution nucleus number
Mesh is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution kernel number is 256, convolution kernel size is
3, the warp lamination that convolution step-length is 2;1 convolution kernel number is 256, the warp that convolution kernel size is 3, convolution step-length is 3
Lamination;And 1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1.
Optimize the network structure of network are as follows: 1 convolution kernel number is 64, convolution kernel size is 3, convolution step-length is 1
Convolutional layer;8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution sum number is
64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;1 convolution kernel number is 256, convolution kernel size is 3, volume
The warp lamination that product step-length is 2;The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3;
And 1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.
The network structure of discriminator are as follows: 1 convolution kernel number is 64, the volume that convolution kernel size is 3, convolution step-length is 2
Lamination;1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;1 convolution kernel number is
256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;1 convolution kernel number is 512, convolution kernel size is 3, volume
The convolutional layer that product step-length is 2;1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;And
2 parallel convolutional layers are true high-definition pictures particularly for differentiation input picture or have generator synthesis
First convolutional layer fc of high-definition pictureGANWith for judge input picture whether be face the second convolutional layer fcclc。
Wherein generate loss function used in confrontation network are as follows:
WhereinIndicate confrontation loss function,Indicate Pixel-level
Loss function,Indicate the loss function of optimization network function,
Presentation class loss function;
Wherein θ, ω are respectively the network parameter of discriminator and generator, Dθ(),Gω() is discriminator and generator respectively
Power function;It is the low-resolution image and corresponding high-definition picture of input respectively;yiFor the mark of input picture
Label, yi=1, yi=0 to respectively represent input picture be face and non-face;α, β are confrontation loss functions, pixel in objective function
The weight distribution coefficient of grade loss function and Classification Loss function, N are training sample sums;ω1For the net for up-sampling network
Network parameter, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the function of optimizing network
Function.
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution obtained in step 2,
The inhuman face image of low-resolution face image, low resolution is trained generation confrontation network.
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and face is waited
Favored area is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and
The region that p is greater than preset threshold is drawn in input picture.Probability p herein is the probability in actual test, and in step 3
p1And p2It is the probability generated in training process, meaning is different.
For example, human-face detector interception facial image simultaneously records face former defeated after inputting an image into human-face detector
Coordinate information where entering in image, the Probability p of output1The position of a facial image is corresponded to, can be used five yuan
Group (x1, y1, x2, y2, p1) Lai Jilu, the top left co-ordinate of rectangle frame where wherein x1 and y1 can be face, x2 and y2 can
To be rectangle frame bottom right angular coordinate, then differentiate five-tuple in whether p1Meet certain threshold condition, if met, basis
Face location is marked by coordinate information x1, y1, x2, y2 in original input image.
Mentality of designing and principle of the invention further described below:
Present invention is generally directed to the deficiency of existing human face detection tech, solves the human face detection tech based on deep learning
The problem that single influence factor influences in many influence factors under real scene can only be mitigated, overcome existing Face datection skill
Art low difficulty of discrimination under really scene provides the recognition of face side based on deep learning under a kind of real scene
Method.Facial image can be overcome simultaneously using " face identification method based on deep learning under real scene " in the present invention
By scale, posture, block, expression, dressing, the factors such as illumination are influenced the problem of, realize the face inspection in real scene
It surveys, so that test object is no longer limited to the single Face datection under simple background, in particular so that human face detection tech is not only
Suitable for the image of posing for photograph under laboratory scene.This method is on the basis of " based on the face recognition technology for generating confrontation network "
On, an optimization network (Refinement Network) is introduced, solves the high score for generating that generator generates in confrontation network
The blurred image problem of resolution, while optimizing network can overcome the image of generator generation also by above-mentioned many influence factor shadows
Loud difficulty.
The present invention with the image (video frame) under real scene be research object, in real scene, facial image in addition to
It by scale, posture, blocks, outside expression, dressing, the factors such as illumination are influenced, there are also following features: first, image capture device
Cause face extremely small farther out apart from face;Second, due to the quick movement of camera or people cause captured image compared with
Be it is fuzzy,.In view of the above-mentioned problems, most representative solution is " based on the recognition of face for generating confrontation network at this stage
Technology ", but be usually relatively fuzzyyer, reason based on the high-definition picture that confrontation network generates is generated are as follows: first, due to
The objective function of generator is usually the least square error for generating the pixel of image and input picture, leads to the image generated excessively
Smoothly;Second, when target face too small (10*10 pixel), details letter would generally be lacked by generating the image that confrontation network generates
Breath;Third, (as quickly movement leads to image fault) in the case where input picture distortion itself, generating confrontation network can not be given birth to
At corresponding clearly high-definition picture.
For the above problem existing for " based on the face recognition technology for generating confrontation network ", the present invention is generating confrontation net
An optimization network is introduced in the generator network of network, the network structure for optimizing network is as shown in table 1.Optimize the main work of network
With are as follows: first, the fuzzy high-definition picture generated for generating confrontation network optimizes network implementations de-fuzzy;Second,
Input picture by block, illumination, dressing, the factors such as expression are influenced when, optimization network can play block, go illumination,
The effects of going dressing, expression neutralized is illuminated by the light, the figure that dressing, the factors such as expression influence some fuzzy especially severes
As after the processing for optimizing network, available corresponding clearly image is based on these clearly unaffected factor shadows
The differentiation of the enterprising pedestrian's face of loud image will become more simple.
In addition, the loss function for optimizing network is introduced into the objective function for generating confrontation network by the present invention, so that raw
Grow up to be a useful person generation high-definition picture include more detailed information, be more clear so that discriminator be more easier to determine it is defeated
The image entered is facial image.In " based on the face recognition technology for generating confrontation network ", objective function is by right
Anti- loss function (adversarial loss), Pixel-level loss function (pixel-wise loss) and Classification Loss function
(classification loss) composition, can indicate are as follows:
First item is confrontation loss function (adversarial loss) in formula, and Section 2 is Pixel-level loss function
(pixel-wise loss), Section 3 are Classification Loss function (classification loss), wherein θ, and ω is respectively to reflect
The network parameter of other device and generator, Dθ(),Gω() is the power function of discriminator and generator respectively,It is defeated respectively
The low-resolution image and corresponding high-definition picture entered, yiFor the label (y of input picturei=1, yi=0 respectively represent it is defeated
Entering image is face and non-face), α, β are that loss function, Pixel-level loss function and Classification Loss are fought in objective function
The weight distribution coefficient of function, N are training sample sums.It is more clear to make to generate the image that network generates, mitigates simultaneously
It blocks, expression, dressing, the influence of the influence factors such as illumination, the present invention is modified the objective function for generating confrontation network, will
The loss function of optimization network is introduced into the objective function of whole network, and new objective function can indicate are as follows:
In formula,It is the loss function for optimizing network, wherein ω1For the network for up-sampling network
Parameter, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the function letter for optimizing network
Number.
To sum up, an optimization network is introduced into " based on the face recognition technology for generating confrontation network " by the present invention, is mentioned
The Face datection frame of a kind of novel " based on confrontation network+optimization network is generated " is gone out.In addition, comprehensively considering existing
The deficiency of " based on the human face detection tech for generating confrontation network " and the actual demand of real scene human face detection, the present invention couple
The existing objective function based on the method for detecting human face for generating confrontation network is modified.In the present invention, confrontation network is generated
Input be under real scene various influence factors effect under low-resolution image, the output of generator is fuzzy have
The high-definition picture that various influence factors influence, the input as optimization network simultaneously of this high-resolution image, optimization
The output of network is the high-definition picture that clearly unaffected factor influences." the base under a kind of real scene of the invention
In the face identification method of deep learning " eliminate facial image in real scene by scale, block, expression, dressing, illumination
Etc. factors influence, improve real scene human face detection accuracy rate, solve the Face datection of existing deep learning
Technology is not suitable for the Face datection problem under real scene, promotes the development of the human face detection tech under real scene, is
Human face detection tech plays certain impetus from laboratory to practical application.
Table 1 optimizes network of network structure chart, wherein " Conv " refers to convolutional layer, " x8 " has referred to 8 identical volumes
Lamination
<embodiment>
The present invention is further explained in the light of specific embodiments, as shown in figure 3, first according to the actual demand of oneself
Prepare training sample (present invention uses existing WIDER FACE database), then using the training sample training one prepared
Human-face detector, the present invention directly adopt an existing MB-FCN human-face detector.Later, trained human-face detector pair
The face location of each image that training sample is concentrated predicted, and according to the face location information interception face of prediction
And inhuman face image (background), obtained face and inhuman face image will be as the training samples for generating confrontation network.Finally, sharp
The face and inhuman face image for using these to intercept train generation confrontation network as training sample, and wherein generator is according to low point
The input picture study of resolution generates corresponding high-definition picture, and optimization network is in the high-definition picture for generating network generation
On the basis of further study obtain not by posture, block, the clearly high score that expression, dressing, the influence factors such as illumination are influenced
Resolution image, discriminator will provide a more accurate Face datection knot according to the clearly high-definition picture ultimately generated
Fruit.Every part is described in detail below:
Prepare training sample first.Training sample image can be collected voluntarily according to actual needs, and then be constructed corresponding
Database can also select existing disclosed Face datection database, such as WIDER FACE, the databases such as FDDB.For convenience
It is compared with other existing methods, the present invention is using the image in WIDER FACE data set as training sample and test specimens
This.WIDER FACE data set is a Face datection benchmark database, and the image in database is from the WIDER announced
It is select on data set, and these images are to capture to obtain under really scene, wherein having many faces is extremely
Small (between 10-30 pixel), at the same these faces by block, expression, dressing, the factors such as illumination are influenced, this true field
Face datection under scape proposes huge challenge to existing method for detecting human face.WIDER FACE data set includes 32203
A image and 393703 facial images, entire data set are constituted based on 61 event types, for each event type,
Randomly select 40% data as training set, 10% data set as verifying collection, 50% data as test set.Together
When, all images are divided into three class according to the size (50/30/10) of facial image by WIDER FACE data set, respectively
It is simple/medium difficulty/more difficult (Easy/Medium/Hard).The present invention mainly overcomes those influence factors under real scene
Face datection caused by difficulty, and then improve real scene under Face datection accuracy rate.
Training human-face detector.Using one human-face detector of above-mentioned ready training sample training, effect be for
Subsequent generation confrontation network interception generates training sample, and the quality of human-face detector will directly influence generation confrontation network instruction
Practice the quality of sample.Here human-face detector can be existing any human-face detector, and the present invention is by this Face datection
Device does the accuracy rate of Face datection further as a reference line (Baseline)
It is promoted.Since this human-face detector is not emphasis of the invention, so the present invention uses a ready-made human-face detector
(MB-FCN), infrastructure network ResNet-50.In order to realize the detection of multiple dimensioned face, MB-FCN detector it is defeated
There are multiple branches out, the recognition of face in some range scale is solved the problems, such as in each Bifurcation Set.In addition, micro- in order to realize
The detection of small face, MB-FCN detector use Feature Fusion, i.e., by the shallow-layer feature of convolutional layer low layer (containing a large amount of
Detailed information) and high-rise further feature (containing a large amount of semantic information) merged.In the present invention, MB-FCN face is utilized
Detector makes a living into the detailed process that confrontation network interception generates training sample are as follows: for training sample, utilizes MB-FCN face
The each image that detector concentrates WIDER FACE training sample carries out face location information prediction, and from each image
Middle interception generation 600 may include region and the preservation of face, and the image that these are saved is as the instruction for generating confrontation network
Practice sample;For test sample, the present invention handles each image in test set also with MB-FCN human-face detector,
And from each image interception generate 600 may comprising face region and preservation, these save images finally by
It generates the generator in confrontation network and obtains corresponding high-resolution (4 times of up-samplings) image, then obtained by optimizing network
Relatively sharp image show that these high-resolution images are facial image, Jin Ershi on earth finally by discriminator
The function of the Face datection under real scene is showed.
Building generates confrontation training sample.It include the image of face, the present invention for the possibility that above-mentioned interception generates
The image of each generation and the face location true value (ground-truth bounding boxes) artificially marked are sought first
Overlapping area (intersection of union, IOU), if IOU be greater than 0.5 if be labeled as positive sample (face), if
IOU is less than 0.45 labeled as negative sample (non-face, i.e. background).According to the above method, the present invention is obtained 1,075,968
Positive sample and 1,626,328 negative samples.It is adopted on 4 times since the image that the generation confrontation network in the present invention generates realizes
Sample, so needing corresponding low resolution and high-definition picture as training sample when training.Herein, the present invention is by MB-FCN
The image that detector interception generates is as high-definition picture, by these high-definition pictures bilinear interpolation (bi-
Linear interpolation method) image that obtains again of down-sampling 4 is as corresponding low-resolution image.
Generator.The effect of generator is to generate corresponding high-resolution according to the training study of the image of the low resolution of input
Rate image, so that discriminator, which passes through the high-definition picture generated, more easily determines the low-resolution image inputted
Classification (face/non-face).Generator is a deep learning network in the present invention, wherein including two deconvolution networks, often
2 times of up-samplings of a deconvolution network implementations, the resolution ratio of the output image of generator network entire in this way will be input picture
4 times.
Optimize network.The main function of optimization network has: first, the more fuzzy image that generator generates passes through optimization
The processing of network becomes relatively sharp, plays the role of deblurring;Second, optimization network can reduce block, expression, dressing,
The influence of the factors such as illumination, reach block, expression is neutralized, is gone dressing, is gone the effects of illumination.As shown in table 1, optimize network
Also a deep learning network, whole network include 8 convolution modules and 4 convolutional layers with residual error form, all volumes
The convolution step-length of lamination is all 1, so the resolution ratio of optimization network output image and the resolution ratio of generator output image are phases
With, the output image for only optimizing network is relatively sharp compared with the output image of generator.
Discriminator.The master network structure of discriminator uses VGG19 network.It is excessive in order to avoid being brought by convolutional calculation
Down-sampling operation, the present invention eliminate the pond layer (max-pooling layer) in " conv5 ".In addition, identifying to realize
Device can differentiate that input picture is true high-definition picture or the image (true/false) for having generator to synthesize and sentences simultaneously
Disconnected input picture is face (face/non-face), and present invention removes the full articulamentums of whole in VGG19 network, i.e.,
Fc6, fc7, fc8, and replace with two parallel volume bases, respectively fcGAN, fcclc.Wherein fcGANEffect be differentiate input
Image is true high-definition picture or the high-definition picture (true/false) for having generator synthesis, fcclcEffect be sentence
Disconnected input picture is face (face/non-face).
Training generates confrontation network.Using the generator built, optimizes network, discriminator network structure and marked
Positive negative training sample can train generation confrontation network.The present invention is by allowing generator network+optimization network and mirror
The mutual game of other device network, alternative optimization mode carry out learning training and entirely generate confrontation network.Generator network+optimization net
As input, output result needs to imitate as far as possible true in high-resolution sample set stochastical sampling network from low resolution sample
Real sample.The input of discriminator network is then true high-resolution sample or the height of generator+optimization network output synthesis
Image in different resolution, the purpose is to the output of generator+optimization network is distinguished while being differentiated as far as possible from authentic specimen
High-definition picture is face, and generator+optimization network will then cheat differentiation network as much as possible.Two groups of networks are mutual
Confrontation, continuous adjusting parameter, final purpose are to make to differentiate network can not judge whether generation+optimization network output result is true
It is real, and then reach generator+optimization network and can produce clearly high-definition picture, while discriminator can be differentiated accurately
Input picture is the purpose of face.In the present invention, the network parameter of generator network and optimization network is from original initialization
Start to train, standard deviation is used to initialize convolution nuclear parameter (weight) for 0.02 Gaussian Profile, deviation (bias) is initial
Turn to 0.In order to avoid generator+optimization network is stuck in local best points, the present invention applies input picture and output figure first
As the least square error of pixel removes one generator network+optimization network of training as objective function, then using trained
Model deinitialization generator+optimization network network parameter.The network parameter of discriminator utilizes on ImageNet data set
The good model of pre-training is initialized, for newly-increased full articulamentum fcGANAnd fcclc, the Gauss point for being 0.1 with standard deviation
Cloth deinitialization, while deviation (bias) is initialized as 0.In addition, the loss function for optimizing network is introduced into target by the present invention
In function so that generator+optimization network output image is more clear, and then reach discriminator be more easier to differentiate it is true/false
With face/non-face purpose.When training whole network, each mini-batch includes 64 images, and positive and negative sample proportion is 1:1,
Total the number of iterations 6 takes turns (epoch), and the learning rate of preceding 3 wheel iteration is 0.0001, and the learning rate of rear 3 wheel iteration is 0.00001.
Trained Face datection network through the above steps, not by existing human face detection tech be easy by scale, block,
The face inspection under real scene may be implemented in the influence of the factors such as expression, dressing, illumination and the limitation for causing accuracy rate low
It surveys." a kind of real scene under the face identification method based on deep learning " Face detection precision is quasi- experiments have shown that of the invention
Really, while detection efficiency is high, and table 2 is experimental result correlation data, and wherein mAP is Average Accuracy (mean Average
It Precision), is the index assessed the superiority and inferiority of training network.As can be seen that the present invention mentions from correlation data
" face identification method based on deep learning under real scene " out is than the identification of benchmark (Baseline) human-face detector
Rate has a very big promotion, has reached current highest 94.4%/93.3%87.3% of testing result.Meanwhile the present invention is logical
Crossing comparative test (without refinement network vs.ours) proves, in the case where no optimization network, people
The discrimination of face detection has decline, respectively 0.4%/0.4%/1.0% on Easy/Medium/Hard, thus may card
Bright optimization network plays an important role in the Face datection under real scene.
2 experimental result correlation data of table
The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field
Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to
The protection scope of the appended claims of the present invention.
Claims (6)
1. the face identification method based on deep learning under a kind of real scene, comprising:
Step 1 establishes tranining database;
Step 2 using the face location of each image in human-face detector prediction tranining database, and intercepts and obtains first
High-resolution human face image and the inhuman face image of the first high-resolution;And handle the first high-resolution human face image and
The inhuman face image of one high-resolution obtains corresponding low-resolution face image and the inhuman face image of low resolution;
Step 3, building generate confrontation network, and the generation confrontation network includes generator and discriminator;Wherein generator is defeated
Enter the low-resolution face image and the inhuman face image of low resolution obtained for step 2, exports as the second high-resolution human face
Image and the inhuman face image of the second high-resolution;The input of discriminator is the first high-resolution human face image, the first high-resolution
First output of the inhuman face image of rate, the second high-resolution human face image, the inhuman face image of the second high-resolution, discriminator is
Input picture belongs to the Probability p of facial image1, second output is the Probability p that input picture is true picture2;Wherein generator
It further comprise sequentially connected up-sampling network and optimization network, up-sampling network is the input terminal of generator, up-samples net
Input of the output result of network as optimization network, optimization network are the output end of generator;
The network structure of discriminator are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 128, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 256, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 2;
1 convolution kernel number is 512, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;And
2 parallel convolutional layers are true high-definition picture particularly for differentiation input picture or are closed by generator
At high-definition picture the first convolutional layer fcGANWith for judge input picture whether be face the second convolutional layer fcclc;
Step 4: using the first high-resolution human face image, the inhuman face image of the first high-resolution, low point obtained in step 2
The inhuman face image of resolution facial image, low resolution is trained generation confrontation network;
Step 5: image to be tested is input to the human-face detector, face candidate region is obtained, and by face candidate area
Domain is input in trained generation confrontation network, and the image for obtaining each candidate region is the Probability p of face, and is inputting
The region that p is greater than preset threshold is drawn in image;
It is characterized in that, generating the loss function of confrontation network are as follows:
WhereinIndicate confrontation loss function,Indicate Pixel-level loss
Function,Indicate the loss function of optimization network function,
Presentation class loss function;
Wherein θ, ω are respectively the network parameter of discriminator and generator, Dθ(),Gω() is the function of discriminator and generator respectively
It can function;It is the low-resolution image and corresponding high-definition picture of input respectively;yiFor the label of input picture,
yi=1, yi=0 to respectively represent input picture be face and non-face;α, β are that loss function, Pixel-level damage are fought in objective function
The weight distribution coefficient of function and Classification Loss function is lost, N is training sample sum;ω1For the network ginseng for up-sampling network
Number, ω2For optimize network network parameter,It is the power function for up-sampling network,It is the power function for optimizing network.
2. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step
Use WIDER FACE database as tranining database in rapid one.
3. the face identification method based on deep learning under real scene according to claim 1, which is characterized in that step
In rapid one, picture construction training number of the size of facial image in WIDER FACE database between 10 to 30 pixels is used
According to library.
4. the face identification method as claimed in any of claims 1 to 3 based on deep learning, which is characterized in that
Step 2 specifically:
Using the face location of each image in human-face detector prediction tranining database, obtain predetermined quantity for indicating
The indicia framing of face location is intercepted to obtain the first high-resolution human face figure according to the size and location of indicia framing in the picture
Picture and the inhuman face image of the first high-resolution;
4 times of down-sampled processing the first high-resolution human face images and the first high-resolution are carried out using bilinear interpolation
Inhuman face image obtains corresponding low-resolution face image and the inhuman face image of low resolution.
5. the face identification method according to claim 4 based on deep learning, which is characterized in that described in step 2
Human-face detector is the depth residual error network of ResNet-50 structure.
6. the face identification method according to claim 1 based on deep learning, which is characterized in that in step 3, above adopt
The network structure of sample network are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
8 convolution kernel numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 2;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3;And
1 convolution number is 3, the convolutional layer that convolution kernel size is 1, convolution step-length is 1;
Optimize the network structure of network are as follows:
1 convolution kernel number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
8 convolution sum numbers are 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
1 convolution sum number is 64, the convolutional layer that convolution kernel size is 3, convolution step-length is 1;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 2;
The warp lamination that 1 convolution kernel number is 256, convolution kernel size is 3, convolution step-length is 3;And
1 convolution number is 3, the convolutional layer that convolution kernel size is 3, convolution step-length is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810119263.2A CN108334847B (en) | 2018-02-06 | 2018-02-06 | A kind of face identification method based on deep learning under real scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810119263.2A CN108334847B (en) | 2018-02-06 | 2018-02-06 | A kind of face identification method based on deep learning under real scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108334847A CN108334847A (en) | 2018-07-27 |
CN108334847B true CN108334847B (en) | 2019-10-22 |
Family
ID=62928509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810119263.2A Active CN108334847B (en) | 2018-02-06 | 2018-02-06 | A kind of face identification method based on deep learning under real scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334847B (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11263525B2 (en) | 2017-10-26 | 2022-03-01 | Nvidia Corporation | Progressive modification of neural networks |
US11250329B2 (en) | 2017-10-26 | 2022-02-15 | Nvidia Corporation | Progressive modification of generative adversarial neural networks |
CN109359559B (en) * | 2018-09-27 | 2021-11-12 | 天津师范大学 | Pedestrian re-identification method based on dynamic shielding sample |
CN111222505A (en) * | 2018-11-25 | 2020-06-02 | 杭州凝眸智能科技有限公司 | Method and system for accurately detecting micro target |
CN109685863A (en) * | 2018-12-11 | 2019-04-26 | 帝工(杭州)科技产业有限公司 | A method of rebuilding medicine breast image |
CN109784349B (en) * | 2018-12-25 | 2021-02-19 | 东软集团股份有限公司 | Image target detection model establishing method, device, storage medium and program product |
CN109753946A (en) * | 2019-01-23 | 2019-05-14 | 哈尔滨工业大学 | A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point |
CN109815893B (en) * | 2019-01-23 | 2021-03-26 | 中山大学 | Color face image illumination domain normalization method based on cyclic generation countermeasure network |
CN110197205B (en) * | 2019-05-09 | 2022-04-22 | 三峡大学 | Image identification method of multi-feature-source residual error network |
CN111242837B (en) * | 2020-01-03 | 2023-05-12 | 杭州电子科技大学 | Face anonymity privacy protection method based on generation countermeasure network |
CN111414888A (en) * | 2020-03-31 | 2020-07-14 | 杭州博雅鸿图视频技术有限公司 | Low-resolution face recognition method, system, device and storage medium |
CN111951373B (en) * | 2020-06-30 | 2024-02-13 | 重庆灵翎互娱科技有限公司 | Face image processing method and equipment |
CN112069993B (en) * | 2020-09-04 | 2024-02-13 | 西安西图之光智能科技有限公司 | Dense face detection method and system based on five-sense organ mask constraint and storage medium |
CN112288044B (en) * | 2020-12-24 | 2021-07-27 | 成都索贝数码科技股份有限公司 | News picture attribute identification method of multi-scale residual error network based on tree structure |
CN113221626B (en) * | 2021-03-04 | 2023-10-20 | 北京联合大学 | Human body posture estimation method based on Non-local high-resolution network |
CN113705341A (en) * | 2021-07-16 | 2021-11-26 | 国家石油天然气管网集团有限公司 | Small-scale face detection method based on generation countermeasure network |
CN113553961B (en) * | 2021-07-27 | 2023-09-05 | 北京京东尚科信息技术有限公司 | Training method and device of face recognition model, electronic equipment and storage medium |
CN113470027B (en) * | 2021-09-03 | 2022-03-25 | 广东电网有限责任公司惠州供电局 | Insulating sheath identification method, device, system and medium based on generation countermeasure |
CN113688799B (en) * | 2021-09-30 | 2022-10-04 | 合肥工业大学 | Facial expression recognition method for generating confrontation network based on improved deep convolution |
CN114359113A (en) * | 2022-03-15 | 2022-04-15 | 天津市电子计算机研究所有限公司 | Detection method and application system of face image reconstruction and restoration method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
CN107194380A (en) * | 2017-07-03 | 2017-09-22 | 上海荷福人工智能科技(集团)有限公司 | The depth convolutional network and learning method of a kind of complex scene human face identification |
CN107423701A (en) * | 2017-07-17 | 2017-12-01 | 北京智慧眼科技股份有限公司 | The non-supervisory feature learning method and device of face based on production confrontation network |
CN107437077A (en) * | 2017-08-04 | 2017-12-05 | 深圳市唯特视科技有限公司 | A kind of method that rotation face based on generation confrontation network represents study |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292813B (en) * | 2017-05-17 | 2019-10-22 | 浙江大学 | A kind of multi-pose Face generation method based on generation confrontation network |
CN107341517B (en) * | 2017-07-07 | 2020-08-11 | 哈尔滨工业大学 | Multi-scale small object detection method based on deep learning inter-level feature fusion |
CN107577985B (en) * | 2017-07-18 | 2019-10-15 | 南京邮电大学 | The implementation method of the face head portrait cartooning of confrontation network is generated based on circulation |
-
2018
- 2018-02-06 CN CN201810119263.2A patent/CN108334847B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154023A (en) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution |
CN107194380A (en) * | 2017-07-03 | 2017-09-22 | 上海荷福人工智能科技(集团)有限公司 | The depth convolutional network and learning method of a kind of complex scene human face identification |
CN107423701A (en) * | 2017-07-17 | 2017-12-01 | 北京智慧眼科技股份有限公司 | The non-supervisory feature learning method and device of face based on production confrontation network |
CN107437077A (en) * | 2017-08-04 | 2017-12-05 | 深圳市唯特视科技有限公司 | A kind of method that rotation face based on generation confrontation network represents study |
Non-Patent Citations (2)
Title |
---|
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network;Christian Ledig 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;全文 * |
基于生成式对抗网络的人脸识别开发;张卫 等;《电子世界》;20171023;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108334847A (en) | 2018-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108334847B (en) | A kind of face identification method based on deep learning under real scene | |
CN108334848A (en) | A kind of small face identification method based on generation confrontation network | |
CN107154023B (en) | Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution | |
CN105069746B (en) | Video real-time face replacement method and its system based on local affine invariant and color transfer technology | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN110175613A (en) | Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models | |
CN111898406B (en) | Face detection method based on focus loss and multitask cascade | |
CN109711288A (en) | Remote sensing ship detecting method based on feature pyramid and distance restraint FCN | |
CN110059694A (en) | The intelligent identification Method of lteral data under power industry complex scene | |
CN109241871A (en) | A kind of public domain stream of people's tracking based on video data | |
CN108564049A (en) | A kind of fast face detection recognition method based on deep learning | |
CN106446930A (en) | Deep convolutional neural network-based robot working scene identification method | |
CN110189255A (en) | Method for detecting human face based on hierarchical detection | |
CN109753946A (en) | A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point | |
CN110232387A (en) | A kind of heterologous image matching method based on KAZE-HOG algorithm | |
Hsu et al. | Deep hierarchical network with line segment learning for quantitative analysis of facial palsy | |
CN109766873A (en) | A kind of pedestrian mixing deformable convolution recognition methods again | |
CN114758362B (en) | Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding | |
CN109272487A (en) | The quantity statistics method of crowd in a kind of public domain based on video | |
CN109584162A (en) | A method of based on the image super-resolution reconstruct for generating network | |
CN110287806A (en) | A kind of traffic sign recognition method based on improvement SSD network | |
CN114926747A (en) | Remote sensing image directional target detection method based on multi-feature aggregation and interaction | |
CN106446890A (en) | Candidate area extraction method based on window scoring and superpixel segmentation | |
CN107767416A (en) | The recognition methods of pedestrian's direction in a kind of low-resolution image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |