CN109614866A - Method for detecting human face based on cascade deep convolutional neural networks - Google Patents
Method for detecting human face based on cascade deep convolutional neural networks Download PDFInfo
- Publication number
- CN109614866A CN109614866A CN201811326169.0A CN201811326169A CN109614866A CN 109614866 A CN109614866 A CN 109614866A CN 201811326169 A CN201811326169 A CN 201811326169A CN 109614866 A CN109614866 A CN 109614866A
- Authority
- CN
- China
- Prior art keywords
- face
- training sample
- loss function
- layer
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of method for detecting human face based on cascade deep convolutional neural networks, comprising: establishes n grades of depth convolutional networks;Convolution algorithm is carried out after filling to training sample;Using characteristic pattern some or all of in characteristic pattern as the input of next layer of convolutional layer, convolution algorithm is carried out;Previous step is repeated until to (n+1)th layer of convolutional layer;It serializes the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, high dimension vector is connect entirely with the node of full articulamentum;Export face frame coordinate and face quality evaluation score;Obtain Face datection loss function, image quality evaluation loss function and total losses function;The loss function for obtaining training sample, is trained the update of sample weights;Train cascade deep convolutional neural networks.The present invention can solve the problems, such as Face datection and quality of human face image evaluation simultaneously, can also improve performance while improving processing speed.
Description
Technical field
The present invention relates to Face datection field, in particular to a kind of Face datection based on cascade deep convolutional neural networks
Method.
Background technique
Face datection (Face Detection) refers in the input image, is scanned for using certain strategy to it,
It determines whether containing face, if there is then returning to the position of each face and the process of size.Face datection is that face is known
Other basis is the first step of recognition of face.The quality of human face image quality that the face inspection stage detects directly influences people
The precision of face identification.In traditional face recognition process, Face datection and quality of human face image evaluation are tasks in two stages
It completes, first detects facial image, quality evaluation then is carried out to the facial image detected.There are two insufficient for this method
Place, first is that need to consume more processing times, and in many view synthesis, the processing time is one very crucial
Index.Second is that the process of Face datection inherently needs to use picture quality as judgment basis, there are correlations between two kinds.
The two tasks are separately handled, correlation between the two can be isolated.
Method for detecting human face can be divided into four major class:
1) Knowledge based engineering method.
This method is by researcher to the knowledge encoding of face at recognition rule.Such as the face in image, usually have
There are two mutually symmetrical with eyes, a nose and a mouth.Correlation between feature can be by between them
Distance and positional relationship describe.The problem of this method is to be difficult the Knowledge conversion of people to be clearly defined rule.If
Rule is excessively detailed, then will lead to low discrimination, whereas if rule is too general, then will lead to high fallout ratio.
2) property invariant method.
The foundation of this method is that people can easily identify face under different postures, angle, illumination condition, therefore
It is considered that there is the property invariant not changed with posture, angle, illumination condition in image.This method first extracts such as eyebrow
The facial characteristics such as hair, eyes, nose, mouth, the relationship then established between statistical model Expressive Features is to confirm depositing for face
?.The problem of this method is when there are illumination, noise in image and phenomena such as block, and the feature extracted may become
It is unreliable.
3) template matching method.
This method pre-defines the face mode of better standard, by calculating input image region and mode standard it
Between degree of correlation detect face.This method is realized simple, but cannot effectively handle scale, state, change in shape
Problem.
4) based on the method for machine learning.
This method detects face with this from the acquistion of training image middle school to face and non-face knowledge.In template
In matching in method, the template of face is predetermined by expert, and in the method, mode be from the acquistion of image middle school to
's.The problem of such methods how selected feature and machine learning algorithm.
Image quality evaluation (Image Quality Assessment, IQA) is one of the basic fundamental in image procossing.
Image quality evaluation evaluates the superiority and inferiority degree or distortion level of image by analyzing and researching to picture characteristics.Image matter
Amount evaluation plays an important role in certain images application such as face authentication (Face Authentication), a people
The people registered in face library, probably due to camera acquired image poor quality (such as: fuzzy, skew, distortion etc.) and can not
Pass through verifying.Image quality evaluation includes subjective assessment and objectively evaluates two methods.Subjective evaluation method, which can be divided into, absolutely to be commented
Valence and relative evaluation two types.It objectively evaluates to be divided into and refers to (Full-Reference) entirely, partially refer to (Reduced-
Reference) and without with reference to (No-Reference) three types.
1989, LeCun invented convolutional neural networks LeNet, mainly uses Handwritten Digit Recognition, obtains good effect
Fruit, however do not draw attention.2006, Geoffrey Hinton proposed depth confidence net (Deep Belief Net), gave
Go out gradient disappearance solution to the problem in deep layer network training, i.e., weight initialized by unsupervised pre-training,
Then the training fine tuning for having supervision is carried out again.2011, ReLU activation primitive was applied to depth network, which can
It is effective to inhibit gradient disappearance problem, 2012, the application of Regularization Technique and Dropout technology so that deep learning algorithm more
Stablize, performance is more preferably.The same year, Hinton seminar utilize 8 layers of AlexNet convolution mind to prove the potentiality of deep learning
The match of ImageNet image recognition is participated in through network, obtains champion.The neural network of injection deep learning vigor becomes people again
The focal spot techniques of work intelligence and message area, deep neural network is at a kind of general learning framework to work well.
Face datection will reach practical level, have higher requirement to accuracy and speed, the two is indispensable.By grinding
Study carefully personnel's being continually striving to for many years, a large amount of method for detecting human face occur, be broadly divided into three kinds:
1) Face datection based on Cascade cascade classifier, such as Paul Viola and Michael Jones are 2001
The Face datection algorithm based on Adaboost that year proposes.
2) it is based on the Face datection of DPM (deformable part models), is a kind of face to be divided into several
The algorithm that component is detected.
3) based on the method for deep neural network, such as DDFD (Deep Dense Face Detector), R-CNN etc..
The advantages of method of Viola-Jones is that speed is fast, but performance is not good enough.The characteristics of DPM method is that speed is slow, property
It can be nor best.Method based on deep neural network can obtain best performance, and speed is in three classes method
Medium level.The problem of the characteristics of deep neural network is that network structure is deep, is brought is training parameter enormous amount, is needed
The consumption a large amount of processing time.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, provide a kind of can solve simultaneously
The problem of Face datection and quality of human face image are evaluated, can also be mentioned while improving processing speed it is high performance based on cascade depth
Spend the method for detecting human face of convolutional neural networks.
The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of based on cascade deep convolutional Neural net
The method for detecting human face of network, includes the following steps:
A n grades of depth convolutional networks) are established;N-th grade of depth convolutional network includes n convolutional layer, a full connection
Layer, a face frame output layer and a face quality evaluation output layer contain multiple square volumes in each convolutional layer
Product core and rectangle convolution kernel, the face frame output layer are equipped with multiple nodes, and the face quality evaluation output layer is equipped with more
A node, the n are integer and n >=1;
B several input pictures) are chosen as training sample, line number and columns are carried out to the edge of current training sample
Filling, obtain filling image, and by it is described filling image respectively in convolutional layer described in first layer each convolution kernel carry out
Convolution algorithm obtains multiple characteristic patterns;
C) using characteristic pattern some or all of in the characteristic pattern as the input of next layer convolutional layer, and by its point
Convolution algorithm is not carried out with each convolution kernel in next layer convolutional layer, obtains multiple corresponding characteristic patterns;
D) repeat the above steps C) until to (n+1)th layer of convolutional layer;
E) characteristic pattern by (n+1)th layer of convolutional layer output serializes to obtain high dimension vector, by the high dimension vector with
The node of the full articulamentum is connected entirely;
F face frame coordinate) is exported by the face frame output layer, people is exported by the face quality evaluation output layer
Face quality evaluation score;
G it) according to the face frame coordinate, calculates and surrounds frame coordinate shift amount, obtain Face datection loss function;
H) with Softmax loss, (Softmax is that exponential function is commonly normalized in neural network, for network is defeated
Become probability distribution out, cross entropy (Cross Entropy) describes the distance between two probability distribution, and cross entropy gets over novel
Bright closer therebetween, Softmax and Cross Entropy combined and just obtains Softmax loss) function indicates to scheme
As quality evaluation loss function;
I) the Face datection loss function and image quality evaluation loss function are weighted after superposition and obtained currently
The loss function of the training sample;
J it) sums to the loss function of each training sample, obtains total losses function;
K) after wherein level deep convolutional network training, each instruction is calculated with trained depth convolutional network
The loss function for practicing sample, the weight of the big training sample of loss function is increased, by the small training sample of loss function
Weight reduced, complete the update to the weight of each training sample;
L) pass through the step A) to the mode of learning of step K), multistage depth convolutional network is stringed together and trains cascade
Depth convolutional neural networks carry out Face datection using the cascade deep convolutional neural networks, remove non-face window.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in n-th grade of depth
It spends in convolutional network, the number of nodes of the full articulamentum is 64x2(n-1), it turns left and counts from the full articulamentum, m-th of convolutional layer
The size of convolution kernel be respectively that (1+2m) × (1+2m), (3+2m) × (1+2m) and (1+2m) × (3+2m), port number are
16×2(n-m), the m is integer and 0 < m < n+2.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step B)
In, enabling the size of the current training sample is Iy×Ix, the size of convolution kernel is ky×kx, in the current training sample
Edge is respectively with the line number of 0 filling and columnsWithThe size of the characteristic pattern of output is Fy
×Fx, there are following relationships:
Fy=Iy+2*Py-ky+ 1=Iy
Fx=Ix+2*Px-kx+ 1=Ix
Wherein, IyIndicate the line number of current training sample, IxIndicate the columns of current training sample, kyFor the row of convolution kernel
Number, kxFor the columns of convolution kernel, kyAnd kxIt is odd number, FyIndicate the line number of characteristic pattern, FxIndicate the columns of characteristic pattern.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, the high dimension vector
Length is 3*16 × 2(n-1)×Fy×Fx, the high dimension vector is 3*16 with the connection number that the node of the full articulamentum is connect entirely
×2(n-1)×Fy×Fx×64×2(n-1)。
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step G)
In, enable the top left co-ordinate in the face frame coordinate beBottom right angular coordinate isThe Face datection damage
Function is lost to be calculated using following formula:
Wherein, LiIt (Face) is the Face datection loss function, Face is face frame coordinate,For training sample upper left
The standard value of angle abscissa,For the standard value of training sample upper left corner ordinate,For training sample lower right corner abscissa
Standard value,For the standard value of training sample lower right corner ordinate.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step H)
In, described image quality evaluation loss function is as follows:
Wherein, LiIt (IQ) is described image quality evaluation loss function, IQ is picture quality, ykFor the expectation of training sample
Value, skFor the output valve of Softmax.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step I)
In, the loss function of the training sample are as follows:
Li=Wi(αLi(Face)+βLi(IQ))
Wherein, LiFor the loss function of current training sample, WiFor the weight of current training sample, α is the Face datection
Loss function weight shared in the total losses function, β are described image quality evaluation loss function in the total losses
Shared weight, 0≤α, β≤1 in function.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, the total losses function
Are as follows:
Wherein, L is the total losses function.
In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step K)
In, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then the weight of each training sample is first
Beginning turns to 1/N, it may be assumed that
Wi=1/N
Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolution net
The updated weight of network;The updated weight of training sample is as follows:
Wherein, Wi newFor the updated weight of the training sample, Z is the intermediate variable for normalizing weight, 1≤i
≤N。
Implement the method for detecting human face of the invention based on cascade deep convolutional neural networks, has the advantages that
Due to the depth by reducing single convolutional network, cascade deep convolution is formed using multiple cascade modes of depth convolutional network
Neural network, in this way can be while improving processing speed, while can also improve performance, and length is introduced in depth convolutional network
The convolution kernel of square structure is for detecting side, line and structure feature abundant, in order to distinguish face and non-face, training sample
Loss function be Face datection loss function and image quality loss function linear superposition, in learning process, Ke Yigen
According to the size of loss function, the weight of adjust automatically training sample, the big training sample of loss function, corresponding weight also adjusted
Greatly, the small training sample of loss function, corresponding weight are also turned down, and weighed value adjusting in this way, network can be pointedly
Study, restrains more quickly, can take into account speed and performance, while detecting face, the quality for giving facial image is commented
Valence, therefore the present invention can solve the problems, such as Face datection and quality of human face image evaluation, while improving processing speed simultaneously
Performance can also be improved.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is that the present invention is based on the processes in method for detecting human face one embodiment of cascade deep convolutional neural networks
Figure;
Fig. 2 is the structural schematic diagram of first order depth convolutional network in the embodiment;
Fig. 3 is the structural schematic diagram of second level depth convolutional network in the embodiment;
Fig. 4 is the schematic diagram of identical convolution in the embodiment;
Fig. 5 is the flow chart of learning process in the embodiment;
Fig. 6 is the Face datection schematic diagram of the embodiment cascade depth convolutional neural networks.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In the method for detecting human face embodiment the present invention is based on cascade deep convolutional neural networks, cascade deep should be based on
The flow chart of the method for detecting human face of convolutional neural networks is as shown in Figure 1.In Fig. 1, cascade deep convolutional neural networks should be based on
Method for detecting human face, which comprises the steps of:
Step S01 establishes n grades of depth convolutional networks.
Specifically, the n grades of depth convolutional network includes n+1 convolutional layer, a full articulamentum, a face frame output
Layer and a face quality evaluation output layer, n are integer and n >=1.In n-th grade of depth convolutional network, the feature of full articulamentum
Points are 64x2(n-1), turn left and count from full articulamentum, the size of the convolution kernel of m-th of convolutional layer is respectively (1+2m) × (1+
2m), (3+2m) × (1+2m) and (1+2m) × (3+2m), port number are 16 × 2(n-m), m is integer and 0 < m < n+2.
When n is 1, the structural schematic diagram of first order depth convolutional network is as shown in Fig. 2, Image indicates input picture, the
Level deep convolutional network is there are two convolutional layer, and in Fig. 2,203 indicate first layer convolutional layers, and 204 indicate second layer convolutional layers, when
When n=2, the structural schematic diagram of second level depth convolutional network as indicated at 3, second level depth convolutional network there are three convolutional layer,
Later every increase level deep convolutional network, can all increase a convolutional layer.Specific rule is: to n-th grade of depth convolutional network,
The convolution number of plies is n+1.
In each convolutional layer, there are also rectangle convolution kernel other than square convolution kernel, containing multiple in each convolutional layer
Square convolution kernel and rectangle convolution kernel, 201 indicate rectangle convolution kernels in Fig. 2, and 202 indicate square convolution kernels.Such as
In Fig. 2, the feature points of full articulamentum are 64 nodes of the 205 full articulamentums of expression in 64, Fig. 2, first layer convolutional layer
The size of convolution kernel be respectively 5 × 5,7 × 5 cores 5 × 7, port number is all 8.The size of the convolution kernel of second layer convolutional layer point
Not Wei 3 × 3,5 × 3 cores 3 × 5, port number is all 16.There are also activation primitive and pond layers after each convolutional layer, for brevity, figure
In do not draw.
It is inspired by Viola-Jones based on the method for detecting human face of Cascade cascade classifier, the present invention is single by reducing
The depth of a depth convolutional network, multiple depth convolutional networks use cascade form, while processing speed can be improved, also
Performance can be improved.
It, can the learning characteristic from data automatically specifically, depth convolutional network is by convolution algorithm.Convolution algorithm sheet
Matter is that a filter by convolution algorithm can be such that signal characteristic enhances, and reduce noise, and different convolution kernels can
Extract the different characteristic in image.Depth convolutional network, which realizes, unites feature extraction and classification processing, by right
A large amount of face and non-face learning sample, it is high that depth convolutional network can efficiently extract out face/non-face resolution
Feature, and then realize Face datection.In depth convolutional network, the structure of convolution kernel is typically all square, for example, 3x3 or
The convolution kernel of 5x5.Successfully another key factor is to use Harr-like feature to Viola-Jones method for detecting human face,
Gray scale difference in Haar-like feature calculation rectangular area, it is special that different Feature prototypes can reflect different sides, line and structure
Sign.Face/high validity feature of non-face resolution can be extracted by Harr-like feature, is inspired by this, the present invention exists
The convolution kernel of rectangle structure is introduced in depth convolutional network for detecting side, line and structure abundant.
Step S02 chooses several input pictures as training sample, carries out line number to the edge of current training sample
With the filling of columns, filling image is obtained, and filling image is rolled up with each convolution kernel in first layer convolutional layer respectively
Product operation, obtains multiple characteristic patterns.
Specifically, when doing convolution algorithm, being generally required in 0 value of the edge of image data in depth convolutional network
It fills (padding), usually there are three types of filling mode: no filling (No padding), half filling (Half Padding) and entirely
It fills (Full padding), corresponding convolution operation are as follows: effective convolution (Valid convolution), identical convolution (Same
) and complete convolution (full convolution) convolution.Due to having rectangle structure in the convolution kernel in the present invention,
In order to enable the characteristic pattern (feature map) after convolution algorithm has unified size, using identical convolution (Same
convolution)。
In this step, the size for enabling current training sample is Iy×Ix, the size of convolution kernel is ky×kx, using partly filling out
Fill mode in the edge of current training sample is respectively with the line number of 0 filling and columnsWith
The size of the characteristic pattern of output is Fy×Fx, there are following relationships:
Wherein, IyIndicate the line number of current training sample, IxIndicate the columns of current training sample, kyFor convolution kernel
Line number, kxFor the columns of convolution kernel, FyIndicate the line number of characteristic pattern, FxIndicate the columns of characteristic pattern.In above formula (1), second
The condition that a equal sign is set up is that the size of convolution kernel is kyAnd kxIt is odd number.As long as by above formula as it can be seen that the size of convolution kernel
For odd number, regardless of its practical difference in size, using identical convolution, as the size for exporting image with the size of input picture is
's.Such as in Fig. 4, the size of input picture is 5 × 5, i.e. Iy=Ix=5.Rectangle convolution kernel size is 5 × 3, i.e. ky=5,
kx=3.Half filling mode is used in the edge of input picture, i.e., in the edge of the input picture line number and column of 0 filling
Number is respectivelyIt can be obtained according to above formula (1)
Image is exported as input picture size.After filling mode determines, the calculating of the convolution value of rectangle convolution kernel and square
Convolution kernel is the same.
Step S03 is distinguished using characteristic pattern some or all of in characteristic pattern as the input of next layer of convolutional layer
Convolution algorithm is carried out with each convolution kernel in next layer of convolutional layer, obtains multiple corresponding characteristic patterns.
Step S04 repeats the above steps S03 until to (n+1)th layer of convolutional layer.
Step S05 serializes the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, by high dimension vector and Quan Lian
The node for connecing layer is connected entirely: for n-th grade of depth convolutional network, sharing n+1 convolutional layer, the input of first layer convolutional layer is
Original image, the input of next layer of convolutional layer are the characteristic patterns of upper one layer of convolutional layer output.The n-th+2 layers are full articulamentum, are had
64x2(n-1)It is to connect between a node and preceding layer convolutional layer entirely.
In this step, serialize the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, the length of the high dimension vector
Degree is 3*16 × 2(n-1)×Fy×Fx, the node of the high dimension vector and full articulamentum is connected entirely (Full connect),
Connection number is 3*16 × 2(n-1)×Fy×Fx×64×2(n-1).After full articulamentum, it to be that face frame is defeated respectively that there are two output layers
Layer Face box and face quality evaluation output layer Image Quality, face frame output layer Facebox are for obtaining face out
Frame has 4 nodes (the namely node of neural network is equivalent to a neuron, receives stimulation, generates output), can be defeated
Face frame coordinate out, 206 in Fig. 2 indicate 4 nodes of face frame output layer Face box.Face quality evaluation output layer
Image Quality has 6 nodes, exports face quality evaluation score, 207 tables in Fig. 2 for obtaining face quality evaluation
It lets others have a look at 6 nodes of face quality evaluation output layer Image Quality.
Step S06 exports face frame coordinate by face frame output layer, exports face by face quality evaluation output layer
Quality evaluation score.
Step S07 calculates according to face frame coordinate and surrounds frame coordinate shift amount, obtain Face datection loss function.
Specifically, face surrounds frame by top left co-ordinate in i-th of training sampleThe lower right corner and
Coordinate determines.Face datection loss function is calculated using following formula:
Wherein, LiIt (Face) is face Detectability loss function, Face is face frame coordinateFor training sample upper left corner cross
The standard value of coordinate,For the standard value of training sample upper left corner ordinate,For the standard of training sample lower right corner abscissa
Value,For the standard value of training sample lower right corner ordinate.It is noted that in above-mentioned face frame output layer Face box
The output of node and the difference of desired output are exactly error, and Face datection loss function is certain direct ratio function of the error.
Step S08 indicates image quality evaluation loss function with Softmax loss function.
Specifically, quality of human face image is noted as 6 grades: 80-100 points are A, and 60-80 divides in training sample
It is B grades, 40-60 points are C grades, and 20-40 points are D grades, and 0-20 points are E grades, non-face to be divided into F grades.Letter is lost in image quality evaluation
Number is as follows:
Wherein, LiIt (IQ) is image quality evaluation loss function, IQ is picture quality, ykFor the desired value of training sample, sk
For the output valve of Softmax.It is noted that node in above-mentioned face quality evaluation output layer Image Quality
The difference of output and desired output is exactly error, and image quality evaluation loss function is certain direct ratio function of the error.This reality
It applies in example, quality of human face image is noted as 6 grades: 80-100 points are A, and 60-80 points are B grades, and 40-60 points are C grades, 20-
40 points are D grades, and 0-20 points are E grades, non-face to be divided into F grades.
In image quality evaluation, distorted image and reference picture are carried out pixel comparison by full reference image quality appraisement,
Obtain the evaluation to distorted image.The image quality evaluation of half reference is to extract image respectively from original image and distorted image
Validity feature, then feature is compared, obtains the evaluation to distorted image.The image quality evaluation of no reference does not have then
The information of any reference picture, no reference method are typically all boundary intensity, noise based on image statistics such as image
Rate, fuzziness, entropy etc. obtain the evaluation to distorted image.Image quality evaluation ultimately depends on the feeling of observer, objective
The target of evaluation method is also to pursue to objectively evaluate result and the subjective assessment of people is consistent as far as possible.Based on this criterion,
In deep neural network frame, the present invention is by the way of machine learning, by the training sample that has largely marked and reasonable
Loss function trains the depth convolutional network suitable for image quality evaluation.
Step S09 is weighted after superposition Face datection loss function and image quality evaluation loss function and obtains currently
The loss function of training sample.
Due to being to carry out Face datection and image quality evaluation task simultaneously, so the loss function of current training sample is
The superposition of the two, comprehensive (2) and (3) formula obtain the loss function of training sample are as follows:
Li=Wi(αLi(Face)+βLi(IQ)) (4)
Wherein, LiFor the loss function of current training sample, WiFor the weight of current training sample, α is face Detectability loss
Function weight shared in total losses function, β are image quality evaluation loss function weight shared in total losses function,
0≤α,β≤1.The value of α, β can be determined by experiment, and can also be used as learning parameter, be determined by network training.
Step S10 sums to the loss function of each training sample, obtains total losses function.
Total losses function are as follows:
Wherein, L is total losses function.
Step S11 ought wherein level deep convolutional network training after, calculated with trained depth convolutional network
The loss function of each training sample increases the weight of the big training sample of loss function, by the small instruction of loss function
The weight for practicing sample is reduced, and the update to the weight of each training sample is completed.
Specifically, the basic thought of AdaBoost learning algorithm is when classifier correctly classifies to certain samples, then
Reduce the weight of these samples;When mistake classification, then increases the weight of these samples, allow learning algorithm in subsequent study
Concentration learns more difficult training sample.It is inspired by this, in the present invention, also there is similar but different strategy.Fig. 5 is
The flow chart of learning process, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then each training sample
Weight be initialized to 1/N, it may be assumed that
Wi=1/N (6)
Wherein, 1≤i≤N.
Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolution net
The updated weight of network;The condition that meets in Fig. 5 refers to that total losses function shown in (5) formula is less than predetermined value or the number of iterations reaches
To limit value.After a depth convolutional network training, the loss of each training sample is gone out with trained network query function
Function Li, then the weight of update training sample is as follows:
Wherein, Wi newFor the updated weight of training sample, Z is the intermediate variable for normalizing weight, 1≤i≤N.
The weight of training sample is after updating, loss function LiThe weight of big training sample is increased, and letter is lost
Number LiThe weight of small training sample is then reduced, and subsequent depth convolutional network can be preferably to these difficulty training samples
This is learnt.It is to be noted that weight renewing method and Viola-Jones here is in AdaBoost Face datection algorithm
In method it is different, difference has two o'clock: first is that update mode is different, the mode of Viola-Jones is often to train one
After basic classification device, weight is updated, mode of the invention be after often training a depth convolutional network to weight into
Row updates;Second is that right value update formula is different.
Step S12 passes through the mode of learning of step S01 to step S11, and multistage depth convolutional network is stringed together and is trained
Cascade deep convolutional neural networks carry out Face datection using cascade deep convolutional neural networks, remove non-face window.
Specifically, a series of cascade deep convolution minds can be trained by the mode of learning of step S01 to step S11
Through network, Fig. 6 is the Face datection schematic diagram of cascade deep convolutional neural networks.It is simple and quick to come cascade network in front,
The purpose is to use less calculation amount, a large amount of non-face window is removed, the classifier of more back is more complicated, and accuracy of identification is got over
Height, but need more calculation amounts.Cascade series depends on requirement of the system to correct recognition rata and recognition speed.Pass through this
Kind cascade mode, can rapidly exclude non-face window, to save out the time for those more promising faces
Quality of human face image evaluation is detected and is provided in region, reaches higher accuracy of identification and speed.
In short, the method for detecting human face of the invention based on cascade deep convolutional neural networks can carry out face inspection simultaneously
It surveys and image quality evaluation task, is able to achieve taking into account for speed and performance.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (9)
1. a kind of method for detecting human face based on cascade deep convolutional neural networks, which comprises the steps of:
A n grades of depth convolutional networks) are established;The n grades of depth convolutional network includes n convolutional layer, a full articulamentum, one
Face frame output layer and a face quality evaluation output layer contain multiple square convolution kernels and length in each convolutional layer
Rectangular convolution kernel, the face frame output layer are equipped with multiple nodes, and the face quality evaluation output layer is equipped with multiple nodes, institute
Stating n is integer and n >=1;
B several input pictures) are chosen as training sample, to the edge progress line number of current training sample and filling out for columns
It fills, obtains filling image, and the filling image is subjected to convolution with each convolution kernel in convolutional layer described in first layer respectively
Operation obtains multiple characteristic patterns;
C) using characteristic pattern some or all of in the characteristic pattern as the input of next layer convolutional layer, and by its respectively with
Each convolution kernel in next layer convolutional layer carries out convolution algorithm, obtains multiple corresponding characteristic patterns;
D) repeat the above steps C) until to (n+1)th layer of convolutional layer;
E) characteristic pattern by (n+1)th layer of convolutional layer output serializes to obtain high dimension vector, by the high dimension vector with it is described
The node of full articulamentum is connected entirely;
F face frame coordinate) is exported by the face frame output layer, face matter is exported by the face quality evaluation output layer
Amount evaluation score;
G it) according to the face frame coordinate, calculates and surrounds frame coordinate shift amount, obtain Face datection loss function;
H image quality evaluation loss function) is indicated with Softmax loss function;
I) to the Face datection loss function and image quality evaluation loss function be weighted superposition after obtain it is presently described
The loss function of training sample;
J it) sums to the loss function of each training sample, obtains total losses function;
K) after wherein level deep convolutional network training, each trained sample is calculated with trained depth convolutional network
This loss function, the weight of the big training sample of loss function is increased, by the power of the small training sample of loss function
Value is reduced, and the update to the weight of each training sample is completed;
L) pass through the step A) to the mode of learning of step K), multistage depth convolutional network is stringed together and trains cascade deep
Convolutional neural networks carry out Face datection using the cascade deep convolutional neural networks, remove non-face window.
2. the method for detecting human face according to claim 1 based on cascade deep convolutional neural networks, which is characterized in that
In n-th grade of depth convolutional network, the number of nodes of the full articulamentum is 64x2(n-1), it turns left and counts from the full articulamentum,
The size of the convolution kernel of m-th of convolutional layer respectively (1+2m) × (1+2m), (3+2m) × (1+2m) and (1+2m) × (3+2m),
Port number is 16 × 2(n-m), the m is integer and 0 < m < n+2.
3. the method for detecting human face according to claim 2 based on cascade deep convolutional neural networks, which is characterized in that
The step B) in, enabling the size of the current training sample is Iy×Ix, the size of convolution kernel is ky×kx, described current
The edge of training sample is respectively with the line number of 0 filling and columnsWithThe characteristic pattern of output
Size be Fy×Fx, there are following relationships:
Fy=Iy+2*Py-ky+ 1=Iy
Fx=Ix+2*Px-kx+ 1=Ix
Wherein, IyIndicate the line number of current training sample, IxIndicate the columns of current training sample, kyFor the line number of convolution kernel, kx
For the columns of convolution kernel, kyAnd kxIt is odd number, FyIndicate the line number of characteristic pattern, FxIndicate the columns of characteristic pattern.
4. the method for detecting human face according to claim 3 based on cascade deep convolutional neural networks, which is characterized in that institute
The length for stating high dimension vector is 3*16 × 2(n-1)×Fy×Fx, the high dimension vector connect entirely with the node of the full articulamentum
Connection number is 3*16 × 2(n-1)×Fy×Fx×64×2(n-1)。
5. the method for detecting human face according to any one of claims 1 to 4 based on cascade deep convolutional neural networks,
It is characterized in that, in the step G) in, enable the top left co-ordinate in the face frame coordinate beBottom right angular coordinate isThe Face datection loss function is calculated using following formula:
Wherein, LiIt (Face) is the Face datection loss function, Face is face frame coordinate,For training sample upper left corner cross
The standard value of coordinate,For the standard value of training sample upper left corner ordinate,For the standard of training sample lower right corner abscissa
Value,For the standard value of training sample lower right corner ordinate.
6. the method for detecting human face according to claim 5 based on cascade deep convolutional neural networks, which is characterized in that
The step H) in, described image quality evaluation loss function is as follows:
Wherein, LiIt (IQ) is described image quality evaluation loss function, IQ is picture quality, ykFor the desired value of training sample, sk
For the output valve of Softmax.
7. the method for detecting human face according to claim 6 based on cascade deep convolutional neural networks, which is characterized in that
The step I) in, the loss function of the training sample are as follows:
Li=Wi(αLi(Face)+βLi(IQ))
Wherein, LiFor the loss function of current training sample, WiFor the weight of current training sample, α is Face datection loss
Function weight shared in the total losses function, β are described image quality evaluation loss function in the total losses function
In shared weight, 0≤α, β≤1.
8. the method for detecting human face according to claim 7 based on cascade deep convolutional neural networks, which is characterized in that institute
State total losses function are as follows:
Wherein, L is the total losses function.
9. the method for detecting human face according to claim 1 based on cascade deep convolutional neural networks, which is characterized in that
The step K) in, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then each training sample
Weight be initialized to 1/N, it may be assumed that
Wi=1/N
Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolutional network more
Weight after new;The updated weight of training sample is as follows:
Wherein, Wi newFor the updated weight of the training sample, Z is the intermediate variable for normalizing weight, 1≤i≤N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811326169.0A CN109614866A (en) | 2018-11-08 | 2018-11-08 | Method for detecting human face based on cascade deep convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811326169.0A CN109614866A (en) | 2018-11-08 | 2018-11-08 | Method for detecting human face based on cascade deep convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109614866A true CN109614866A (en) | 2019-04-12 |
Family
ID=66003406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811326169.0A Withdrawn CN109614866A (en) | 2018-11-08 | 2018-11-08 | Method for detecting human face based on cascade deep convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109614866A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110673A (en) * | 2019-05-10 | 2019-08-09 | 杭州电子科技大学 | A kind of face identification method based on two-way 2DPCA and cascade feedforward neural network |
CN111507271A (en) * | 2020-04-20 | 2020-08-07 | 北京理工大学 | Airborne photoelectric video target intelligent detection and identification method |
CN111814553A (en) * | 2020-06-08 | 2020-10-23 | 浙江大华技术股份有限公司 | Face detection method, model training method and related device |
CN112819085A (en) * | 2021-02-10 | 2021-05-18 | 中国银联股份有限公司 | Model optimization method and device based on machine learning and storage medium |
CN112861659A (en) * | 2021-01-22 | 2021-05-28 | 平安科技(深圳)有限公司 | Image model training method and device, electronic equipment and storage medium |
WO2021120316A1 (en) * | 2019-12-17 | 2021-06-24 | Tcl华星光电技术有限公司 | Image processing method and apparatus, electronic device, and computer-readable storage medium |
-
2018
- 2018-11-08 CN CN201811326169.0A patent/CN109614866A/en not_active Withdrawn
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110673A (en) * | 2019-05-10 | 2019-08-09 | 杭州电子科技大学 | A kind of face identification method based on two-way 2DPCA and cascade feedforward neural network |
WO2021120316A1 (en) * | 2019-12-17 | 2021-06-24 | Tcl华星光电技术有限公司 | Image processing method and apparatus, electronic device, and computer-readable storage medium |
US11348211B2 (en) | 2019-12-17 | 2022-05-31 | Tcl China Star Optoelectronics Technology Co., Ltd. | Image processing method, device, electronic apparatus and computer readable storage medium |
CN111507271A (en) * | 2020-04-20 | 2020-08-07 | 北京理工大学 | Airborne photoelectric video target intelligent detection and identification method |
CN111507271B (en) * | 2020-04-20 | 2021-01-12 | 北京理工大学 | Airborne photoelectric video target intelligent detection and identification method |
CN111814553A (en) * | 2020-06-08 | 2020-10-23 | 浙江大华技术股份有限公司 | Face detection method, model training method and related device |
CN111814553B (en) * | 2020-06-08 | 2023-07-11 | 浙江大华技术股份有限公司 | Face detection method, training method of model and related devices thereof |
CN112861659A (en) * | 2021-01-22 | 2021-05-28 | 平安科技(深圳)有限公司 | Image model training method and device, electronic equipment and storage medium |
WO2022156061A1 (en) * | 2021-01-22 | 2022-07-28 | 平安科技(深圳)有限公司 | Image model training method and apparatus, electronic device, and storage medium |
CN112861659B (en) * | 2021-01-22 | 2023-07-14 | 平安科技(深圳)有限公司 | Image model training method and device, electronic equipment and storage medium |
CN112819085A (en) * | 2021-02-10 | 2021-05-18 | 中国银联股份有限公司 | Model optimization method and device based on machine learning and storage medium |
CN112819085B (en) * | 2021-02-10 | 2023-10-24 | 中国银联股份有限公司 | Model optimization method, device and storage medium based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614866A (en) | Method for detecting human face based on cascade deep convolutional neural networks | |
CN110348319B (en) | Face anti-counterfeiting method based on face depth information and edge image fusion | |
Cheng et al. | Exploiting effective facial patches for robust gender recognition | |
CN103605972B (en) | Non-restricted environment face verification method based on block depth neural network | |
CN106599797B (en) | A kind of infrared face recognition method based on local parallel neural network | |
CN104866810B (en) | A kind of face identification method of depth convolutional neural networks | |
Zahisham et al. | Food recognition with resnet-50 | |
CN102663413B (en) | Multi-gesture and cross-age oriented face image authentication method | |
CN107506740A (en) | A kind of Human bodys' response method based on Three dimensional convolution neutral net and transfer learning model | |
CN108564049A (en) | A kind of fast face detection recognition method based on deep learning | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN109255375A (en) | Panoramic picture method for checking object based on deep learning | |
CN106651915B (en) | The method for tracking target of multi-scale expression based on convolutional neural networks | |
CN107871101A (en) | A kind of method for detecting human face and device | |
CN110348399A (en) | EO-1 hyperion intelligent method for classifying based on prototype study mechanism and multidimensional residual error network | |
CN108108760A (en) | A kind of fast human face recognition | |
CN104680545B (en) | There is the detection method of well-marked target in optical imagery | |
CN109344856B (en) | Offline signature identification method based on multilayer discriminant feature learning | |
CN108681735A (en) | Optical character recognition method based on convolutional neural networks deep learning model | |
CN102156871A (en) | Image classification method based on category correlated codebook and classifier voting strategy | |
CN105654122B (en) | Based on the matched spatial pyramid object identification method of kernel function | |
CN107545243A (en) | Yellow race's face identification method based on depth convolution model | |
CN105138951B (en) | Human face portrait-photo array the method represented based on graph model | |
CN107818299A (en) | Face recognition algorithms based on fusion HOG features and depth belief network | |
CN113537173B (en) | Face image authenticity identification method based on face patch mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190412 |
|
WW01 | Invention patent application withdrawn after publication |