CN109614866A

CN109614866A - Method for detecting human face based on cascade deep convolutional neural networks

Info

Publication number: CN109614866A
Application number: CN201811326169.0A
Authority: CN
Inventors: 温峻峰; 江志伟; 李鑫; 杜海江; 夏欢; 谢巍; 张浪文; 翁冠碧
Original assignee: Zhongke Skynet (guangdong) Technology Co Ltd
Current assignee: Zhongke Skynet (guangdong) Technology Co Ltd
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2019-04-12

Abstract

The invention discloses a kind of method for detecting human face based on cascade deep convolutional neural networks, comprising: establishes n grades of depth convolutional networks；Convolution algorithm is carried out after filling to training sample；Using characteristic pattern some or all of in characteristic pattern as the input of next layer of convolutional layer, convolution algorithm is carried out；Previous step is repeated until to (n+1)th layer of convolutional layer；It serializes the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, high dimension vector is connect entirely with the node of full articulamentum；Export face frame coordinate and face quality evaluation score；Obtain Face datection loss function, image quality evaluation loss function and total losses function；The loss function for obtaining training sample, is trained the update of sample weights；Train cascade deep convolutional neural networks.The present invention can solve the problems, such as Face datection and quality of human face image evaluation simultaneously, can also improve performance while improving processing speed.

Description

Method for detecting human face based on cascade deep convolutional neural networks

Technical field

The present invention relates to Face datection field, in particular to a kind of Face datection based on cascade deep convolutional neural networks Method.

Background technique

Face datection (Face Detection) refers in the input image, is scanned for using certain strategy to it, It determines whether containing face, if there is then returning to the position of each face and the process of size.Face datection is that face is known Other basis is the first step of recognition of face.The quality of human face image quality that the face inspection stage detects directly influences people The precision of face identification.In traditional face recognition process, Face datection and quality of human face image evaluation are tasks in two stages It completes, first detects facial image, quality evaluation then is carried out to the facial image detected.There are two insufficient for this method Place, first is that need to consume more processing times, and in many view synthesis, the processing time is one very crucial Index.Second is that the process of Face datection inherently needs to use picture quality as judgment basis, there are correlations between two kinds. The two tasks are separately handled, correlation between the two can be isolated.

Method for detecting human face can be divided into four major class:

1) Knowledge based engineering method.

This method is by researcher to the knowledge encoding of face at recognition rule.Such as the face in image, usually have There are two mutually symmetrical with eyes, a nose and a mouth.Correlation between feature can be by between them Distance and positional relationship describe.The problem of this method is to be difficult the Knowledge conversion of people to be clearly defined rule.If Rule is excessively detailed, then will lead to low discrimination, whereas if rule is too general, then will lead to high fallout ratio.

2) property invariant method.

The foundation of this method is that people can easily identify face under different postures, angle, illumination condition, therefore It is considered that there is the property invariant not changed with posture, angle, illumination condition in image.This method first extracts such as eyebrow The facial characteristics such as hair, eyes, nose, mouth, the relationship then established between statistical model Expressive Features is to confirm depositing for face ?.The problem of this method is when there are illumination, noise in image and phenomena such as block, and the feature extracted may become It is unreliable.

3) template matching method.

This method pre-defines the face mode of better standard, by calculating input image region and mode standard it Between degree of correlation detect face.This method is realized simple, but cannot effectively handle scale, state, change in shape Problem.

4) based on the method for machine learning.

This method detects face with this from the acquistion of training image middle school to face and non-face knowledge.In template In matching in method, the template of face is predetermined by expert, and in the method, mode be from the acquistion of image middle school to 's.The problem of such methods how selected feature and machine learning algorithm.

Image quality evaluation (Image Quality Assessment, IQA) is one of the basic fundamental in image procossing. Image quality evaluation evaluates the superiority and inferiority degree or distortion level of image by analyzing and researching to picture characteristics.Image matter Amount evaluation plays an important role in certain images application such as face authentication (Face Authentication), a people The people registered in face library, probably due to camera acquired image poor quality (such as: fuzzy, skew, distortion etc.) and can not Pass through verifying.Image quality evaluation includes subjective assessment and objectively evaluates two methods.Subjective evaluation method, which can be divided into, absolutely to be commented Valence and relative evaluation two types.It objectively evaluates to be divided into and refers to (Full-Reference) entirely, partially refer to (Reduced- Reference) and without with reference to (No-Reference) three types.

1989, LeCun invented convolutional neural networks LeNet, mainly uses Handwritten Digit Recognition, obtains good effect Fruit, however do not draw attention.2006, Geoffrey Hinton proposed depth confidence net (Deep Belief Net), gave Go out gradient disappearance solution to the problem in deep layer network training, i.e., weight initialized by unsupervised pre-training, Then the training fine tuning for having supervision is carried out again.2011, ReLU activation primitive was applied to depth network, which can It is effective to inhibit gradient disappearance problem, 2012, the application of Regularization Technique and Dropout technology so that deep learning algorithm more Stablize, performance is more preferably.The same year, Hinton seminar utilize 8 layers of AlexNet convolution mind to prove the potentiality of deep learning The match of ImageNet image recognition is participated in through network, obtains champion.The neural network of injection deep learning vigor becomes people again The focal spot techniques of work intelligence and message area, deep neural network is at a kind of general learning framework to work well.

Face datection will reach practical level, have higher requirement to accuracy and speed, the two is indispensable.By grinding Study carefully personnel's being continually striving to for many years, a large amount of method for detecting human face occur, be broadly divided into three kinds:

1) Face datection based on Cascade cascade classifier, such as Paul Viola and Michael Jones are 2001 The Face datection algorithm based on Adaboost that year proposes.

2) it is based on the Face datection of DPM (deformable part models), is a kind of face to be divided into several The algorithm that component is detected.

3) based on the method for deep neural network, such as DDFD (Deep Dense Face Detector), R-CNN etc..

The advantages of method of Viola-Jones is that speed is fast, but performance is not good enough.The characteristics of DPM method is that speed is slow, property It can be nor best.Method based on deep neural network can obtain best performance, and speed is in three classes method Medium level.The problem of the characteristics of deep neural network is that network structure is deep, is brought is training parameter enormous amount, is needed The consumption a large amount of processing time.

Summary of the invention

The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, provide a kind of can solve simultaneously The problem of Face datection and quality of human face image are evaluated, can also be mentioned while improving processing speed it is high performance based on cascade depth Spend the method for detecting human face of convolutional neural networks.

The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of based on cascade deep convolutional Neural net The method for detecting human face of network, includes the following steps:

A n grades of depth convolutional networks) are established；N-th grade of depth convolutional network includes n convolutional layer, a full connection Layer, a face frame output layer and a face quality evaluation output layer contain multiple square volumes in each convolutional layer Product core and rectangle convolution kernel, the face frame output layer are equipped with multiple nodes, and the face quality evaluation output layer is equipped with more A node, the n are integer and n >=1；

B several input pictures) are chosen as training sample, line number and columns are carried out to the edge of current training sample Filling, obtain filling image, and by it is described filling image respectively in convolutional layer described in first layer each convolution kernel carry out Convolution algorithm obtains multiple characteristic patterns；

C) using characteristic pattern some or all of in the characteristic pattern as the input of next layer convolutional layer, and by its point Convolution algorithm is not carried out with each convolution kernel in next layer convolutional layer, obtains multiple corresponding characteristic patterns；

D) repeat the above steps C) until to (n+1)th layer of convolutional layer；

E) characteristic pattern by (n+1)th layer of convolutional layer output serializes to obtain high dimension vector, by the high dimension vector with The node of the full articulamentum is connected entirely；

F face frame coordinate) is exported by the face frame output layer, people is exported by the face quality evaluation output layer Face quality evaluation score；

G it) according to the face frame coordinate, calculates and surrounds frame coordinate shift amount, obtain Face datection loss function；

H) with Softmax loss, (Softmax is that exponential function is commonly normalized in neural network, for network is defeated Become probability distribution out, cross entropy (Cross Entropy) describes the distance between two probability distribution, and cross entropy gets over novel Bright closer therebetween, Softmax and Cross Entropy combined and just obtains Softmax loss) function indicates to scheme As quality evaluation loss function；

I) the Face datection loss function and image quality evaluation loss function are weighted after superposition and obtained currently The loss function of the training sample；

J it) sums to the loss function of each training sample, obtains total losses function；

K) after wherein level deep convolutional network training, each instruction is calculated with trained depth convolutional network The loss function for practicing sample, the weight of the big training sample of loss function is increased, by the small training sample of loss function Weight reduced, complete the update to the weight of each training sample；

L) pass through the step A) to the mode of learning of step K), multistage depth convolutional network is stringed together and trains cascade Depth convolutional neural networks carry out Face datection using the cascade deep convolutional neural networks, remove non-face window.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in n-th grade of depth It spends in convolutional network, the number of nodes of the full articulamentum is 64x2^(n-1), it turns left and counts from the full articulamentum, m-th of convolutional layer The size of convolution kernel be respectively that (1+2m) × (1+2m), (3+2m) × (1+2m) and (1+2m) × (3+2m), port number are 16×2^(n-m), the m is integer and 0 < m < n+2.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step B) In, enabling the size of the current training sample is I_y×I_x, the size of convolution kernel is k_y×k_x, in the current training sample Edge is respectively with the line number of 0 filling and columnsWithThe size of the characteristic pattern of output is F_y ×F_x, there are following relationships:

F_y=I_y+2*P_y-k_y+ 1=I_y

F_x=I_x+2*P_x-k_x+ 1=I_x

Wherein, I_yIndicate the line number of current training sample, I_xIndicate the columns of current training sample, k_yFor the row of convolution kernel Number, k_xFor the columns of convolution kernel, k_yAnd k_xIt is odd number, F_yIndicate the line number of characteristic pattern, F_xIndicate the columns of characteristic pattern.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, the high dimension vector Length is 3*16 × 2^(n-1)×F_y×F_x, the high dimension vector is 3*16 with the connection number that the node of the full articulamentum is connect entirely ×2^(n-1)×F_y×F_x×64×2^(n-1)。

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step G) In, enable the top left co-ordinate in the face frame coordinate beBottom right angular coordinate isThe Face datection damage Function is lost to be calculated using following formula:

Wherein, L_iIt (Face) is the Face datection loss function, Face is face frame coordinate,For training sample upper left The standard value of angle abscissa,For the standard value of training sample upper left corner ordinate,For training sample lower right corner abscissa Standard value,For the standard value of training sample lower right corner ordinate.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step H) In, described image quality evaluation loss function is as follows:

Wherein, L_iIt (IQ) is described image quality evaluation loss function, IQ is picture quality, y_kFor the expectation of training sample Value, s_kFor the output valve of Softmax.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step I) In, the loss function of the training sample are as follows:

L_i=W_i(αL_i(Face)+βL_i(IQ))

Wherein, L_iFor the loss function of current training sample, W_iFor the weight of current training sample, α is the Face datection Loss function weight shared in the total losses function, β are described image quality evaluation loss function in the total losses Shared weight, 0≤α, β≤1 in function.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, the total losses function Are as follows:

Wherein, L is the total losses function.

In the method for detecting human face of the present invention based on cascade deep convolutional neural networks, in the step K) In, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then the weight of each training sample is first Beginning turns to 1/N, it may be assumed that

W_i=1/N

Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolution net The updated weight of network；The updated weight of training sample is as follows:

Wherein, W_i ^newFor the updated weight of the training sample, Z is the intermediate variable for normalizing weight, 1≤i ≤N。

Implement the method for detecting human face of the invention based on cascade deep convolutional neural networks, has the advantages that Due to the depth by reducing single convolutional network, cascade deep convolution is formed using multiple cascade modes of depth convolutional network Neural network, in this way can be while improving processing speed, while can also improve performance, and length is introduced in depth convolutional network The convolution kernel of square structure is for detecting side, line and structure feature abundant, in order to distinguish face and non-face, training sample Loss function be Face datection loss function and image quality loss function linear superposition, in learning process, Ke Yigen According to the size of loss function, the weight of adjust automatically training sample, the big training sample of loss function, corresponding weight also adjusted Greatly, the small training sample of loss function, corresponding weight are also turned down, and weighed value adjusting in this way, network can be pointedly Study, restrains more quickly, can take into account speed and performance, while detecting face, the quality for giving facial image is commented Valence, therefore the present invention can solve the problems, such as Face datection and quality of human face image evaluation, while improving processing speed simultaneously Performance can also be improved.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is that the present invention is based on the processes in method for detecting human face one embodiment of cascade deep convolutional neural networks Figure；

Fig. 2 is the structural schematic diagram of first order depth convolutional network in the embodiment；

Fig. 3 is the structural schematic diagram of second level depth convolutional network in the embodiment；

Fig. 4 is the schematic diagram of identical convolution in the embodiment；

Fig. 5 is the flow chart of learning process in the embodiment；

Fig. 6 is the Face datection schematic diagram of the embodiment cascade depth convolutional neural networks.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In the method for detecting human face embodiment the present invention is based on cascade deep convolutional neural networks, cascade deep should be based on The flow chart of the method for detecting human face of convolutional neural networks is as shown in Figure 1.In Fig. 1, cascade deep convolutional neural networks should be based on Method for detecting human face, which comprises the steps of:

Step S01 establishes n grades of depth convolutional networks.

Specifically, the n grades of depth convolutional network includes n+1 convolutional layer, a full articulamentum, a face frame output Layer and a face quality evaluation output layer, n are integer and n >=1.In n-th grade of depth convolutional network, the feature of full articulamentum Points are 64x2^(n-1), turn left and count from full articulamentum, the size of the convolution kernel of m-th of convolutional layer is respectively (1+2m) × (1+ 2m), (3+2m) × (1+2m) and (1+2m) × (3+2m), port number are 16 × 2^(n-m), m is integer and 0 < m < n+2.

When n is 1, the structural schematic diagram of first order depth convolutional network is as shown in Fig. 2, Image indicates input picture, the Level deep convolutional network is there are two convolutional layer, and in Fig. 2,203 indicate first layer convolutional layers, and 204 indicate second layer convolutional layers, when When n=2, the structural schematic diagram of second level depth convolutional network as indicated at 3, second level depth convolutional network there are three convolutional layer, Later every increase level deep convolutional network, can all increase a convolutional layer.Specific rule is: to n-th grade of depth convolutional network, The convolution number of plies is n+1.

In each convolutional layer, there are also rectangle convolution kernel other than square convolution kernel, containing multiple in each convolutional layer Square convolution kernel and rectangle convolution kernel, 201 indicate rectangle convolution kernels in Fig. 2, and 202 indicate square convolution kernels.Such as In Fig. 2, the feature points of full articulamentum are 64 nodes of the 205 full articulamentums of expression in 64, Fig. 2, first layer convolutional layer The size of convolution kernel be respectively 5 × 5,7 × 5 cores 5 × 7, port number is all 8.The size of the convolution kernel of second layer convolutional layer point Not Wei 3 × 3,5 × 3 cores 3 × 5, port number is all 16.There are also activation primitive and pond layers after each convolutional layer, for brevity, figure In do not draw.

It is inspired by Viola-Jones based on the method for detecting human face of Cascade cascade classifier, the present invention is single by reducing The depth of a depth convolutional network, multiple depth convolutional networks use cascade form, while processing speed can be improved, also Performance can be improved.

It, can the learning characteristic from data automatically specifically, depth convolutional network is by convolution algorithm.Convolution algorithm sheet Matter is that a filter by convolution algorithm can be such that signal characteristic enhances, and reduce noise, and different convolution kernels can Extract the different characteristic in image.Depth convolutional network, which realizes, unites feature extraction and classification processing, by right A large amount of face and non-face learning sample, it is high that depth convolutional network can efficiently extract out face/non-face resolution Feature, and then realize Face datection.In depth convolutional network, the structure of convolution kernel is typically all square, for example, 3x3 or The convolution kernel of 5x5.Successfully another key factor is to use Harr-like feature to Viola-Jones method for detecting human face, Gray scale difference in Haar-like feature calculation rectangular area, it is special that different Feature prototypes can reflect different sides, line and structure Sign.Face/high validity feature of non-face resolution can be extracted by Harr-like feature, is inspired by this, the present invention exists The convolution kernel of rectangle structure is introduced in depth convolutional network for detecting side, line and structure abundant.

Step S02 chooses several input pictures as training sample, carries out line number to the edge of current training sample With the filling of columns, filling image is obtained, and filling image is rolled up with each convolution kernel in first layer convolutional layer respectively Product operation, obtains multiple characteristic patterns.

Specifically, when doing convolution algorithm, being generally required in 0 value of the edge of image data in depth convolutional network It fills (padding), usually there are three types of filling mode: no filling (No padding), half filling (Half Padding) and entirely It fills (Full padding), corresponding convolution operation are as follows: effective convolution (Valid convolution), identical convolution (Same ) and complete convolution (full convolution) convolution.Due to having rectangle structure in the convolution kernel in the present invention, In order to enable the characteristic pattern (feature map) after convolution algorithm has unified size, using identical convolution (Same convolution)。

In this step, the size for enabling current training sample is I_y×I_x, the size of convolution kernel is k_y×k_x, using partly filling out Fill mode in the edge of current training sample is respectively with the line number of 0 filling and columnsWith The size of the characteristic pattern of output is F_y×F_x, there are following relationships:

Wherein, I_yIndicate the line number of current training sample, I_xIndicate the columns of current training sample, k_yFor convolution kernel Line number, k_xFor the columns of convolution kernel, F_yIndicate the line number of characteristic pattern, F_xIndicate the columns of characteristic pattern.In above formula (1), second The condition that a equal sign is set up is that the size of convolution kernel is k_yAnd k_xIt is odd number.As long as by above formula as it can be seen that the size of convolution kernel For odd number, regardless of its practical difference in size, using identical convolution, as the size for exporting image with the size of input picture is 's.Such as in Fig. 4, the size of input picture is 5 × 5, i.e. I_y=I_x=5.Rectangle convolution kernel size is 5 × 3, i.e. k_y=5, k_x=3.Half filling mode is used in the edge of input picture, i.e., in the edge of the input picture line number and column of 0 filling Number is respectivelyIt can be obtained according to above formula (1) Image is exported as input picture size.After filling mode determines, the calculating of the convolution value of rectangle convolution kernel and square Convolution kernel is the same.

Step S03 is distinguished using characteristic pattern some or all of in characteristic pattern as the input of next layer of convolutional layer Convolution algorithm is carried out with each convolution kernel in next layer of convolutional layer, obtains multiple corresponding characteristic patterns.

Step S04 repeats the above steps S03 until to (n+1)th layer of convolutional layer.

Step S05 serializes the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, by high dimension vector and Quan Lian The node for connecing layer is connected entirely: for n-th grade of depth convolutional network, sharing n+1 convolutional layer, the input of first layer convolutional layer is Original image, the input of next layer of convolutional layer are the characteristic patterns of upper one layer of convolutional layer output.The n-th+2 layers are full articulamentum, are had 64x2^(n-1)It is to connect between a node and preceding layer convolutional layer entirely.

In this step, serialize the characteristic pattern that (n+1)th layer of convolutional layer exports to obtain high dimension vector, the length of the high dimension vector Degree is 3*16 × 2^(n-1)×F_y×F_x, the node of the high dimension vector and full articulamentum is connected entirely (Full connect), Connection number is 3*16 × 2^(n-1)×F_y×F_x×64×2^(n-1).After full articulamentum, it to be that face frame is defeated respectively that there are two output layers Layer Face box and face quality evaluation output layer Image Quality, face frame output layer Facebox are for obtaining face out Frame has 4 nodes (the namely node of neural network is equivalent to a neuron, receives stimulation, generates output), can be defeated Face frame coordinate out, 206 in Fig. 2 indicate 4 nodes of face frame output layer Face box.Face quality evaluation output layer Image Quality has 6 nodes, exports face quality evaluation score, 207 tables in Fig. 2 for obtaining face quality evaluation It lets others have a look at 6 nodes of face quality evaluation output layer Image Quality.

Step S06 exports face frame coordinate by face frame output layer, exports face by face quality evaluation output layer Quality evaluation score.

Step S07 calculates according to face frame coordinate and surrounds frame coordinate shift amount, obtain Face datection loss function.

Specifically, face surrounds frame by top left co-ordinate in i-th of training sampleThe lower right corner and Coordinate determines.Face datection loss function is calculated using following formula:

Wherein, L_iIt (Face) is face Detectability loss function, Face is face frame coordinateFor training sample upper left corner cross The standard value of coordinate,For the standard value of training sample upper left corner ordinate,For the standard of training sample lower right corner abscissa Value,For the standard value of training sample lower right corner ordinate.It is noted that in above-mentioned face frame output layer Face box The output of node and the difference of desired output are exactly error, and Face datection loss function is certain direct ratio function of the error.

Step S08 indicates image quality evaluation loss function with Softmax loss function.

Specifically, quality of human face image is noted as 6 grades: 80-100 points are A, and 60-80 divides in training sample It is B grades, 40-60 points are C grades, and 20-40 points are D grades, and 0-20 points are E grades, non-face to be divided into F grades.Letter is lost in image quality evaluation Number is as follows:

Wherein, L_iIt (IQ) is image quality evaluation loss function, IQ is picture quality, y_kFor the desired value of training sample, s_k For the output valve of Softmax.It is noted that node in above-mentioned face quality evaluation output layer Image Quality The difference of output and desired output is exactly error, and image quality evaluation loss function is certain direct ratio function of the error.This reality It applies in example, quality of human face image is noted as 6 grades: 80-100 points are A, and 60-80 points are B grades, and 40-60 points are C grades, 20- 40 points are D grades, and 0-20 points are E grades, non-face to be divided into F grades.

In image quality evaluation, distorted image and reference picture are carried out pixel comparison by full reference image quality appraisement, Obtain the evaluation to distorted image.The image quality evaluation of half reference is to extract image respectively from original image and distorted image Validity feature, then feature is compared, obtains the evaluation to distorted image.The image quality evaluation of no reference does not have then The information of any reference picture, no reference method are typically all boundary intensity, noise based on image statistics such as image Rate, fuzziness, entropy etc. obtain the evaluation to distorted image.Image quality evaluation ultimately depends on the feeling of observer, objective The target of evaluation method is also to pursue to objectively evaluate result and the subjective assessment of people is consistent as far as possible.Based on this criterion, In deep neural network frame, the present invention is by the way of machine learning, by the training sample that has largely marked and reasonable Loss function trains the depth convolutional network suitable for image quality evaluation.

Step S09 is weighted after superposition Face datection loss function and image quality evaluation loss function and obtains currently The loss function of training sample.

Due to being to carry out Face datection and image quality evaluation task simultaneously, so the loss function of current training sample is The superposition of the two, comprehensive (2) and (3) formula obtain the loss function of training sample are as follows:

L_i=W_i(αL_i(Face)+βL_i(IQ)) (4)

Wherein, L_iFor the loss function of current training sample, W_iFor the weight of current training sample, α is face Detectability loss Function weight shared in total losses function, β are image quality evaluation loss function weight shared in total losses function, 0≤α,β≤1.The value of α, β can be determined by experiment, and can also be used as learning parameter, be determined by network training.

Step S10 sums to the loss function of each training sample, obtains total losses function.

Total losses function are as follows:

Wherein, L is total losses function.

Step S11 ought wherein level deep convolutional network training after, calculated with trained depth convolutional network The loss function of each training sample increases the weight of the big training sample of loss function, by the small instruction of loss function The weight for practicing sample is reduced, and the update to the weight of each training sample is completed.

Specifically, the basic thought of AdaBoost learning algorithm is when classifier correctly classifies to certain samples, then Reduce the weight of these samples；When mistake classification, then increases the weight of these samples, allow learning algorithm in subsequent study Concentration learns more difficult training sample.It is inspired by this, in the present invention, also there is similar but different strategy.Fig. 5 is The flow chart of learning process, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then each training sample Weight be initialized to 1/N, it may be assumed that

W_i=1/N (6)

Wherein, 1≤i≤N.

Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolution net The updated weight of network；The condition that meets in Fig. 5 refers to that total losses function shown in (5) formula is less than predetermined value or the number of iterations reaches To limit value.After a depth convolutional network training, the loss of each training sample is gone out with trained network query function Function L_i, then the weight of update training sample is as follows:

Wherein, W_i ^newFor the updated weight of training sample, Z is the intermediate variable for normalizing weight, 1≤i≤N.

The weight of training sample is after updating, loss function L_iThe weight of big training sample is increased, and letter is lost Number L_iThe weight of small training sample is then reduced, and subsequent depth convolutional network can be preferably to these difficulty training samples This is learnt.It is to be noted that weight renewing method and Viola-Jones here is in AdaBoost Face datection algorithm In method it is different, difference has two o'clock: first is that update mode is different, the mode of Viola-Jones is often to train one After basic classification device, weight is updated, mode of the invention be after often training a depth convolutional network to weight into Row updates；Second is that right value update formula is different.

Step S12 passes through the mode of learning of step S01 to step S11, and multistage depth convolutional network is stringed together and is trained Cascade deep convolutional neural networks carry out Face datection using cascade deep convolutional neural networks, remove non-face window.

Specifically, a series of cascade deep convolution minds can be trained by the mode of learning of step S01 to step S11 Through network, Fig. 6 is the Face datection schematic diagram of cascade deep convolutional neural networks.It is simple and quick to come cascade network in front, The purpose is to use less calculation amount, a large amount of non-face window is removed, the classifier of more back is more complicated, and accuracy of identification is got over Height, but need more calculation amounts.Cascade series depends on requirement of the system to correct recognition rata and recognition speed.Pass through this Kind cascade mode, can rapidly exclude non-face window, to save out the time for those more promising faces Quality of human face image evaluation is detected and is provided in region, reaches higher accuracy of identification and speed.

In short, the method for detecting human face of the invention based on cascade deep convolutional neural networks can carry out face inspection simultaneously It surveys and image quality evaluation task, is able to achieve taking into account for speed and performance.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of method for detecting human face based on cascade deep convolutional neural networks, which comprises the steps of:

A n grades of depth convolutional networks) are established；The n grades of depth convolutional network includes n convolutional layer, a full articulamentum, one Face frame output layer and a face quality evaluation output layer contain multiple square convolution kernels and length in each convolutional layer Rectangular convolution kernel, the face frame output layer are equipped with multiple nodes, and the face quality evaluation output layer is equipped with multiple nodes, institute Stating n is integer and n >=1；

B several input pictures) are chosen as training sample, to the edge progress line number of current training sample and filling out for columns It fills, obtains filling image, and the filling image is subjected to convolution with each convolution kernel in convolutional layer described in first layer respectively Operation obtains multiple characteristic patterns；

C) using characteristic pattern some or all of in the characteristic pattern as the input of next layer convolutional layer, and by its respectively with Each convolution kernel in next layer convolutional layer carries out convolution algorithm, obtains multiple corresponding characteristic patterns；

D) repeat the above steps C) until to (n+1)th layer of convolutional layer；

E) characteristic pattern by (n+1)th layer of convolutional layer output serializes to obtain high dimension vector, by the high dimension vector with it is described The node of full articulamentum is connected entirely；

F face frame coordinate) is exported by the face frame output layer, face matter is exported by the face quality evaluation output layer Amount evaluation score；

H image quality evaluation loss function) is indicated with Softmax loss function；

I) to the Face datection loss function and image quality evaluation loss function be weighted superposition after obtain it is presently described The loss function of training sample；

K) after wherein level deep convolutional network training, each trained sample is calculated with trained depth convolutional network This loss function, the weight of the big training sample of loss function is increased, by the power of the small training sample of loss function Value is reduced, and the update to the weight of each training sample is completed；

L) pass through the step A) to the mode of learning of step K), multistage depth convolutional network is stringed together and trains cascade deep Convolutional neural networks carry out Face datection using the cascade deep convolutional neural networks, remove non-face window.

2. the method for detecting human face according to claim 1 based on cascade deep convolutional neural networks, which is characterized in that In n-th grade of depth convolutional network, the number of nodes of the full articulamentum is 64x2^(n-1), it turns left and counts from the full articulamentum, The size of the convolution kernel of m-th of convolutional layer respectively (1+2m) × (1+2m), (3+2m) × (1+2m) and (1+2m) × (3+2m), Port number is 16 × 2^(n-m), the m is integer and 0 < m < n+2.

3. the method for detecting human face according to claim 2 based on cascade deep convolutional neural networks, which is characterized in that The step B) in, enabling the size of the current training sample is I_y×I_x, the size of convolution kernel is k_y×k_x, described current The edge of training sample is respectively with the line number of 0 filling and columnsWithThe characteristic pattern of output Size be F_y×F_x, there are following relationships:

F_y=I_y+2*P_y-k_y+ 1=I_y

F_x=I_x+2*P_x-k_x+ 1=I_x

Wherein, I_yIndicate the line number of current training sample, I_xIndicate the columns of current training sample, k_yFor the line number of convolution kernel, k_x For the columns of convolution kernel, k_yAnd k_xIt is odd number, F_yIndicate the line number of characteristic pattern, F_xIndicate the columns of characteristic pattern.

4. the method for detecting human face according to claim 3 based on cascade deep convolutional neural networks, which is characterized in that institute The length for stating high dimension vector is 3*16 × 2^(n-1)×F_y×F_x, the high dimension vector connect entirely with the node of the full articulamentum Connection number is 3*16 × 2^(n-1)×F_y×F_x×64×2^(n-1)。

5. the method for detecting human face according to any one of claims 1 to 4 based on cascade deep convolutional neural networks, It is characterized in that, in the step G) in, enable the top left co-ordinate in the face frame coordinate beBottom right angular coordinate isThe Face datection loss function is calculated using following formula:

Wherein, L_iIt (Face) is the Face datection loss function, Face is face frame coordinate,For training sample upper left corner cross The standard value of coordinate,For the standard value of training sample upper left corner ordinate,For the standard of training sample lower right corner abscissa Value,For the standard value of training sample lower right corner ordinate.

6. the method for detecting human face according to claim 5 based on cascade deep convolutional neural networks, which is characterized in that The step H) in, described image quality evaluation loss function is as follows:

Wherein, L_iIt (IQ) is described image quality evaluation loss function, IQ is picture quality, y_kFor the desired value of training sample, s_k For the output valve of Softmax.

7. the method for detecting human face according to claim 6 based on cascade deep convolutional neural networks, which is characterized in that The step I) in, the loss function of the training sample are as follows:

L_i=W_i(αL_i(Face)+βL_i(IQ))

Wherein, L_iFor the loss function of current training sample, W_iFor the weight of current training sample, α is Face datection loss Function weight shared in the total losses function, β are described image quality evaluation loss function in the total losses function In shared weight, 0≤α, β≤1.

8. the method for detecting human face according to claim 7 based on cascade deep convolutional neural networks, which is characterized in that institute State total losses function are as follows:

Wherein, L is the total losses function.

9. the method for detecting human face according to claim 1 based on cascade deep convolutional neural networks, which is characterized in that The step K) in, for first order depth convolutional network, it is assumed that the quantity of training sample is N, then each training sample Weight be initialized to 1/N, it may be assumed that

W_i=1/N

Since the depth convolutional network of the second level, the initialization value of the weight of training sample comes from upper level depth convolutional network more Weight after new；The updated weight of training sample is as follows:

Wherein, W_i ^newFor the updated weight of the training sample, Z is the intermediate variable for normalizing weight, 1≤i≤N.