Summary of the invention
Embodiments provide a kind of method for detecting human face and device, it is possible at the image of arbitrary size
Reason, and positive face and side face can be detected simultaneously, improve detection speed.
In view of this, first aspect present invention provides a kind of method for detecting human face, including:
Obtaining candidate face image by the first degree of depth convolutional network, described first degree of depth convolutional network is for initial survey
Full convolutional network;
By the second degree of depth convolutional network, described candidate face image is calculated, obtain described candidate face image
Reliability values, described second degree of depth convolutional network is the degree of depth convolutional network for verification;
If the reliability values of described candidate face image is more than predetermined threshold value, then it is judged to final facial image.
Optional:
Described by first degree of depth convolutional network obtain candidate face image include:
Face thermodynamic chart is generated by the first degree of depth convolutional network;
Local hottest point is determined from described face thermodynamic chart, and using described local hottest point as candidate face position;
According to described candidate face position acquisition candidate face image.
Optional:
Described include according to before described candidate face position acquisition candidate face image:
Judge whether described candidate face position exists overlap;
The most then merge the described candidate face position of overlap.
Optional:
Described by including before the first degree of depth convolutional network acquisition candidate face image:
Generate the first degree of depth convolutional network;
Gather facial image and non-face image, and using described facial image and non-face image as training sample;
Described first degree of depth convolutional network is trained by described training sample.
Optional:
Described second degree of depth convolutional network is multiple degree of depth convolutional network, by the second degree of depth convolutional network to described candidate
Facial image calculates, and the reliability values obtaining described candidate face image includes:
Respectively described candidate face image is calculated by the plurality of degree of depth convolutional network, obtain described candidate
Multiple reliability values of face image;
The reliability values of described candidate face image is obtained according to the plurality of reliability values.
Optional:
Described first degree of depth convolutional network comprises multilamellar, is followed successively by: the first input layer, first volume lamination, the first output layer,
First maximum pond layer, the second output layer, the first activation primitive layer, volume Two lamination, the 3rd output layer, the second activation primitive
Layer, the 3rd convolutional layer and the 4th output layer;Described second degree of depth convolutional network includes multilamellar, is followed successively by: the second input layer,
Four convolutional layers, the 5th output layer, the second maximum pond layer, the 6th output layer, the 3rd activation primitive layer, the 5th convolutional layer, the 7th
Output layer, the 3rd maximum pond layer, the 8th output layer, the 4th activation primitive layer, full articulamentum and the 9th output layer.
Second aspect present invention provides a kind of human face detection device, including:
Acquisition module, for obtaining candidate face image, described first degree of depth convolution net by the first degree of depth convolutional network
Network is the full convolutional network for initial survey;
First processing module, for being calculated described candidate face image by the second degree of depth convolutional network, is obtained
The reliability values of described candidate face image, described second degree of depth convolutional network is the preset degree of depth convolution net for verification
Network;
Determination module, if the reliability values for described candidate face image is more than predetermined threshold value, is then judged to final
Facial image.
Optional:
Described acquisition module includes:
Signal generating unit, for generating face thermodynamic chart by the first degree of depth convolutional network;
First processing unit, for determining local hottest point, and by described local hottest point from described face thermodynamic chart
As candidate face position;
Acquiring unit, for according to described candidate face position acquisition candidate face image.
Optional:
Described device also includes:
Judge module, is used for judging whether described candidate face position exists overlap;
Second processing module, if judging that described candidate face position exists overlap for judge module, then merges overlap
Described candidate face position.
Optional:
Described device also includes:
Generation module, for generating the first degree of depth convolutional network;
3rd processing module, is used for gathering facial image and non-face image, and by described facial image and non-face figure
As training sample;
Training module, for training described first degree of depth convolutional network by described training sample.
Optional:
Described second degree of depth convolutional network is multiple degree of depth convolutional network, and described first processing module includes:
Computing unit, for respectively described candidate face image being calculated by the plurality of degree of depth convolutional network,
Obtain multiple reliability values of described candidate face image;
Second processing unit, for obtaining the reliability number of described candidate face image according to the plurality of reliability values
Value.
Optional:
Described first degree of depth convolutional network comprises multilamellar, is followed successively by: the first input layer, first volume lamination, the first output layer,
First maximum pond layer, the second output layer, the first activation primitive layer, volume Two lamination, the 3rd output layer, the second activation primitive
Layer, the 3rd convolutional layer and the 4th output layer;Described second degree of depth convolutional network includes multilamellar, is followed successively by: the second input layer,
Four convolutional layers, the 5th output layer, the second maximum pond layer, the 6th output layer, the 3rd activation primitive layer, the 5th convolutional layer, the 7th
Output layer, the 3rd maximum pond layer, the 8th output layer, the 4th activation primitive layer, full articulamentum and the 9th output layer.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that the network for initial survey is full volume
Long-pending network, thus the image of arbitrary size can be processed and improve Face datection speed, the additionally present invention by the present invention
Candidate face image do not limit positive face image or side face image, so the present invention also is able to detect positive face and side simultaneously
Face.
Term " first " in description and claims of this specification and above-mentioned accompanying drawing, " second ", " the 3rd ", "
Four " etc. (if present) is for distinguishing similar object, without being used for describing specific order or precedence.Should manage
Solve the data so used can exchange in the appropriate case, in order to the embodiments described herein can be with except here illustrating
Or the order enforcement beyond the content described.Additionally, term " includes " and " having " and their any deformation, it is intended that
Cover non-exclusive comprising, such as, contain series of steps or the process of unit, method, system, product or equipment need not limit
In those steps clearly listed or unit, but can include the most clearly listing or for these processes, method, product
Product or intrinsic other step of equipment or unit.
Referring to Fig. 1, in the embodiment of the present invention, one embodiment of method for detecting human face includes:
101, obtaining candidate face image by the first degree of depth convolutional network, this first degree of depth convolutional network is preset use
Full convolutional network in initial survey;
In the present embodiment, degree of depth convolutional network is typically only used for the image of fixed size, carries out classifying or identifying, this
The first degree of depth convolutional network in bright is full convolutional network, by utilizing the network structure of full convolutional network so that it is can be suitable for
Image in arbitrary size.
Optionally, in some embodiments of the invention, obtain candidate face image by the first degree of depth convolutional network to have
Body is:
Face thermodynamic chart is generated by the first degree of depth convolutional network;
Local hottest point is determined from face thermodynamic chart, and by local hottest point as candidate face position;
According to candidate face position acquisition candidate face image.
It should be noted that the first degree of depth convolutional network can be a mininet, mininet is used for generating face
Thermodynamic chart;Then from face thermodynamic chart, local hottest point is found, as candidate face position;According to candidate face position from former
Figure intercepts out candidate face image.Such as artwork is a photo, photo includes a little girl and a desk, logical
Cross the first degree of depth convolutional network and can intercept out the facial image of little girl from this photo.
Further alternative, in some embodiments of the invention, according to candidate face position acquisition candidate face image
Include before:
Judge whether candidate face position exists overlap;
The most then merge the candidate face position of overlap.
Optionally, in some embodiments of the invention, by first degree of depth convolutional network obtain candidate face image it
Before include:
Generate the first degree of depth convolutional network;
Gather facial image and non-face image, and using facial image and non-face image as training sample;
The first degree of depth convolutional network is trained by training sample.
It should be noted that the present invention gathers substantial amounts of face and non-face image as training sample for training first
Degree of depth convolutional network.Generally, facial image is positioned in a bigger image, carrys out accurate identification face by a rectangular area
Position and size, this rectangle becomes face frame;By the size and location of face frame is added random disturbance, intercept out face
Image, then enters row stochastic rotation, to detect the facial image of different directions, the most repeatedly obtains from a facial image
Multiple training samples;Usual original facial image need more than 10000, the training sample extracted up to more than 100000, with
Just the first degree of depth convolutional network learns to more preferable feature.
The selection of non-face image can intercept in not having the pictures such as the picture with scenes of face, object picture at random, non-
The number of facial image is significantly larger than the number of facial image, such as more than 1,000,000.
By existing neutral net instrument and training sample, the first degree of depth convolutional network is trained, determines first
The parameter of degree of depth convolutional network.When first degree of depth convolutional network is trained, if employing cross entropy loss function, the most finally
Convolution need to use two convolution and, but in the use after having trained, owing to only using the defeated of first convolution kernel
Going out result, therefore the result of second convolution kernel need not calculate, and can remove.
It should be noted that the second degree of depth convolutional network and the first degree of depth convolutional network are similar to, training sample also can be passed through
Training, here is omitted.
Optionally, in some embodiments of the invention, the first degree of depth convolutional network comprises multilamellar, is followed successively by: first is defeated
Enter layer, first volume lamination, the first output layer, the first maximum pond layer, the second output layer, the first activation primitive layer, the second convolution
Layer, the 3rd output layer, the second activation primitive layer, the 3rd convolutional layer and the 4th output layer, for the ease of understanding the first degree of depth volume
Long-pending network, is described in detail to the first degree of depth convolutional network below:
Refer to Fig. 2, Fig. 2 and represent a kind of preset full convolutional network for initial survey, that is to say the first degree of depth convolution net
Network, the first input layer inputs 3 passages (H × W) image of a width arbitrary dimension, at first volume lamination, uses 32 5 × 5 convolution
Core carries out convolution, and the first output layer obtains the output image of 32 passages, and its height and width decrease 4 pixels simultaneously, i.e. (H-4) ×
(W-4);The maximum pondization that then first maximum pond layer is carried out one time 4 × 4 processes, and its pixel count is reduced to original 1/4, i.e.
Second output layer obtains ((H-4)/4) × ((W-4)/4) of 32 passages, and then the first activation primitive layer carries out ReLU activation primitive
Process;Next volume Two lamination uses 64 7 × 7 convolution kernels to carry out convolution, and the 3rd output layer obtains the output figure of 64 passages
Picture, i.e. ((H-4)/4-6) × ((W-4)/4-6), the second activation primitive layer reuses ReLU activation primitive and processes;Last 3rd
Convolutional layer uses the convolution kernel of 11 × 1 to carry out convolution, and the 4th output layer obtains 1 passage ((H-4)/4-6) × ((W-4)/4-
6), the most last face probability graph, i.e. thermodynamic chart, on this thermodynamic chart, local maximum point is possible face.
This full convolutional network is equivalent to the blockage image of 32 × 32 pixels to be mapped as a probit, and therefore using should
The face (32 × 32) of complete one yardstick of convolutional network Intelligent Measurement, will detect the face of other yardsticks, needs artwork to scale
After again detect, need scaling number of times and scaling determine according to the scope of face size to be detected.
For the occasion that rate request is higher, the size of this full convolutional network, such as input picture can be reduced further
Using single pass gray level image, or reduce convolution kernel quantity, convolution is 4 convolution kernels such as the first time, and second time is 16
Individual convolution kernel, can greatly speed up the speed of process.
The first degree of depth convolutional network for initial survey is full convolutional network, such that it is able to the image of arbitrary size
Reason;Secondly requiring it is that speed is fast, precision can be slightly lower, and usual reliability can reach more than 99.3%.
102, by the second degree of depth convolutional network, candidate face image is calculated, obtain the reliable of candidate face image
Property numerical value, this second degree of depth convolutional network is the preset degree of depth convolutional network for verification;
In the present embodiment, the requirement of speed be need not too strict by the second degree of depth convolutional network for verification, but needs
Higher reliability, generally, reliability values needs to reach more than 99.7%.Permissible for the second degree of depth convolutional network of verification
Being designed as one or more, in the case of multiple, in order to more reliable, multiple degree of depth convolutional network are structurally or on convolution kernel
The comparison in difference of design is big, in order to form complementation, it is possible to excavate the different characteristic in image.It addition, for the of verification
Two degree of depth convolutional network can use the image of fixed size as input, is therefore not required for full convolutional network.In order to reach
Higher reliability, the number of plies of the second degree of depth convolutional network and image channel number can be bigger, and Fig. 3 is a kind of typical second degree of depth
Convolutional network schematic diagram, the second degree of depth convolutional network in Fig. 3 can include multilamellar, be followed successively by: the second input layer, Volume Four are amassed
Layer, the 5th output layer, the second maximum pond layer, the 6th output layer, the 3rd activation primitive layer, the 5th convolutional layer, the 7th output layer,
3rd maximum pond layer, the 8th output layer, the 4th activation primitive layer, full articulamentum and the 9th output layer.Concrete, second is defeated
Enter layer one 3 passages (32 × 32) image of input, at Volume Four lamination, use 32 11 × 11 convolution kernels to carry out convolution, the 5th
Output layer obtains the output image of 32 passages (22 × 22), and then the second maximum pond layer carries out the maximum pond Hua Chu of a time 2 × 2
Reason, the 6th output layer obtains the output image of 32 passages (11 × 11), and then the 3rd activation primitive layer carries out ReLU activation primitive
Process;Following 5th convolutional layer uses 64 3 × 3 convolution kernels to carry out convolution, and the 7th output layer obtains 64 passages (9 × 9)
Output image, the maximum pondization that the 3rd maximum pond layer is carried out a time 3 × 3 processes, and the 8th output layer obtains 64 passages (3 × 3)
Output image, the 4th activation primitive layer reuse ReLU activation primitive process, full articulamentum obtain 576 input values and
2 output valves, wherein 576 × 2 represent the parameter matrix of 576 × 2.9th output layer finally gives two numerical value, and these are two years old
Individual numerical value finally can be used for calculating face and non-face probability, finally takes the probability of face.
Optionally, in some embodiments of the invention, if the second degree of depth convolutional network is multiple degree of depth convolutional network, step
Rapid 102 particularly as follows:
Respectively candidate face image is calculated by multiple degree of depth convolutional network, obtain the multiple of candidate face image
Reliability values;
The reliability values of candidate face image is obtained according to multiple reliability values.
Concrete, multiple reliability values are averaged, using meansigma methods as the reliability values of candidate face image;
Or, multiple reliability values are taken maximum, using maximum as the reliability values of candidate face image;Or, will be many
Individual reliability values takes weighted value, using weighted value as the reliability values of candidate face image, it is also possible to use additive method,
It is not construed as limiting herein.
If the reliability values of 103 candidate face images is more than predetermined threshold value, then it is judged to final facial image.
In the present embodiment, reliability values is for representing the reliability of candidate face image, and the value of predetermined threshold value is permissible
It is 99.7%, it is also possible to for other reasonable values, be not construed as limiting herein.
In the present embodiment, the network for initial survey is full convolutional network, thus the present invention can be to the image of arbitrary size
Carrying out processing and improving Face datection speed, additionally the candidate face image of the present invention does not limit positive face image or side face
Image, so the present invention also is able to detect positive face and side face simultaneously.
Referring to Fig. 4, in the embodiment of the present invention, one embodiment of human face detection device includes:
Acquisition module 201, for obtaining candidate face image, this first degree of depth convolution net by the first degree of depth convolutional network
Network is the preset full convolutional network for initial survey;
First processing module 202, for being calculated candidate face image by the second degree of depth convolutional network, is waited
Selecting the reliability values of facial image, this second degree of depth convolutional network is the preset degree of depth convolutional network for verification;
Determination module 203, if the reliability values for candidate face image is more than predetermined threshold value, is then judged to final people
Face image.
In the present embodiment, the network for initial survey is full convolutional network, thus the present invention can be to the image of arbitrary size
Carrying out processing and improving Face datection speed, additionally the candidate face image of the present invention does not limit positive face image or side face
Image, so the present invention also is able to detect positive face and side face simultaneously.
Optionally, acquisition module 201 includes:
Signal generating unit, for generating face thermodynamic chart by the first degree of depth convolutional network;
First processing unit, for determining local hottest point, and by local hottest point as candidate from face thermodynamic chart
Face location;
Acquiring unit, for according to candidate face position acquisition candidate face image.
Further, this device also includes:
Judge module, is used for judging whether candidate face position exists overlap;
Second processing module, if judging that candidate face position exists overlap for judge module, then merges this time of overlap
Select face location.
Optionally, this device also includes:
Generation module, for generating the first degree of depth convolutional network;
3rd processing module, is used for gathering facial image and non-face image, and facial image and non-face image is made
For training sample;
Training module, for training the first degree of depth convolutional network by training sample.
Further, if the second degree of depth convolutional network is multiple degree of depth convolutional network.First processing module 202 includes:
Computing unit, for being calculated candidate face image respectively by multiple degree of depth convolutional network, obtains candidate
Multiple reliability values of facial image;
Second processing unit, for obtaining the reliability values of candidate face image according to multiple reliability values.
Concrete, multiple reliability values are averaged, using meansigma methods as the reliability values of candidate face image;
Or, multiple reliability values are taken maximum, using maximum as the reliability values of candidate face image;Or, will be many
Individual reliability values takes weighted value, using weighted value as the reliability values of candidate face image, it is also possible to use additive method,
It is not construed as limiting herein.
Optionally, the first degree of depth convolutional network comprises multilamellar, is followed successively by: the first input layer, first volume lamination, the first output
Layer, the first maximum pond layer, the second output layer, the first activation primitive layer, volume Two lamination, the 3rd output layer, the second activation letter
Several layers, the 3rd convolutional layer and the 4th output layer;Described second degree of depth convolutional network includes multilamellar, is followed successively by: the second input layer,
Volume Four lamination, the 5th output layer, the second maximum pond layer, the 6th output layer, the 3rd activation primitive layer, the 5th convolutional layer, the
Seven output layers, the 3rd maximum pond layer, the 8th output layer, the 4th activation primitive layer, full articulamentum and the 9th output layer.
Those skilled in the art is it can be understood that arrive, for convenience and simplicity of description, and the system of foregoing description,
The specific works process of device and unit, is referred to the corresponding process in preceding method embodiment, does not repeats them here.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method are permissible
Realize by another way.Such as, device embodiment described above is only schematically, such as, and described unit
Dividing, be only a kind of logic function and divide, actual can have other dividing mode, the most multiple unit or assembly when realizing
Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not performs.Another point, shown or
The coupling each other discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit
Close or communication connection, can be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate, shows as unit
The parts shown can be or may not be physical location, i.e. may be located at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected according to the actual needs to realize the mesh of the present embodiment scheme
's.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to two or more unit are integrated in a unit.Above-mentioned integrated list
Unit both can realize to use the form of hardware, it would however also be possible to employ the form of SFU software functional unit realizes.
If described integrated unit realizes and as independent production marketing or use using the form of SFU software functional unit
Time, can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part that in other words prior art contributed or this technical scheme completely or partially can be with the form of software product
Embodying, this computer software product is stored in a storage medium, including some instructions with so that a computer
Equipment (can be personal computer, server, or the network equipment etc.) performs the complete of method described in each embodiment of the present invention
Portion or part steps.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
The above, above example only in order to technical scheme to be described, is not intended to limit;Although with reference to front
State embodiment the present invention has been described in detail, it will be understood by those within the art that: it still can be to front
State the technical scheme described in each embodiment to modify, or wherein portion of techniques feature is carried out equivalent;And these
Amendment or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.