CN110263731A - A kind of single step face detection system - Google Patents

A kind of single step face detection system Download PDF

Info

Publication number
CN110263731A
CN110263731A CN201910550738.8A CN201910550738A CN110263731A CN 110263731 A CN110263731 A CN 110263731A CN 201910550738 A CN201910550738 A CN 201910550738A CN 110263731 A CN110263731 A CN 110263731A
Authority
CN
China
Prior art keywords
convolution module
depth
face
module
crop box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910550738.8A
Other languages
Chinese (zh)
Other versions
CN110263731B (en
Inventor
徐杰
田野
罗堡文
廖静茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910550738.8A priority Critical patent/CN110263731B/en
Publication of CN110263731A publication Critical patent/CN110263731A/en
Application granted granted Critical
Publication of CN110263731B publication Critical patent/CN110263731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of single step face detection systems.The present invention proposes that separating the real-time face that convolution is constituted by depth detects network YOMO, the Fusion Features structure containing multiple forms from top to bottom, and each detection module is only responsible for detecting the Face datection in corresponding range scale.The present invention enables the sample training that each detection module is more sufficient by quantity using the random cropping strategy of multiple scale detecting structure is more met.Oval recurrence device proposed by the present invention, can improve the detection recall rate under ContROC evaluation criteria by a relatively large margin.The detection accuracy of YOMO model proposed by the present invention, while keeping stronger competitiveness, the detection rates to the picture of 544 × 544 resolution ratio are 51FPS, and the EMS memory occupation of model only has 21M.

Description

A kind of single step face detection system
Technical field
The present invention relates to human face detection tech fields, and in particular to a kind of single step face detection system.
Background technique
Face datection is the key components of smart city focusing on people, is related to identification, personalized clothes The technologies such as business, pedestrian detection tracking, crowd's counting.Although having obtained extensive research, since there are various challenges, scene is unrestricted Face datection be still one and open study a question.
The Face datection of early stage is primarily upon and manually designs effective feature, and establishes efficient classifier with this.But The detection model of suboptimum is generally yielded, and with the variation of application scenarios, detecting accuracy be might have by a relatively large margin It reduces.In recent years, the Successful utilization that depth learning technology is attracted people's attention in Face datection task, but generate an application Unrestricted in scene, the real-time face detection model with higher accuracy still has biggish challenge.
Faster R-CNN using area proposed algorithm substitutes sliding window, and by candidate frame generation, feature extraction, frame It returns and classification is all integrated into a network, be detection rates and the highest model of accuracy in R-CNN series model.But due to Recommendation network generates more face candidate frame, and biggish computing cost brought by complicated network structure, can not do It is detected to real-time face.
Another kind of method for detecting human face, such as YOLO, the problem of will test is converted into regression problem, therefore does not include and recommend net Network returns face frame directly in the characteristic pattern of feature extraction network, has faster detection rates, but detection accuracy has Wait improve.For improve detect accuracy, SSD utilize positioned at different layers Analysis On Multi-scale Features figure, the classification of associated prediction frame and Position.Multilayer feature prediction helps to detect the face of different scale, but each stage therein without specialized training, with Handle the face of particular dimensions range.That is, the face of all scales can produce in each detection module in training Raw loss.In contrast, each detection module of YOMO is only trained by the face in suitable range scale.
For the small scale Face datection problem of single step detection method, HR utilizes image pyramid, the multiple separation of training Single scale detector, each detector are responsible for the face of particular dimensions.But in test phase, picture need to be zoomed to multiple rulers The picture of degree, each scale will pass through very deep network, and the expense of this multistep single scale detector computationally is very high.
And single step multiple scale detecting device, such as S3FD, face is detected using the Analysis On Multi-scale Features of depth convolutional network, is being tested Stage only needs single to transmit picture to network.But there are still the problems same as SSD by S3FD, i.e., by the spy of each different scale Sign figure is individually used for predicting, when predicting small scale face using bottom-layer network, due to lacking semantic feature, causes S3FD to small ruler The detection effect for spending face is still undesirable.
Summary of the invention
For above-mentioned deficiency in the prior art, a kind of single step face detection system provided by the invention solves face inspection The undesirable problem of examining system detection effect.
In order to achieve the above object of the invention, the technical solution adopted by the present invention are as follows: a kind of single step face detection system, including Sequentially connected conventional convolution module conv0, depth separate convolution module conv1 from left to right, depth separates convolution mould Block conv2, depth separate convolution module conv3, depth separates convolution module conv4, depth separates convolution module Conv5, depth separate convolution module conv6, depth separates convolution module conv7, depth separates convolution module Conv8, depth separate convolution module conv9, depth separates convolution module conv10, depth separates convolution module Conv11, depth separate convolution module conv12, depth separates convolution module conv13, depth separates convolution module Conv14, warp lamination conv15, depth separate convolution module conv16, depth separates convolution module conv17, warp Lamination conv18, depth separate convolution module conv19 and depth separates convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, and the depth can divide Output end from convolution module conv17 is connect with detection module det-16, and the depth separates the defeated of convolution module conv20 Outlet is connect with detection module det-8;
The depth separates the input of the output end and the separable convolution module conv16 of depth of convolution module conv11 End connection, the depth separate convolution module conv16 output end and warp lamination conv15 output end Fusion Features simultaneously The input terminal that depth separates convolution module conv17 is connected, the depth separates the output end and depth of convolution module conv5 The input terminal connection of separable convolution module conv19 is spent, the depth separates the output end and warp of convolution module conv19 The output end Fusion Features of lamination conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
Further: the conventional convolution module conv0 include from top to bottom sequentially connected 3 × 3 convolutional layer, BatchNorm layers and LeakyReLU active coating.
Further:
The input picture of the conventional convolution module conv0 selects crop box by the random clipping algorithm of medium-soft SelectCropbboxIt is cut and is trained, specific steps are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, after original image is cut Obtain cut picture, according to the input figure size of network require scaling cut picture, and by equal proportion scaling crop box in have The true frame of effect, the quantity of each scale face, statistical formula are counted according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3, Respectively small scale face, mesoscale face and large scale face, M are the sum of crop box, and 1 () was identifier, and condition is True duration is 1, is otherwise 0, MinScalecAnd MaxScalecRespectively the boundary minimum value of c class face scale and boundary be most Big value, bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and according to it by face scale classification ascending order Arrangement are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCutting Frame, random selection one meet the crop box of condition as SelectCropbbox
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale In classification sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition SelectCropbbox
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne sanction of middle random selection Frame is cut as SelectCropbbox
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to people all kinds of when hands-on The quantity of face scaleIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, selected by s expression The crop box serial number selected.
Further: the depth separates convolution module conv1, depth separates convolution module conv2, depth can divide Convolution module conv4 is separated from convolution module conv3, depth, depth separates the separable volume of convolution module conv5, depth Volume module conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution mould Block conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module Conv12, depth separate convolution module conv13, depth separates convolution module conv14, depth separates convolution module Conv16, depth separate convolution module conv17, depth separates convolution module conv19 and depth separates convolution module The structure of conv20 is identical, including sequentially connected 3 × 3 convolutional layer from top to bottom, BatchNorm layers, LeakyReLU activation Layer, 1 × 1 convolutional layer, BatchNorm layers and LeakyReLU active coating.
Further: the depth separates convolution module conv14, depth separates convolution module conv17 and depth The output channel number of separable convolution module conv20 is 1024.
Further: the detection module det-32 is used for large scale Face datection, and the detection module det-16 is used for Mesoscale Face datection, the detection module det-8 are used for small scale Face datection.
Further: the detection module det-32, detection module det-16 and detection module det-8 include regular volume Lamination and output layer;
The output channel quantity of the regular volume lamination is 18;
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyRespectively For the offset of prediction block central point abscissa and ordinate, (Cx,Cy) top left co-ordinate of grid, σ () where Anchor For sigmoid function, pwAnd phThe respectively width of Anchor and height.
Further: the output end of the detection module det-32, detection module det-16 and detection module det-8 connect Oval recurrence device is connect, output layer prediction block is converted oval true frame, the meter of the oval really frame by the oval recurrence device Calculate formula are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, the horizontal seat of central point Mark cxWith ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bw With high bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is true frame Normalized coordinates vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame The mean value and standard deviation of coordinate vector Y.
The invention has the benefit that
1. the present invention proposes that separating the real-time face that convolution constitutes by depth detects network YOMO, containing it is multiple from upper and The Fusion Features structure of lower form, each detection module are only responsible for detecting the Face datection in corresponding range scale.
2. the present invention enables each detection module to be counted using the random cropping strategy for more meeting multiple scale detecting structure Measure more sufficient sample training.
3. oval recurrence device proposed by the present invention, can improve the detection recall rate under ContROC evaluation criteria by a relatively large margin.
4. the detection accuracy of YOMO model proposed by the present invention, while keeping stronger competitiveness, to 544 × 544 The detection rates of the picture of resolution ratio are 51FPS.
Detailed description of the invention
Fig. 1 is structure of the invention figure;
Fig. 2 is assessment result of the present invention in FDDB data set;
Fig. 3 is visualization result figure of the present invention in WIDER FACE data set and FDDB data set.
Specific embodiment
A specific embodiment of the invention is described below, in order to facilitate understanding by those skilled in the art this hair It is bright, it should be apparent that the present invention is not limited to the ranges of specific embodiment, for those skilled in the art, As long as various change is in the spirit and scope of the present invention that the attached claims limit and determine, these variations are aobvious and easy See, all are using the innovation and creation of present inventive concept in the column of protection.
As shown in Figure 1, a kind of single step face detection system, including sequentially connected conventional convolution module from left to right Conv0, depth separate convolution module conv1, depth separates convolution module conv2, depth separates convolution module Conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module Conv12, depth separate convolution module conv13, depth separates convolution module conv14, warp lamination conv15, depth Separable convolution module conv16, depth separate convolution module conv17, warp lamination conv18, depth and separate convolution mould Block conv19 and depth separate convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, and the depth can divide Output end from convolution module conv17 is connect with detection module det-16, and the depth separates the defeated of convolution module conv20 Outlet is connect with detection module det-8;
The depth separates the input of the output end and the separable convolution module conv16 of depth of convolution module conv11 End connection, the depth separate convolution module conv16 output end and warp lamination conv15 output end Fusion Features simultaneously The input terminal that depth separates convolution module conv17 is connected, the depth separates the output end and depth of convolution module conv5 The input terminal connection of separable convolution module conv19 is spent, the depth separates the output end and warp of convolution module conv19 The output end Fusion Features of lamination conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
The output characteristic pattern of conv14, conv17, conv20 compare original image, and down-sampling step-length is respectively 32,16,8.Institute Detection module det-32 is stated for large scale Face datection, the detection module det-16 is used for mesoscale Face datection, described Detection module det-8 is used for small scale Face datection, and the face range scale that detection module is responsible for is as shown in table 1.
The face range scale that 1 detection module of table is responsible for
Scale classification Det-8 (small scale face) Det-16 (mesoscale face) Det-32 (large scale face)
Minimum M inScale 10 40 100
Maximum value MaxScale 39 99 350
The present invention is set as the RMSProp gradient optimal method training network of table 2 using parameter.Place 3 detection modules On the layer of different step-lengths, to enhance the multiple scale detecting ability of model.In training, the loss function of each detection module is Multitask loss function comprising 5 parts.To make each detection module only be responsible for the face in corresponding range scale, returned in gradient When biography, detection branches belonging to the maximum anchor of IoU of search and true frame, the only anchor will generate frame and return damage It loses.To keep training more effective, each true frame will match one and the highest anchor of its IoU.
Table 2 trains file parameters allocation list
base_lr step_value gamma batch_size iter_size type weight_decay max_iter
0.001 40000 0.1 9 3 RMSProp 0.00005 200000
The multitask loss function of YOMO includes 5 parts, respectively non-targeted loss, the loss of anchor pre-training, mesh Target positioning loss, the confidence level loss of target, the classification loss of target, as shown in formula (3).
Wherein W, H are respectively the width and height of characteristic pattern, and A is the quantity of Anchor, and t is the number of iterations.1 (x) indicates to differentiate Symbol, when x is true, value 1, otherwise its value is 0.λnoobj, λprior, λcoord, λobj, λclassFor the weighted value of each point of task, It is non-target loss weight, Anchor pre-training loss weight, coordinate loss weight, target loss weight, classification loss respectively Weight.brFor 4 coordinate shift values of neural network forecast, and priorrIt is that frame central point is horizontal respectively for 4 coordinates of Anchor Coordinate x, ordinate y, border width w, bezel height h.When the IoU of prediction block and all true frames is both less than or equal to threshold value When Thresh, then the region of input figure corresponding to the prediction block is non-targeted, i.e. background, and the predicted value of confidence level is bo。 In order to make network adapt to Anchor as soon as possible, Anchor pre-training loss weight is introduced early period in training.1 is defined in YOMO model A epoch is training early period.
The conventional convolution module conv0 include from top to bottom sequentially connected 3 × 3 convolutional layer, BatchNorm layers and LeakyReLU active coating.
The input picture of conventional convolution module conv0 selects crop box by the random clipping algorithm of medium-soft SelectCropbboxIt is cut and is trained, specific steps are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, after original image is cut Obtain cut picture, according to the input figure size of network require scaling cut picture, and by equal proportion scaling crop box in have The true frame of effect, the quantity of each scale face, statistical formula are counted according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3, Respectively small scale face, mesoscale face and large scale face, M are the sum of crop box, and 1 () was identifier, and condition is True duration is 1, is otherwise 0, MinScalecAnd MaxScalecRespectively the boundary minimum value of c class face scale and boundary be most Big value, bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and according to it by face scale classification ascending order Arrangement are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCutting Frame, random selection one meet the crop box of condition as SelectCropbbox
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale In classification sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition SelectCropbbox
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne sanction of middle random selection Frame is cut as SelectCropbbox
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to people all kinds of when hands-on The quantity of face scaleIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, selected by s expression The crop box serial number selected.
The depth separates convolution module conv1, depth separates convolution module conv2, depth separates convolution mould Block conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module Conv12, depth separate convolution module conv13, depth separates convolution module conv14, depth separates convolution module Conv16, depth separate convolution module conv17, depth separates convolution module conv19 and depth separates convolution module The structure of conv20 is identical, including sequentially connected 3 × 3 convolutional layer from top to bottom, BatchNorm layers, LeakyReLU activation Layer, 1 × 1 convolutional layer, BatchNorm layers and LeakyReLU active coating.
The depth separates convolution module conv14, depth separates convolution module conv17 and depth separates convolution The output channel number of module conv20 is 1024.
The detection module det-32, detection module det-16 and detection module det-8 include regular volume lamination and defeated Layer out;
The calculation formula of the output channel quantity of the regular volume lamination are as follows:
numoutput=(numcoordinate+numconfidence+numclasses)×numAnchors
Wherein coordinate, confidence, classes, Anchors respectively indicate frame coordinate points, confidence level, class Other and anchor.When Anchor number is more, the detection accuracy of network is preferable, but trained and test speed will reduce.Consider There are 3 detection modules to be responsible for the face of 3 kinds of scales into YOMO, in order to balance speed and precision, numAnchors=3.Therefore it examines The output channel number for the regular volume lamination surveyed in module is all 18.
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyRespectively For the offset of prediction block central point abscissa and ordinate, (Cx,Cy) top left co-ordinate of grid, σ () where Anchor For sigmoid function, pwAnd phThe respectively width of Anchor and height.
The output end of the detection module det-32, detection module det-16 and detection module det-8 are all connected with oval return Return device, output layer prediction block is converted oval true frame, the calculation formula of the oval true frame by the oval recurrence device are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, the horizontal seat of central point Mark cxWith ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bw With high bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is true frame Normalized coordinates vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame The mean value and standard deviation of coordinate vector Y.
When training ellipse returns device, how to match prediction block and true frame is crucial.In practical operation, to every of FDDB The true frame of each of picture matches the highest prediction block of IoU therewith, only considers true frame and matched prediction block when training.
Experimental situation of the present invention is based on 64 Ubuntu 14.04LTS systems, and running memory 16GB, CPU are 8 cores IntelCore i7-7700K, monokaryon frequency are 4.20GHz.All models are based on Caffe frame, training, type in individual GPU Number be NVIDIA GeForce GTX 1080Ti.
The feature extraction network pre-training of YOMO model is in ImageNet, and the fine tuning Jing Guo 200K iteration.When training Other parameter settings it is as shown in table 2.The maximum anchor of IoU with true frame is positive example, and the anchor of IoU < 0.3 is recognized To be background.In view of detection rates and face range scale, each detection module includes 3 anchor, and numerical value is in training Cluster is concentrated to obtain.Each section weight is respectively λ in loss functionnoobj=1, λprior=1, λcoord=1, λobj=5, λclass= 1.The NMS threshold value of each detection module is set as 0.7 when training, and while testing is 0.45.The training picture of all models in the present invention It is scaled to 544 × 544 resolution ratio.
WIDERFACE is Face datection benchmark dataset, and picture is collected in internet, and background is more complex.Data set has altogether Comprising 32203 pictures, it is labelled with totally 393703 width face, the size of face, has blocked higher constant interval at posture. And 61 event classes are ranged, proportionally 40%, 10%, 50% training set, verifying collection and survey are splitted data into every class Examination collection.All models in the present invention are obtained in training concentration training.
Picture in FDDB data set is collected in Faces in the Wild data set, altogether includes 2845 pictures, 5171 Width face.It with certain difficulty, including blocks, difficult posture, low resolution and out of focus, further includes black and white and color image. Different from other face detection data collection, tab area is oval and non-rectangle.All models are in FDDB data set in the present invention Middle test.
In FDDB data integrated test, all pictures keep length-width ratio scaling, and are embedded in the black of 544 × 544 scales In background, to guarantee that picture will not deformation occurs.As shown in Fig. 2 (a) and 2 (b), by YOMO and MTCNN, ScaleFace, HR, HR-ER, ICC-CNN, FANet model, the result in DiscROC and ContROC compare respectively.
YOMO-Fit is testing result of the YOMO after oval recurrence device in Fig. 2.By FDDB assessment result it is found that YOMO-Fit is under DiscROC and ContROC evaluation criteria, and when erroneous detection number is fixed as 1000, recall rate is respectively 97.7% and 83.6%, it is only below FANet.And even if HR-ER uses FDDB as the training data of 10-fold cross validation, Recall rate in DiscROC is identical as YOMO, the recall rate ratio YOMO-Fit low 4.9% in ContROC.It is noticeable It is that ellipse returns device and makes recall rate of the YOMO at DiscROC and ContROC that 0.1% and 8.6% be respectively increased.
Fig. 3 (a), (b) are the visualization result that individual pictures of WIDER FACE and FDDB data set are tested respectively. Rectangular shaped rim is the prediction block of YOMO model in Fig. 3 (a).In Fig. 3 (b) rectangle and it is oval be respectively the prediction block of YOMO and true Frame.

Claims (8)

1. a kind of single step face detection system, which is characterized in that including sequentially connected conventional convolution module from left to right Conv0, depth separate convolution module conv1, depth separates convolution module conv2, depth separates convolution module Conv3, depth separate convolution module conv4, depth separates convolution module conv5, depth separates convolution module Conv6, depth separate convolution module conv7, depth separates convolution module conv8, depth separates convolution module Conv9, depth separate convolution module conv10, depth separates convolution module conv11, depth separates convolution module Conv12, depth separate convolution module conv13, depth separates convolution module conv14, warp lamination conv15, depth Separable convolution module conv16, depth separate convolution module conv17, warp lamination conv18, depth and separate convolution mould Block conv19 and depth separate convolution module conv20;
The output end that the depth separates convolution module conv14 is connect with detection module det-32, the separable volume of the depth The output end of volume module conv17 is connect with detection module det-16, and the depth separates the output end of convolution module conv20 It is connect with detection module det-8;
The input terminal that the depth separates the output end of convolution module conv11 and depth separates convolution module conv16 connects It connects, the depth separates output end and the output end Fusion Features of warp lamination conv15 of convolution module conv16 and connect Depth separates the input terminal of convolution module conv17, and the depth separates the output end of convolution module conv5 and depth can The input terminal connection of convolution module conv19 is separated, the depth separates the output end and warp lamination of convolution module conv19 The output end Fusion Features of conv18 simultaneously connect the input terminal that depth separates convolution module conv20.
2. single step face detection system according to claim 1, which is characterized in that the conventional convolution module conv0 packet Include sequentially connected 3 × 3 convolutional layer, BatchNorm layers and LeakyReLU active coating from top to bottom.
3. single step face detection system according to claim 1, which is characterized in that the conventional convolution module conv0's It inputs picture and crop box SelectCrop is selected by the random clipping algorithm of medium-softbboxIt is cut and is trained, it is specific to walk Suddenly are as follows:
S1, the crop box Sampled that several length-width ratios are 1 is generated by random clipping algorithmbboxes, obtained after original image is cut Picture is cut, requires scaling to cut picture according to the input figure size of network, and by effective in equal proportion scaling crop box True frame counts the quantity of each scale face, statistical formula according to face range scale are as follows:
In above formula, NumicFor the number of the c class face scale of i-th of crop box, N is the type of face scale, N=3, difference For small scale face, mesoscale face and large scale face, M is the sum of crop box, and 1 () was identifier, when condition is true Value is 1, is otherwise 0, MinScalecAnd MaxScalecThe respectively boundary minimum value and boundary maximum value of c class face scale, bboxkFor the side length of crop box, K is the total quantity of the crop box generated;
S2, face scale classification descending is arranged according to all kinds of face quantity of each crop box are as follows:
Si1≥Si2≥…≥SiN
In above formula, i is crop box serial number, SicFor one kind in i-th of crop box in N class face scale classification;
The quantity of all kinds of face scales when S3, statistics network hands-on, and arranged face scale classification ascending order according to it Are as follows:
A1≤A2≤…≤AN
In above formula, AcFor one kind in N class face scale classification;
S4, in crop box SampledbboxesIn M face scale classification sequence in, searching meet Sic=AcCrop box, with Machine selects the crop box for meeting condition as SelectCropbbox
S5, when the crop box for meeting step S4 is not found, in crop box SampledbboxesIn M face scale classification In sequence, searching meets Si1=A1And SiN=ANCrop box, the crop box conduct that random selection one meets condition SelectCropbbox
S6, when the crop box for meeting step S5 is not found, in crop box SampledbboxesOne crop box of middle random selection As SelectCropbbox
S7, by SelectCropbboxIn face scale of all categories quantity Numsc, update to face rulers all kinds of when hands-on The quantity of degreeIn, it may be assumed that
In above formula,For the quantity for all kinds of face scales that preceding primary training obtains, s indicates selected Crop box serial number.
4. single step face detection system according to claim 1, which is characterized in that the depth separates convolution module Conv1, depth separate convolution module conv2, depth separates convolution module conv3, depth separates convolution module Conv4, depth separate convolution module conv5, depth separates convolution module conv6, depth separates convolution module Conv7, depth separate convolution module conv8, depth separates convolution module conv9, depth separates convolution module Conv10, depth separate convolution module conv11, depth separates convolution module conv12, depth separates convolution module Conv13, depth separate convolution module conv14, depth separates convolution module conv16, depth separates convolution module Conv17, depth separate convolution module conv19 and depth separate convolution module conv20 structure it is identical, include from Sequentially connected 3 × 3 convolutional layer, BatchNorm layers, LeakyReLU active coating, 1 × 1 convolutional layer, BatchNorm under Layer and LeakyReLU active coating.
5. single step face detection system according to claim 1, which is characterized in that the depth separates convolution module Conv14, depth separate convolution module conv17 and the output channel number of the separable convolution module conv20 of depth is 1024。
6. single step face detection system according to claim 1, which is characterized in that the detection module det-32 is for big Scale Face datection, the detection module det-16 are used for mesoscale Face datection, and the detection module det-8 is used for small scale Face datection.
7. single step face detection system according to claim 1, which is characterized in that the detection module det-32, detection Module det-16 and detection module det-8 includes regular volume lamination and output layer;
The output channel quantity of the regular volume lamination is 18;
The centre coordinate of the output layer prediction block and the calculation formula of side length are as follows:
bx=σ (tx)+Cx,by=σ (ty)+Cy
In above formula, (bx,by) be prediction block centre coordinate, bwAnd bhThe respectively width and height of prediction block, txAnd tyIt is respectively pre- Survey the offset of frame central point abscissa and ordinate, (Cx,Cy) be grid where Anchor top left co-ordinate, σ () is Sigmoid function, pwAnd phThe respectively width of Anchor and height.
8. single step face detection system according to claim 7, which is characterized in that the detection module det-32, detection The output end of module det-16 and detection module det-8 are all connected with oval recurrence device, and the oval device that returns predicts output layer Frame is converted into oval true frame, the calculation formula of the oval true frame are as follows:
Y=XW+ ε
In above formula, Y is the coordinate vector of oval true frame, including major semiaxis ra, semi-minor axis rb, angle, θ, central point abscissa cx With ordinate cy, X is the coordinate vector of output layer prediction block, the centre coordinate b including prediction blockx、by, prediction block wide bwWith High bh, W is regression coefficient matrix, and ε is random error;
Wherein, the calculation formula of regression coefficient matrix W are as follows:
In above formula, J () indicates that mean square error function, X ' are the normalized coordinates vector of prediction block, and Y ' is the standard of true frame Change coordinate vector;
In above formula, UXAnd σXThe respectively mean value and standard deviation of the X of prediction block coordinate vector, UYAnd σYRespectively true frame coordinate to Measure the mean value and standard deviation of Y.
CN201910550738.8A 2019-06-24 2019-06-24 Single step human face detection system Active CN110263731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910550738.8A CN110263731B (en) 2019-06-24 2019-06-24 Single step human face detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910550738.8A CN110263731B (en) 2019-06-24 2019-06-24 Single step human face detection system

Publications (2)

Publication Number Publication Date
CN110263731A true CN110263731A (en) 2019-09-20
CN110263731B CN110263731B (en) 2021-03-16

Family

ID=67920979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910550738.8A Active CN110263731B (en) 2019-06-24 2019-06-24 Single step human face detection system

Country Status (1)

Country Link
CN (1) CN110263731B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807385A (en) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN111401292A (en) * 2020-03-25 2020-07-10 成都东方天呈智能科技有限公司 Face recognition network construction method fusing infrared image training
CN111489332A (en) * 2020-03-31 2020-08-04 成都数之联科技有限公司 Multi-scale IOF random cutting data enhancement method for target detection
CN112699826A (en) * 2021-01-05 2021-04-23 风变科技(深圳)有限公司 Face detection method and device, computer equipment and storage medium

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866833A (en) * 2015-05-29 2015-08-26 中国科学院上海高等研究院 Video stream face detection method and apparatus thereof
US9392257B2 (en) * 2011-11-28 2016-07-12 Sony Corporation Image processing device and method, recording medium, and program
CN106599797A (en) * 2016-11-24 2017-04-26 北京航空航天大学 Infrared face identification method based on local parallel nerve network
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN108182397A (en) * 2017-12-26 2018-06-19 王华锋 A kind of multiple dimensioned face verification method of multi-pose
CN108564030A (en) * 2018-04-12 2018-09-21 广州飒特红外股份有限公司 Classifier training method and apparatus towards vehicle-mounted thermal imaging pedestrian detection
CN108647649A (en) * 2018-05-14 2018-10-12 中国科学技术大学 The detection method of abnormal behaviour in a kind of video
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
WO2018213841A1 (en) * 2017-05-19 2018-11-22 Google Llc Multi-task multi-modal machine learning model
CN109101899A (en) * 2018-07-23 2018-12-28 北京飞搜科技有限公司 A kind of method for detecting human face and system based on convolutional neural networks
CN109272487A (en) * 2018-08-16 2019-01-25 北京此时此地信息科技有限公司 The quantity statistics method of crowd in a kind of public domain based on video
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN109598290A (en) * 2018-11-22 2019-04-09 上海交通大学 A kind of image small target detecting method combined based on hierarchical detection
WO2019079895A1 (en) * 2017-10-24 2019-05-02 Modiface Inc. System and method for image processing using deep neural networks
CN109711384A (en) * 2019-01-09 2019-05-03 江苏星云网格信息技术有限公司 A kind of face identification method based on depth convolutional neural networks
CN109753927A (en) * 2019-01-02 2019-05-14 腾讯科技(深圳)有限公司 A kind of method for detecting human face and device
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9392257B2 (en) * 2011-11-28 2016-07-12 Sony Corporation Image processing device and method, recording medium, and program
CN104866833A (en) * 2015-05-29 2015-08-26 中国科学院上海高等研究院 Video stream face detection method and apparatus thereof
CN106599797A (en) * 2016-11-24 2017-04-26 北京航空航天大学 Infrared face identification method based on local parallel nerve network
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
WO2018213841A1 (en) * 2017-05-19 2018-11-22 Google Llc Multi-task multi-modal machine learning model
WO2019079895A1 (en) * 2017-10-24 2019-05-02 Modiface Inc. System and method for image processing using deep neural networks
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
CN108182397A (en) * 2017-12-26 2018-06-19 王华锋 A kind of multiple dimensioned face verification method of multi-pose
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
CN108564030A (en) * 2018-04-12 2018-09-21 广州飒特红外股份有限公司 Classifier training method and apparatus towards vehicle-mounted thermal imaging pedestrian detection
CN108647649A (en) * 2018-05-14 2018-10-12 中国科学技术大学 The detection method of abnormal behaviour in a kind of video
CN109101899A (en) * 2018-07-23 2018-12-28 北京飞搜科技有限公司 A kind of method for detecting human face and system based on convolutional neural networks
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN109272487A (en) * 2018-08-16 2019-01-25 北京此时此地信息科技有限公司 The quantity statistics method of crowd in a kind of public domain based on video
CN109598290A (en) * 2018-11-22 2019-04-09 上海交通大学 A kind of image small target detecting method combined based on hierarchical detection
CN109753927A (en) * 2019-01-02 2019-05-14 腾讯科技(深圳)有限公司 A kind of method for detecting human face and device
CN109711384A (en) * 2019-01-09 2019-05-03 江苏星云网格信息技术有限公司 A kind of face identification method based on depth convolutional neural networks
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BARRET ZOPH ET.AL.: "Learning Transferable Architectures for Scalable Image Recognition", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
林鹏: "基于Adaboost算法的人脸检测研究及实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807385A (en) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN110807385B (en) * 2019-10-24 2024-01-12 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN111401292A (en) * 2020-03-25 2020-07-10 成都东方天呈智能科技有限公司 Face recognition network construction method fusing infrared image training
CN111489332A (en) * 2020-03-31 2020-08-04 成都数之联科技有限公司 Multi-scale IOF random cutting data enhancement method for target detection
CN112699826A (en) * 2021-01-05 2021-04-23 风变科技(深圳)有限公司 Face detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110263731B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN110263774B (en) A kind of method for detecting human face
JP6830707B1 (en) Person re-identification method that combines random batch mask and multi-scale expression learning
CN110263731A (en) A kind of single step face detection system
CN107145908B (en) A kind of small target detecting method based on R-FCN
WO2018219016A1 (en) Facial detection training method, apparatus and electronic device
CN110188720A (en) A kind of object detection method and system based on convolutional neural networks
Ghamisi et al. A novel feature selection approach based on FODPSO and SVM
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN110458165B (en) Natural scene text detection method introducing attention mechanism
CN109101930A (en) A kind of people counting method and system
CN108171233A (en) Use the method and apparatus of the object detection of the deep learning model based on region
CN109117876A (en) A kind of dense small target deteection model building method, model and detection method
CN109271870A (en) Pedestrian recognition methods, device, computer equipment and storage medium again
CN108960404B (en) Image-based crowd counting method and device
CN103164687B (en) A kind of method and system of pornographic image detecting
CN105894047A (en) Human face classification system based on three-dimensional data
CN105354595A (en) Robust visual image classification method and system
CN112949572A (en) Slim-YOLOv 3-based mask wearing condition detection method
CN109558902A (en) A kind of fast target detection method
CN109815979A (en) A kind of weak label semantic segmentation nominal data generation method and system
CN109785298A (en) A kind of multi-angle object detecting method and system
CN110826379A (en) Target detection method based on feature multiplexing and YOLOv3
CN105975925A (en) Partially-occluded pedestrian detection method based on joint detection model
CN109598220A (en) A kind of demographic method based on the polynary multiple dimensioned convolution of input
CN107590427A (en) Monitor video accident detection method based on space-time interest points noise reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant