CN112949499A - Improved MTCNN face detection method based on ShuffleNet - Google Patents

Improved MTCNN face detection method based on ShuffleNet Download PDF

Info

Publication number
CN112949499A
CN112949499A CN202110242262.9A CN202110242262A CN112949499A CN 112949499 A CN112949499 A CN 112949499A CN 202110242262 A CN202110242262 A CN 202110242262A CN 112949499 A CN112949499 A CN 112949499A
Authority
CN
China
Prior art keywords
net
convolution
shufflenet
point
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110242262.9A
Other languages
Chinese (zh)
Inventor
徐成
秦振
刘宏哲
徐冰心
潘卫国
代松银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Union University
Original Assignee
Beijing Union University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Union University filed Critical Beijing Union University
Priority to CN202110242262.9A priority Critical patent/CN112949499A/en
Publication of CN112949499A publication Critical patent/CN112949499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses an MTCNN face detection method based on ShuffleNet improvement, which comprises the following steps: firstly, the image is transformed in different scales to construct an image pyramid so as to adapt to the detection of the human faces in different sizes. Generating a face region Bounding Boxes by an original picture in a first stage through P-Net; the second stage R-Net takes the original picture and the Bounding Boxes generated by the first stage P-Net as input to generate corrected more accurate Bounding Boxes; and in the third stage, taking the original picture and Bounding Boxes output by the R-Net as input of the O-Net to generate final face areas Bounding Boxes. And the model is improved by adopting the channel shuffling idea and the point-by-point hierarchical convolution technology in the Shufflenet. The model is based on MTCNN, and a channel shuffling thought is adopted to improve the model during convolution operation, so that a network can quickly and accurately detect the human face.

Description

Improved MTCNN face detection method based on ShuffleNet
Technical Field
The invention relates to the field of deep learning target detection, in particular to an MTCNN face detection method improved based on ShuffleNet.
Background
With the rapid increase of the quantity of motor vehicles, great convenience is brought to the life and the travel of people, but road traffic accidents caused by the great convenience cause great loss to the lives and properties and national economy of people in various countries every year, and fatigue driving is one of the important causes and main causes of traffic accidents. If the face fatigue can be efficiently recognized, the fatigue driving phenomenon of the driver can be effectively prevented and reminded by detecting the real-time facial expression state of the driver, so that the possibility of traffic accidents is reduced, and the system has potential economic value and wide application prospect.
The existing driver fatigue detection method has a series of problems: the detection method based on the physiological parameters requires a driver to wear invasive experimental equipment, which not only affects comfort, but also causes interference to the driver in an actual driving state. The driver fatigue detection method based on the driver operation behavior is greatly influenced by individual difference factors such as driving habits, driving proficiency and the like, and has the problems of poor robustness, low detection precision and the like. The fatigue detection method based on the vehicle operation parameters has requirements on the driving environment, and the detection robustness on the unstructured road is poor. The fatigue detection method based on the facial behaviors has the advantages of non-invasiveness, low cost, good instantaneity and the like, but is greatly influenced by the driving environment and individual difference.
In recent years, deep learning technology is continuously developed and makes a great breakthrough, and target features are automatically extracted through a convolutional neural network. The strong feature extraction capability of the convolutional neural network is benefited, the detection accuracy of the face detection algorithm is greatly improved, the robustness is stronger, and the face detection algorithm can adapt to more complex recognition scenes.
The AlexNet proposal in 2012 pulled the development of deep learning, and the VGGNet proposal in 2014 made the implementation of deep neural network possible, but the gradient vanishing problem occurred while the network deepened. ResNet in 2015 solves the problems by a residual connection method, reduces model convergence time, and makes the network deeper and difficult to have gradient disappearance.
The Multi-task Cascaded Convolutional neural network (MTCNN) is a Cascaded structure model based on a Coarse-to-fine (Coarse-to-fine) idea and simultaneously realizing face detection and face key point detection, and is a detector widely applied in the field of face detection at present. The internal relevance of the face detection and the face key point detection is utilized to improve the detection performance of the face detection and the face key point detection, the face detection and the face key point detection are few detectors which can fall on the ground on traditional hardware, and the face detection task has higher detection precision. Because only 5 calibrated face key points are output by the MTCNN, and more key points are needed for accurately positioning face parts (such as eyes, mouths and the like) and calculating fatigue characteristics of the face parts in driving fatigue detection, the invention only utilizes the face detection function of the MTCNN. The entire cascade structure includes three CNN models: P-Net (Proposal network), R-Net (RefinementNetwork) and O-Net (output network). Wherein, P-Net is a full Convolutional neural Network (FCN) [5] for rapidly generating a series of face candidate windows; R-Net is used for filtering most of non-face candidate windows generated by P-Net and further correcting the coordinate positions of Bounding boxes (Bounding boxes) of the candidate windows which are possibly faces; O-Net and R-Net have similar functions, except that O-Net has more feature inputs and a complex network structure, has better performance, and generates a final face window and the positions of face key points.
The MTCNN model has the following bottlenecks in detection speed: the larger the resolution of the input image of the 1 st stage P-Net is, the more time is consumed; the more faces in the image the longer the time spent in the 2 nd and 3 rd stages O-Net and R-Net.
Aiming at the problems, the hybrid channel of ShuffleNet is added to the MTCNN model to be convolved with the point-by-point group, so that the face detection precision is ensured, and the detection speed is increased.
Disclosure of Invention
In order to solve the above problems, an embodiment of the present invention provides an MTCNN face detection method improved based on shuffle net, aiming to improve the face detection accuracy and speed, and including the following formation steps:
firstly, transforming images in different scales to construct an image pyramid;
inputting all pictures of the image pyramid into P-Net, performing convolution for three times, pooling for one time, performing channel shuffling for two times, and outputting a large number of bounding box coordinates;
thirdly, cutting out a picture from the original picture according to the Boundingbox coordinate, wherein resize is 24 x 24;
step four, inputting the picture with the size of 24 × 24 into R-Net, performing three times of conventional convolution, performing pooling twice, performing channel shuffling twice, performing point-by-point group convolution once, and outputting corrected more accurate Bounding box coordinates;
fifthly, cutting out a picture from the original picture according to the Boundingbox coordinate, wherein resize is 48 x 48;
step six, inputting the picture with the size of 24 × 24 into O-Net, performing four times of conventional convolution, three times of pooling and three times of channel shuffling, performing one time of point-by-point group convolution, and outputting an accurate Bounding box coordinate;
preferably, when convolution operation is performed in the second step, the fourth step and the sixth step, a channel shuffling thought improved model is adopted, and the characteristic channels are evenly distributed in different groups, so that the characteristics of each group can obtain the information of other groups when convolution operation is performed, and the relevance among characteristic graphs of different channels is enhanced. The strategy not only reduces the calculation amount as much as possible, but also can ensure the detection precision of the model. (ii) a
Preferably, the point-by-point group convolution is performed in the fourth step and the sixth step. The point-by-point Group Convolution is a combined application of Group Convolution (Group Convolution) and point-by-point Convolution (point Convolution), the Group Convolution has the effect of reducing the parameter number, and the Group Convolution can be regarded as Structured Sparse (Structured Sparse), and the formula of the point-by-point Convolution is as follows:
Figure BDA0002962638490000031
where k is all children supported by the coreIterating over the field, piIs the coordinate of the ith point, |. represents the number of points in the subdomain, ωkIs the kernel weight of the kth subfield, l-1 and l are the ordinal numbers of the input and output layers.
Compared with the prior art, the embodiment of the invention is improved on the basis of the MTCNN network, and the hybrid channel added with ShuffleNet is convolved with the point-by-point group, thereby improving the detection speed while ensuring the face detection precision.
Drawings
FIG. 1 is a flow chart of reasoning of the forming steps of an improved MTCNN face detection method based on ShuffleNet of the present invention;
FIG. 2 is a schematic diagram of an improved P-Net method for MTCNN face detection based on ShuffleNet;
FIG. 3 is a schematic diagram of an improved R-Net method for MTCNN face detection based on ShuffleNet improvement according to the present invention;
FIG. 4 is a schematic diagram of an improved O-Net method for MTCNN face detection based on ShuffleNet;
FIG. 5 is a schematic diagram of a channel shuffling technique adopted in an improved MTCNN face detection method based on ShuffleNet;
FIG. 6 is a schematic diagram of a packet convolution adopted by an improved MTCNN face detection method based on ShuffleNet according to the present invention;
FIG. 7 is a schematic diagram of a point-by-point convolution adopted by the improved MTCNN face detection method based on ShuffleNet of the present invention;
FIG. 8 is a diagram of the detection effect of the improved MTCNN face detection method based on ShuffleNet of the present invention.
Detailed Description
The model scheme in the embodiment of the present invention will be fully described in the following with reference to the accompanying drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not a whole embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides an improved MTCNN face detection method based on ShuffleNet, and the example of the present invention includes the following steps:
firstly, transforming images in different scales to construct an image pyramid; firstly, continuously carrying out Resize on the picture to obtain a picture pyramid. And (3) resize is carried out on the picture according to a resize _ factor (for example, 0.70, the size is determined according to the face size distribution of the data set, and basically, the size is determined to be more appropriate between 0.70 and 0.80, the set size is larger, the inference time is easy to prolong, and small and medium-sized faces are easy to miss) until the size is equal to 12 x 12 required by the P-net. Thus, you get the original image, the original _ size _ factor, the original image, the original _ size _ factor 2, and the original image, the original _ size _ factor. Note that these images are all to be entered into Pnet one by one to get candidates.
Step two, the picture pyramid inputs Pnet, as shown in fig. 2, to obtain a large number of candidates (candidate). And inputting all pictures into the P-net according to the picture pyramid obtained in the step, and obtaining an output map with the shape of (m, n, 16). According to the classification score, screening a majority of candidates, calibrating the bbox according to the obtained 4 offsets to obtain coordinates of the upper left part and the lower right part of the bbox (correcting the embedded pits according to the offsets to describe candidates in the training stage), and then performing non-maximum suppression (NMS) on the candidates according to the IOU value to screen a majority of candidates. In detail, the tensor of (num _ left,4), that is, the absolute coordinates of upper left and lower right of num _ left bbox, is obtained according to the classification score from large to small. Each time iou is found with the bbox coordinate and the remaining coordinates of the maximum score value in the queue, the box where iou is greater than 0.6 (the threshold is set in advance) is drained and this maximum score value is moved to the final result. Repeating the operation will dry out many bboxs with a large amount of overlap, and finally obtain (num _ left _ after _ nms,16) candidates, wherein a channel shuffling step is added after convolution, and the specific principle is as shown in fig. 5;
thirdly, cutting out a picture from the original picture according to the Boundingbox coordinate, wherein resize is 24 x 24;
and step four, according to the coordinates output by the P-net, cutting out a picture from the original image (the cut picture has details which are squares requiring the maximum side length of the bbox to be cut, which is to ensure that no deformation is generated and more details around the face frame are reserved when resize is generated), wherein the resize is 24 x 24, inputting the resize into the R-net, and performing fine adjustment as shown in the figure 3. The R-net will still output two classes of one-hot2 outputs, bbox coordinate offset 4 outputs, landmark10 outputs, and after the offset adjustment of bbox in the screenshot is performed (the simple point is to perform up-down and left-right adjustment of the x, y coordinates at the upper left and lower right) according to the two-class score, repeat the P-net again to said IOU NMS to dry most candidates. Finally, the P-net outputs num _ left _ after _ Rnet,16, and a point-by-point layer convolution step is added to accelerate the detection speed, wherein the specific principle is shown in FIG. 6 and FIG. 7;
fifthly, cutting out an image from the original image according to the coordinates of the bbox and inputting the image into O-net, and also performing square cutting according to the maximum side length to avoid deformation and keep more details;
and step six, inputting the pictures subjected to the drying of a plurality of candidates by the R-net into the O-net, and outputting accurate bbox coordinates and landmark coordinates as shown in figure 4. The process of P-net can be repeated in general, but with the difference that at this time we also output the coordinates of landmark in addition to the coordinates of bbox. (therefore, there is also the output of the landmark coordinate, and it is mainly hoped that the landmark coordinate can be combined to make the bbox more accurate, in other words, the P-net and R-net in the inference stage can completely output no landmark, O-net output).
Through tests, compared with the traditional MTCNN network, the embodiment of the invention has a good detection effect on WIDERFACE data sets, the average precision reaches 90.3%, the average speed reaches 232FPS, and compared with 25FPS of MTCNN, the requirement of real-time detection can be met.
In summary, in the embodiment of the present invention, based on the MTCNN model, the hybrid channel added with ShuffleNet is convolved with the point-by-point group, so that the face detection accuracy is ensured, and the detection speed is increased.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. An MTCNN face detection method based on ShuffleNet improvement is characterized in that: the method for improving the model by using the channel shuffling method in the ShuffleNet comprises the following forming steps:
firstly, transforming images in different scales to construct an image pyramid;
inputting all pictures of the image pyramid into P-Net, performing convolution for three times, pooling for one time, performing channel shuffling for two times, and outputting a large number of Bounding box coordinates;
thirdly, cutting out a picture from the original picture according to the Bounding box coordinate, wherein resize is 24 x 24;
step four, inputting the picture with the size of 24 × 24 into R-Net, performing three times of conventional convolution, performing pooling twice, performing channel shuffling twice, performing point-by-point group convolution once, and outputting corrected more accurate Bounding box coordinates;
fifthly, cutting out a picture from the original picture according to the Bounding box coordinate, wherein resize is 48 x 48;
and step six, inputting the picture with the size of 24 × 24 into O-Net, performing four times of conventional convolution, three times of pooling, three times of channel shuffling, performing one time of point-by-point group convolution, and outputting an accurate Bounding box coordinate.
2. The ShuffleNet-based modified MTCNN face detection method as claimed in claim 1, wherein an image pyramid is used in the first step to solve the multi-scale problem, i.e. the original image is scaled multiple times according to a certain factor to obtain a multi-scale image.
3. The ShuffleNet-based modified MTCNN face detection method as claimed in claim 1, wherein in step two, the P-NET is a full convolution network, and the convolution, pooling and nonlinear activation are all operations that accept arbitrary scale matrix.
4. The ShuffleNet-based improved MTCNN face detection method as claimed in claim 1, wherein a channel shuffling thought improvement model is adopted during convolution operation in the second, fourth and sixth steps, the feature channels are evenly distributed in different groups, and the features of each group can obtain information of other groups during convolution operation, so as to enhance the correlation between feature maps of different channels.
5. The ShuffleNet-based modified MTCNN face detection method as claimed in claim 1, wherein in the fourth and sixth steps, a point-by-point group convolution is performed; the point-by-point group convolution is a combined application of group convolution and point-by-point convolution, the group convolution has the effect of reducing the parameter quantity, and the group convolution is regarded as structured sparseness.
6. The ShuffleNet-based improved MTCNN face detection method as claimed in claim 1, wherein the improved MTCNN still utilizes three tasks to train the model, respectively: the method comprises the following steps of face classification, face bounding box regression and face key point regression, and the main task is the face bounding box regression.
7. A ShuffleNet-based improved MTCNN face detection method as claimed in claim 1, wherein for each candidate window, its offset from the nearest artificially labeled bounding box is predicted; the learning objective is formulated as a regression problem, using euclidean distances for each sample to calculate the loss:
Figure FDA0002962638480000021
Figure FDA0002962638480000022
is the value of the output of the network regression,
Figure FDA0002962638480000023
and the position coordinates which are actually marked for the human face comprise the coordinates, the height and the width of the upper left corner of the human face frame in the original image.
8. The ShuffleNet-based modified MTCNN face detection method as claimed in claim 1, wherein the multi-source training is directly implemented using a sample type indicator, and the whole learning objective function is formulated;
Figure FDA0002962638480000024
wherein N is the number of samples in the whole training data set; alpha is alphajRepresents the importance degree of the learning task, and alpha is used for training the networks P-Net and R-NetjIs set as (alpha)det=1,αbox=0.5,αlandmark0.5) and when training O-Net, α is used to make the keypoint location more accuratejIs set as (alpha)det=1,αbox=0.5,αlandmark=1);
Figure FDA0002962638480000025
Is a sample type indicator; and the random gradient descent SGD is used in the network training process to optimize the CNN network parameters of each stage.
CN202110242262.9A 2021-03-04 2021-03-04 Improved MTCNN face detection method based on ShuffleNet Pending CN112949499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242262.9A CN112949499A (en) 2021-03-04 2021-03-04 Improved MTCNN face detection method based on ShuffleNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242262.9A CN112949499A (en) 2021-03-04 2021-03-04 Improved MTCNN face detection method based on ShuffleNet

Publications (1)

Publication Number Publication Date
CN112949499A true CN112949499A (en) 2021-06-11

Family

ID=76247781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242262.9A Pending CN112949499A (en) 2021-03-04 2021-03-04 Improved MTCNN face detection method based on ShuffleNet

Country Status (1)

Country Link
CN (1) CN112949499A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619319A (en) * 2019-09-27 2019-12-27 北京紫睛科技有限公司 Improved MTCNN model-based face detection method and system
CN110705357A (en) * 2019-09-02 2020-01-17 深圳中兴网信科技有限公司 Face recognition method and face recognition device
WO2020020472A1 (en) * 2018-07-24 2020-01-30 Fundación Centro Tecnoloxico De Telecomunicacións De Galicia A computer-implemented method and system for detecting small objects on an image using convolutional neural networks
CN111161543A (en) * 2019-11-14 2020-05-15 南京行者易智能交通科技有限公司 Automatic snapshot method and system for bus front violation behavior based on image recognition
CN111401257A (en) * 2020-03-17 2020-07-10 天津理工大学 Non-constraint condition face recognition method based on cosine loss
CN112069993A (en) * 2020-09-04 2020-12-11 西安西图之光智能科技有限公司 Dense face detection method and system based on facial features mask constraint and storage medium
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020020472A1 (en) * 2018-07-24 2020-01-30 Fundación Centro Tecnoloxico De Telecomunicacións De Galicia A computer-implemented method and system for detecting small objects on an image using convolutional neural networks
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing
CN110705357A (en) * 2019-09-02 2020-01-17 深圳中兴网信科技有限公司 Face recognition method and face recognition device
CN110619319A (en) * 2019-09-27 2019-12-27 北京紫睛科技有限公司 Improved MTCNN model-based face detection method and system
CN111161543A (en) * 2019-11-14 2020-05-15 南京行者易智能交通科技有限公司 Automatic snapshot method and system for bus front violation behavior based on image recognition
CN111401257A (en) * 2020-03-17 2020-07-10 天津理工大学 Non-constraint condition face recognition method based on cosine loss
CN112069993A (en) * 2020-09-04 2020-12-11 西安西图之光智能科技有限公司 Dense face detection method and system based on facial features mask constraint and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALI GHOFRANI等: "Realtime Face-Detection and Emotion Recognition Using MTCNN and miniShuffleNet V2", 2019 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI), pages 2 *
江航;董兰芳;: "CPU环境下快速精确的人脸检测方法", 小型微型计算机系统, no. 01, pages 157 - 162 *

Similar Documents

Publication Publication Date Title
US11055574B2 (en) Feature fusion and dense connection-based method for infrared plane object detection
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN106778835A (en) The airport target by using remote sensing image recognition methods of fusion scene information and depth characteristic
CN106709568A (en) RGB-D image object detection and semantic segmentation method based on deep convolution network
CN111091105A (en) Remote sensing image target detection method based on new frame regression loss function
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN109447033A (en) Vehicle front obstacle detection method based on YOLO
CN107423700A (en) The method and device of testimony verification
CN106951840A (en) A kind of facial feature points detection method
CN105139004A (en) Face expression identification method based on video sequences
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN105205453A (en) Depth-auto-encoder-based human eye detection and positioning method
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN104182985A (en) Remote sensing image change detection method
CN111985552B (en) Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background
CN112949633B (en) Improved YOLOv 3-based infrared target detection method
CN105956570A (en) Lip characteristic and deep learning based smiling face recognition method
CN106780546A (en) The personal identification method of the motion blur encoded point based on convolutional neural networks
CN113505670B (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN107315998A (en) Vehicle class division method and system based on lane line
CN108460336A (en) A kind of pedestrian detection method based on deep learning
CN113159215A (en) Small target detection and identification method based on fast Rcnn
CN112906631A (en) Dangerous driving behavior detection method and detection system based on video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination