CN109886159B - Face detection method under non-limited condition - Google Patents

Face detection method under non-limited condition Download PDF

Info

Publication number
CN109886159B
CN109886159B CN201910091271.5A CN201910091271A CN109886159B CN 109886159 B CN109886159 B CN 109886159B CN 201910091271 A CN201910091271 A CN 201910091271A CN 109886159 B CN109886159 B CN 109886159B
Authority
CN
China
Prior art keywords
image
network
face
face detection
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910091271.5A
Other languages
Chinese (zh)
Other versions
CN109886159A (en
Inventor
王慧燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201910091271.5A priority Critical patent/CN109886159B/en
Publication of CN109886159A publication Critical patent/CN109886159A/en
Application granted granted Critical
Publication of CN109886159B publication Critical patent/CN109886159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a face detection method under non-limited conditions, which comprises the following steps: s1) image preprocessing; s2) designing a face detection network based on deep convolution; s3) face detection network forward propagation; s4) adopting a non-maximum value suppression algorithm; s5) to obtain the final detection result. The invention has the advantages that: the method has the advantages of being wide in application range, achieving the effect and the speed of reaching the state-of-art level, being beneficial to improving the accuracy of the coordinates of the pedestrians, reducing the probability of false detection, relieving the problem of gradient dispersion of the deep network and accelerating the network convergence process.

Description

Face detection method under non-limited condition
Technical Field
The invention relates to the technical field of computer image processing, in particular to a human face detection method under an undefined condition based on a deep convolutional neural network and combined with a multi-scale feature pyramid.
Background
The human face detection technology is used as the basis of various visual tasks and occupies a very important position in the fields of image processing and pattern recognition. In recent years, with the rapid development of artificial intelligence based on a neural network, face detection technology is increasingly applied to various visual tasks, for example, the preconditions of tasks such as testimony comparison, conference check-in, face gate, face recognition and the like are high-precision and high-accuracy face detection methods.
Early face detection techniques relied on manually constructed features, performed in conjunction with traditional machine learning. For example, a famous Haar feature and a face detection algorithm using an AdaBoost algorithm scan an image by using a sliding window, extract the Haar feature of a target in the sliding window, and classify the target by using the AdaBoost algorithm. The algorithm not only consumes a lot of time, but also has unsatisfactory effect.
With the continuous progress of artificial intelligence technology, target detection methods based on neural networks are diversified, and the most representative methods are MTCNN, YOLO, SSD, and fast RCNN. The MTCNN adopts a small neural network to classify and match with a sliding window for rapid detection, so that good detection effect and speed can be achieved, but the generalization performance is poor, and retraining is needed for a specific scene; the YOLO and the SSD adopt a deep convolutional network, and the detection process is realized by classifying and performing deviation regression on anchor points of each feature map at one time, so that the method is high in speed, good in generalization performance and lost in precision; the performance of the fast RCNN algorithm of the two phases is the best compared with other algorithms, but the computation amount is extremely increased due to the intervention of a full connection layer, and the fast RCNN algorithm is difficult to be applied to an industrial application scene requiring real-time performance.
Disclosure of Invention
The invention aims to provide a face detection method under the non-limited condition, which is beneficial to improving the accuracy of pedestrian coordinates, reducing the probability of false detection, relieving the problem of gradient dispersion of a deep network and accelerating the network convergence process.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a face detection method under non-limiting conditions comprises the following steps:
s1) image preprocessing
For public data sets collected from the network, horizontally turning all pictures for data enhancement;
for 1080p video data collected by self, extracting video frames which can be used for face detection, carrying out manual or machine face labeling on each frame, carrying out scale reduction on each face appearing in the video frames, and cutting an image containing the face on the reduced image to be used as a final training image;
s2) designing face detection network based on deep convolution
Extracting features by taking VGG16 as a convolution layer, deleting a final full-connection layer, and adding an acceptance structure and an expansion convolution structure; the whole network comprises 6 blocks, feature pyramids are used for feature fusion of 3 blocks with relatively large feature maps at the lower layer, a convolution layer of 3 x 3 is added behind each fused convolution module, and then classification loss functions and regression loss functions are added to the 3 outputs at the lower layer and the last layer of the other 3 convolution modules without feature fusion;
subtracting ImageNet mean value (104,117,123) from the manually labeled multi-scale image sample, normalizing to [0,1], and inputting the image sample into a network through a network data layer for training; uniformly distributing initialization weights for the classification loss functions of all layers; optimizing the network weight by adopting a random gradient descent algorithm, and training a network model until convergence;
s3) face detection network forward propagation
Inputting the training image obtained in the step S1) into a trained network model for forward propagation operation, and calculating a classification result and a regression result output by each layer of feature pyramid, wherein the classification result represents the probability that the target of the anchor point is a pedestrian or a background, and the regression result represents the deviation of a prediction result relative to the anchor point;
screening out anchor points with the probability that the anchor point target is a pedestrian exceeding a threshold value, removing corresponding predicted deviation of the anchor points, and correcting anchor point coordinates by using the predicted deviation to obtain a preliminary prediction result P1;
s4) applying a non-maximum suppression algorithm
Sequencing the preliminarily obtained detection results P1 according to probability, and filtering out the detection results with local non-maximum probability to obtain a prediction result P2;
s5) obtaining the final detection result
The prediction result is the coordinates (x 1, y 1) of the upper left corner of the target frame and the width and height (x 2, y 2) of the target frame, and the prediction result exceeding the image range is filtered;
if x2 is greater than the image width, x2 is modified to the image width; if y2 is greater than the image height, y2 is modified to the image height; if x1 and y1 are less than 0, correcting x1 and y1 to be 0;
the final detection result P3 was obtained.
Further, in step S1), the scaling down of each face appearing in the video frame is performed to reduce the original image size to [. about.0.9,. about.0.8,. about.0.7 ], and the image including the face is cut out at 700. about.700 pixels from the reduced image, and then the image with 640. about.640 pixels is cut out at 700. about.700 pixels from the image as the final training image.
Compared with the prior art, the invention has the following advantages:
the invention relates to a face detection method under non-limited conditions, which takes VGG16 as a convolution layer to extract features, uses a feature pyramid structure to design a face detection network, and combines a classification loss function and a regression loss function of each layer to improve the face detection effect, in particular the small target face detection effect. The invention can be applied to various types of video monitoring and detecting systems, particularly to the monitoring of a face bayonet camera, has wide application range, can achieve the effect and speed of state-of-art level, is beneficial to improving the coordinate accuracy of pedestrians, reducing the occurrence probability of false detection, relieving the gradient dispersion problem of a deep network and accelerating the network convergence process.
Drawings
Fig. 1 is a schematic flow chart of a face detection method under an unlimited condition according to the present invention.
Fig. 2 is a schematic diagram of a face detection network structure of a face detection method under an unlimited condition according to the present invention.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
According to the face detection method under the non-limited condition, the face detection from end to end can be realized through a network model obtained after training; when the video frame enters the network, outputting the probability of the detection result and the position information of the target; through non-maximum suppression and probability screening, specific coordinates of the human face can be obtained.
A face detection method under non-limiting conditions comprises the following steps:
s1) image preprocessing
For public data sets collected from the network, horizontal turning is adopted for data enhancement;
for 1080p video data collected by self, extracting video frames which can be used for face detection, carrying out face manual or machine labeling on each frame, carrying out scale reduction on each face appearing in the video frames, respectively reducing the size of the original image to [. about.0.9,. about.0.8,. about.0.7 ], cutting out 700. about.700 pixels of the image containing the face from the reduced image, and then cutting out 640. about.640 pixels of the image on the 700. about.700 pixels of the image to be used as a final training image.
S2) designing face detection network based on deep convolution
Extracting features by taking VGG16 as a convolution layer, deleting a final full-connection layer, and adding an acceptance structure and an expansion convolution structure; the whole network comprises 6 blocks, the blocks are obtained by cutting the whole network into 6 parts, the feature maps of each part are different in size, feature pyramids are used for feature fusion of 3 blocks in a lower layer, the feature pyramids refer to features of multiple scales, a convolution layer of 3 x 3 is added behind each fusion convolution module, the convolution layer does not change the size of a feature graph and can prevent aliasing effect after fusion of two different layer features, and then classification loss functions and regression loss functions are added to 3 outputs in the lower layer and the last layer of outputs in other 3 convolution modules which are not subjected to feature fusion;
subtracting ImageNet mean value (104,117,123) from the manually labeled multi-scale image sample, normalizing to [0,1], inputting the image sample into a network through a network data layer for training, wherein the training data comprises images and coordinate information of all human faces in the images; configuring weights for the classification loss functions of all layers in a uniform distribution mode; and optimizing the network weight by adopting a random gradient descent algorithm, and training the network model until convergence.
S3) face detection network forward propagation
Inputting the training image obtained in the step S1) into a trained network model for forward propagation operation, calculating a classification result output by each layer of feature pyramid by using a Softmax function, calculating a regression result output by each layer of feature pyramid by using a logistic regression function, wherein the classification result represents the probability that the target of the anchor point is a pedestrian or a background, and the regression result represents the deviation of a prediction result relative to the anchor point; the regression deviation offset = { dx, dy, dw, dh }, where dx, dy, dw, dh denotes the abscissa offset amount, ordinate offset amount, width offset multiple, and height offset multiple of the prediction result for the currently set anchor box. Anchor points are represented as anchor = { x, y, w, h }, where (x, y) represents the center coordinate of the currently used anchor box, and w and h represent width and height;
screening out anchor points with the probability of pedestrians exceeding a threshold value, removing the corresponding predicted deviation of the anchor points, and correcting the coordinates of the anchor points by using the predicted deviation to obtain a preliminary prediction result P1= { x + dx-w × dw/2, y + dy-h × dh/2, x + dx + w × dw/2, y + dy + h × dh/2}, wherein the prediction result is the coordinates of the upper left corner and the coordinates of the lower right corner.
S4) applying a non-maximum suppression algorithm
And sequencing the preliminarily obtained detection results P1 according to the probability, performing non-maximum suppression processing, traversing each candidate frame, removing the candidate frames which are less in probability and have the intersection ratio exceeding 0.35, and filtering the detection results of local non-maximum probability to obtain a prediction result P2.
S5) obtaining the final detection result
The prediction result P2 is the coordinates (x 1, y 1) of the upper left corner of the target frame and the width and height (x 2, y 2) of the target frame, and the prediction result beyond the image range is filtered;
if x2 is greater than the image width, x2 is modified to the image width; if y2 is greater than the image height, y2 is modified to the image height; if x1 and y1 are less than 0, correcting x1 and y1 to be 0;
the final detection result P3 was obtained.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and improvements can be made without departing from the spirit of the present invention, and these modifications and improvements should also be considered as within the scope of the present invention.

Claims (2)

1. A face detection method under non-limiting conditions is characterized by comprising the following steps:
s1) image preprocessing
For public data sets collected from the network, horizontally turning all pictures for data enhancement;
for 1080p video data collected by self, extracting video frames which can be used for face detection, carrying out manual or machine face labeling on each frame, carrying out scale reduction on each face appearing in the video frames, and cutting an image containing the face on the reduced image to be used as a final training image;
s2) designing face detection network based on deep convolution
Extracting features by taking VGG16 as a convolution layer, deleting a final full-connection layer, and adding an acceptance structure and an expansion convolution structure; the whole network comprises 6 blocks, feature pyramids are used for feature fusion of 3 blocks with relatively large feature maps at the lower layer, a convolution layer of 3 x 3 is added behind each fused convolution module, and then classification loss functions and regression loss functions are added to the 3 outputs at the lower layer and the last layer of the other 3 convolution modules without feature fusion;
subtracting ImageNet mean value (104,117,123) from the manually labeled multi-scale image sample, normalizing to [0,1], and inputting the image sample into a network through a network data layer for training; uniformly distributing initialization weights for the classification loss functions of all layers; optimizing the network weight by adopting a random gradient descent algorithm, and training a network model until convergence;
s3) face detection network forward propagation
Inputting the training image obtained in the step S1) into a trained network model for forward propagation operation, and calculating a classification result and a regression result output by each layer of feature pyramid, wherein the classification result represents the probability that the target of an anchor point is a pedestrian or a background, and the regression result represents the deviation of a prediction result relative to the anchor point;
screening out anchor points with the probability that the anchor point target is a pedestrian exceeding a threshold value, removing corresponding predicted deviation of the anchor points, and correcting anchor point coordinates by using the predicted deviation to obtain a preliminary prediction result P1;
s4) applying a non-maximum suppression algorithm
Sequencing the preliminarily obtained detection results P1 according to probability, and filtering out the detection results with local non-maximum probability to obtain a prediction result P2;
s5) obtaining the final detection result
The prediction result is the coordinates (x 1, y 1) of the upper left corner of the target frame and the width and height (x 2, y 2) of the target frame, and the prediction result exceeding the image range is filtered;
if x2 is greater than the image width, x2 is modified to the image width; if y2 is greater than the image height, y2 is modified to the image height; if x1 and y1 are less than 0, correcting x1 and y1 to be 0;
the final detection result P3 was obtained.
2. The method according to claim 1, wherein the face detection method comprises: in step S1), the scaling down of each face appearing in the video frame is performed to reduce the original image size to [. about.0.9,. about.0.8,. about.0.7 ], and the image including the face is cut out at 700 x 700 pixels from the reduced image, and then the image with 640 x 640 pixels is cut out at 700 x 700 pixels as the final training image.
CN201910091271.5A 2019-01-30 2019-01-30 Face detection method under non-limited condition Active CN109886159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910091271.5A CN109886159B (en) 2019-01-30 2019-01-30 Face detection method under non-limited condition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910091271.5A CN109886159B (en) 2019-01-30 2019-01-30 Face detection method under non-limited condition

Publications (2)

Publication Number Publication Date
CN109886159A CN109886159A (en) 2019-06-14
CN109886159B true CN109886159B (en) 2021-03-26

Family

ID=66927482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910091271.5A Active CN109886159B (en) 2019-01-30 2019-01-30 Face detection method under non-limited condition

Country Status (1)

Country Link
CN (1) CN109886159B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827432B (en) * 2019-11-11 2021-12-28 深圳算子科技有限公司 Class attendance checking method and system based on face recognition
CN111985439A (en) * 2020-08-31 2020-11-24 中移(杭州)信息技术有限公司 Face detection method, device, equipment and storage medium
CN112734682B (en) * 2020-12-31 2023-08-01 杭州芯炬视人工智能科技有限公司 Face detection surface vector data acceleration method, system, computer device and storage medium
CN112967254A (en) * 2021-03-08 2021-06-15 中国计量大学 Lung disease identification and detection method based on chest CT image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830196A (en) * 2018-05-31 2018-11-16 上海贵和软件技术有限公司 Pedestrian detection method based on feature pyramid network
CN108898078A (en) * 2018-06-15 2018-11-27 上海理工大学 A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11157814B2 (en) * 2016-11-15 2021-10-26 Google Llc Efficient convolutional neural networks and techniques to reduce associated computational costs
WO2018140596A2 (en) * 2017-01-27 2018-08-02 Arterys Inc. Automated segmentation utilizing fully convolutional networks
CN107403141B (en) * 2017-07-05 2020-01-10 中国科学院自动化研究所 Face detection method and device, computer readable storage medium and equipment
CN107944442B (en) * 2017-11-09 2019-08-13 北京智芯原动科技有限公司 Based on the object test equipment and method for improving convolutional neural networks
CN108520219B (en) * 2018-03-30 2020-05-12 台州智必安科技有限责任公司 Multi-scale rapid face detection method based on convolutional neural network feature fusion
CN108734211B (en) * 2018-05-17 2019-12-24 腾讯科技(深圳)有限公司 Image processing method and device
CN109034090A (en) * 2018-08-07 2018-12-18 南通大学 A kind of emotion recognition system and method based on limb action
CN109145854A (en) * 2018-08-31 2019-01-04 东南大学 A kind of method for detecting human face based on concatenated convolutional neural network structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830196A (en) * 2018-05-31 2018-11-16 上海贵和软件技术有限公司 Pedestrian detection method based on feature pyramid network
CN108898078A (en) * 2018-06-15 2018-11-27 上海理工大学 A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Very Deep Convolutional Networks for Text Classification";Holger Schwenk等;《Computer Science 》;20170127;1-10 *
"基于深度学习的非限定条件下人脸识别研究";夏洋洋;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170715;I138-825 *
"深度学习辅助的多行人跟踪算法";王慧燕等;《中国图象图形学报》;20170316;第22卷(第3期);349-357 *

Also Published As

Publication number Publication date
CN109886159A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109886159B (en) Face detection method under non-limited condition
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
WO2021238019A1 (en) Real-time traffic flow detection system and method based on ghost convolutional feature fusion neural network
CN107545263B (en) Object detection method and device
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
CN111914698B (en) Human body segmentation method, segmentation system, electronic equipment and storage medium in image
CN112308860A (en) Earth observation image semantic segmentation method based on self-supervision learning
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN112464851A (en) Smart power grid foreign matter intrusion detection method and system based on visual perception
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN110889360A (en) Crowd counting method and system based on switching convolutional network
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN112801021B (en) Method and system for detecting lane line based on multi-level semantic information
Xiang et al. Crowd density estimation method using deep learning for passenger flow detection system in exhibition center
Guo et al. A novel transformer-based network with attention mechanism for automatic pavement crack detection
CN110751150A (en) FPGA-based binary neural network license plate recognition method and system
CN110570450A (en) Target tracking method based on cascade context-aware framework
CN116152696A (en) Intelligent security image identification method and system for industrial control system
CN113449656B (en) Driver state identification method based on improved convolutional neural network
Ran et al. Adaptive fusion and mask refinement instance segmentation network for high resolution remote sensing images
CN110427920B (en) Real-time pedestrian analysis method oriented to monitoring environment
CN113723414A (en) Mask face shelter segmentation method and device
CN113177956A (en) Semantic segmentation method for unmanned aerial vehicle remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant