CN110427821A

CN110427821A - A kind of method for detecting human face and system based on lightweight convolutional neural networks

Info

Publication number: CN110427821A
Application number: CN201910567225.8A
Authority: CN
Inventors: 毛亮; 刘爽爽; 李本崇; 朱婷婷; 王祥雪; 黄仝宇; 汪刚; 宋一兵; 侯玉清; 刘双广
Original assignee: Xian University of Electronic Science and Technology; Gosuncn Technology Group Co Ltd
Current assignee: Xian University of Electronic Science and Technology; Gosuncn Technology Group Co Ltd
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2019-11-08

Abstract

The invention belongs to the technical field of computer vision detection, and in particular relates to a face detection method and system based on a lightweight convolutional neural network. The key problem that needs to be solved in this solution is to optimize the model into a lightweight model, reduce the amount of calculation, and improve the calculation speed. Secondly, while realizing the lightweight of the network, it is necessary to ensure the accuracy of the face detection model. Therefore, it is necessary to balance the lightweight and Accuracy, it is necessary to be able to study how to improve the accuracy of the algorithm based on the lightweight face detection network model is the key problem to be solved by this solution. This case has certain advantages in terms of detection accuracy, model size, and detection speed. Compared with the face detection algorithm based on VGG16, the network guarantees a certain accuracy and has advantages in detection speed and model size.

Description

A face detection method and system based on lightweight convolutional neural network

技术领域technical field

本发明属于计算机视觉检测技术领域，具体涉及一种基于轻量级卷积神经网络的人脸检测方法及系统。The invention belongs to the technical field of computer vision detection, and in particular relates to a face detection method and system based on a lightweight convolutional neural network.

背景技术Background technique

检测问题是计算机视觉的一个传统问题，其中人脸检测又是其中最重要的一项。随着计算机成像技术的日益成熟，加之人脸检测算法的发展，人脸检测在现今社会中开始扮演越来越重要的角色，比如机场安检、公司打卡、公共场合监控、电子设备免密进入等等，人脸检测都有其用武之地。Detection problem is a traditional problem of computer vision, and face detection is one of the most important ones. With the increasing maturity of computer imaging technology and the development of face detection algorithms, face detection has begun to play an increasingly important role in today's society, such as airport security check, company check-in, public place monitoring, electronic equipment password-free entry, etc. etc., face detection has its place.

现有的人脸检测方法大致可以分为两类。一类是基于模板匹配，通过模板与需要检测的图片进行对比，来确定该图片是否为人脸，对人脸不变性特征比如轮廓特征、颜色特征、纹理特征等建立模板，通过计算输入图像与人脸模板的相似度判断该区域是否包含人脸，基于模板匹配的算法依赖于静态场景中事先建立好的人脸模板，在尺度变化大的动态场景中，效果比较差。另一类是基于特征统计，将人工构造的特征与机器学习中的分类器算法如人工神经网络和Adaboost等相结合，在大量人脸样本中进行统计学习，对待测图像中某一区域是否能通过分类器进行正确分类来达到人脸检测的目的，基于特征统计的检测算法受制于人工设计的特征算子，不能充分获取到图像中更高层的表征。Existing face detection methods can be roughly divided into two categories. One is based on template matching. By comparing the template with the picture to be detected, it is determined whether the picture is a human face, and a template is established for the invariant features of the face, such as contour features, color features, texture features, etc. The similarity of the face template determines whether the area contains a face. The algorithm based on template matching relies on the pre-established face template in the static scene. In the dynamic scene with large scale changes, the effect is relatively poor. The other is based on feature statistics, combining artificially constructed features with classifier algorithms in machine learning such as artificial neural networks and Adaboost, and performing statistical learning in a large number of face samples to determine whether a certain area in the image to be tested can The purpose of face detection is achieved through the correct classification of the classifier. The detection algorithm based on feature statistics is subject to the artificially designed feature operator, and cannot fully obtain the higher-level representation of the image.

卷积神经网络在图像分类问题上取得成功之后很快被用于人脸检测问题，在精度上大幅度超越之前的AdaBoost框架，当前已经有一些高精度、高效的算法。Convolutional neural network was quickly used in face detection after its success in image classification, and its accuracy greatly surpassed the previous AdaBoost framework. Currently, there are some high-precision and efficient algorithms.

基于深度学习的人脸检测网络技术取得了突破性的进展，但是在实时性与准确性的平衡上仍然有许多不足。随着卷积层深度的增加，它能提取更好的特征，检测精度会提高，但是计算量也越大，意味着检测速度变慢，如 SSD框架以及由此衍生出各种改进的SSD模型，基于VGG-Net的SSD模型虽然在检测精度得到了提升,但是由于其模型计算量较大，检测速度较慢，难以满足实时应用需求。如需实现实时人脸检测,则需要对神经网络进行剪枝优化,这会牺牲一定的检测精度，因此如何做到能保持检测精度，同时又能压缩网络，使检测速度变快，是当前人脸检测面临的主要技术问题。The face detection network technology based on deep learning has made a breakthrough, but there are still many deficiencies in the balance between real-time performance and accuracy. As the depth of the convolutional layer increases, it can extract better features, and the detection accuracy will increase, but the larger the calculation, the slower the detection speed, such as the SSD framework and various improved SSD models derived from it. Although the detection accuracy of the SSD model based on VGG-Net has been improved, it is difficult to meet the needs of real-time applications due to the large amount of calculation and the slow detection speed of the model. To achieve real-time face detection, it is necessary to prune and optimize the neural network, which will sacrifice a certain detection accuracy. Therefore, how to maintain the detection accuracy and compress the network at the same time to make the detection speed faster is a current issue. The main technical problems faced by face detection.

发明内容Contents of the invention

为了解决现有技术中存在的技术缺陷，本发明提出了一种基于轻量级卷积神经网络的人脸检测方法及系统。In order to solve the technical defects existing in the prior art, the present invention proposes a face detection method and system based on a lightweight convolutional neural network.

本发明通过以下技术方案实现：The present invention is realized through the following technical solutions:

一种基于轻量级卷积神经网络的人脸检测方法，其包括步骤：A face detection method based on lightweight convolutional neural network, which comprises steps:

S1，数据处理，建立人脸数据库，进行图像预处理，生成训练样本；S1, data processing, establishing a face database, performing image preprocessing, and generating training samples;

S2，基于输入的所述样本，利用轻量级卷积神经网络进行特征提取；所述轻量级卷积神经网络包括卷积核大小分别为3x3和1x1的卷积层；S2, based on the input samples, feature extraction is performed using a lightweight convolutional neural network; the lightweight convolutional neural network includes convolutional layers with convolution kernel sizes of 3x3 and 1x1 respectively;

S3，基于特征融合模块将轻量级卷积神经网络的不同特征层进行融合；S3, based on the feature fusion module, the different feature layers of the lightweight convolutional neural network are fused;

S4，锚点框选取；S4, anchor frame selection;

S5，输出多尺度特征图；S5, output the multi-scale feature map;

S6，人脸候选区映射匹配；S6, face candidate area mapping matching;

S7，人脸分类回归；S7, face classification regression;

S8，建立非极大值抑制约束；S8, establishing a non-maximum suppression constraint;

S9，输出检测结果。S9, outputting the detection result.

进一步地，在所述步骤S2中，所述轻量级卷积神经网络CNN的主干网络包括：依次连接的1个A block和5个B block，其中，A block中卷积的通道数设置为16，各个B block的通道数依次设置为16、32、64、128、128。Further, in the step S2, the backbone network of the lightweight convolutional neural network CNN includes: 1 A block and 5 B blocks connected in sequence, wherein the number of channels of convolution in the A block is set to 16. The number of channels of each B block is set to 16, 32, 64, 128, and 128 in sequence.

进一步地，所述A block通过卷积核大小为3×3，步长为2的卷积层实现降采样，其后添加卷积核大小为1×1，步长为1的卷积层达到输入特征降维的目的，每个卷积层后面均依次连接用于提高网络的非线性特征的 Batch Normalization和非线性函数ReLU。Further, the A block implements downsampling through a convolutional layer with a convolution kernel size of 3×3 and a step size of 2, and then adds a convolution layer with a convolution kernel size of 1×1 and a step size of 1 to achieve For the purpose of input feature dimensionality reduction, each convolutional layer is followed by Batch Normalization and nonlinear function ReLU for improving the nonlinear characteristics of the network.

进一步地，所述B block通过卷积核大小为2×2的最大池化层实现降采样，其步长为2，随后依次连接了卷积核大小为3×3、1×1、3×3的卷积层，每个卷积层步长为1，且后面均依次连接用于提高网络的非线性特征 Batch Normalization和非线性函数ReLU。Further, the B block implements downsampling through a maximum pooling layer with a convolution kernel size of 2×2, and its step size is 2, and then sequentially connects the convolution kernel sizes of 3×3, 1×1, and 3× 3 convolutional layers, each convolutional layer has a step size of 1, and is sequentially connected to the non-linear feature Batch Normalization and non-linear function ReLU used to improve the network.

进一步地，在所述步骤S3中，所述特征融合模块包括卷积核大小分别为3×3和1×1的卷积层，且步长均为1，其使用反卷积进行上采样，并采用元素求和的方式将低层特征图和变换后的高层特征图融合在一起。Further, in the step S3, the feature fusion module includes a convolution layer with a convolution kernel size of 3×3 and 1×1 respectively, and a step size of 1, which uses deconvolution for upsampling, And the low-level feature map and the transformed high-level feature map are fused together by element-wise summation.

本发明还提供一种基于轻量级卷积神经网络的人脸检测系统，以SSD 检测框架为基础，在特征提取时采用所设计的轻量卷积级神经网络作为主干网络，其包括：The present invention also provides a face detection system based on a lightweight convolutional neural network, based on an SSD detection framework, and adopts a designed lightweight convolutional neural network as a backbone network during feature extraction, which includes:

数据处理模块，用于建立人脸数据库，进行图像预处理，生成训练样本；The data processing module is used to establish a face database, perform image preprocessing, and generate training samples;

轻量卷积级神经网络模块，基于输入的所述样本，利用轻量级卷积神经网络进行特征提取；所述轻量级卷积神经网络包括卷积核大小分别为3x3和 1x1的卷积层；A lightweight convolutional neural network module, based on the input samples, utilizes a lightweight convolutional neural network for feature extraction; the lightweight convolutional neural network includes convolution kernels whose sizes are 3x3 and 1x1 respectively Floor;

特征融合模块，用于对主干网络的不同特征层进行融合，并输出多尺度特征图；The feature fusion module is used to fuse different feature layers of the backbone network and output multi-scale feature maps;

锚点框选取模块，用于设置锚点框的宽高比。Anchor box selection module, used to set the aspect ratio of the anchor box.

优选的，所述轻量级卷积神经网络包括：依次连接的1个A block和5 个B block，其中，A block中卷积的通道数设置为16，各个B block的通道数依次设置为16、32、64、128、128。Preferably, the lightweight convolutional neural network includes: 1 A block and 5 B blocks connected in sequence, wherein the number of channels of convolution in the A block is set to 16, and the number of channels of each B block is set to 16, 32, 64, 128, 128.

优选的，所述A block通过卷积核大小为3×3，步长为2的卷积层实现降采样，其后添加卷积核大小为1×1，步长为1的卷积层达到输入特征降维的目的，每个卷积层后面均依次连接用于提高网络的非线性特征的Batch Normalization和非线性函数ReLU。Preferably, the A block implements downsampling through a convolutional layer with a convolution kernel size of 3×3 and a step size of 2, and then adds a convolution layer with a convolution kernel size of 1×1 and a step size of 1 to achieve For the purpose of input feature dimensionality reduction, each convolutional layer is followed by Batch Normalization and nonlinear function ReLU for improving the nonlinear characteristics of the network.

优选的，所述B block通过卷积核大小为2×2的最大池化层实现降采样，其步长为2，随后依次连接了卷积核大小为3×3、1×1、3×3的卷积层，每个卷积层步长为1，且后面均依次连接用于提高网络的非线性特征 Batch Normalization和非线性函数ReLU。Preferably, the B block implements downsampling through a maximum pooling layer with a convolution kernel size of 2×2, and its step size is 2, and then sequentially connects convolution kernel sizes of 3×3, 1×1, and 3× 3 convolutional layers, each convolutional layer has a step size of 1, and is sequentially connected to the non-linear feature Batch Normalization and non-linear function ReLU used to improve the network.

优选的，所述特征融合模块包括卷积核大小分别为3×3和1×1的卷积层，且步长均为1，其使用反卷积进行上采样，并采用元素求和的方式将低层特征图和变换后的高层特征图融合在一起。Preferably, the feature fusion module includes a convolution layer with convolution kernel sizes of 3×3 and 1×1 respectively, and the step size is 1, which uses deconvolution for upsampling, and adopts the method of element summation The low-level feature maps and the transformed high-level feature maps are fused together.

与现有技术相比，本发明至少具有下述的有益效果或优点：本方案从设计轻量级的人脸检测网络出发，在原有深度学习神经网络模型理论研究的基础上，分析人脸检测神经网络模型在尺寸与运算效率方面的关键制约因素，通过对关键制约因素的分析,对网络进行优化；在SSD框架中采用所设计的轻量级网络作为主干网络用于人脸检测，减少模型的存储空间，减少模型建立与初始化时对计算机内存的占用；在卷积网络上实现了多个卷积层特征的融合，将设计的特征融合模块用于轻量级人脸检测网络，从而让算法在精度方面有一定的提高；与传统SSD检测算法相比，考虑到人脸形状和比例的特殊性，本方案把检测框的宽高比设置为1，减少复杂的网络计算，提高深度学习神经网络的运算效率。Compared with the prior art, the present invention has at least the following beneficial effects or advantages: the program starts from the design of a lightweight face detection network, and analyzes the face detection on the basis of the original deep learning neural network model theoretical research. The key constraints of the neural network model in terms of size and operation efficiency, through the analysis of the key constraints, optimize the network; in the SSD framework, the designed lightweight network is used as the backbone network for face detection, reducing the number of models storage space, reducing the computer memory occupation during model establishment and initialization; the fusion of multiple convolutional layer features is realized on the convolutional network, and the designed feature fusion module is used in a lightweight face detection network, so that The algorithm has a certain improvement in accuracy; compared with the traditional SSD detection algorithm, considering the particularity of the shape and proportion of the face, this solution sets the aspect ratio of the detection frame to 1 to reduce complex network calculations and improve deep learning. The computational efficiency of neural networks.

附图说明Description of drawings

以下将结合附图对本发明做进一步详细说明；The present invention will be described in further detail below in conjunction with accompanying drawing;

图1(a)为本发明A block结构图；Fig. 1 (a) is A block structural diagram of the present invention;

图1(b)为本发明B block结构图；Fig. 1 (b) is the B block structural diagram of the present invention;

图1(c)为本发明的人脸检测主干网络结构图；图2为本发明的特征融合模块图；Fig. 1 (c) is the backbone network structural diagram of face detection of the present invention; Fig. 2 is the feature fusion module diagram of the present invention;

图3为本发明的人脸检测流程图。Fig. 3 is a flow chart of face detection in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

主流的卷积神经网络模型层次深、参数大，计算量十分庞大，无法实现实时检测，由实验可知，在卷积计算中影响耗时的因素主要分为两方面：输入特征图的维度和卷积核的个数。输入特征图的宽高越大，卷积核在水平和垂直方向上所需滑动计算的次数越多；输入特征图的通道数越大，卷积核每一次局部连接计算时需要线性组合计算的参数就越多；在前向运算中，输入特征需要与卷积神经网络中的每个神经元组合计算，包含的卷积核越多，该层计算时与卷积核的迭代计算量就越大。The mainstream convolutional neural network model has deep layers, large parameters, and a huge amount of calculation, which cannot realize real-time detection. According to experiments, the factors that affect the time-consuming in convolution calculation are mainly divided into two aspects: the dimension of the input feature map and the volume The number of cores. The larger the width and height of the input feature map, the more times the convolution kernel needs to slide and calculate in the horizontal and vertical directions; the larger the number of channels of the input feature map, the convolution kernel needs linear combination calculations for each local connection calculation. The more parameters; in the forward operation, the input features need to be combined and calculated with each neuron in the convolutional neural network. The more convolution kernels are included, the more iterative calculations with the convolution kernels will be during the calculation of this layer. big.

由以上分析可知，卷积神经网络模型能够通过增加模型的深度和每一层卷积核的个数提高模型效能，但是带来的是计算量增大问题；如果单纯使用较小分辨率的图像作为输入，信息量太少又会使得网络欠拟合。针对这一问题，本方案以SSD检测框架为基础，在特征提取时采用所设计的轻量级神经网络，在保证一定的识别精度情况下，可以减少网络规模(参数量、计算量)，所设计网络将3x3的卷积核部分替换为1x1的卷积核，这样使参数量缩小了9倍，但是这样可能影响到卷积网络的提取的特征的有效性，为了确保识别效率，只做了部分替换；同时网络减小了输入数据的通道大小，这样可以减少参数量。网络主干部分结构如图1(c)所示，A block(如图 1(a)所示)通过卷积核大小为3×3，步长为2的卷积层实现降采样，其后跟随Batch Normalization和非线性函数ReLU用于提高网络的非线性特征，再通过后面添加1×1卷积层达到输入特征降维的目的，从而优化了模型的计算时间，A block中卷积的通道数设置为16。B block(如图1(b)所示)通过2×2最大池化层实现降采样，其步长为2，随后依次连接了卷积核大小为3×3、1×1、3×3的卷积层，每个卷积层步长为1，后面均连接 Batch Normalization和非线性函数ReLU以提高模型的非线性表达能力。在整个主干网络中，一共使用了5个B block，其通道数依次设置为16、32、 64、128、128。From the above analysis, it can be seen that the convolutional neural network model can improve the performance of the model by increasing the depth of the model and the number of convolution kernels in each layer, but it brings the problem of increased calculation; if only using a smaller resolution image As an input, too little information will make the network underfit. To solve this problem, this scheme is based on the SSD detection framework, and adopts the designed lightweight neural network in feature extraction, which can reduce the network scale (parameter amount, calculation amount) while ensuring a certain recognition accuracy. Design the network to replace the 3x3 convolution kernel with a 1x1 convolution kernel, which reduces the parameter amount by 9 times, but this may affect the effectiveness of the extracted features of the convolution network. In order to ensure the recognition efficiency, only Partial replacement; at the same time, the network reduces the channel size of the input data, which can reduce the amount of parameters. The structure of the backbone part of the network is shown in Figure 1(c). A block (as shown in Figure 1(a)) implements downsampling through a convolutional layer with a convolution kernel size of 3×3 and a step size of 2, followed by Batch Normalization and nonlinear function ReLU are used to improve the nonlinear characteristics of the network, and then add a 1×1 convolution layer to achieve the purpose of dimensionality reduction of input features, thereby optimizing the calculation time of the model, and the number of convolutional channels in A block Set to 16. B block (as shown in Figure 1(b)) achieves downsampling through the 2×2 maximum pooling layer, with a step size of 2, and then sequentially connects the convolution kernels with sizes of 3×3, 1×1, and 3×3 The convolutional layer, each convolutional layer has a step size of 1, followed by Batch Normalization and nonlinear function ReLU to improve the nonlinear expression ability of the model. In the entire backbone network, a total of 5 B blocks are used, and the number of channels is set to 16, 32, 64, 128, and 128 in sequence.

SSD结构中不同检测分支是相互独立的，因此容易出现相同物体被不同大小的检测框同时检测出来的问题。本案利用特征融合的方式可以增加不同层的联系，减少重复框的出现，另一方面，低层的特征语义信息比较少，更注重细节信息，目标位置准确；高层的特征语义信息比较丰富，但是目标位置比较粗略，通过特征融合模块同时利用低层特征高分辨率和高层特征的高语义信息，通过融合这些不同层的特征达到预测的效果，并且预测是在每个融合后的特征层上单独进行的，可以提高小尺寸人脸的检测精度。The different detection branches in the SSD structure are independent of each other, so it is easy to have the problem that the same object is detected by the detection frames of different sizes at the same time. In this case, the use of feature fusion can increase the connection between different layers and reduce the appearance of repeated boxes. On the other hand, the low-level feature semantic information is relatively small, and more attention is paid to detailed information, and the target position is accurate; the high-level feature semantic information is relatively rich, but the target The position is relatively rough, and the feature fusion module utilizes the high-resolution of low-level features and high-level semantic information of high-level features at the same time, and achieves the prediction effect by fusing the features of these different layers, and the prediction is performed separately on each fused feature layer , which can improve the detection accuracy of small-sized faces.

特征融合模块如图2所示，Conv3×3、Conv1×1的通道数均为256，且步长均为1，使用反卷积进行上采样，并采用元素求和(element-wise sum) 的方式将低层特征图和变换后的高层特征图融合在一起。通过所设计的特征融合模块将主干网络的不同特征层进行融合，可以输出对应的不同尺度的特征图，最后和在主干网络后添加的辅助卷积层一起输出作为每类检测框的候选框，实现提高检测精度的目的，同时又保证较小的模型和较高的运行效率。The feature fusion module is shown in Figure 2. The number of channels of Conv3×3 and Conv1×1 is 256, and the step size is 1. Deconvolution is used for upsampling, and element-wise sum is used. The method fuses the low-level feature maps and the transformed high-level feature maps together. Through the designed feature fusion module, the different feature layers of the backbone network are fused, and the corresponding feature maps of different scales can be output, and finally output together with the auxiliary convolutional layer added after the backbone network as the candidate frame of each type of detection frame. To achieve the purpose of improving the detection accuracy, while ensuring a smaller model and higher operating efficiency.

本案是通过单一网络的不同特征图使用大小不同的检测框可以达到多尺度检测效果，同时还可以共享参数，减少计算量，提高检测速度。理论上检测框的感受野是均匀的，实际上，越是中间的输入对输出的影响越重，类似一种高斯分布。根据上述理论和SSD的检测框设计方法，本技术设置出符合多尺度人脸检测的检测框。并且经过对人脸数据的分析，人脸大部分都是正方形，由于人脸形状的特点，本案把检测框的宽高比设置为1，这样可以减少网络计算获取的数量，从而提高检测速度。In this case, different feature maps of a single network can use different sizes of detection frames to achieve multi-scale detection effects. At the same time, parameters can be shared to reduce the amount of calculation and improve the detection speed. In theory, the receptive field of the detection frame is uniform. In fact, the middle input has a heavier impact on the output, similar to a Gaussian distribution. According to the above theory and the detection frame design method of SSD, this technology sets the detection frame conforming to multi-scale face detection. And after analyzing the face data, most of the faces are square. Due to the characteristics of the shape of the face, in this case, the aspect ratio of the detection frame is set to 1, which can reduce the number of network calculations and obtain, thereby improving the detection speed.

人脸检测流程如图3所示，本案以SSD检测框架为基础，在特征提取时采用所设计的轻量级神经网络作为主干网络，极大减少了神经网络参数；其次，我们将设计的轻量级卷积网络中多个卷积特征进行融合，进而让算法的精度有一定提高，并且维持SSD的多尺度预测特点，提高检测的准确率，从而确保网络人脸检测的准确率。该网络与基于VGG16的人脸检测算法相比，在算法精度、检测速度、模型大小等综合因素上具有一定的优势。在检测框的设置上，一般物体的形状是不定的，所以SSD设置了不同比例的锚点框，但是由于人脸的特殊性，一般人脸的比例是固定。为了减少不必要的计算量，本按把检测框的宽高比设置为1，这样可以减少网络计算获取的数量，从而提高检测速度。The face detection process is shown in Figure 3. This case is based on the SSD detection framework, and the designed lightweight neural network is used as the backbone network in feature extraction, which greatly reduces the parameters of the neural network. Secondly, we will design the lightweight neural network as the backbone network. The fusion of multiple convolutional features in the magnitude convolution network improves the accuracy of the algorithm, maintains the multi-scale prediction characteristics of SSD, and improves the accuracy of detection, thereby ensuring the accuracy of network face detection. Compared with the face detection algorithm based on VGG16, this network has certain advantages in comprehensive factors such as algorithm accuracy, detection speed, and model size. In the setting of the detection frame, the shape of the general object is uncertain, so SSD sets the anchor point frame with different ratios, but due to the particularity of the face, the ratio of the general face is fixed. In order to reduce unnecessary calculations, this paper sets the aspect ratio of the detection frame to 1, which can reduce the number of network calculations and thus improve the detection speed.

本发明还提供一种计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时实现基于轻量级卷积神经网络的人脸检测方法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the steps of the face detection method based on a lightweight convolutional neural network are realized.

本发明还提供一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其中所述处理器执行所述程序时实现基于轻量级卷积神经网络的人脸检测方法的步骤。The present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements a lightweight convolutional neural network-based The steps of the face detection method.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限定本发明的保护范围。在不脱离本发明之精神和范围内，所做的任何修改、等同替换、改进等，同样属于本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. protected range. Without departing from the spirit and scope of the present invention, any modifications, equivalent replacements, improvements, etc., also fall within the protection scope of the present invention.

Claims

1. a kind of method for detecting human face based on lightweight convolutional neural networks, which is characterized in that comprising steps of

Face database is established in S1, data processing, carries out image preprocessing, generates training sample；

S2, the sample based on input carry out feature extraction using lightweight convolutional neural networks；The lightweight convolution mind It include the convolutional layer that convolution kernel size is respectively 3x3 and 1x1 through network；

S3 is merged the different characteristic layer of lightweight convolutional neural networks based on Fusion Features module；

S4, anchor point frame are chosen；

S5 exports Analysis On Multi-scale Features figure；

S6, the mapping matching of face candidate area；

S7, face classification return；

S8 establishes non-maxima suppression constraint；

S9, output test result.

2. the method for detecting human face according to claim 1 based on lightweight convolutional neural networks, which is characterized in that in institute It states in step S2, the core network of the lightweight convolutional neural networks CNN includes: sequentially connected 1 A block and 5 B Block, wherein the port number of convolution is set as 16 in A block, the port number of each B block sets gradually as 16,32, 64、128、128。

3. the method for detecting human face according to claim 2 based on lightweight convolutional neural networks, which is characterized in that described A block by convolution kernel size be 3 × 3, step-length be 2 convolutional layer realize it is down-sampled, thereafter add convolution kernel size be 1 × 1, the convolutional layer that step-length is 1 achievees the purpose that input feature vector dimensionality reduction, is sequentially connected behind each convolutional layer for improving network Nonlinear characteristic Batch Normalization and nonlinear function ReLU.

4. the method for detecting human face according to claim 3 based on lightweight convolutional neural networks, which is characterized in that described B block is down-sampled by the maximum pond layer realization that convolution kernel size is 2 × 2, and step-length 2 has then been sequentially connected volume The convolutional layer that product core size is 3 × 3,1 × 1,3 × 3, each convolutional layer step-length are 1, and are sequentially connected for improving net below The nonlinear characteristic Batch Normalization and nonlinear function ReLU of network.

5. the method for detecting human face according to claim 1 based on lightweight convolutional neural networks, which is characterized in that in institute It states in step S3, the Fusion Features module includes the convolutional layer that convolution kernel size is respectively 3 × 3 and 1 × 1, and step-length is 1, it is up-sampled using deconvolution, and by low-level feature figure and transformed high-level characteristic figure by the way of element summation It is fused together.

6. a kind of face detection system based on lightweight convolutional neural networks, based on SSD detection framework, in feature extraction Light weight convolution grade neural network is as core network designed by Shi Caiyong characterized by comprising

Data processing module carries out image preprocessing, generates training sample for establishing face database；

Light weight convolution grade neural network module, the sample based on input carry out feature using lightweight convolutional neural networks It extracts；The lightweight convolutional neural networks include the convolutional layer that convolution kernel size is respectively 3x3 and 1x1；

Fusion Features module merges for the different characteristic layer to core network, and exports Analysis On Multi-scale Features figure；

Anchor point frame chooses module, for the ratio of width to height of anchor point frame to be arranged.

7. the face detection system according to claim 6 based on lightweight convolutional neural networks, which is characterized in that described Lightweight convolutional neural networks include: sequentially connected 1 A block and 5 B block, wherein convolution in A block Port number is set as 16, and it is 16,32,64,128,128 that the port number of each B block, which is set gradually,.

8. the face detection system according to claim 7 based on lightweight convolutional neural networks, which is characterized in that described A block by convolution kernel size be 3 × 3, step-length be 2 convolutional layer realize it is down-sampled, thereafter add convolution kernel size be 1 × 1, the convolutional layer that step-length is 1 achievees the purpose that input feature vector dimensionality reduction, is sequentially connected behind each convolutional layer for improving network Nonlinear characteristic Batch Normalization and nonlinear function ReLU.

9. the face detection system according to claim 8 based on lightweight convolutional neural networks, which is characterized in that described B block is down-sampled by the maximum pond layer realization that convolution kernel size is 2 × 2, and step-length 2 has then been sequentially connected volume The convolutional layer that product core size is 3 × 3,1 × 1,3 × 3, each convolutional layer step-length are 1, and are sequentially connected for improving net below The nonlinear characteristic Batch Normalization and nonlinear function ReLU of network.

10. the face detection system according to claim 6 based on lightweight convolutional neural networks, which is characterized in that institute Stating Fusion Features module includes the convolutional layer that convolution kernel size is respectively 3 × 3 and 1 × 1, and step-length is 1, uses deconvolution It is up-sampled, and is fused together low-level feature figure and transformed high-level characteristic figure by the way of element summation.