CN106485230B

CN106485230B - Training of face detection model based on neural network, face detection method and system

Info

Publication number: CN106485230B
Application number: CN201610906338.2A
Authority: CN
Inventors: 邵枭虎; 吕江靖; 覃勋辉; 周祥东; 石宇
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2016-10-18
Filing date: 2016-10-18
Publication date: 2019-10-25
Anticipated expiration: 2036-10-18
Also published as: CN106485230A

Abstract

The present invention provides a neural network-based face detection model training, face detection method and system, the training method: according to the bias information of the predicted face frame relative to the default face frame, and the real face frame relative to the default The bias information of the face frame is calculated to predict the loss function of the face frame bias network layer; according to the confidence of the default face frame, the loss function of the predicted face frame confidence network layer is calculated; the error of the two loss functions is calculated And the error is fed back to the neural network to adjust the weights in the neural network; repeated iterative training until it converges to obtain the face detection model, so that the predicted face frame contains the face more accurately. Detection method: Input the face image to be tested into the trained face detection model to output bias information and confidence; calculate the corresponding predicted face frame according to the bias information; select a confidence greater than the preset confidence threshold Or the predicted face frame corresponding to the highest confidence level is used as the face detection result.

Description

Training of face detection model based on neural network, face detection method and system

技术领域technical field

本发明涉及图像处理技术领域，特别是涉及一种基于神经网络的人脸检测模型的训练、人脸检测方法及系统。The invention relates to the technical field of image processing, in particular to a neural network-based face detection model training, face detection method and system.

背景技术Background technique

人脸检测其主要任务是判断给定的人脸图像上是否存在人脸，如果存在人脸，则给出人脸所处的位置和大小。人脸检测的通常采用的流程主要包含以下三步：(1)从图像上选择一个矩形区域作为一个观察窗口；(2)对观察窗口提取特征，用于描述其包含的内容；(3)进行分类判别，判断这个窗口是否包含一张人脸。通过不断的重复上述三个步骤，直到遍历完人脸图像上所有的观察窗口。如果有观察窗口被判别为包含人脸，则该窗口的位置和大小即可为检测到的人脸的位置和大小；反之，如果所有窗口都不包含人脸，则认为给定的人脸图像不存在人脸。The main task of face detection is to judge whether there is a face on a given face image, and if there is a face, give the position and size of the face. The commonly used process of face detection mainly includes the following three steps: (1) select a rectangular area from the image as an observation window; (2) extract features from the observation window to describe the content it contains; (3) carry out Classification discrimination, to determine whether the window contains a face. By continuously repeating the above three steps, until all observation windows on the face image are traversed. If any observation window is judged to contain a human face, the position and size of the window can be the position and size of the detected human face; conversely, if all windows do not contain a human face, the given face image is considered There are no faces.

目前普遍采用的人脸检测器都是基于Paul Viola和Michael Jones于2001年设计Viola-Jones人脸检测器；首先，使用Haar特征构建积分图从而实现特征的快速计算；其次，使用有效的特征分类器方法，例如，AdaBoost算法；最后，通过级联的方式对窗口进行由粗到细的判别。由于人脸的非刚性的属性结构，加上外部环境复杂多变，因此人脸检测技术存在误检高，人脸检出率低等问题。针对这一问题，后续大量的工作对Viola-Jones人脸检测器进行了改进，例如，选用描述能力更强的特征(LBP，HOG)，改进分类器算法以及级联结构等，使得人脸检测的性能有所改善。The currently commonly used face detectors are based on the Viola-Jones face detector designed by Paul Viola and Michael Jones in 2001; first, use the Haar feature to construct an integral map to achieve fast calculation of features; second, use effective feature classification processor method, for example, the AdaBoost algorithm; finally, the window is discriminated from coarse to fine through cascading. Due to the non-rigid attribute structure of the face and the complex and changeable external environment, the face detection technology has problems such as high false detection and low face detection rate. In response to this problem, a lot of follow-up work has improved the Viola-Jones face detector, for example, selecting features with stronger descriptive capabilities (LBP, HOG), improving the classifier algorithm and cascade structure, etc., making face detection performance has been improved.

近年来，随着深度学习的发展，也逐渐出现了一些基于深度神经网络的人脸检测方法，例如，CascadeCNN，Faceness，Faster RCNN等，相比传统的人脸检测方法，深度神经网络提取的特征具有更强的鲁棒性和描述能力，因此具有较高的检测率和较低的误检。In recent years, with the development of deep learning, some face detection methods based on deep neural networks have gradually emerged, such as CascadeCNN, Faceness, Faster RCNN, etc. Compared with traditional face detection methods, the features extracted by deep neural networks It has stronger robustness and descriptive ability, so it has higher detection rate and lower false detection.

虽然目前人脸检测器已经取得了长足的进展，但仍存在以下几个方面的问题：Although the current face detector has made great progress, there are still some problems in the following aspects:

(1)目前大部分人脸检测器都采用滑动窗口方式进行选择观察窗口，因此遍历完一张人脸图像，需要对大量的观察窗口进行计算判别，因此计算量较大；并且针对人脸图像中不同大小的人脸，需要构建图像金字塔，或者采用不同尺度的观察窗，人脸检测速度较慢。(1) At present, most face detectors use a sliding window method to select the observation window, so after traversing a face image, it is necessary to calculate and distinguish a large number of observation windows, so the calculation amount is large; and for face images For faces of different sizes, it is necessary to construct an image pyramid, or use observation windows of different scales, and the face detection speed is slow.

(2)大部分的人脸检测算法都步骤较多，每个步骤都相对独立，其中任何一个步骤出现问题，都会影响最终的人脸检测结果。(2) Most face detection algorithms have many steps, and each step is relatively independent. Any problem in any step will affect the final face detection result.

(3)基于深度学习的人脸检测方法，虽然效果较好，但需要对输入人脸图像缩放到固定大小，造成图像中的人脸拉伸、扭曲、变形等，影响最终的人脸检测结果。(3) The face detection method based on deep learning, although the effect is good, needs to scale the input face image to a fixed size, causing the face in the image to be stretched, distorted, deformed, etc., affecting the final face detection result .

发明内容Contents of the invention

鉴于以上所述现有技术的缺点，本发明的目的在于提供一种基于神经网络的人脸检测模型的训练、人脸检测方法及系统，能够实现人脸的多尺度检测。In view of the above-mentioned shortcomings of the prior art, the object of the present invention is to provide a neural network-based face detection model training, face detection method and system, which can realize multi-scale detection of faces.

为实现上述目的及其他相关目的，本发明提供一种基于神经网络的人脸检测模型的训练方法，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；所述方法还包括：In order to achieve the above purpose and other related purposes, the present invention provides a training method for a face detection model based on a neural network. The network layer of the face frame confidence; wherein, the network layer of the human face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and the network layer of the human face detection Each cell in is bound to six default face frames, which are set according to the network layer scale of the corresponding face detection; each layer of face detection network layer is connected to a layer of predicted face frame A biased network layer and a network layer predicting the confidence of the face frame; the method also includes:

接收到模型训练指令时，将所述训练集中的人脸图像输入到所述神经网络中进行训练；When the model training instruction is received, the face images in the training set are input into the neural network for training;

通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，以及计算出真实人脸框相对于对应的默认人脸框的偏置信息；并通过预测人脸框置信度的网络层计算出每个默认人脸框包含人脸的置信度；Calculate the bias information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame bias, and calculate the bias information of the real face frame relative to the corresponding default face frame; and Calculate the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame;

根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数；并根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数；According to the bias information of the predicted face frame relative to the corresponding default face frame, and the bias information of the real face frame relative to the corresponding default face frame, calculate the network layer for predicting the bias of the face frame A loss function; and according to the confidence of the default human face frame, calculate the loss function of the network layer for predicting the confidence of the human face frame;

计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差，并将所述误差通过反向传播反馈到所述神经网络中，根据所述误差更新所述神经网络的网络权重参数以及根据更新后的网络权重参数调整所述预测人脸框；Calculate the error of the loss function of the network layer of the bias of the predicted face frame and the loss function of the network layer of the predicted face frame confidence, and feed back the error to the neural network through backpropagation, Updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters;

重复迭代训练直至调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内，输出人脸检测模型。Repeat iterative training until the error between the adjusted predicted face frame and the real face frame is within the preset error range, and output the face detection model.

优选地，所述通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，包括：Preferably, the bias information of the predicted face frame relative to the corresponding default face frame is calculated by the network layer of the predicted face frame bias, including:

按照以下公式计算出预测人脸框相对于对应的默认人脸框的偏置信息：Calculate the offset information of the predicted face frame relative to the corresponding default face frame according to the following formula:

t_x＝(x- x_a)/w_a,t_y＝(y- y_a)/h_a t _x ＝(x- x _a )/w _a , t _y ＝(y- y _a )/h _a

t_w＝log(w/w_a),t_h＝log(h/h_a)t _w =log(w/w _a ),t _h =log(h/h _a )

其中，(x,y,w,h)为预测人脸框的中心点坐标、宽和高；(x_a,y_a,w_a,h_a)为默认人脸框的中心点坐标、宽和高；(t_x,t_y,t_w,t_h)为所述预测人脸框相对于对应的默认人脸框的偏置信息；Among them, (x, y, w, h) are the center point coordinates, width and height of the predicted face frame; (x _a , y _a , w _a , h _a ) are the center point coordinates, width and height of the default face frame High; (t _x , t _y , t _w , t _h ) is the offset information of the predicted face frame relative to the corresponding default face frame;

所述计算出真实人脸框相对于对应的默认人脸框的偏置信息，包括：The calculation of the offset information of the real face frame relative to the corresponding default face frame includes:

根据每个默认人脸框与对应的真实人脸框，按照以下公式计算所述真实人脸框相对于对应的默认人脸框的偏置信息：According to each default face frame and the corresponding real face frame, the offset information of the real face frame relative to the corresponding default face frame is calculated according to the following formula:

其中，(x_a,y_a,w_a,h_a)为默认人脸框的中心点坐标、宽和高；(x^*,y^*,w^*,h^*)为真实人脸框的中心点坐标、宽和高；为所述真实人脸框相对于对应的默认人脸框的偏置信息。Among them, (x _a , y _a , w _a , h _a ) are the center point coordinates, width and height of the default face frame; (x ^* , y ^* , w ^* , h ^* ) are the center point of the real face frame coordinates, width and height; is the offset information of the real face frame relative to the corresponding default face frame.

优选地，所述根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数，包括：Preferably, the predicted face frame is calculated according to the bias information of the predicted face frame relative to the corresponding default face frame, and the bias information of the real face frame relative to the corresponding default face frame The loss function of the biased network layer, including:

选取所述默认人脸框相对于对应的真实人脸框的相对面积大于预设的第一相对面积阈值时所对应的默认人脸框作为采样默认人脸框；其中，所述相对面积为默认人脸框与真实人脸框的相交区域的面积除以默认人脸框与真实人脸框的并集区域的面积；Select the corresponding default face frame when the relative area of the default face frame relative to the corresponding real face frame is greater than the preset first relative area threshold as the sampling default face frame; wherein, the relative area is the default The area of the intersection area of the face frame and the real face frame is divided by the area of the union area of the default face frame and the real face frame;

根据预测人脸框相对于对应的采样默认人脸框的偏置信息，以及所述真实人脸框相对于对应的采样默认人脸框的偏置信息，按照以下公式计算预测人脸框偏置的网络层的损失函数 Loss₁：According to the bias information of the predicted face frame relative to the corresponding sampled default face frame, and the bias information of the real face frame relative to the corresponding sampled default face frame, the predicted face frame offset is calculated according to the following formula The loss function Loss ₁ of the network layer:

其中，N_reg为采样默认人脸框的数量，L_reg对应第k(k为1和N_reg之间的正整数)个默认人脸框的空间偏置的回归损失函数，T＝t_x,t_y,t_w,t_h为预测人脸框相对于对应的采样默认人脸框的偏置信息，z是属于{x,y,w,h}这一集合的元素，为真实人脸框相对于对应的采样默认人脸框的偏置信息，smooth_L1表示平滑L1损失函数，它是L1范数损失函数的变种，l表示smooth_L1函数的输入变量。Among them, N _reg is the number of sampled default face frames, L _reg corresponds to the regression loss function of the spatial bias of the kth (k is a positive integer between 1 and N _reg ) default face frames, T=t _x , t _y , t _w , t _h are the bias information of the predicted face frame relative to the corresponding sampled default face frame, z is an element belonging to the set {x, y, w, h}, is the offset information of the real face frame relative to the corresponding sampled default face frame, smooth _L1 represents the smooth L1 loss function, which is a variant of the L1 norm loss function, and l represents the input variable of the smooth _L1 function.

所述根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数，包括：According to the confidence that the default human face frame contains a human face, the loss function of the network layer that calculates the predicted human face frame confidence includes:

将所述默认人脸框相对于对应的真实人脸框的相对面积大于预设的第一相对面积阈值时所对应的默认人脸框作为正样本，将所述默认人脸框相对于对应的真实人脸框的相对面积小于或等于预设的第一相对面积阈值时所对应的默认人脸框作为负样本；When the relative area of the default face frame to the corresponding real face frame is greater than the preset first relative area threshold, the corresponding default face frame is used as a positive sample, and the default face frame is compared to the corresponding The default face frame corresponding to when the relative area of the real face frame is less than or equal to the preset first relative area threshold is used as a negative sample;

按照预设的正负样本比例选取全部正样本及部分负样本；Select all positive samples and some negative samples according to the preset ratio of positive and negative samples;

按照以下公式计算预测人脸框置信度的网络层的损失函数Loss₂：Calculate the loss function Loss ₂ of the network layer for predicting the confidence of the face frame according to the following formula:

L_cls(p,p^*)＝-[p^*logp+(1-p^*)log(1-p)L _cls (p,p ^* )＝-[p ^* logp+(1-p ^* )log(1-p)

其中，N_cls为选取的正负样本的总数量，对应第i(i为1和N_cls之间的正整数)个类别的分类损失函数，p为选取的正样本或负样本包含人脸的置信度，p^*为选取的正样本或负样本包含人脸的真实概率，正样本的p^*为1，负样本的p^*为0。Among them, N _cls is the total number of positive and negative samples selected, Corresponding to the classification loss function of the i-th category (i is a positive integer between 1 and N _cls ), p is the confidence that the selected positive or negative sample contains the face, and p ^* is the selected positive or negative sample contains The true probability of the face, p ^* is 1 for positive samples, and p ^* is 0 for negative samples.

优选地，所述计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差，包括：Preferably, the calculation of the error between the loss function of the network layer that predicts the bias of the face frame and the loss function of the network layer that predicts the confidence of the face frame includes:

采用随机梯度下降法计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的梯度；Adopt stochastic gradient descent method to calculate the gradient of the loss function of the network layer of described prediction face frame offset and the loss function of the network layer of described prediction face frame confidence degree;

将得到的梯度值作为所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差。The obtained gradient value is used as an error between the loss function of the network layer predicting the bias of the face frame and the loss function of the network layer predicting the confidence of the face frame.

本发明的另一目的在于提供一种基于神经网络的人脸检测方法，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；所述方法包括：Another object of the present invention is to provide a face detection method based on neural network, said neural network comprising: a network layer for face detection, a network layer for predicting the bias of the face frame, and a network for predicting the confidence of the face frame layer; wherein, the network layer of the human face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and each cell in the network layer of the human face detection is bound Determine six default face frames, the default face frame is set according to the network layer scale of the corresponding face detection; the network layer of each layer of face detection is connected to a network layer that predicts the bias of the face frame and a network layer Layer predicts the network layer of face frame confidence; Described method comprises:

接收到人脸检测指令时，将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测；When the face detection instruction is received, the face image to be detected is input into the trained face detection model for face detection;

针对所述待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度；For the face image to be detected, the network layer of the predicted face frame bias in the trained face detection model outputs the bias information of the predicted face frame relative to the corresponding default face frame, and the trained The network layer predicting the confidence of the face frame in the face detection model outputs the confidence that each default face frame contains a face;

根据每个默认人脸框及预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框；Calculate the corresponding predicted face frame according to the offset information of each default face frame and the predicted face frame relative to each default face frame;

在所述预测人脸框中选取大于预设的置信度阈值的置信度所对应的预测人脸框作为最终的人脸检测结果，或者，在所述预测人脸框中选取最高置信度所对应的预测人脸框作为最终的人脸检测结果。In the predicted human face frame, select the predicted human face frame corresponding to the confidence greater than the preset confidence threshold as the final human face detection result, or select the corresponding predicted human face frame in the predicted human face frame The predicted face frame is used as the final face detection result.

优选地，所述根据每个默认人脸框及对应的预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框，包括：Preferably, the calculation of the corresponding predicted face frame according to each default face frame and the offset information of the corresponding predicted face frame relative to each default face frame includes:

根据每个默认人脸框及对应的预测人脸框相对于每个默认人脸框的偏置信息，按照以下公式计算对应的预测人脸框：According to the offset information of each default face frame and the corresponding predicted face frame relative to each default face frame, the corresponding predicted face frame is calculated according to the following formula:

x＝t_x*w_a+x_a,y＝t_y*h_a+y_a x＝t _x *w _a +x _a ,y＝t _y *h _a +y _a

其中，(x_a,y_a,w_a,h_a)为每个默认人脸框的中心点坐标、宽和高；(x,y,w,h)为每个默认人脸框对应的预测人脸框的中心点坐标、宽和高；(t_x,t_y,t_w,t_h)为对应的预测人脸框相对于每个默认人脸框的偏置信息。Among them, (x _a , y _a , w _a , h _a ) are the center point coordinates, width and height of each default face frame; (x, y, w, h) are the predictions corresponding to each default face frame The center point coordinates, width and height of the face frame; (t _x , _ty , t _w , t _h ) is the offset information of the corresponding predicted face frame relative to each default face frame.

优选地，所述针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度之后，所述方法还包括：Preferably, for the face image to be detected, the bias information of the predicted face frame relative to the corresponding default face frame is output through the network layer of the predicted face frame bias in the trained face detection model, and passed After the network layer of predicting the confidence of the face frame in the trained face detection model outputs the confidence that each default face frame contains the face, the method also includes:

过滤掉所述置信度小于或等于预设的置信度阈值的默认人脸框；Filtering out default face frames whose confidence is less than or equal to a preset confidence threshold;

根据其余的每个默认人脸框及对应的预测人脸框相对于其余的每个所述默认人脸框的偏置信息，计算对应的预测人脸框；Calculate the corresponding predicted face frame according to the bias information of each remaining default face frame and the corresponding predicted face frame relative to each of the remaining default face frames;

将其余的默认人脸框所对应的预测人脸框作为最终的人脸检测结果。The predicted face frames corresponding to the remaining default face frames are used as the final face detection result.

优选地，所述根据每个默认人脸框及对应的预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框之后，所述方法还包括：Preferably, after calculating the corresponding predicted face frame according to the offset information of each default face frame and the corresponding predicted face frame relative to each default face frame, the method further includes:

计算每两个预测人脸框的相对面积；Calculate the relative area of each two predicted face frames;

若每两个预测人脸框的相对面积大于预设的第二相对面积阈值，则将所述两个预测人脸框作为采样预测人脸框；其中，所述相对面积为两个预测人脸框的相交区域的面积除以两个预测人脸框的并集区域的面积；If the relative area of every two predicted human face frames is greater than the preset second relative area threshold, then the two predicted human face frames are used as sampling predicted human face frames; wherein, the relative areas are two predicted human faces The area of the intersection area of the frame is divided by the area of the union area of the two predicted face frames;

在所述采样预测人脸框中选取大于预设的置信度阈值的置信度所对应的采样预测人脸框作为最终的人脸检测结果，或者，在所述采样预测人脸框中选取最高置信度所对应的采样预测人脸框作为最终的人脸检测结果。In the sampled predicted face frame, select the sampled predicted face frame corresponding to the confidence greater than the preset confidence threshold as the final face detection result, or select the highest confidence in the sampled predicted face frame The sampling prediction face frame corresponding to degree is used as the final face detection result.

本发明的另一目的在于提供一种基于神经网络的人脸检测模型的训练系统，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；所述系统包括：输入模块、第一计算模块、第二计算模块、第三计算模块、反馈及更新模块、迭代输出模块；其中，Another object of the present invention is to provide a training system for a face detection model based on a neural network. degree of network layer; wherein, the network layer of the human face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and each of the network layers of the human face detection The cell is bound to six default face frames, which are set according to the network layer scale of the corresponding face detection; each network layer of face detection is connected to a network that predicts the bias of the face frame Layer and a network layer for predicting the confidence of the face frame; the system includes: an input module, a first calculation module, a second calculation module, a third calculation module, a feedback and update module, and an iterative output module; wherein,

所述输入模块，用于接收到模型训练指令时，将所述训练集中的人脸图像输入到所述神经网络中进行训练；The input module is configured to input the face images in the training set into the neural network for training when receiving a model training instruction;

所述第一计算模块，用于通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，以及计算出真实人脸框相对于对应的默认人脸框的偏置信息；并通过预测人脸框置信度的网络层计算出每个默认人脸框包含人脸的置信度；The first calculation module is used to calculate the offset information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame offset, and calculate the offset information of the real face frame relative to the corresponding default The bias information of the face frame; and calculate the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame;

所述第二计算模块，用于根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数；并根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数；The second computing module is configured to calculate according to the bias information of the predicted face frame relative to the corresponding default face frame, and the bias information of the real face frame relative to the corresponding default face frame Predict the loss function of the network layer of human face frame offset; And according to the confidence degree that described default human face frame comprises people's face, calculate the loss function of the network layer of predicting human face frame confidence degree;

所述第三计算模块，用于计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差；The third calculation module is used to calculate the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence;

所述反馈及更新模块，用于将所述误差通过反向传播反馈到所述神经网络中，根据所述误差更新所述神经网络的网络权重参数以及根据更新后的网络权重参数调整所述预测人脸框；The feedback and update module is configured to feed back the error into the neural network through backpropagation, update the network weight parameters of the neural network according to the error, and adjust the prediction according to the updated network weight parameters face frame;

所述迭代输出模块，用于重复迭代训练直至调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内，输出人脸检测模型。The iterative output module is used to repeat iterative training until the error between the adjusted predicted face frame and the real face frame is within a preset error range, and output the face detection model.

本发明的另一目的在于提供一种基于神经网络的人脸检测系统，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；所述系统包括：输入模块、输出模块、计算模块、选取模块；其中，Another object of the present invention is to provide a human face detection system based on a neural network, the neural network comprising: a network layer for human face detection, a network layer for predicting the bias of the human face frame, and a network for predicting the confidence of the human face frame layer; wherein, the network layer of the human face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and each cell in the network layer of the human face detection is bound Determine six default face frames, the default face frame is set according to the network layer scale of the corresponding face detection; the network layer of each layer of face detection is connected to a network layer that predicts the bias of the face frame and a network layer The network layer of layer prediction face frame confidence degree; Described system comprises: input module, output module, calculation module, selection module; Wherein,

所述输入模块，用于接收到人脸检测指令时，将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测；The input module is configured to input the face image to be detected into the trained face detection model for face detection when receiving the face detection instruction;

所述输出模块，用于针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度；The output module is used to output the bias information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame offset in the trained face detection model for the face image to be detected, And output the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame in the trained face detection model;

所述计算模块，用于根据每个默认人脸框及预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框；The calculation module is used to calculate the corresponding predicted face frame according to the bias information of each default face frame and predicted face frame relative to each default face frame;

所述选取模块，用于在所述预测人脸框中选取大于预设的置信度阈值的置信度所对应的预测人脸框作为最终的人脸检测结果，或者，在所述预测人脸框中选取最高置信度所对应的预测人脸框作为最终的人脸检测结果。The selection module is configured to select a predicted face frame corresponding to a confidence greater than a preset confidence threshold in the predicted face frame as the final face detection result, or, in the predicted face frame The predicted face frame corresponding to the highest confidence level is selected as the final face detection result.

如上所述，本发明的基于神经网络的人脸检测模型的训练、人脸检测方法及系统，相对于现有技术具有以下有益效果：As mentioned above, the training of the face detection model based on the neural network, the face detection method and system of the present invention have the following beneficial effects compared with the prior art:

(1)本发明实施例中，不需要采用滑动窗口方式进行选择观察窗口，也不需要构建图像构建金字塔或者使用多尺度的观察窗口以及对大量的观察窗口进行计算判别，而是根据神经网络中不同网络层对应原始人脸图像中不同大小的感受野选取用于人脸检测的网络层，其中，网络层数越高对应于原始人脸图像的感受野越大，网络层数越低对应于原始人脸图像的感受野越小，直接通过人脸检测的网络层进行人脸检测，相对于现有技术计算量较小，且可以选取低层网络层用于检测小尺寸的人脸，高层网络层用于检测大尺寸的人脸，实现人脸的多尺度检测及偏置回归，使预测人脸框能更好地包含人脸，相对于现有技术本发明实施例人脸检测更准确且人脸检测速度更快。(1) In the embodiment of the present invention, it is not necessary to select the observation window in a sliding window manner, nor to construct an image pyramid or use a multi-scale observation window and to calculate and discriminate a large number of observation windows, but according to the neural network Different network layers correspond to different sizes of receptive fields in the original face image. The network layer used for face detection is selected. The higher the number of network layers corresponds to the larger the receptive field of the original face image, the lower the number of network layers corresponds to The smaller the receptive field of the original face image, the face detection is performed directly through the face detection network layer. Compared with the existing technology, the calculation amount is small, and the low-level network layer can be selected to detect small-sized faces. The layer is used to detect large-sized human faces, realize multi-scale detection and bias regression of human faces, and make the predicted human face frame better contain human faces. Compared with the prior art, the human face detection of the embodiment of the present invention is more accurate and Face detection is faster.

(2)本发明实施例中，模型训练直接采用端到端方式，输入训练集中的人脸图像直接输出对应的预测人脸框的位置和大小，相对现有多步骤方法更加简便快捷；并且，由于采用端到端的训练模式，根据所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差直接对神经网络的网络权重参数进行反馈调节，以使预测人脸框和真实人脸框更加接近，从而使得预测人脸框更加精确的包含人脸；因此，相对于现有技术中基于多个独立子步骤方法具有更高的检测率。(2) In the embodiment of the present invention, the model training directly adopts the end-to-end mode, and the face image in the input training set directly outputs the position and size of the corresponding predicted face frame, which is simpler and faster than the existing multi-step method; and, Due to the end-to-end training mode, the network weight parameters of the neural network are directly fed back according to the error between the loss function of the network layer for predicting the bias of the face frame and the loss function of the network layer for the confidence of the predicted face frame Adjust to make the predicted face frame and the real face frame closer, so that the predicted face frame contains the face more accurately; therefore, it has a higher detection rate than the method based on multiple independent sub-steps in the prior art .

(3)本发明实施例中，不需要对输入人脸图像进行缩放，而是直接将训练集中的人脸图像输入神经网络中训练人脸检测模型，以及直接将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测，这样能够避免人脸图像拉伸、扭曲、变形等因素对人脸检测结果的影响。(3) In the embodiment of the present invention, it is not necessary to zoom the input face image, but directly input the face image in the training set into the neural network to train the face detection model, and directly input the face image to be detected into Face detection is performed in the trained face detection model, which can avoid the influence of face image stretching, distortion, deformation and other factors on the face detection results.

附图说明Description of drawings

图1显示为本发明实施例中的基于神经网络的人脸检测模型的训练方法的实现流程图；Fig. 1 is shown as the implementation flowchart of the training method of the face detection model based on neural network in the embodiment of the present invention;

图2显示为本发明实施例中的基于神经网络的人脸检测方法的实现流程图；Fig. 2 is shown as the implementation flowchart of the face detection method based on neural network in the embodiment of the present invention;

图3显示为本发明实施例中的基于神经网络的人脸检测模型的训练系统的组成结构示意图；Fig. 3 is shown as the composition structure diagram of the training system of the face detection model based on neural network in the embodiment of the present invention;

图4显示为本发明实施例中的基于神经网络的人脸检测系统的组成结构示意图。FIG. 4 is a schematic diagram showing the composition and structure of the neural network-based face detection system in the embodiment of the present invention.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。Embodiments of the present invention are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific implementation modes, and various modifications or changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention.

请参阅附图。需要说明的是，本发明实施例中所提供的图示仅以示意方式说明本发明的基本构想，遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制，其实际实施时各组件的型态、数量及比例可为一种随意的改变，且其组件布局型态也可能更为复杂。Please refer to attached picture. It should be noted that the diagrams provided in the embodiments of the present invention are only schematically illustrating the basic idea of the present invention, and only the components related to the present invention are shown in the diagrams rather than the number and shape of components in actual implementation. and size drawing, the type, quantity and proportion of each component can be changed arbitrarily during actual implementation, and the component layout type may also be more complex.

本发明实施例中，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层。In the embodiment of the present invention, the neural network includes: a network layer for face detection, a network layer for predicting the bias of the face frame, and a network layer for predicting the confidence of the face frame; wherein, the network layer for face detection is Selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, each cell in the network layer of the face detection is bound to six default face frames, and the default face The frame is set according to the network layer scale of the corresponding face detection; each network layer of face detection is connected with a network layer for predicting the bias of the face frame and a network layer for predicting the confidence of the face frame.

其中，由于VGG-16是目前比较流行的全卷积神经网络结构，具有良好的抽象特征的表达能力，在物体分类和目标识别中，例如，ImageNet竞赛，人脸识别等领域，取得良好的效果，因此选取VGG-16作为全卷积神经网络的基本网络结构。VGG-16全卷积神经网络结构的网络参数配置信息如表1所示：Among them, since VGG-16 is currently a popular fully convolutional neural network structure, it has good expression ability of abstract features, and has achieved good results in object classification and target recognition, such as ImageNet competition, face recognition and other fields. , so VGG-16 is selected as the basic network structure of the fully convolutional neural network. The network parameter configuration information of the VGG-16 fully convolutional neural network structure is shown in Table 1:

表1Table 1

其中，由于不同大小的感受野有利于检测不同尺度的人脸图像，因此可以根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取网络层作为人脸检测的网络层；这里，可以根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野，选取所述神经网络中的conv3_3，conv4_3，conv5_3这三个网络层用于多尺度人脸检测，conv3_3， conv4_3，conv5_3这三个网络层对应训练集中的人脸图像的感受野如表2所示：Wherein, since the receptive fields of different sizes are conducive to detecting face images of different scales, the network layer can be selected as the network layer of face detection according to the receptive fields of different network layers in the neural network corresponding to the face images in the training set; Here, the three network layers conv3_3, conv4_3, and conv5_3 in the neural network can be selected for multi-scale face detection according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, conv3_3, The three network layers of conv4_3 and conv5_3 correspond to the receptive fields of the face images in the training set as shown in Table 2:

表2Table 2

其中，可以在每一层人脸检测的网络层的后面额外添加一层用于预测人脸框偏置的网络层及一层用于预测人脸框置信度的网络层；即：每一层人脸检测的网络层同时连接一层用于预测人脸框偏置的网络层及一层用于预测人脸框置信度的网络层，用于预测人脸框偏置的网络层及用于预测人脸框置信度的网络层是并行的。Among them, an additional network layer for predicting the bias of the face frame and a network layer for predicting the confidence of the face frame can be added behind the network layer of each layer of face detection; that is: each layer The network layer of face detection is connected with a network layer for predicting the bias of the face frame and a network layer for predicting the confidence of the face frame, and a network layer for predicting the bias of the face frame and a network layer for The layers of the network that predict the confidence of the face box are parallelized.

其中，针对于conv3_3，conv4_3，conv5_3这三个网络层的每个胞元(feature mapcell) 分别绑定六个默认人脸框，可以根据conv3_3，conv4_3，conv5_3这三个网络层每层的规模设置每层中的每个胞元绑定的六个默认人脸框的尺寸，每层的规模及默认人脸框的建议尺寸如表3所示：Among them, six default face frames are bound to each cell (feature mapcell) of the three network layers conv3_3, conv4_3, and conv5_3, which can be set according to the scale of each layer of the three network layers conv3_3, conv4_3, and conv5_3 The size of the six default face frames bound to each cell in each layer, the scale of each layer and the recommended size of the default face frame are shown in Table 3:

表3table 3

下面结合附图及具体实施例对本发明做进一步详细的说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明实施例提出了一种基于神经网络的人脸检测模型的训练方法，如图1所示，该方法包括：The embodiment of the present invention proposes a training method of a neural network-based face detection model, as shown in Figure 1, the method includes:

步骤S100：接收到模型训练指令时，将所述训练集中的人脸图像输入到所述神经网络中进行训练。Step S100: When receiving a model training instruction, input the face images in the training set into the neural network for training.

本步骤中，可以使用所述神经网络VGG-16在ImageNet图像库(1000个类别，共120万张训练图像)上训练的人脸检测模型，作为初始化人脸检测模型；然后使用CelebFaces图像库(202599张图像，每张图像包含一张人脸)和AFLW图像库(21080张图片，共包含24,386张人脸)这两个图像库库进行人脸检测模型的训练，也可以对训练图像数据进行扩充，例如，缩放，加模糊噪声，对比度变化等，以丰富训练样本。In this step, the face detection model trained on the ImageNet image library (1000 categories, 1.2 million training images) can be used as the initialization face detection model; then use the CelebFaces image library ( 202,599 images, each containing a face) and the AFLW image library (21,080 images, containing a total of 24,386 faces). Augmentation, e.g., scaling, adding blurring noise, contrast changes, etc., to enrich training samples.

步骤S101：通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，以及计算出真实人脸框相对于对应的默认人脸框的偏置信息；并通过预测人脸框置信度的网络层计算出每个默认人脸框包含人脸的置信度。Step S101: Calculate the offset information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame bias, and calculate the offset of the real face frame relative to the corresponding default face frame information; and calculate the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame.

本步骤中，每个默认人脸框对应一个预测人脸框及一个真实人脸框，即：默认人脸框、预测人脸框及真实人脸框是一一对应关系。In this step, each default face frame corresponds to a predicted face frame and a real face frame, that is, there is a one-to-one correspondence between the default face frame, the predicted face frame, and the real face frame.

本步骤中，按照如下公式计算输出预测人脸框相对于对应的默认人脸框的偏置信息 (t_x,t_y,t_w,t_h)：In this step, the bias information (t _x , t _y , t _w , t _h ) of the output predicted face frame relative to the corresponding default face frame is calculated according to the following formula:

t_x＝(xx_a)/w_a,t_y＝(yy_a)/h_a t _x ＝(xx _a )/w _a , t _y ＝(yy _a )/h _a

t_w＝log(w/w_a),t_h＝log(h/h_a)t _w =log(w/w _a ),t _h =log(h/h _a )

按照以下公式计算所述真实人脸框相对于对应的默认人脸框的偏置信息：Calculate the offset information of the real face frame relative to the corresponding default face frame according to the following formula:

步骤S102：根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数；并根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数。Step S102: According to the offset information of the predicted face frame relative to the corresponding default face frame, and the offset information of the real face frame relative to the corresponding default face frame, calculate the bias of the predicted face frame The loss function of the network layer; and according to the confidence that the default face frame contains the face, calculate the loss function of the network layer that predicts the confidence of the face frame.

本步骤中，可以仅选取默认人脸框相对真实人脸框的相对面积IOU大于预设的第一相对面积阈值的默认人脸框所对应的偏置信息用于回归计算，预测人脸框回归的损失函数计算如下：In this step, only the bias information corresponding to the default face frame whose relative area IOU of the default face frame relative to the real face frame is greater than the preset first relative area threshold can be selected for regression calculation, and the face frame regression is predicted The loss function of is calculated as follows:

首先选取所述默认人脸框相对于对应的真实人脸框的相对面积IOU大于预设的第一相对面积阈值时所对应的默认人脸框作为采样默认人脸框；其中，所述相对面积为默认人脸框与真实人脸框的相交区域的面积除以默认人脸框与真实人脸框的并集区域的面积；First select the default face frame corresponding to the default face frame when the relative area IOU of the corresponding real face frame is greater than the preset first relative area threshold as the default face frame for sampling; wherein the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;

根据预测人脸框相对于对应的采样默认人脸框的偏置信息，以及所述真实人脸框相对于对应的采样默认人脸框的偏置信息，按照以下公式计算预测人脸框偏置的网络层的损失函数：According to the bias information of the predicted face frame relative to the corresponding sampled default face frame, and the bias information of the real face frame relative to the corresponding sampled default face frame, the predicted face frame offset is calculated according to the following formula The loss function of the network layer:

其中，N_reg为采样默认人脸框的数量，L_reg对应第k(k为1和N_reg之间的正整数)个默认人脸框的空间偏置的回归损失函数，T＝t_x,t_y,t_w,t_h为预测人脸框相对于对应的采样默认人脸框的偏置信息，z是属于{x,y,w,h}这一集合的元素，为真实人脸框相对于对应的采样默认人脸框的偏置信息，smooth_L1表示平滑L1损失函数，它是L1范数损失函数的变种，l表示smooth_L1函数的输入变量；Among them, N _reg is the number of sampled default face frames, L _reg corresponds to the regression loss function of the spatial bias of the kth (k is a positive integer between 1 and N _reg ) default face frames, T=t _x , t _y , t _w , t _h are the bias information of the predicted face frame relative to the corresponding sampled default face frame, z is an element belonging to the set {x, y, w, h}, is the offset information of the real face frame relative to the corresponding sampled default face frame, smooth _L1 represents the smooth L1 loss function, which is a variant of the L1 norm loss function, and l represents the input variable of the smooth _L1 function;

然后，将所述默认人脸框相对于对应的真实人脸框的相对面积大于预设的第一相对面积阈值时所对应的默认人脸框作为正样本，将所述默认人脸框相对于对应的真实人脸框的相对面积小于或等于预设的第一相对面积阈值时所对应的默认人脸框作为负样本；Then, use the default face frame corresponding to the default face frame when the relative area of the corresponding real face frame is greater than the preset first relative area threshold as a positive sample, and compare the default face frame with respect to The corresponding default face frame when the relative area of the corresponding real face frame is less than or equal to the preset first relative area threshold is used as a negative sample;

按照以下公式计算预测人脸框置信度的网络层的损失函数：Calculate the loss function of the network layer that predicts the confidence of the face frame according to the following formula:

其中，N_cls为选取的正负样本的总数量，对应第i(i为1和N_cls之间的正整数) 个类别的分类损失函数，p为选取的正样本或负样本包含人脸的置信度，p^*为选取的正样本或负样本包含人脸的真实概率，正样本的p^*为1，负样本的p^*为0。Among them, N _cls is the total number of positive and negative samples selected, Corresponding to the classification loss function of the i-th category (i is a positive integer between 1 and N _cls ), p is the confidence that the selected positive or negative sample contains the face, and p ^* is the selected positive or negative sample contains The true probability of the face, p ^* is 1 for positive samples, and p ^* is 0 for negative samples.

本发明实施例中，预设的第一相对面积阈值可以根据实际要求进行设置，这里对预设的第一相对面积阈值不作具体限定，优选地，预设的第一相对面积阈值为0.5。对于人脸分类置信度的计算，由于人脸检测中大部分默认人脸框与对应的真实人脸框的相对面积IOU小于 0.5，因此存在负样本的数量会远远高于正样本的现象，因此，为了均衡正负样本，以避免正负样本不均衡造成人脸模型误解或者漏检问题，可以使用正负样本比例为1：3进行人脸分类置信度计算，其中，负样本选取置信度较高的部分用于人脸检测模型的训练。In the embodiment of the present invention, the preset first relative area threshold can be set according to actual requirements, and the preset first relative area threshold is not specifically limited here. Preferably, the preset first relative area threshold is 0.5. For the calculation of face classification confidence, since the relative area IOU between most of the default face frames and the corresponding real face frames in face detection is less than 0.5, the number of negative samples will be much higher than that of positive samples. Therefore, in order to balance the positive and negative samples to avoid the misinterpretation or missed detection of the face model caused by the imbalance of positive and negative samples, a positive and negative sample ratio of 1:3 can be used to calculate the confidence of face classification, in which the confidence of negative samples is selected The higher part is used for the training of the face detection model.

最终，可以将上述两个损失函数进行融合得到如下总的损失函数：Finally, the above two loss functions can be fused to obtain the following total loss function:

Loss＝Loss₁+λLoss₂ Loss＝Loss ₁ +λLoss ₂

其中，λ用于均衡两个损失函数，默认设为1。Among them, λ is used to balance the two loss functions and is set to 1 by default.

需要说明的是：λ可以根据实际需求进行设置。It should be noted that: λ can be set according to actual needs.

步骤S103：计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差，并将所述误差通过反向传播反馈到所述神经网络中，根据所述误差更新所述神经网络的网络权重参数以及根据更新后的网络权重参数调整所述预测人脸框。Step S103: Calculate the error between the loss function of the network layer that predicts the bias of the face frame and the loss function of the network layer that predicts the confidence of the face frame, and feed back the error to the neural network through backpropagation In the network, the network weight parameters of the neural network are updated according to the error and the predicted face frame is adjusted according to the updated network weight parameters.

本步骤中，可以采用随机梯度下降法计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的梯度；将得到的梯度值作为所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差。In this step, the stochastic gradient descent method can be used to calculate the gradient of the loss function of the network layer that predicts the bias of the face frame and the loss function of the network layer that predicts the confidence of the face frame; the obtained gradient value is used as the The error between the loss function of the network layer that predicts the bias of the face frame and the loss function of the network layer that predicts the confidence of the face frame.

本步骤中，网络权重参数更新的公式如下：In this step, the formula for updating the network weight parameters is as follows:

其中，W_i ^l为第l层的参数经过i次迭代后的权值，为第i+1次迭代更新后的权值，使用均值为0方差为1的高斯分布初始化，α为学习率，m为动量，λ为权重衰减系数。Among them, W _i ^l is the weight of the parameters of the l layer after i iterations, is the updated weight of the i+1 iteration, Initialize with a Gaussian distribution with a mean of 0 and a variance of 1, α is the learning rate, m is the momentum, and λ is the weight decay coefficient.

本步骤中，采用随机梯度下降法进行梯度反馈更新，需要预先设置学习网络权重参数时使用的参数，使用的参数包括：初始化学习率、动量、权重衰减；其中，初始化学习率设为 0.001，初始化动量设为0.9，初始化权重衰减设置为0.0005。In this step, the stochastic gradient descent method is used for gradient feedback update, and the parameters used for learning network weight parameters need to be set in advance. The parameters used include: initial learning rate, momentum, and weight decay; where the initial learning rate is set to 0.001, and The momentum is set to 0.9 and the initial weight decay is set to 0.0005.

步骤S104：判断调整后的预测人脸框与真实人脸框的误差是否在预设的误差范围之内；若调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内，则输出人脸检测模型，结束本次处理；若调整后的预测人脸框与真实人脸框的误差未在预设的误差范围之内，则转入步骤S101根据调整后的预测人脸框继续执行。Step S104: Determine whether the error between the adjusted predicted face frame and the real face frame is within the preset error range; if the adjusted error between the predicted face frame and the real face frame is within the preset error range If the error between the adjusted predicted human face frame and the real human face frame is not within the preset error range, then go to step S101 according to the adjusted predicted human face frame. The face frame continues to execute.

本步骤中，预设的误差范围可以根据实际需求进行设置，以调整后的预测人脸框迭代收敛于真实人脸框为原则，即：更新所述神经网络的权重参数，以使预测人脸框收敛于真实人脸框，通常经过360000次迭代后即可收敛。In this step, the preset error range can be set according to actual needs, based on the principle that the adjusted predicted face frame iteratively converges to the real face frame, that is, the weight parameters of the neural network are updated so that the predicted face frame The frame converges to the real face frame, usually after 360000 iterations.

本发明实施例提出了一种基于神经网络的人脸检测方法，如图2所示，该方法包括：The embodiment of the present invention proposes a face detection method based on a neural network, as shown in Figure 2, the method includes:

步骤S200：接收到人脸检测指令时，将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测。Step S200: When a face detection instruction is received, input the face image to be detected into the trained face detection model for face detection.

本步骤中，训练好的人脸检测模型是采用步骤S100～S104重复迭代训练直至调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内时获得的人脸检测模型。In this step, the trained face detection model is the face detection model obtained when the error between the adjusted predicted face frame and the real face frame is within the preset error range by repeated iterative training in steps S100-S104 .

步骤S201：针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度。Step S201: Output the bias information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame bias in the trained face detection model for the face image to be detected, and pass the training The network layer that predicts the confidence of the face frame in the face detection model outputs the confidence that each default face frame contains a face.

本步骤中，默认人脸框包含人脸的置信度，即：默认人脸框包含的区域属于背景或人脸的概率。In this step, the confidence of the default face frame containing the face is the probability that the area contained in the default face frame belongs to the background or the face.

步骤S202：根据每个默认人脸框及预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框。Step S202: According to each default face frame and the offset information of the predicted face frame relative to each default face frame, calculate the corresponding predicted face frame.

具体地，根据每个默认人脸框及对应的预测人脸框相对于每个默认人脸框的偏置信息，按照以下公式计算对应的预测人脸框：Specifically, according to the offset information of each default face frame and the corresponding predicted face frame relative to each default face frame, the corresponding predicted face frame is calculated according to the following formula:

x＝t_x*w_a+x_a,y＝t_y*h_a+y_a x＝t _x *w _a +x _a ,y＝t _y *h _a +y _a

步骤S203：在所述预测人脸框中选取大于预设的置信度阈值的置信度所对应的预测人脸框作为最终的人脸检测结果，或者，选取最高置信度对应的预测人脸框作为最终的人脸检测结果。Step S203: Select the predicted face frame corresponding to the confidence greater than the preset confidence threshold in the predicted face frame as the final face detection result, or select the predicted face frame corresponding to the highest confidence as the The final face detection result.

本步骤中，由于默认人脸框与预测人脸框是一一对应关系，因此，大于预设的置信度阈值的置信度或最高置信度所对应的预测人脸框，即为置信度大于预设的置信度阈值的默认人脸框或置信度最高的默认人脸框所对应的预测人脸框。In this step, since there is a one-to-one correspondence between the default face frame and the predicted face frame, the predicted face frame corresponding to a confidence degree greater than the preset confidence threshold or the highest confidence degree means that the confidence degree is greater than the predetermined The default face frame of the set confidence threshold or the predicted face frame corresponding to the default face frame with the highest confidence.

本步骤中，所述置信度阈值可以根据实际需求预设，这里对置信度阈值不作具体限定。In this step, the confidence threshold may be preset according to actual needs, and the confidence threshold is not specifically limited here.

在本发明的一种优选实施方式中，所述针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度之后，为减少计算预测人脸框的计算量，还包括：In a preferred embodiment of the present invention, for the face image to be detected, the network layer of the predicted face frame offset in the trained face detection model outputs the predicted face frame with respect to the corresponding default face The bias information of the frame, and output the confidence of each default face frame containing the face through the network layer of the predicted face frame confidence in the trained face detection model, in order to reduce the calculation amount of the predicted face frame ,Also includes:

首先过滤掉所述置信度小于或等于预设的置信度阈值的默认人脸框；First filter out the default face frames whose confidence is less than or equal to a preset confidence threshold;

然后根据其余的每个默认人脸框及对应的预测人脸框相对于其余的每个所述默认人脸框的偏置信息，计算对应的预测人脸框；Then according to the bias information of each remaining default human face frame and corresponding predicted human face frame with respect to each remaining said default human face frame, calculate the corresponding predicted human face frame;

最终将其余的默认人脸框所对应的预测人脸框作为最终的人脸检测结果。Finally, the predicted face frames corresponding to the remaining default face frames are used as the final face detection result.

在本发明的一种优选实施方式中，在计算对应的预测人脸框之前，先过滤掉所述置信度小于或等于预设的置信度阈值的默认人脸框，然后根据其余的每个默认人脸框及对应的预测人脸框相对于其余的每个所述默认人脸框的偏置信息，计算对应的预测人脸框，这样能够减少计算预测人脸框的计算量，从而提高计算的速度，缩减计算时间。In a preferred embodiment of the present invention, before calculating the corresponding predicted face frame, first filter out the default face frame whose confidence is less than or equal to the preset confidence threshold, and then according to each of the remaining default The face frame and the corresponding predicted face frame are calculated with respect to the offset information of each of the remaining default face frames, and the corresponding predicted face frame can be calculated, which can reduce the calculation amount of the predicted human face frame, thereby improving the calculation speed and reduce computation time.

在本发明的另一种优选实施方式中，由于一个人脸可能对应多个重叠的预测人脸框，存在大量冗余，可以使用非最大值抑制方法去除冗余重复的人脸框，具体地，去除每两个预测人脸框的相对面积IOU大于预设的第二相对面积阈值的预测人脸框，仅保留每两个预测人脸框的相对面积IOU较高的预测人脸框，因此，根据每个默认人脸框及对应的预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框之后，还包括：In another preferred embodiment of the present invention, since a face may correspond to multiple overlapping predicted face frames, there is a large amount of redundancy, and the non-maximum suppression method can be used to remove redundant and repeated face frames, specifically , remove the predicted face frame whose relative area IOU of every two predicted face frames is greater than the preset second relative area threshold, and only keep the predicted face frame whose relative area IOU of every two predicted face frames is higher, so , after calculating the corresponding predicted face frame according to the offset information of each default face frame and the corresponding predicted face frame relative to each default face frame, it also includes:

在所述采样预测人脸框中选取大于预设的置信度阈值的置信度所对应的采样预测人脸框作为最终的人脸检测结果，或者，在所述采样预测人脸框中选取最高置信度对应的采样预测人脸框作为最终的人脸检测结果。In the sampled predicted face frame, select the sampled predicted face frame corresponding to the confidence greater than the preset confidence threshold as the final face detection result, or select the highest confidence in the sampled predicted face frame The corresponding sampling prediction face frame is used as the final face detection result.

在本发明的另一种优选实施方式中，针对预测人脸框计算每两个预测人脸框彼此的相对面积，如果彼此相对面积大于预设的第二相对面积阈值，则认为这两个预测人脸框包含同一张人脸，因此，若每两个预测人脸框的相对面积大于预设的第二相对面积阈值，则将所述两个预测人脸框作为采样预测人脸框，在所述采样预测人脸框中，选取大于预设的置信度阈值的置信度所对应的采样预测人脸框作为最终的人脸检测结果，或者，选取最高置信度对应的预测人脸框作为最终的人脸检测结果，以使最终的人脸检测结果更好地包含人脸。In another preferred embodiment of the present invention, the relative areas of every two predicted face frames are calculated for the predicted face frames, and if the relative areas of each other are larger than the preset second relative area threshold, the two predicted face frames are considered The face frame contains the same face, therefore, if the relative area of each two predicted face frames is greater than the preset second relative area threshold, the two predicted face frames are used as the sample predicted face frame, and the In the sampled predicted face frame, the sampled predicted face frame corresponding to the confidence greater than the preset confidence threshold is selected as the final face detection result, or the predicted face frame corresponding to the highest confidence is selected as the final face detection result. face detection results, so that the final face detection results better contain faces.

其中，第二相对面积阈值可以根据实际需求进行预设，这里对第二相对面积阈值不作具体限定，优选地，所述第二相对面积阈值预设为0.3。Wherein, the second relative area threshold can be preset according to actual needs, and the second relative area threshold is not specifically limited here, preferably, the second relative area threshold is preset as 0.3.

为实现上述方法，本发明实施例提供了一种基于神经网络的人脸检测模型的训练系统，由于该系统解决问题的原理与方法相似，因此，系统的实施过程及实施原理均可以参见前述方法的实施过程及实施原理描述，重复之处不再赘述。In order to realize the above-mentioned method, the embodiment of the present invention provides a training system of a face detection model based on a neural network. Since the principle and method of solving the problem of this system are similar, therefore, the implementation process and implementation principle of the system can refer to the above-mentioned method The implementation process and implementation principles are described, and the repetitions will not be repeated.

本发明实施例提出了一种基于神经网络的人脸检测模型的训练系统，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；如图3所示，该系统包括：输入模块300、第一计算模块301、第二计算模块302、第三计算模块303、反馈及更新模块304、迭代输出模块305；其中，The embodiment of the present invention proposes a training system for a face detection model based on a neural network. The neural network includes: a network layer for face detection, a network layer for predicting the bias of the face frame, and a network layer for predicting the confidence of the face frame. Network layer; wherein, the network layer of the human face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and each cell in the network layer of the human face detection Binding six default face frames, the default face frame is set according to the network layer scale of the corresponding face detection; the network layer of each layer of face detection is connected to a network layer that predicts the bias of the face frame and One layer predicts the network layer of face frame confidence; As shown in Figure 3, this system comprises: input module 300, the first calculation module 301, the second calculation module 302, the third calculation module 303, feedback and update module 304, iterative output module 305; wherein,

所述输入模块300，用于接收到模型训练指令时，将所述训练集中的人脸图像输入到所述神经网络中进行训练；The input module 300 is configured to input the face images in the training set into the neural network for training when receiving a model training instruction;

所述第一计算模块301，用于通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，以及计算出真实人脸框相对于对应的默认人脸框的偏置信息；并通过预测人脸框置信度的网络层计算出每个默认人脸框包含人脸的置信度；The first calculation module 301 is used to calculate the offset information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame offset, and calculate the offset information of the real face frame relative to the corresponding The bias information of the default face frame; and calculate the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame;

所述第二计算模块302，用于根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数；并根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数；The second calculation module 302 is configured to, according to the offset information of the predicted face frame relative to the corresponding default face frame, and the offset information of the real face frame relative to the corresponding default face frame, Computing the loss function of the network layer for predicting the offset of the face frame; and calculating the loss function of the network layer for predicting the confidence of the face frame according to the confidence of the default face frame;

所述第三计算模块303，用于计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差；The third calculation module 303 is used to calculate the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence;

所述反馈及更新模块304，用于将所述误差通过反向传播反馈到所述神经网络中，根据所述误差更新所述神经网络的网络权重参数以及根据更新后的网络权重参数调整所述预测人脸框；The feedback and update module 304 is configured to feed back the error into the neural network through backpropagation, update the network weight parameters of the neural network according to the error, and adjust the network weight parameters according to the updated network weight parameters. Predict the face frame;

所述迭代输出模块305，用于重复迭代训练直至调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内，输出人脸检测模型。The iterative output module 305 is configured to repeat iterative training until the error between the adjusted predicted face frame and the real face frame is within a preset error range, and output the face detection model.

具体实施中，所述第一计算模块301具体用于：In specific implementation, the first calculation module 301 is specifically used for:

t_x＝(xx_a)/w_a,t_y＝(yy_a)/h_a t _x ＝(xx _a )/w _a , t _y ＝(yy _a )/h _a

t_w＝log(w/w_a),t_h＝log(h/h_a)t _w =log(w/w _a ),t _h =log(h/h _a )

按照以下公式计算出所述真实人脸框相对于对应的默认人脸框的偏置信息：The offset information of the real face frame relative to the corresponding default face frame is calculated according to the following formula:

具体实施中，所述第二计算模块302具体用于：In a specific implementation, the second calculation module 302 is specifically used for:

具体实施中，所述第三计算模块303具体用于：In specific implementation, the third calculation module 303 is specifically used for:

以上功能模块的划分方式仅为本发明实施例给出的一种优选实现方式，功能模块的划分方式不构成对本发明的限制。为了描述的方便，以上所述系统的各部分以功能分为各种模块或单元分别描述。该系统可以是分布式系统或集中式系统，若为分布式系统，则上述功能模块可分别由硬件设备实现，各硬件设备之间通过通信网络交互；若是集中式系统，则上述各功能模块可集成在一个硬件设备中。The above division manner of the functional modules is only a preferred implementation manner given by the embodiment of the present invention, and the division manner of the functional modules does not constitute a limitation of the present invention. For the convenience of description, each part of the system described above is divided into various modules or units by function and described separately. The system can be a distributed system or a centralized system. If it is a distributed system, the above-mentioned functional modules can be realized by hardware devices respectively, and each hardware device interacts through a communication network; if it is a centralized system, the above-mentioned functional modules can be integrated in one hardware device.

在实际应用中，当所述输入模块300、第一计算模块301、第二计算模块302、第三计算模块303、反馈及更新模块304、迭代输出模块305集成于一个硬件设备中时，所述输入模块300、第一计算模块301、第二计算模块302、第三计算模块303、反馈及更新模块304、迭代输出模块305可由位于该硬件设备中的中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)实现。In practical applications, when the input module 300, the first calculation module 301, the second calculation module 302, the third calculation module 303, the feedback and update module 304, and the iterative output module 305 are integrated in one hardware device, the The input module 300, the first calculation module 301, the second calculation module 302, the third calculation module 303, the feedback and update module 304, and the iterative output module 305 can be located in the central processing unit (CPU), microprocessor ( MPU), Digital Signal Processor (DSP) or Field Programmable Gate Array (FPGA) implementation.

为实现上述方法，本发明实施例还提供了一种基于神经网络的人脸检测系统，由于该系统解决问题的原理与方法相似，因此，系统的实施过程及实施原理均可以参见前述方法的实施过程及实施原理描述，重复之处不再赘述。In order to realize the above method, the embodiment of the present invention also provides a face detection system based on neural network. Since the principle and method of solving the problem of this system are similar to the method, therefore, the implementation process and implementation principle of the system can refer to the implementation of the aforementioned method The process and implementation principles are described, and the repetitions will not be repeated.

本发明实施例提出了一种基于神经网络的人脸检测系统，所述神经网络包括：人脸检测的网络层、预测人脸框偏置的网络层及预测人脸框置信度的网络层；其中，所述人脸检测的网络层是根据所述神经网络中不同网络层对应训练集中的人脸图像的感受野选取的，所述人脸检测的网络层中的每个胞元绑定六个默认人脸框，所述默认人脸框是根据对应的人脸检测的网络层规模设置的；每层人脸检测的网络层连接一层预测人脸框偏置的网络层及一层预测人脸框置信度的网络层；如图4所示，该系统包括：输入模块400、输出模块401、计算模块 402、选取模块403；其中，An embodiment of the present invention proposes a neural network-based face detection system, the neural network comprising: a network layer for face detection, a network layer for predicting the bias of the face frame, and a network layer for predicting the confidence of the face frame; Wherein, the network layer of the face detection is selected according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network, and each cell in the network layer of the face detection is bound to six A default face frame, the default face frame is set according to the network layer scale of the corresponding face detection; the network layer of each layer of face detection is connected to a network layer that predicts the bias of the face frame and a layer of prediction The network layer of face frame confidence degree; As shown in Figure 4, this system comprises: input module 400, output module 401, calculation module 402, selection module 403; Wherein,

所述输入模块400，用于接收到人脸检测指令时，将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测；The input module 400 is configured to input the face image to be detected into the trained face detection model for face detection when receiving the face detection instruction;

所述输出模块401，用于针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息及每个默认人脸框包含人脸的置信度；The output module 401 is used to output the bias information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame offset in the trained face detection model for the face image to be detected And each default face frame contains the confidence of the face;

所述计算模块402，用于根据每个默认人脸框及预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框；The calculation module 402 is used to calculate the corresponding predicted face frame according to each default face frame and the bias information of the predicted face frame relative to each default face frame;

所述选取模块403，用于在所述预测人脸框中选取大于预设的置信度阈值的置信度所对应的预测人脸框作为最终的人脸检测结果，或者，在所述预测人脸框中选取最高置信度所对应的预测人脸框作为最终的人脸检测结果。The selecting module 403 is configured to select a predicted human face frame corresponding to a confidence degree greater than a preset confidence threshold in the predicted human face frame as the final human face detection result, or, in the predicted human face frame, Select the predicted face frame corresponding to the highest confidence level in the frame as the final face detection result.

具体实施中，所述计算模块402具体用于：In specific implementation, the calculation module 402 is specifically used for:

x＝t_x*w_a+x_a,y＝t_y*h_a+y_a x＝t _x *w _a +x _a ,y＝t _y *h _a +y _a

进一步地，所述系统还包括：Further, the system also includes:

过滤模块404，用于过滤掉所述置信度小于或等于预设的置信度阈值的默认人脸框；A filtering module 404, configured to filter out default face frames whose confidence is less than or equal to a preset confidence threshold;

所述计算模块402，还用于根据其余的每个默认人脸框及对应的预测人脸框相对于其余的每个所述默认人脸框的偏置信息，计算对应的预测人脸框；The calculation module 402 is further configured to calculate a corresponding predicted face frame according to each of the remaining default face frames and the offset information of the corresponding predicted face frame relative to each of the remaining default face frames;

所述选取模块403，还用于将其余的默认人脸框所对应的预测人脸框作为最终的人脸检测结果。The selection module 403 is further configured to use the predicted face frames corresponding to the remaining default face frames as the final face detection result.

进一步地，所述系统还包括：Further, the system also includes:

判断模块405，用于计算每两个预测人脸框的相对面积，在每两个预测人脸框的相对面积大于预设的第二相对面积阈值时，将所述两个预测人脸框作为采样预测人脸框；其中，所述相对面积为两个预测人脸框的相交区域的面积除以两个预测人脸框的并集区域的面积；The judging module 405 is used to calculate the relative area of each two predicted human face frames, and when the relative area of each two predicted human face frames is greater than the preset second relative area threshold, the two predicted human face frames are used as Sampling prediction face frame; Wherein, described relative area is the area of the intersecting area of two prediction face frames divided by the area of the union area of two prediction face frames;

所述选取模块403，还用于在所述采样预测人脸框中选取大于预设的置信度阈值的置信度所对应的采样预测人脸框作为最终的人脸检测结果，或者，在所述采样预测人脸框中选取最高置信度所对应的采样预测人脸框作为最终的人脸检测结果。The selection module 403 is further configured to select, in the sampled predicted face frame, a sampled predicted face frame corresponding to a confidence greater than a preset confidence threshold as the final face detection result, or, in the In the sampled predicted face frame, the sampled predicted face frame corresponding to the highest confidence is selected as the final face detection result.

在实际应用中，当所述输入模块400、输出模块401、计算模块402、选取模块403、过滤模块404、判断模块405集成于一个硬件设备中时，所述输入模块400、输出模块401、计算模块402、选取模块403、过滤模块404、判断模块405可由位于该硬件设备中的中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)实现。In practical applications, when the input module 400, output module 401, calculation module 402, selection module 403, filter module 404, and judgment module 405 are integrated in one hardware device, the input module 400, output module 401, calculation module Module 402, selection module 403, filter module 404, judgment module 405 can be positioned at central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or field programmable gate array (FPGA) in this hardware equipment )accomplish.

为了更清楚地对本发明实施例进行说明，下面以具体实施例对基于神经网络的人脸检测模型的训练流程及检测流程进行详细描述。In order to illustrate the embodiment of the present invention more clearly, the training process and detection process of the neural network-based face detection model will be described in detail below with specific embodiments.

实施例一Embodiment one

步骤S1：接收到模型训练指令时，将所述训练集中的人脸图像输入到所述神经网络中进行训练。Step S1: When receiving a model training instruction, input the face images in the training set into the neural network for training.

步骤S2：通过预测人脸框偏置的网络层计算出预测人脸框相对于对应的默认人脸框的偏置信息，以及计算出真实人脸框相对于对应的默认人脸框的偏置信息；并通过预测人脸框置信度的网络层计算出每个默认人脸框包含人脸的置信度。Step S2: Calculate the offset information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame bias, and calculate the offset of the real face frame relative to the corresponding default face frame information; and calculate the confidence that each default face frame contains a face through the network layer that predicts the confidence of the face frame.

步骤S3：根据所述预测人脸框相对于对应的默认人脸框的偏置信息，以及所述真实人脸框相对于对应的默认人脸框的偏置信息，计算预测人脸框偏置的网络层的损失函数；并根据所述默认人脸框包含人脸的置信度，计算预测人脸框置信度的网络层的损失函数。Step S3: According to the offset information of the predicted face frame relative to the corresponding default face frame, and the offset information of the real face frame relative to the corresponding default face frame, calculate the predicted face frame offset The loss function of the network layer; and according to the confidence that the default face frame contains the face, calculate the loss function of the network layer that predicts the confidence of the face frame.

步骤S4：计算所述预测人脸框偏置的网络层的损失函数与所述预测人脸框置信度的网络层的损失函数的误差，并将所述误差通过反向传播反馈到所述神经网络中，根据所述误差更新所述神经网络的网络权重参数以及根据更新后的网络权重参数调整所述预测人脸框。Step S4: Calculate the error between the loss function of the network layer that predicts the bias of the face frame and the loss function of the network layer that predicts the confidence of the face frame, and feed back the error to the neural network through backpropagation In the network, the network weight parameters of the neural network are updated according to the error and the predicted face frame is adjusted according to the updated network weight parameters.

步骤S5：判断调整后的预测人脸框与真实人脸框的误差是否在预设的误差范围之内；若调整后的预测人脸框与真实人脸框的误差在预设的误差范围之内，则输出人脸检测模型；若调整后的预测人脸框与真实人脸框的误差未在预设的误差范围之内，则转入步骤S1根据调整后的预测人脸框继续执行。Step S5: Determine whether the error between the adjusted predicted face frame and the real face frame is within the preset error range; if the adjusted error between the predicted face frame and the real face frame is within the preset error range , then output the face detection model; if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, then go to step S1 and continue to execute according to the adjusted predicted face frame.

步骤S6：接收到人脸检测指令时，将待检测的人脸图像输入到训练好的人脸检测模型中进行人脸检测。Step S6: When receiving the face detection instruction, input the face image to be detected into the trained face detection model for face detection.

步骤S7：针对待检测的人脸图像通过训练好的人脸检测模型中预测人脸框偏置的网络层输出预测人脸框相对于对应的默认人脸框的偏置信息，并通过训练好的人脸检测模型中预测人脸框置信度的网络层输出每个默认人脸框包含人脸的置信度。Step S7: Output the bias information of the predicted face frame relative to the corresponding default face frame through the network layer of the predicted face frame bias in the trained face detection model for the face image to be detected, and pass the training The network layer that predicts the confidence of the face frame in the face detection model outputs the confidence that each default face frame contains a face.

步骤S8：根据每个默认人脸框及预测人脸框相对于每个默认人脸框的偏置信息，计算对应的预测人脸框。Step S8: According to each default face frame and the bias information of the predicted face frame relative to each default face frame, calculate the corresponding predicted face frame.

步骤S9：在所述预测人脸框中选取大于预设的置信度阈值的置信度所对应的预测人脸框作为最终的人脸检测结果，或者，在所述预测人脸框中选取最高置信度对应的预测人脸框作为最终的人脸检测结果。Step S9: Select the predicted face frame corresponding to the confidence greater than the preset confidence threshold in the predicted face frame as the final face detection result, or select the highest confidence in the predicted face frame The predicted face frame corresponding to degree is used as the final face detection result.

实施例二Embodiment two

步骤S8：过滤掉所述置信度小于或等于预设的置信度阈值的默认人脸框。Step S8: Filter out default face frames whose confidence is less than or equal to a preset confidence threshold.

步骤S9：根据其余的每个默认人脸框及对应的预测人脸框相对于其余的每个所述默认人脸框的偏置信息，计算对应的预测人脸框。Step S9: According to each of the remaining default face frames and the offset information of the corresponding predicted face frames relative to each of the remaining default face frames, calculate the corresponding predicted face frame.

步骤S10：将其余的默认人脸框所对应的预测人脸框作为最终的人脸检测结果。Step S10: Use the predicted face frames corresponding to the remaining default face frames as the final face detection result.

实施例三Embodiment three

步骤S9：计算每两个预测人脸框的相对面积；若每两个预测人脸框的相对面积大于预设的第二相对面积阈值，则将所述两个预测人脸框作为采样预测人脸框。Step S9: Calculate the relative area of each two predicted human face frames; if the relative area of each two predicted human face frames is greater than the preset second relative area threshold, then use the two predicted human face frames as the sampling predicted human face frame.

步骤S10：在所述采样预测人脸框中选取大于预设的置信度阈值的置信度所对应的采样预测人脸框作为最终的人脸检测结果，或者，在所述采样预测人脸框中选取最高置信度对应的预测人脸框作为最终的人脸检测结果。Step S10: Select the sampled predicted face frame corresponding to the confidence greater than the preset confidence threshold in the sampled predicted face frame as the final face detection result, or, in the sampled predicted face frame Select the predicted face frame corresponding to the highest confidence level as the final face detection result.

实施例四Embodiment Four

步骤S10：计算每两个预测人脸框的相对面积；若每两个预测人脸框的相对面积大于预设的第二相对面积阈值，则将所述两个预测人脸框作为采样预测人脸框；Step S10: Calculate the relative area of each two predicted human face frames; if the relative area of each two predicted human face frames is greater than the preset second relative area threshold, then use the two predicted human face frames as the sampling predicted human face frame. face frame;

步骤S11：将采样预测人脸框作为最终的人脸检测结果，或者，在所述采样预测人脸框中选取最高置信度对应的采样预测人脸框作为最终的人脸检测结果。Step S11: Take the sampled predicted face frame as the final face detection result, or select the sampled predicted face frame corresponding to the highest confidence level among the sampled predicted face frames as the final face detection result.

综上所述，本发明的基于神经网络的人脸检测模型的训练方法及系统、人脸检测方法及系统，相对于现有技术具有以下有益效果：In summary, the training method and system of the face detection model based on the neural network, the face detection method and the system of the present invention have the following beneficial effects compared with the prior art:

上述实施例仅例示性说明本发明的原理及其功效，而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下，对上述实施例进行修饰或改变。因此，举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本发明的权利要求所涵盖。The above-mentioned embodiments only illustrate the principles and effects of the present invention, but are not intended to limit the present invention. Anyone skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or changes made by those skilled in the art without departing from the spirit and technical ideas disclosed in the present invention should still be covered by the claims of the present invention.

Claims

1. A training method of a face detection model based on a neural network is characterized in that the neural network comprises the following steps: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method comprises the following steps:

when a model training instruction is received, inputting the face images in the training set into the neural network for training;

calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;

calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;

calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameter of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameter;

and repeating iterative training until the error between the adjusted predicted face frame and the actual face frame is within a preset error range, and outputting a face detection model.

2. The training method according to claim 1, wherein the calculating, by the network layer of the predicted face frame bias, bias information of the predicted face frame with respect to a corresponding default face frame includes:

calculating the bias information of the predicted face frame relative to the corresponding default face frame according to the following formula:

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a

t_w＝log(w/w_a),t_h＝log(h/h_a)

wherein, (x, y, w, h) is the coordinate of the central point of the predicted face frame, the width and the height; (x)_a,y_a,w_a,h_a) Is a default face frameThe center point coordinates, width and height of; (t)_x,t_y,t_w,t_h) Bias information for the predicted face frame relative to a corresponding default face frame;

the calculating of the bias information of the real face frame relative to the corresponding default face frame includes:

calculating the bias information of the real face frame relative to the corresponding default face frame according to the following formula:

wherein (x)_a,y_a,w_a,h_a) The coordinates, width and height of the center point of the default face frame are used; (x)^*,y^*,w^*,h^*) Coordinates, width and height of a central point of a real face frame;and the offset information of the real face frame relative to the corresponding default face frame is obtained.

3. The training method according to claim 2, wherein the calculating a loss function of a network layer of the predicted face frame bias according to the bias information of the predicted face frame relative to the corresponding default face frame and the bias information of the real face frame relative to the corresponding default face frame comprises:

selecting the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a sampling default face frame; the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;

based on the predicted face frameCalculating the offset information relative to the corresponding sampling default face frame and the offset information of the real face frame relative to the corresponding sampling default face frame according to the following formula to predict the Loss function Loss of the network layer of the face frame offset₁：

Wherein N is_regDefault number of face boxes for sampling, L_regRegression loss function of spatial bias corresponding to k-th default face box, k being 1 and N_regPositive integer between, T ═ T_x，t_y，t_w，t_hIs the bias information of the predicted face frame with respect to the corresponding sampled default face frame, z is an element belonging to the set { x, y, w, h }, T ═ T }^* _x，t^* _y，t^* _w，t^* _hThe's is the offset information of the real face frame relative to the corresponding sampling default face frame, smooth_L1Represents the smoothing L1 loss function, which is a variation of the L1 norm loss function, L represents smooth_L1An input variable of the function;

the calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence that the default face frame contains the face comprises:

taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a positive sample, and taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is smaller than or equal to the preset first relative area threshold value as a negative sample;

selecting all positive samples and part of negative samples according to a preset positive-negative sample proportion;

calculating Loss function Loss of the network layer for predicting the confidence coefficient of the face frame according to the following formula₂：

L_cls(p,p^*)＝-[p^*logp+(1-p^*)log(1-p)]

Wherein N is_clsFor the total number of positive and negative samples selected,classification loss function corresponding to ith class, i being 1 and N_clsP is the confidence that the selected positive sample or negative sample contains the face, p^*For the true probability that the selected positive or negative sample contains a face, p of the positive sample^*Is 1, p of negative example^*Is 0.

4. A training method as claimed in any one of claims 1 to 3, wherein said calculating an error between the loss function of the network layer for the predicted face frame bias and the loss function of the network layer for the predicted face frame confidence comprises:

calculating the gradient of the loss function of the network layer of the predicted face frame bias and the gradient of the loss function of the network layer of the predicted face frame confidence by adopting a random gradient descent method;

and taking the obtained gradient value as the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the confidence coefficient of the predicted face frame.

5. A face detection method based on a neural network is characterized in that the neural network comprises the following steps: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method comprises the following steps:

when a face detection instruction is received, inputting a face image to be detected into the face detection model according to any one of claims 1 to 4 for face detection;

for the face image to be detected, outputting the offset information of the predicted face frame relative to the corresponding default face frame through the network layer for predicting the offset of the face frame in the face detection model, and outputting the confidence coefficient of each default face frame including the face through the network layer for predicting the confidence coefficient of the face frame in the face detection model;

calculating corresponding predicted face frames according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame;

and selecting the predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.

6. The detection method according to claim 5, wherein the calculating the corresponding predicted face frame according to each default face frame and the offset information of the corresponding predicted face frame relative to each default face frame comprises:

according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, calculating the corresponding predicted face frame according to the following formula:

x＝t_x*w_a+x_a,y＝t_y*h_a+y_a

wherein (x)_a,y_a,w_a,h_a) Coordinates, width and height of a center point of each default face frame; (x, y, w, h) coordinates, width and height of a center point of the predicted face frame corresponding to each default face frame; (t)_x,t_y,t_w,t_h) Bias information for the corresponding predicted face frame relative to each default face frame.

7. The detection method according to claim 5 or 6, wherein after outputting, for the face image to be detected, the offset information of the predicted face frame relative to the corresponding default face frame through the network layer for predicting the offset of the face frame in the trained face detection model, and outputting the confidence that each default face frame contains a face through the network layer for predicting the confidence of the face frame in the trained face detection model, the method further comprises:

filtering out a default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold value;

calculating corresponding predicted face frames according to the rest default face frames and the bias information of the corresponding predicted face frames relative to the rest default face frames;

and taking the predicted face frames corresponding to the rest default face frames as final face detection results.

8. The method of claim 7, wherein after computing the corresponding predicted face frame according to each default face frame and the offset information of the corresponding predicted face frame relative to each default face frame, the method further comprises:

calculating the relative area of every two predicted face frames;

if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames; wherein the relative area is the area of the intersection region of the two predicted face frames divided by the area of the union region of the two predicted face frames;

and selecting the sampling prediction face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the sampling prediction face frames as a final face detection result, or selecting the sampling prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.

9. A training system for a face detection model based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, a first calculation module, a second calculation module, a third calculation module, a feedback and update module and an iteration output module; wherein,

the input module is used for inputting the face images in the training set into the neural network for training when receiving a model training instruction;

the first calculation module is used for calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;

the second calculation module is used for calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;

the third calculation module is used for calculating the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence;

the feedback and updating module is used for feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters;

and the iteration output module is used for repeating iteration training until the error between the adjusted predicted face frame and the adjusted real face frame is within a preset error range, and outputting a face detection model.

10. A face detection system based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, an output module, a calculation module and a selection module; wherein,

the input module is used for inputting a face image to be detected into a trained face detection model for face detection when receiving a face detection instruction;

the output module is used for outputting the offset information of the predicted face frame relative to the corresponding default face frame by the network layer for predicting the offset of the face frame by the face detection model according to any one of claims 1 to 4 aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by the network layer for predicting the confidence coefficient of the face frame by the face detection model;

the calculation module is used for calculating corresponding predicted face frames according to each default face frame and the bias information of the predicted face frames relative to each default face frame;

and the selection module is used for selecting the predicted face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.