CN109815814B

CN109815814B - Face detection method based on convolutional neural network

Info

Publication number: CN109815814B
Application number: CN201811572322.8A
Authority: CN
Inventors: 刘高华; 王萌; 苏寒松
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2023-01-24
Anticipated expiration: 2038-12-21
Also published as: CN109815814A

Abstract

The invention discloses a face detection method based on a convolutional neural network, which comprises the following steps: step (1), establishing a database; step (2), the images in the database are processed; pre-treating; step (3), training a deeply built learning network; and (4) testing the training result, wherein the detection accuracy of the human faces with shielding, different angles and side faces in the picture and the small and fuzzy human faces in the picture is high, the network structure is simple, the iteration parameters are less, and the training time is short.

Description

Face detection method based on convolutional neural network

Technical Field

The invention belongs to the field of computer vision and artificial intelligence, and particularly relates to a face detection method based on a convolutional neural network.

Background

The face detection is a process of determining the position and size of a face in an image with the face, is an important component in the field of computer vision, is also a key step of preprocessing during face recognition, and has great influence on subsequent work because the detection precision also determines the precision of the face recognition to a great extent, so that the face detection has great significance and practical value for the research of the face detection.

Human face detection has wide application in real life, such as personal authentication and security protection, in electronic products related to human face, such as media and entertainment, mobile phones and digital cameras, and image retrieval level. The face detection method can be roughly classified into a conventional detection method (including a detection method based on a matching template, a detection method based on a distance, and the like) and a detection method based on deep learning.

In recent years, deep learning is continuously perfected and developed, and the method is widely applied to classification and regression tasks. The face detection method based on deep learning is also continuously developed, but for the current method, taking the MTCNN method which is most commonly applied as an example, the recognition speed is not fast enough, the recognition accuracy is not high enough, and particularly, the face detection method is not easy to detect for the face which has a block in the image or video, or has different angles, sides and is small in the picture. As a preprocessing step in the face recognition process, the accuracy of face detection also greatly affects the accuracy of subsequent recognition work, so that solving the problems is very important.

Disclosure of Invention

Based on the prior art, the invention provides a face detection method based on a convolutional neural network, and particularly relates to the detection of a face which is in a side state or is illuminated in a picture and is very small in the picture.

The invention provides a face detection method based on a convolutional neural network, which comprises the following steps:

a face detection method based on a convolutional neural network comprises the following steps:

step 1, establishing a database to obtain image data, and preprocessing the image data to construct a convolutional neural network;

step 2, carrying out four times of iterative operation on the preprocessed data through an image feature analysis module in the convolutional neural network to generate image feature parameters;

step 3, operating the image characteristic parameters through a full connection layer in the convolutional neural network to generate an image one-dimensional vector;

and 4, classifying and regressing the one-dimensional vectors of the images through a classification layer in the convolutional neural network to obtain the position coordinates of the face images.

The step 2 of the image feature analysis module for preprocessing data comprises the following steps:

2.1, extracting image characteristics by a method of convolving the weight and the parameters of the preprocessed data by a convolution layer of the image characteristic analysis module;

2.2, an activation function layer of the image feature analysis module carries out nonlinear operation on the image features by applying a ReLu function to obtain nonlinear feature map parameters;

and 2.3, reducing the parameters of the nonlinear feature map by the maximum pooling layer of the image feature analysis module.

The classification layer in the step 4 performs classification and regression processes on the image one-dimensional vectors: comprises the following steps.

Step 4.1, iteration weight is carried out on the image one-dimensional vector by an optimization method of a random gradient descent method, so that a loss function is continuously adjusted, and a super-parameter during training is continuously adjusted to obtain an optimal training result, wherein the super-parameter comprises: iteration times, batches, maximum iteration times and learning rate;

step 4.2, the loss function selected in the classification process is to combine the central loss function with the softmax loss function

The specific expression of the method is as follows:

wherein L is _S As a softmax loss function, L _c For the central loss function, λ is a coefficient, indicating that the weights of both here are taken to be λ =0.1. Wherein Wx + b is the output of the full connection layer, and represents x after log _i Belong to the category y _i C represents the feature center of the category;

and 4.3, adopting a loss function in the regression process as follows: the Euclidean distance loss function has the following specific expression:

y _i ∈R ⁴

wherein,

is the output result of network prediction, and y is the true label of the mark, namely 68 face key pointsAnd (4) coordinates. And 4.4, comparing the coordinates of the 68 human face key points output under the optimal weight value condition with the coordinates of the human face key points with the labels in the database and the human face, and calculating the accuracy of the convolutional neural network for detecting the human face.

Advantageous effects

Compared with the prior art, the face detection method based on the convolutional neural network has the advantages that the detection accuracy rate of the face with shielding, different angles and side faces in the picture and the face with smaller and fuzzy face in the picture is higher, the network structure is simple, the iteration parameters are fewer, and the training time is shorter.

Drawings

FIG. 1 is a flow chart of a face detection method based on a convolutional neural network;

fig. 2 is a connection mode of a convolutional neural network used in a face recognition method based on a convolutional neural network provided by the present invention, which includes four convolutional layers, four ReLu activation function layers, four maximum pooling layers, and two full-link layers, wherein the last full-link layer is a softmax classification layer;

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

fig. 1 is a flowchart of a face detection method based on a convolutional neural network.

step 1 (110), establishing a database to obtain image data, preprocessing the image data and constructing a convolutional neural network;

in this step, a database is established to obtain image data, that is, the established database contains the pictures with the following requirements: the picture contains at least one face, the position of the face does not make requirements, and the face which is not in the center of the picture and is far away is better; the background of the face is complex and diverse, and comprises various indoor and outdoor scenes; the location of the face in the image is marked with a rectangular box and 68 key points including eyebrows, eyes, nose, mouth, face contour are marked. The image clarity is not required. The created database contains 6000 images containing faces and marked.

In the step, the images in the database are preprocessed, and the images in the established database are subjected to spatial pyramid pooling operation firstly, so that a plurality of images with different pixels and different scales can be obtained from one image, and a feature vector with a fixed size can be conveniently extracted from the features with the multiple scales; carrying out random mirroring on all the pictures generated in the step, wherein the random mirroring comprises up-down mirroring and left-right mirroring; 4/5 of the database images processed in the steps are used as a training database, and 1/5 of the database images are used as a testing database;

step 2, (210) carrying out four times of iterative operation on the preprocessed data through an image feature analysis module in the convolutional neural network to generate image feature parameters;

Sending a preprocessed test database image into a trained neural network, outputting classification and regression results after a test image passes through a trained neural network weight matrix and a classifier after characteristics are extracted, wherein the classification results are expressed in a probability form, if the probability of judging as the face is greater than the probability of judging as a non-face, judging as the face, and marking the part judged as the face by using a rectangular frame; the regression results in that 68 key points of the face part in the picture are marked by the key points, and the marked coordinates are returned.

And 3, operating the image characteristic parameters through a full connection layer in the convolutional neural network to generate an image one-dimensional vector (310).

And 4, classifying and regressing the one-dimensional vectors of the images through a classification layer in the convolutional neural network to obtain the position coordinates of the face images. The classification layer in the step 4 performs classification and regression processes on the image one-dimensional vectors: the method comprises the following steps:

step 4.1, iteration weights are carried out on the image one-dimensional vector by an optimization method of a random gradient descent method, so that a loss function is continuously adjusted, and a super-parameter during training is continuously adjusted to obtain an optimal training result, wherein the super-parameter comprises the following steps: iteration times, batches, maximum iteration times and learning rate;

step 4.2, the loss function selected in the classification process is a method for combining the central loss function with the softmax loss function, and the specific expression is as follows:

step 4.3, the loss function adopted in the regression process is as follows: the Euclidean distance loss function has the following specific expression:

y _i ∈R ⁴

wherein,

is the output of the network prediction, and y is the coordinates of the labeled real label, namely 68 face key points. Step 4.4, the coordinates of the 68 face key points output under the optimal weight value condition are compared with the coordinates of the labeled face key points in the database and the face, and therefore the convolutional neural network is calculatedThe accuracy rate for face detection.

The training task of the invention is integrally divided into two parts: classification and regression. The classification means that the human face detection problem is regarded as a two-classification problem of human face and non-human face; the regression refers to a process of returning the coordinates of the frame of the human face and the coordinates of the positions of the 68 key points of the human face after the training of the neural network, so that the purpose of detecting the human face is achieved. Continuously iterating and updating the weight in the network to reduce a loss function, thereby finally obtaining an optimal weight; and comparing the recognition result output under the condition of the optimal weight with the labeled human face key point coordinates and the human face in the database, thereby calculating the accuracy of the convolutional neural network for human face detection.

As shown in fig. 2, the convolutional neural network used in the face recognition method based on the convolutional neural network provided by the present invention includes four convolutional layers, four ReLu activation function layers, four maximum pooling layers, and two full-link layers, wherein the last full-link layer is a softmax classification layer. The convolution layer is used for extracting the characteristics of the image by utilizing a method of convolving the weight of the convolution layer with the parameters; the function layer is activated to increase the nonlinear capability of the network, wherein the ReLu function refers to a function of y = max (0, x); the maximum pooling layer is used for reducing the output size and parameters; the full connection layer is used for mapping the extracted features into one-dimensional vectors; the classification layer is used for classifying two parts of a human face and a non-human face from the features extracted by the network and regressing the coordinates of 68 key points of the human face. The whole training process is as follows: firstly, initializing parameters in a convolution layer and a full connection layer randomly, obtaining the characteristics of the face after four convolution, activation and pooling layers after sending the image in the established database to the network, obtaining the characteristic vector with a fixed size through the full connection layer, and finally obtaining the coordinates of the position of the face through the classification layer. The classification layer is used for classifying two parts of a human face and a non-human face from the features extracted by the network and regressing the coordinates of 68 key points of the human face. In the process of network training, data are transmitted in the forward direction of the network, errors obtained through loss functions are transmitted in the reverse direction of the network, parameters in the convolutional layer and the full connection layer are optimized continuously, and good training effects are obtained finally through continuous training and fine adjustment of various parameters.

The whole training process of the invention is as follows: firstly, initializing parameters in a convolution layer and a full connection layer randomly, obtaining the characteristics of the face after four convolution, activation and pooling layers after sending the image in the established database to the network, obtaining the characteristic vector with fixed size through the full connection layer, and finally obtaining the coordinates of the position of the face through the classification layer. In the process of network training, data are propagated in the forward direction of the network, errors obtained through loss functions are propagated in the reverse direction of the network, parameters in the convolutional layers and the full connection layers are optimized continuously, and various parameters are trained and fine-tuned continuously to obtain a good training effect finally. This step is performed by training the database to obtain the optimal parameters. In the whole training process, the error between the actual label and the prediction result is represented by the loss function, namely, the loss function is minimized, iterative training is continuously carried out, and when the loss function is minimized finally, the optimal parameter is obtained. The parameters to be trained include the convolution kernel and bias of the convolutional layer, and the neuron parameters in the fully-connected layer. In the whole training process, data are transmitted forward, errors obtained by calculation of a loss function are transmitted backward, and the network finds a global optimum point in the continuous iteration process through a gradient descent method, so that the optimum parameters are obtained. After the training is finished, the optimal network parameters are obtained, the optimal parameters are substituted into the whole network, and at the moment, the whole network has the face detection capability, so that the face detection can be carried out. And then, the accuracy of the neural network for face detection can be obtained through testing.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, but rather as the subject matter of any modification, equivalent arrangement, or improvement made within the spirit and principle of the present invention is included in the scope of the present invention.

Claims

1. A face detection method based on a convolutional neural network is characterized by comprising the following steps:

step 1, establishing a database to obtain image data, preprocessing the image data and constructing a convolutional neural network;

step 4, classifying and regressing the one-dimensional vectors of the images through a classification layer in a convolutional neural network to obtain the position coordinates of the face images; wherein: the classification layer in the step 4 performs classification and regression processes on the image one-dimensional vector, and the classification and regression process comprises the following steps:

wherein L is _S Is a softmax loss function, L _c Is a central loss function, and is represented by a coefficient, wherein the weight of the two is represented by lambda =0.1, and Wx + b is the output of a full connection layer and is represented by x after log _i Belong to the category y _i C represents the feature center of the category;

y _i ∈R ⁴

wherein,

is the output result of the network prediction, and y is the real label of the mark, namely the coordinates of the key points of 68 human faces;

and 4.4, comparing the coordinates of the 68 human face key points output under the optimal weight value condition with the coordinates of the human face key points with the labels in the database and the human face, and calculating the accuracy of the convolutional neural network for detecting the human face.

2. The face detection method based on the convolutional neural network as claimed in claim 1, wherein the image feature analysis module in step 2 processes the preprocessed data, comprising the following steps:

step 2.1, extracting image characteristics by a method of convolving the weight and the parameters of the preprocessed data by the convolution layer of the image characteristic analysis module;