CN106204779A

CN106204779A - The check class attendance method learnt based on plurality of human faces data collection strategy and the degree of depth

Info

Publication number: CN106204779A
Application number: CN201610504632.0A
Authority: CN
Inventors: 裴炤; 张艳宁; 彭亚丽; 马苗; 尚海星; 苏艺
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2016-06-30
Filing date: 2016-06-30
Publication date: 2016-12-07
Anticipated expiration: 2036-06-30
Also published as: CN106204779B

Abstract

The invention discloses a kind of check class attendance method learnt based on plurality of human faces data collection strategy and the degree of depth, for solving the technical problem of existing Work attendance method identification rate variance based on recognition of face.Technical scheme is to utilize AdaBoost algorithm and complexion model to carry out multi-target detection and extraction.Only once the face of all participation work attendances need to be shot one section of video, and the face in video sequence is detected, extracts, complete the foundation of face database.Face identification method based on degree of depth study, based on degree of depth convolutional neural networks LeNet 5 model, the face characteristic under scenes different in face database is learnt by LeNet 5 model that application simplifies, and obtains new feature by multilayered nonlinear conversion and represents.These new features are as much as possible to be eliminated as changed in the classes such as illumination, noise, attitude and expression, and retains and change between the class that identity difference produces, and improves face identification method at actual complex scene human face discrimination.

Description

Classroom attendance checking method based on multi-face data acquisition strategy and deep learning

Technical Field

The invention relates to an attendance checking method based on face recognition, in particular to a class attendance checking method based on a multi-face data acquisition strategy and deep learning.

Background

The document "A protocol of Automated Attendance System Using imaging processing, International Journal of Advanced Research in Computer and communication Engineering, Vol.5, Issue 4, April2016, p 501-505" discloses an Attendance method based on face recognition. The method adopts a traditional principal component analysis method to identify the detected human face. After the attendance checking person enters the attendance checking system, the system judges whether the face data of the attendance checking person exists in the database or not, if yes, the attendance checking person is directly identified, and the detection result is added into the database. If the face data does not exist, the face data is required to be acquired firstly. The method needs to acquire face data of the attendance one by one before recognition. However, in practical situations, the number of attendance checking personnel is large, and if the face data is collected one by one, a large amount of time is consumed, and the data collection efficiency is low. Moreover, the method requires the attendance to autonomously complete the acquisition of the face data, and the quality of the acquired face data is difficult to ensure. In addition, the method has the advantages of simple background, stable illumination and single facial expression during face recognition, however, in actual attendance, many attendance personnel have, the changes of background, illumination, posture, expression and the like are very complex, and the traditional face recognition method based on principal component analysis has poor recognition rate under the actual complex condition.

Disclosure of Invention

In order to overcome the defect of poor identification rate of the existing attendance checking method based on face identification, the invention provides a classroom attendance checking method based on a multi-face data acquisition strategy and deep learning. The method utilizes an AdaBoost algorithm and a skin color model to carry out multi-target detection and extraction. Only one video is shot for all the faces participating in attendance, and the faces in the video sequence are detected and extracted to complete the establishment of the face database. The problems that in actual attendance, human face data acquisition is time-consuming and labor-consuming and is difficult to acquire in a unified mode are solved, and mass human face data can be acquired more easily. In addition, the face recognition method based on deep learning is based on a deep convolutional neural network LeNet-5 model, the simplified LeNet-5 model is applied to learn the face features of different scenes in a face database, and new feature representation is obtained through multilayer nonlinear transformation. The new features remove the intra-class changes such as illumination, noise, posture, expression and the like as much as possible, and keep the inter-class changes generated by different identities, thereby improving the face recognition rate of the face recognition method in the actual complex scene.

The technical scheme adopted by the invention for solving the technical problems is as follows: a classroom attendance checking method based on multi-face data acquisition strategies and deep learning is characterized by comprising the following steps:

(a) and acquiring face data.

The method comprises the steps of shooting a video sequence of 30 seconds on the face of a person to be collected, and in the video shooting process, the person to be collected formulates a series of rules to simulate possible changes of the face in actual attendance. Including smiling, expression changes of frowning, and action changes of mouth opening, head raising, head lowering, and changing face orientation. The person to be collected carries out expression and action changes in the video shooting process according to the requirements of the person to be collected.

(b) And performing multi-face detection by using an AdaBoost algorithm and combining a skin color model.

Combining an AdaBoost algorithm with a skin color model, positioning the position of a human face through the AdaBoost algorithm, and then performing skin color check on the human face by using the skin color model, wherein the method comprises the following steps:

firstly, a classifier for face detection is generated by using an Adaboost algorithm, and preliminary face detection is carried out.

Secondly, checking a human face region preliminarily determined by adopting a skin color model, and comparing pixels in the image with a standard skin color to distinguish a skin color region from a non-skin color region in the image. When a standard skin color range is set, three color spaces are adopted: RGB color space, HSV color space, YCbCr color space.

Two RGB standard skin color models are set. Threshold range for model one: g >40, B >20, R > G, R > B, MAX (R, G, B) -MIN (R, G, B) > 15; threshold range for model two: r >220, | R-G | <15, R > G, R > B.

Converting the RGB color into HSV color by adopting formulas (1), (2), (3) and (4), and setting an HSV standard skin color threshold range: 0< H <50, 0.23< S < 0.68.

H = \{\begin{matrix} H i, & B \leq G \\ 360 - H, & B &GreaterEqual; G \end{matrix} - - - (1)

In the formula, H represents a hue. Wherein,

H i = \frac{\frac{1}{2 (R - G) + (R - B)}}{\sqrt{\sqrt{{(R - G)}^{2} + (R - G) \times (G - B)}}} - - - (2)

where R is the value of the red channel, G is the value of the green channel, and B is the value of the blue channel.

S = \frac{M A X (R, G, B) - M I N (R, G, B)}{M A X (R, G, B)} - - - (3)

Wherein S is saturation.

V = \frac{M A X (R, G, B)}{255} - - - (4)

Wherein V is lightness.

Converting the RGB color into YCbCr color by using formula (5), and then setting the YCbCr standard skin color threshold range as follows: y >20, 135< Cr <180, 85< Cb < 135.

\{\begin{matrix} Y = 0.299 R + 0.587 G + 0.114 B \\ C b = (B - Y) \times 0.564 + 128 \\ C r = (R - Y) \times 0.713 + 128 \end{matrix} - - - (5)

Where Y is the luminance component, Cb is the blue chrominance component, and Cr is the red chrominance component.

(c) And the range of the center position of the face is restricted.

Extracting 20 frames of images from the collected video sequence, and calculating the interval g between each frame by using a formula (6)

g = \frac{t \times f}{20} - - - (6)

Wherein t is the length of the shot video, and f is the number of frames per second of the shot video.

Secondly, after the extracted 20 frames of images are detected through an AdaBoost algorithm and a skin color model, the face coordinates obtained through detection are stored, and the average value of the center coordinates of different faces is calculated by using formulas (7) and (8)

x_{c} = \frac{x_{γ} - x_{l}}{2} - - - (7)

y_{c} = \frac{y_{γ} - y_{l}}{2} - - - (8)

In the formula (x)_r，y_r) To detect the coordinates of the lower right corner of the face, (x)_l，y_l) Is the coordinate of the upper left corner.

Comparing the calculated average value with the face center coordinates in the actual image to obtain the error range of the face center coordinates, and adding constraint conditions according to the error range

x_c-m≤x_{c_real}≤x_c+m (9)

y_c-n≤y_{c_real}≤y_c+n (10)

In the formula (x)_{c_real}，y_{c_real}) The coordinates of the center position of the face obtained by actual detection.

(d) And extracting and processing the detected face to complete the establishment of the face database.

The detected face is extracted and converted into a face gray image with the size of 28 x 28 pixels. And storing the processed face images according to different constraint conditions to complete the establishment of an actual attendance face database.

(e) And (5) training the model.

And inputting the face gray level image with the size of 28 multiplied by 28 pixels in the face database as training data into the deep convolution neural network model for repeated iterative training to finish the training of the model. The specific training process is as follows:

the training process is divided into two steps: forward propagation and backward propagation.

The purpose of forward propagation is to feed training data into the network to obtain the stimulus response. Comprises a convolutional layer and a downsampling layer.

Firstly, the convolution layer is processed, and the convolution characteristic extracted by the convolution layer l is obtained by applying a formula (11) on the convolution layer I.

x_{j}^{l} = f (Σ_{i &Element; M_{j}} x_{j}^{l - 1} * k_{j}^{l} + b_{j}^{l}) - - - (11)

In the formula,for convolution characteristics of convolution layer l, M_jIn order to select the input set of feature maps,is a convolution kernel on the convolution layer l,for the bias values on convolutional layer l, the convolutional characteristic of convolutional layer l is obtained by activating function f.

After the convolution characteristics of the convolution layer are obtained, the formula (12) is applied to carry out downsampling processing on the convolution characteristics, namely carrying out aggregation statistics on the characteristics at different positions.

x_{j}^{l} = f (β_{j}^{l} d o w n (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

Where down (-) is the downsampling function, β is the multiplicative bias, and b is the additive bias.

② back propagation, weight and bias are adjusted by minimizing residual error. Comprises a convolutional layer and a downsampling layer.

For the convolutional layer, the next layer is the downsampled layer, and the residual is calculated using equation (13).

Where up (-) is an upsampling function, which represents the multiplication of corresponding elements in the matrix.

Wherein,

u^l＝W^lx^l-1+b^l(14)

x^l＝f(u^l) (15)

wherein f is an activation function.

From the obtained residual errorThe gradient of the bias b is calculated using equation (16).

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (16)

In the formula, u and v represent coordinate values in the feature map.

Definition ofFor a block of n × n pixels multiplied element by element when convolved, the convolution kernel gradient is calculated by applying equation (17).

\frac{\partial E}{\partial k_{i j}^{l}} = Σ_{u, v} {(δ_{j}^{l})}_{u v} {(p_{i}^{l - 1})}_{u v} - - - (17)

The value of the (u, v) position of the output convolution feature map is the pixel block of n × n of the (u, v) position in the previous layer and the convolution kernel k_ijElemental phase by elemental phase results.

The downsampled layers are processed, the next layer of the downsampled layers is a convolutional layer, and the downsampled feature map residual is calculated by applying the formula (18).

In the formula,represents the convolution kernelThe matrix is rotated by 180, i.e. the matrix elements are swapped diagonally. conv2 is a full convolution function, ' full ' indicating that the matrix obtained for full convolution is filled with 0's in the absence.

After the residual is obtained, the gradient of the offset b is calculated by applying equation (19).

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (19)

In the formula, u and v represent coordinate values in the feature map.

Definition ofThe convolution kernel gradient is calculated using equation (20) using the residual found.

And thirdly, outputting the layer to classify the characteristics. The essence of the method is a classifier, and a softmax classifier is adopted to solve the problem of multi-classification.

(f) The trained model is tested using the test data.

Acquisition of test data.

In application, images are shot for all the collected persons. Detecting the face appearing in the image, processing the detected face to obtain a face gray image with the size of 28 multiplied by 28 pixels, and storing the processed image as test data.

Secondly, inputting the test data into the trained model, recognizing the input test data by the model, outputting the face label on the corresponding training set, and finishing face recognition.

(g) And (3) performing multiple experiments by adjusting the layer number of the convolutional neural network, the number of convolutional kernels of each layer of the network and the learning rate, namely repeating the step (e) and the step (f), comparing the experiment results, selecting the model parameter with the highest recognition rate, and storing the parameter and the training model.

The invention has the beneficial effects that: the method utilizes an AdaBoost algorithm and a skin color model to carry out multi-target detection and extraction. Only one video is shot for all the faces participating in attendance, and the faces in the video sequence are detected and extracted to complete the establishment of the face database. The problems that in actual attendance, human face data acquisition is time-consuming and labor-consuming and is difficult to acquire in a unified mode are solved, and mass human face data can be acquired more easily. In addition, the face recognition method based on deep learning is based on a deep convolutional neural network LeNet-5 model, the simplified LeNet-5 model is applied to learn the face features of different scenes in a face database, and new feature representation is obtained through multilayer nonlinear transformation. The new features remove the intra-class changes such as illumination, noise, posture, expression and the like as much as possible, and keep the inter-class changes generated by different identities, thereby improving the face recognition rate of the face recognition method in the actual complex scene.

The present invention will be described in detail with reference to the following embodiments.

Detailed Description

The classroom attendance checking method based on the multi-face data acquisition strategy and deep learning specifically comprises the following steps:

1. and acquiring face data.

The method completes the acquisition of the face data by shooting the video of the person to be acquired, and obtains a video sequence of about 30 seconds by shooting.

In the video shooting process, an acquirer formulates a series of rules to simulate the changes of the face possibly occurring in the actual attendance. Including smiling, frowning, and other expression changes, and mouth opening, head raising, head lowering, and facial orientation changing. And the person to be collected changes expressions and actions in the video shooting process according to the instruction of the person to be collected.

2. Multi-face detection by using AdaBoost algorithm and combining skin color model

The invention combines the AdaBoost algorithm with the skin color model, positions the face position through the AdaBoost algorithm, and then performs skin color check on the face position by using the skin color model, thereby greatly reducing the false detection rate during face detection. The main implementation method comprises the following steps:

Secondly, checking a human face region preliminarily determined by adopting a skin color model, and comparing pixels in the image with a standard skin color so as to distinguish a skin color region from a non-skin color region in the image. When setting the range of standard skin color, three color spaces are adopted: RGB color space, HSV color space, YCbCr color space.

H = \{\begin{matrix} H i, & B \leq G \\ 360 - H, & B &GreaterEqual; G \end{matrix} - - - (1)

In the formula, H represents a hue. Wherein,

H i = \frac{\frac{1}{2 (R - G) + (R - B)}}{\sqrt{\sqrt{{(R - G)}^{2} + (R - G) \times (G - B)}}} - - - (2)

S = \frac{M A X (R, G, B) - M I N (R, G, B)}{M A X (R, G, B)} - - - (3)

Wherein S is saturation.

V = \frac{M A X (R, G, B)}{255} - - - (4)

Wherein V is lightness.

\{\begin{matrix} Y = 0.299 R + 0.587 G + 0.114 B \\ C b = (B - Y) \times 0.564 + 128 \\ C r = (R - Y) \times 0.713 + 128 \end{matrix} - - - (5)

3. And the range of the center position of the face is restricted.

When the face data is actually collected, the expression and the action of the face are specified by the collector, so that the amplitude of the face change in the video is small, and the appearance range of the face is easy to determine.

g = \frac{t \times f}{20} - - - (6)

Secondly, after the extracted 20 frames of images are detected through an AdaBoost algorithm and a skin color model, the coordinates of the detected human face are stored, and the average value of the central coordinates of different human faces is calculated by using formulas (7) and (8)

x_{c} = \frac{x_{γ} - x_{l}}{2} - - - (7)

y_{c} = \frac{y_{γ} - y_{l}}{2} - - - (8)

x_c-m≤x_{c_real}≤x_c+m (9)

y_c-n≤y_{c_real}≤y_c+n (10)

4. And extracting and processing the detected face to complete the establishment of the face database.

The detected face is extracted and converted into a grayscale image of 28 pixels × 28 pixels in size. And storing the processed face images according to different constraint conditions to complete the establishment of an actual attendance face database.

5. And (5) training the model.

And inputting the face image with the size of 28 pixels multiplied by 28 pixels in the face database as training data into the deep convolution neural network model for repeated iterative training to finish the training of the model. The specific training process is as follows:

the training process is mainly divided into two steps: forward propagation and backward propagation.

x_{j}^{l} = f (Σ_{i &Element; M_{j}} x_{j}^{l - 1} * k_{j}^{l} + b_{j}^{l}) - - - (11)

After the convolution characteristics of the convolution layer are obtained, the formula (12) is applied to carry out downsampling processing on the convolution characteristics, namely carrying out aggregation statistics on the characteristics at different positions. The resulting summary statistical features not only have much lower dimensionality than the features obtained using all the extractions, but also improve the results and are not easily overfitting.

x_{j}^{l} = f (β_{j}^{l} d o w n (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

Where down (-) is the down-sampling function, β is the multiplicative bias, and b is the additive bias.

Wherein,

u^l＝W^lx^l-1+b^l(14)

x^l＝f(u^l) (15)

wherein f is an activation function.

Residual error obtained from the aboveThe gradient of the bias b can be calculated, applying equation (16).

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (16)

In the formula, u and v represent coordinate values in the feature map.

\frac{\partial E}{\partial k_{i j}^{l}} = Σ_{u, v} {(δ_{j}^{l})}_{u v} {(p_{i}^{l - 1})}_{u v} - - - (17)

After the residual is obtained, the gradient of the bias b is calculated again and equation (19) is applied.

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (19)

In the formula, u and v represent coordinate values in the feature map.

Definition ofThe convolution kernel gradient is calculated using the residual error found, and equation (20) is applied.

6. The trained model is tested using the test data.

Acquisition of test data

In practice, images are taken of all students during a class. Detecting the face appearing in the picture, processing the detected face to obtain a face gray image with the size of 28 pixels multiplied by 28 pixels, and storing the processed image as test data.

7. And (3) performing multiple experiments by adjusting parameters such as the number of layers of the convolutional neural network, the number of convolutional kernels of each layer of the network, the learning rate and the like, namely repeating the steps 5 and 6, comparing the experiment results, selecting the model parameter with the highest recognition rate, and storing the parameter and the training model.

In a word, the invention provides a classroom attendance method based on a multi-face data acquisition strategy and deep learning, the problem that massive face data are difficult to acquire one by one in actual attendance is solved by using the multi-face data acquisition strategy, and the face acquisition efficiency is greatly improved. Meanwhile, a large number of facial images in a complex actual environment are applied to train a deep learning model, the model learns new features from the deep learning model, the new features remove intra-class changes such as illumination, noise, postures and expressions as much as possible, and inter-class changes generated by different identities are reserved, so that the problem of poor face recognition rate of a traditional face recognition method in an actual complex scene is solved.

Claims

1. A classroom attendance checking method based on a multi-face data acquisition strategy and deep learning is characterized by comprising the following steps:

(a) acquiring face data;

shooting a video sequence of 30 seconds on the face of an acquired person, and in the video shooting process, the acquiring person makes a series of rules to simulate the possible changes of the face in the actual attendance; the facial expression changes comprise smiling and frown expression changes and action changes of opening mouth, raising head, lowering head and changing face orientation; the person to be collected changes expression and action in the video shooting process according to the requirement of the collector;

firstly, generating a classifier for face detection by using an Adaboost algorithm, and carrying out primary face detection;

checking a human face region preliminarily determined by adopting a skin color model, and comparing pixels in the image with a standard skin color to distinguish a skin color region from a non-skin color region in the image; when a standard skin color range is set, three color spaces are adopted: RGB color space, HSV color space, YCbCr color space;

setting two RGB standard skin color models; threshold range for model one: g >40, B >20, R > G, R > B, MAX (R, G, B) -MIN (R, G, B) > 15; threshold range for model two: r >220, | R-G | <15, R > G, R > B;

converting the RGB color into HSV color by adopting formulas (1), (2), (3) and (4), and setting an HSV standard skin color threshold range: 0< H <50, 0.23< S < 0.68;

H = \{\begin{matrix} H i, & B \leq G \\ 360 - H, & B &GreaterEqual; G \end{matrix} - - - (1)

wherein H is a hue; wherein,

H i = \frac{\frac{1}{2 (R - G) + (R - B)}}{\sqrt{\sqrt{{(R - G)}^{2} + (R - G) \times (G - B)}}} - - - (2)

wherein R is the value of the red channel, G is the value of the green channel, and B is the value of the blue channel;

S = \frac{M A X (R, G, B) - M I N (R, G, B)}{M A X (R, G, B)} - - - (3)

wherein S is saturation;

V = \frac{M A X (R, G, B)}{255} - - - (4)

wherein V is lightness;

converting the RGB color into YCbCr color by using formula (5), and then setting the YCbCr standard skin color threshold range as follows: y >20, 135< Cr <180, 85< Cb < 135;

\{\begin{matrix} Y = 0.299 R + 0.587 G + 0.114 B \\ C b = (B - Y) \times 0.564 + 128 \\ C r = (R - Y) \times 0.713 + 128 \end{matrix} - - - (5)

wherein Y is a luminance component, Cb is a blue chrominance component, and Cr is a red chrominance component;

(c) the range of the center position of the face is restrained;

g = \frac{t \times f}{20} - - - (6)

Wherein t is the length of the shot video, and f is the frame number of the shot video per second;

x_{c} = \frac{x_{γ} - x_{l}}{2} - - - (7)

y_{c} = \frac{y_{γ} - y_{l}}{2} - - - (8)

In the formula (x)_r，y_r) To detect the coordinates of the lower right corner of the face, (x)_l，y_l) Coordinates of the upper left corner;

x_c-m≤x_{c_real}≤x_c+m (9)

y_c-n≤y_{c_real}≤y_c+n (10)

In the formula (x)_{c_real}，y_{c_real}) The coordinates of the center position of the face are obtained by actual detection;

(d) extracting and processing the detected face to complete the establishment of a face database;

extracting a detected face, and converting the detected face into a face gray image with the size of 28 multiplied by 28 pixels; storing the processed face images according to different constraint conditions to complete the establishment of an actual attendance face database;

(e) training a model;

inputting a face gray image with the size of 28 multiplied by 28 pixels in a face database as training data into a deep convolution neural network model for repeated iterative training to finish the training of the model; the specific training process is as follows:

the training process is divided into two steps: forward propagation and backward propagation;

the purpose of forward propagation is to send training data into a network to obtain an excitation response; comprises a convolution layer and a down-sampling layer;

firstly, processing the convolution layer, and applying a formula (11) on the first convolution layer to obtain convolution characteristics extracted from the convolution layer;

x_{j}^{l} = f (Σ_{i &Element; M_{j}} x_{j}^{l - 1} * k_{j}^{l} + b_{j}^{l}) - - - (11)

in the formula,for convolution characteristics of convolution layer l, M_jIn order to select the input set of feature maps,is a convolution kernel on the convolution layer l,obtaining the convolution characteristic of the convolution layer l for the offset value on the convolution layer l through an activation function f;

after the convolution characteristics of the convolution layer are obtained, the formula (12) is applied to carry out downsampling processing on the convolution characteristics, namely, aggregation statistics is carried out on the characteristics at different positions;

x_{j}^{l} = f (β_{j}^{l} d o w n (x_{j}^{l - 1}) + b_{j}^{l}) - - - (12)

wherein down (-) is a down-sampling function, β is a multiplicative bias, and b is an additive bias;

backward propagation, and adjusting the weight and the bias by minimizing residual errors; comprises a convolution layer and a down-sampling layer;

for the convolutional layer, the next layer is a downsampled layer, and the residual error is calculated by using the formula (13);

wherein up (·) is an upsampling function, o represents multiplication of corresponding elements in a matrix;

wherein,

u^l＝W^lx^l-1+b^l(14)

x^l＝f(u^l) (15)

wherein f is an activation function;

from the obtained residual errorCalculating the gradient of the bias b by applying the formula (16);

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (16)

in the formula, u and v represent coordinate values in the feature map;

definition ofCalculating a convolution kernel gradient by applying a formula (17) for the pixel block of n × n multiplied by element when in convolution;

\frac{\partial E}{\partial k_{i j}^{l}} = Σ_{u, v} {(δ_{j}^{l})}_{u v} {(p_{i}^{l - 1})}_{u v} - - - (17)

the value of the (u, v) position of the output convolution feature map is the pixel block of n × n of the (u, v) position in the previous layer and the convolution kernel k_ijElemental phase by elemental phase results;

processing down-sampling layers, wherein the next layer of the down-sampling layers is a convolution layer, and calculating a down-sampling feature map residual error by applying a formula (18);

in the formula,represents the convolution kernelThe matrix is rotated by 180 degrees, namely, the matrix elements are exchanged according to the diagonal; conv2 is a full convolution function, and 'full' indicates that the matrix obtained by full convolution is filled with 0 in a vacancy;

after obtaining the residue, calculating the gradient of the bias b by applying a formula (19);

\frac{\partial E}{\partial b_{j}} = Σ_{u, v} (δ_{j}^{l}) u v - - - (19)

in the formula, u and v represent coordinate values in the feature map;

definition ofCalculating the gradient of the convolution kernel by using the obtained residual error and applying a formula (20);

thirdly, outputting a layer to classify the characteristics; the essence is a classifier, and a softmax classifier is adopted to solve the problem of multi-classification;

(f) testing the trained model by using the test data;

firstly, acquiring test data;

in application, all the collected persons are photographed with images; detecting a face appearing in the image, processing the detected face to obtain a face gray image with the size of 28 multiplied by 28 pixels, and storing the processed image as test data;

inputting the test data into the trained model, identifying the input test data by the model, outputting a face label on a corresponding training set, and completing face identification;