CN108319943B

CN108319943B - Method for improving face recognition model performance under wearing condition

Info

Publication number: CN108319943B
Application number: CN201810377373.9A
Authority: CN
Inventors: 李继凯
Original assignee: Beijing Uwonders Technology Co ltd
Current assignee: Beijing Uwonders Technology Co ltd
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2021-10-12
Anticipated expiration: 2038-04-25
Also published as: CN108319943A

Abstract

A method for improving the performance of a face recognition model under the condition of wearing glasses comprises the following steps: aiming at the existing face training data without wearing glasses, the face training data is expanded into the face training data with glasses through a face image automatic glasses adding algorithm; and training by using the extended glasses-wearing face training data to obtain a face recognition model. The method has the advantages that the glasses are added to the face image without glasses, the training data of the existing glasses without glasses can be rapidly and conveniently expanded to the training data with glasses, the scale and the diversity of the training data are improved, compared with the face training data of newly-built glasses, the method is low in cost, simple, convenient and quick, obvious in effect, and capable of saving a large amount of labor and financial cost expenses. Meanwhile, the face recognition model is trained through the expanded training samples, so that the face recognition model has better anti-interference capability and recognition effect on the face of the person wearing glasses, and the overall recognition accuracy is greatly improved.

Description

Method for improving face recognition model performance under wearing condition

Technical Field

The invention belongs to the field of computer vision, relates to a face detection and recognition method, and particularly relates to a method for improving the performance of a face recognition model under the condition of wearing glasses.

Background

In order to improve the recognition effect of the face recognition algorithm, a large amount of training data is usually required to perform training, so that a face recognition model with excellent performance can be obtained. On the premise that the model structure is the same, the scale and diversity of the training data will have a decisive influence on the final performance of the model. In practical applications, the face recognition technology generally faces the following problems: firstly, most of the existing training data are training data without wearing glasses, and a face recognition model obtained through the training data has poor face recognition effect on wearing glasses; meanwhile, if large-scale human face training data with glasses are newly created, a large amount of manpower and financial resources are consumed, and the time period is long; secondly, as a ubiquitous phenomenon, a user does not wear glasses during face recognition registration, but wears glasses during detection, and in this case, a face recognition model established during registration cannot generally associate the same face without glasses and with glasses, so that the face recognition effect is also poor.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method which is low in cost and simple in steps and can obviously improve the performance of a face recognition model under the condition of wearing glasses.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for improving the performance of a face recognition model under the condition of wearing glasses comprises the following steps: (1) aiming at the existing face training data without wearing glasses, the face training data is expanded into the face training data with glasses through a face image automatic glasses adding algorithm; (2) and training by using the extended glasses-wearing face training data to obtain a face recognition model.

Further, the automatic glasses adding algorithm for the face image in the step (1) comprises the following steps: (1.1) positioning the face position and the key point position of the five sense organs of a face image in face training data by using a face detection algorithm based on a cascade convolution network; (1.2) estimating a face attitude angle by using the positions of five sense organs; (1.3) carrying out angle transformation on the glasses material image by using the human face attitude angle; and (1.4) carrying out pixel-level local weighted summation on the transformed glasses material image and the face image to obtain the face image with glasses.

Further, when the transformed glasses material image and the face image are subjected to pixel-level local weighted summation in the step (1.4), the positions of the glasses material image are disturbed in the vertical direction, and a plurality of glasses-wearing face images with different glasses positions are obtained.

Further, in the step (1.4), a plurality of different weights are used for weighting and summing to obtain a plurality of glasses-wearing face images with different glasses lens reflection effects.

Further, the step (2) comprises the steps of: (2.1) classifying and calibrating the extended glasses-wearing face training data; (2.2) establishing a face recognition model based on a residual error depth convolution network, and extracting deeper features of the image; (2.3) establishing a classification loss function of the softmax function based on the enhanced classification interval for evaluating the classification error of the network; (2.4) optimizing the classification loss function by using an optimization method of error back propagation and random gradient descent; and (2.5) through multiple iterative computations, the classification loss function is reduced and converged, and the face recognition model is obtained.

Further, the classifying and calibrating method in step (2.1) includes: the face images of the same person are consistent in label and different from those of other face images.

Further, the classified cost function in step (2.3) is:

where n is the total number of training samples, s is the L2 norm of the feature, m is the offset term, y_iIs a set of samples that are of a type,

as the angle between the eigenvector x and the net weight vector W, the L2 norm of the weight vector W is normalized to 1,

is a normalized feature vector with the length of n.

A human face recognition method, utilize the human face recognition model that the above-mentioned method of improving the human face recognition model performance under the wearing glasses condition obtains, carry on the characteristic calculation to the human face picture needing to be discerned, and carry on the similarity evaluation with the known human face characteristic; judging the known face with the maximum similarity and higher than a threshold value, namely the known face is a recognition result; and if the similarity of all the known faces is smaller than the threshold value, judging that the face is a strange face.

According to the method for improving the performance of the face recognition model under the condition of wearing glasses, the glasses are added to the face image without wearing glasses, the existing training data without wearing glasses can be rapidly and conveniently expanded into the training data with glasses, the scale and the diversity of the training data are improved, and compared with the newly-built face training data with glasses, the method is low in cost, simple, convenient and rapid, obvious in effect and capable of saving a large amount of labor and financial cost expenses. Meanwhile, the face recognition model is trained through the expanded training sample, so that the face recognition model obtained by deep convolutional network training has better anti-interference capability and recognition effect on the face of the wearer, and the overall recognition accuracy is greatly improved.

Drawings

FIG. 1 is a flow chart of a method for improving the performance of a face recognition model in an embodiment with glasses;

FIG. 2 is a first network architecture diagram of a cascaded convolutional network in an embodiment;

FIG. 3 is a diagram showing a fourth network structure of the concatenated convolutional network in the embodiment;

fig. 4 is a network configuration diagram of a residual unit in the embodiment.

Detailed Description

The following will further describe an embodiment of the method for improving the performance of the face recognition model under the condition of wearing glasses according to the present invention with reference to the accompanying drawings 1 to 4. The method for improving the performance of the face recognition model under the condition of wearing glasses is not limited to the description of the following embodiments.

As shown in fig. 1, a method for improving the performance of a face recognition model under a wearing condition mainly comprises the following two steps:

(1) aiming at the existing face training data without glasses, glasses are added to each face data in a training set one by one through a face image automatic glasses adding algorithm, so that the training data is expanded into face training data with glasses;

(2) and training by using the extended glasses-wearing face training data to obtain a face recognition model.

Wherein, the basic processing thought of the step (1) is as follows: firstly, carrying out key point detection on an existing face database by using a cascade convolution network, and acquiring the inclination angle of a face according to the position of eyes; carrying out affine transformation of the same inclination angle on the glasses material image, carrying out corresponding pixel weighted summation on the transformed glasses image and the face image, and adjusting the weight to obtain the face image of the glasses with different light reflection effects; and finally, combining the face image with the glasses with the original face database, wherein the face image of the same person is endowed with the same label, and the labels of different persons are different. The specific implementation mode of the step (1) is as follows:

and (1.1) positioning the face position and the key point position of the five sense organs of the face image in the face training data by using a face detection algorithm based on a cascade convolution network. The cascade convolution network adopted in the step comprises four convolution neural networks, and the first three networks are formed by connecting basic operation layers such as traditional convolution layers, pooling layers and the like in series.

Taking the first convolutional neural network as an example, the structure is shown in fig. 2, where data represents an input image, conv represents a convolution operation, prilu represents an excitation function operation, pool represents a pooling operation, and prob represents the confidence of the output. The conv4-2 layer outputs the coordinate position of the target, the prob1 outputs the confidence that the target is a human face, and the activation function PRelu is as follows:

wherein x is_iIs an input to an activation function, alpha_iIs a positive coefficient.

The fourth convolutional neural network performs position estimation of five key points on the basis of the face position output by the third convolutional neural network, and the structure of the fourth convolutional neural network is shown in fig. 3, wherein data represents a whole face image, slice _ data represents that 5 paths of splitting are performed on input data, conv represents a convolutional layer, PRelu represents a parameterized Relu activation function layer, pool represents a pooling layer, fc represents a full convolutional layer, and concat represents that 5 paths of data are connected. The slicer _ dara divides the whole human face into five local data according to expected positions of five key point adjacent domains of an average face through Slice operation, and each path of data is subjected to convolution, pooling and other operations to extract local features. After the local features are extracted, the five local features are connected together through a concat layer, and then the mutual correlation among the five local features is further mined through a full connection layer. Finally, the positions of the five keypoints are estimated by combining the local features and the global features.

And (1.2) estimating the face attitude angle by using the positions of the five sense organs. Let the coordinates of the key points of the left eye be (x)₁,y₁) The coordinate of the key point of the right eye is (x)₂,y₂) Then, the inclination angle θ of the face in the horizontal direction is:

and (1.3) carrying out angle transformation on the glasses material image by using the human face attitude angle. Assuming that the pixel coordinate of the original image is (x, y) and the corresponding pixel coordinate of the image after the corresponding angle transformation is (x ', y'), the two satisfy the following relationship:

and (1.4) carrying out pixel-level local weighted summation on the transformed glasses material image and the face image to obtain the face image with glasses. The weighted sum weight is transformed and slight disturbance is carried out on the position of the glasses in the vertical direction, so that the diversity of samples is increased; the light reflection effect with different intensities can be obtained by adjusting the weight of the lens area, and the anti-interference capability of the model on the illumination influence can be increased.

The basic processing idea of the step (2) is as follows: firstly, performing data enhancement operations such as illumination change, left and right mirror image, chromaticity change and the like on marked face data; then, the residual error is sent into a residual error network of a softmax loss function with a loss function of increasing the class interval for training; and finally obtaining the face recognition model. The specific implementation manner of the step (2) is as follows:

and (2.1) classifying and calibrating the extended glasses-wearing face training data. The face images of the same person are consistent in label and different from those of other persons in label.

And (2.2) establishing a face recognition model based on the residual error deep convolution network. By adopting the residual error network, the number of network layers of the convolution network can be increased on the premise that the gradient disappears, so that the extraction of deeper features of the image is facilitated.

The network structure of the residual unit is shown in fig. 4, where conv represents a convolutional layer, relu represents an activation function layer, and res represents a residual layer. The res layer carries out corresponding pixel difference operation on two paths of input data, and aims to deepen the network and introduce more parameters while avoiding gradient disappearance so as to improve the performance of the network.

And (2.3) establishing a classification cost function based on the softmax function of the enhanced classification interval for evaluating the classification error of the network. Increasing the classification interval will help to improve the distinctiveness of different human face features. The classification cost function is:

where n is the total number of training samples, s is the L2 norm of the feature, m is the offset term to increase the class spacing, y_iIn the form of a sample of the type,

is the angle between the eigenvector and the network weight vector,

is a normalized feature vector with the length of n.

And (2.4) optimizing the objective function by using an optimization method of error back propagation and random gradient descent.

And (2.5) through repeated iterative computation, the loss function is reduced and converged, and the face recognition model for optimizing the face recognition performance of the glasses is obtained.

The method for judging whether the face image is a certain known face or a strange face by using the face recognition model obtained by the method comprises the following steps: firstly, feature calculation is carried out on a face image to be recognized, and cosine similarity evaluation is carried out on the face image and known face features; then, the face image is judged to be a certain known face or a strange face. The specific judging method comprises the following steps: judging the known face with the maximum similarity and higher than a threshold value, namely the known face is a recognition result; and if the similarity of all the known faces is smaller than the threshold value, judging that the face is a strange face. The face recognition model obtained by the method is used for recognizing the face image, and the recognition effect of the face recognition model is obviously superior to that of the face recognition model obtained by the traditional method.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A method for improving the performance of a face recognition model under the condition of wearing glasses is characterized in that: the method comprises the following steps:

(1) aiming at the existing face training data without wearing glasses, the face training data is expanded into the face training data with glasses through a face image automatic glasses adding algorithm;

(2) training by using the extended glasses-wearing face training data to obtain a face recognition model;

wherein, the automatic glasses adding algorithm for the face image in the step (1) comprises the following steps:

(1.1) positioning the face position and the key point position of the five sense organs of a face image in face training data by using a face detection algorithm based on a cascade convolution network;

(1.2) estimating a face attitude angle by using the positions of five sense organs;

(1.3) carrying out angle transformation on the glasses material image by using the human face attitude angle;

(1.4) carrying out pixel-level local weighted summation on the transformed glasses material image and the face image to obtain a face image with glasses;

in the step (1.4), a plurality of glasses-wearing face images with different glasses lens reflection effects are obtained by weighting and summing a plurality of different weights.

2. The method of improving the performance of a face recognition model with glasses according to claim 1, wherein: and (1.4) when the transformed glasses material image and the face image are subjected to pixel-level local weighted summation, disturbing the positions of the glasses material image in the vertical direction to obtain a plurality of glasses-worn face images with different glasses positions.

3. The method of improving the performance of a face recognition model with glasses according to claim 2, wherein: the step (2) comprises the following steps:

(2.1) classifying and calibrating the extended glasses-wearing face training data;

(2.2) establishing a face recognition model based on a residual error depth convolution network, and extracting deeper features of the image;

(2.3) establishing a classification loss function of the softmax function based on the enhanced classification interval for evaluating the classification error of the network;

(2.4) optimizing the classification loss function by using an optimization method of error back propagation and random gradient descent;

and (2.5) through multiple iterative computations, the classification loss function is reduced and converged, and the face recognition model is obtained.

4. The method of improving the performance of a face recognition model with glasses according to claim 3, wherein: the classification and calibration method in the step (2.1) comprises the following steps: the face images of the same person are consistent in label and different from those of other face images.

5. The method of improving the performance of a face recognition model with glasses according to claim 4, wherein: the classification cost function in the step (2.3) is as follows:

is a normalized feature vector with the length of n.

6. A face recognition method is characterized in that: performing feature calculation on a face image to be recognized by using a face recognition model obtained by the method of any one of claims 1 to 5, and performing similarity evaluation on the face image and known face features;

judging the known face with the maximum similarity and higher than a threshold value, namely the known face is a recognition result;

and if the similarity of all the known faces is smaller than the threshold value, judging that the face is a strange face.