CN110866436B

CN110866436B - Automatic glasses removing method based on convolutional neural network feature reconstruction

Info

Publication number: CN110866436B
Application number: CN201910808296.2A
Authority: CN
Inventors: 赵明华; 张哲�; 张利利; 张鑫; 石争浩; 都双丽
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2023-04-07
Anticipated expiration: 2039-08-29
Also published as: CN110866436A

Abstract

The invention discloses a method for automatically removing glasses based on convolutional neural network feature reconstruction, which comprises the following steps of firstly, searching a source image set and a target image set of a glasses-free face image, wherein the source image set and the target image set are provided with similar attribute glasses-wearing face images, and the target image set is provided with a K neighbor algorithm; then, mapping the source image set and the target image set to a depth feature space to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain the glasses attribute; and finally, performing difference operation on the input new feature representation of the face image with the glasses and the glasses attribute vector to finish the removal of the glasses attribute, reversely mapping the removed face feature back to a pixel space, and reconstructing the pixel image without the glasses. The invention solves the problems of low efficiency and poor effect of the face glasses removing method in the prior art.

Description

Automatic glasses removing method based on convolutional neural network feature reconstruction

Technical Field

The invention belongs to the technical field of automatic glasses removal, and particularly relates to an automatic glasses removal method based on convolutional neural network feature reconstruction.

Background

In recent years, with the rapid development of networks and the popularization of online payment, face recognition technology has come into play. The human face recognition is a biological recognition technology which is used for extracting and processing the characteristic information of human faces by researching the geometrical characteristics of the human faces and authenticating identities by utilizing the processed information. However, the face recognition technology still has some problems to be solved under the non-constrained condition, such as factors of illumination, expressions in the face, makeup on the face, shielding objects in the face and the like can bring great influence on the face recognition. Researchers find that in two face libraries, namely, the FERET and the FRVT, the most important factor influencing the face recognition rate is the change of the face after illumination, or when the face is recognized, the face shelter also influences the recognition effect. The influence of the obstruction in the face on the face recognition is more serious, so that the face recognition rate is restricted in terms of accuracy and precision, and how to effectively remove the glasses in the face image becomes one of the problems to be solved urgently in the face recognition technology. Therefore, the method has important significance and practical value for researching the problem of removing the glasses in the face recognition.

Based on the traditional face glasses removing method, most of the methods need three steps: the glasses shelter area detection, the glasses shelter area extraction and the glasses shelter area removal enable the face glasses removal efficiency to be lower, and the operation is complex. With the rapid development of the convolutional neural network, researches find that the features in the human face can be effectively expressed through training and learning of the neural network. Compared with the traditional face glasses removing method, the face glasses removing method by using the deep learning method can be more effective.

Disclosure of Invention

The invention aims to provide an automatic glasses removing method based on convolutional neural network feature reconstruction, and solves the problems of low efficiency and poor effect of a face glasses removing method in the prior art.

The technical scheme of the invention is that the automatic glasses removing method based on convolutional neural network feature reconstruction is implemented according to the following steps:

step 1: searching a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW face image data set by adopting a K nearest neighbor algorithm, wherein a set of the glasses-wearing face images is used as a source image set, and a set of the glasses-free face images is used as a target image set;

and 2, step: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain an eyeglass attribute vector;

and 3, step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;

and 4, step 4: and reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses.

The present invention is also characterized in that,

the step 1 is implemented according to the following steps:

step 1.1, selecting a well-divided glasses-containing face image set and glasses-free face image set from an LFW face image data set as a training set;

step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:

wherein, when a face image with glasses is desired to be obtained, x _i Representing the image of the face sample to be tested, y _i Representing a face image in the face image set with glasses; when it is desired to obtain a glasses-free face image, x _i Representing the image of the non-spectacle face sample to be measured, y _i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.

N =100 in step 1.2.

The convolutional neural network setting in step 2 is specifically as follows:

the first layer of the convolutional neural network is a data input layer which is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer which is sandwiched between the continuous convolutional layers, unnecessary redundant information is removed through pooling operation, the size of a feature mapping graph generated by the convolutional layers is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data.

The step 2 is implemented according to the following steps:

step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth characteristic space from a pixel space, and setting x ₀ Is an original image, then theta ₀ ＝θ(x ₀ ) Representing a new representation of the original image in depth feature space;

step 2.2, define the source image set as

The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is->

Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute as G, and calculating the glasses attribute according to a formula (2) and a formula (3):

in formula (2)

Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) ^s ) Representing a new representation, theta (x), of a face image in a source image set in a depth feature space ^t ) Represents a new representation of the face image in the target image set in the depth feature space, based on the intensity of the image in the target image set, and/or based on the intensity of the image in the target image set>

K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>

K nearest neighbors in formula (3) that represent a target image set having similar attributes to θ (x), and/or based on a predetermined criterion>

And calculating the average characteristic value of the face image in the source image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain the glasses attribute G.

In step 2.2, the value of k is 100.

Step 3 is specifically implemented according to the following steps:

step 3.1, inputting the face image to be removed with glasses into a pre-trained Visual Geometry Group network structure VGG for feature mapping, extracting face features, and obtaining new representation of the face image in a depth feature space;

and 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):

θ(w)＝θ(b)-αG (4)

in the formula (4), b is a face image of a wearer to be removed, θ (b) is a new representation that an input face image of the wearer to be removed is mapped into a feature space, α is an adjustment coefficient, the removal degree of glasses is affected by the value of α, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and θ (w) is a new representation of the face image without glasses obtained after the glasses attribute vector is removed in a depth feature space.

Step 4 is specifically implemented according to the following steps:

the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):

R _v (w)＝∑((w _i+1，j -w _i，j ) ² +(w _i，j+1 -w _i，j ) ² ) (6)

in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term that, as an image prior,the smoothness of the image can be ensured by adding the regular term; r _v For facilitating pixel smoothing, v denotes the total variation term, λ _v Is a coefficient for balancing the regular term, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the position of the target image (i, j).

λ _v Is 0.001.

The adjustment coefficient alpha takes a value of 3.

The invention has the beneficial effects that:

(1) The method reduces the influence of the glasses shielding on the face recognition, and provides a simple and effective method for realizing the automatic removal of the face glasses.

(2) The characteristics in the human face can be effectively expressed through training and learning of the neural network, and compared with a traditional human face glasses removing method, the method for removing the human face glasses by using a deep learning method can be more effective.

(3) The invention improves the efficiency of removing the glasses, improves the visual effect of the face image, effectively relieves the influence of the frame trace of the glasses after removing the glasses, and leads the removed visual effect to be more natural.

Drawings

FIG. 1 is a flow chart of an automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;

FIG. 2 is a diagram of a data set of an image of a face with glasses according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;

FIG. 3 is a diagram of a data set of an image of a glasses-free face according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;

FIG. 4 is a structural diagram of convolutional neural network feature extraction used in the automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;

FIG. 5 is a diagram of the glasses removal effect of the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an automatic glasses removing method based on convolutional neural network feature reconstruction, which is implemented according to the following steps as shown in figure 1:

step 1: a K-nearest neighbor algorithm is adopted to search for a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW (laboratory Faces in the Wild) face image dataset, and the LFW (laboratory Faces in the Wild) face image dataset is a database finished by an AMOMER computer vision laboratory at the State university of Massachusetts, USA and is mainly used for researching the face recognition problem under the non-limited condition. There are more than 40 different attributes in human faces, such as age, hair, glasses, skin tone, gender, etc. Therefore, the method of the invention collects the data set according to the attributes in the human face. The glasses are one of the face attributes which need to be processed in the later period, and in order to enable the processed face images to be closer to real images, the selected images in the experimental data set should be similar as much as possible, that is, the face attributes in each face image should be similar as much as possible. The method comprises the following steps of taking a set of glasses-wearing face images as a source image set, taking a set of glasses-free face images as a target image set, and specifically:

step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K neighbor algorithm, wherein the K neighbor algorithm is used for obtaining the images by the following specific operations:

wherein, when a face image with glasses is desired to be obtained, x _i Representing the image of the face sample to be tested, y _i Representing a face image in the face image set with glasses; when a glasses-free face image is desired, x _i Representing the image of the non-spectacle face sample to be measured, y _i Representing a glasses-free face image setI represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos β is smaller, it indicates that the attributes in the two face images are less similar, as shown in fig. 2 and 3.

N =100 in step 1.2.

Step 2: the method has the advantages that the original image set and the target image set are mapped to a depth feature space of a convolutional neural network, the convolutional neural network can map input face data from a color space to a new characterization space from an image through a depth network pre-trained by an ImageNet data set, and the new characterization space is called as a depth feature space. Training through a convolutional neural network to obtain respective average characteristic values, and calculating by using the difference value of a source image set and a target image set to obtain a glasses attribute vector, wherein the convolutional neural network is specifically set as follows:

as shown in fig. 4, the first layer of the convolutional neural network is a data input layer, and the data input layer is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer, which is sandwiched between consecutive convolutional layers, through pooling operation, unnecessary redundant information is removed, the size of a feature mapping graph generated by the convolutional layer is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a connection schematic diagram of convolutional kernel extraction features in the convolutional neural network is shown in a dotted line part in fig. 4. As can be seen from fig. 4, each node in the first layers of the convolutional neural network is connected to only a part of nodes in the previous layer. In order to make up for the low efficiency performance of the linear classifier in classifying the nonlinear data set, a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data, and meanwhile, a nonlinear function can simulate more complex characteristics. In the real image, because the relevance between the features is relatively small, all the features in the feature image can be fitted through the nonlinear function, and a better effect can be achieved.

The step 2 is implemented according to the following steps:

step 2.2, define the source image set as

in formula (2)

K nearest neighbors representing a target image set having similar attributes to θ (x), in equation (3), based on the number of image bins in the target image set>

In step 2.2, the value of k is 100.

And step 3: and (3) carrying out difference operation on a new feature representation obtained after feature mapping is carried out on the input face image of the wearer to be removed through a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2), and completing the removal of the glasses attribute, wherein the method is implemented according to the following steps:

step 3.1, inputting the face image to be removed to a pretrained Visual Geometry Group network structure VGG for feature mapping, wherein the detailed parameters of the pretrained VGG network model are shown in table 1, the selected convolutional layers are conv3_1, conv4 \ 1 and conv5_1 in the last three layers respectively, and each layer is activated once to extract the face features, so that a new representation of the face image in a depth feature space is obtained;

table 1 network model detailed parameter settings

And 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the formula for removing the glasses attribute is shown as a formula (4):

θ(w)＝θ(b)-αG (4)

And 4, step 4: and obtaining a pixel image through inverse mapping. The reverse mapping method of the invention is to find an image with the depth characteristic data which is most matched with the depth characteristic data after the removal of the glasses attribute is completed, and convert the image into a regularized regression problem. The inverse mapping process refers to reflecting the vector theta (b) -alpha G to a pixel space, and the obtained output image w is a face pixel picture with glasses removed, and theta (w) = theta (b) -alpha G is satisfied. The regression problem needs to specify a target function and reduce loss to find an optimal solution, and the Euclidean distance is used as a loss function, and the method is implemented according to the following steps:

R _v (w)＝∑((w _i+1，j -w _i，j ) ² +(w _ij+1 -w _i，j ) ² ) (6)

in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, α is 3, g is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term, because point noise on the image can generate great influence on a reconstructed result image, the point noise is used as image prior, and the addition of the regularization term can ensure the smoothness of the image; r is _v For facilitating pixel smoothing, v denotes the total variation term, λ _v Is a coefficient, λ, that balances the regular term _v 0.001, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the target image (i, j) position.

The final effect diagram after the removal of the glasses is shown in fig. 5, in which the first column is an original image, the second column is a removal effect with an α value of 2, the third column is a removal effect with an α value of 3, and the fourth column is a removal effect with an α value of 4. It can be seen from the figure that the degree of lens removal varies from one value of α to another. If the value of alpha is too small, the purpose of removing the glasses cannot be achieved; on the contrary, if the value of alpha is too large, although the effect of removing the glasses attribute can be achieved, the reconstructed face image is unnatural. The reason is that as the value of α increases, the change in the visual characteristic component associated with the removal of the eyeglass property also increases. In order to ensure the credibility of the face image, the default value of alpha is 3.

Claims

1. The automatic glasses removing method based on convolutional neural network feature reconstruction is characterized by comprising the following steps:

step 2: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain a glasses attribute vector;

and step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;

and 4, step 4: reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses;

the step 2 is specifically implemented according to the following steps:

step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth feature space from a pixel space, and setting x ₀ Is an original image, then theta ₀ ＝θ(x ₀ ) Representing a new feature representation of the original image in depth feature space;

step 2.2, define the source image set as

Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute vector as G, and calculating the glasses attribute vector according to a formula (2) and a formula (3):

in formula (2)

Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) ^s ) Representing a new feature representation, theta (x), of a facial image in a source image set in a depth feature space ^t ) Represents the new feature representation of the face image in the target image set in the depth feature space, and/or>

K nearest neighbors representing that the target image set has similar properties to theta (x), in equation (3),

is a source diagramCalculating the average characteristic value of the face image in the image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain a glasses attribute vector G;

the step 4 is specifically implemented according to the following steps:

/>

R _v (w)＝∑((w _i+1,j -w _i,j ) ² +(w _i,j+1 -w _i,j ) ² ) (6)

in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, theta (w) represents a new feature representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, alpha is an adjusting coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, theta (b) represents a new feature representation of an input glasses-wearing face image mapped into the feature space, the first term is a loss term, a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated, and the smaller the loss value of the two loss values is, the reconstructed target image is closer to the face image after removing glasses; the second term is a regularization data term which is used as image prior, and the smoothness of the image can be ensured by adding the regularization term; r _v For facilitating pixel smoothing, v denotes the total variation term, λ _v Is a coefficient balancing the regular term, equation (6) is a formula for calculating the regular term, w _i,j Is the pixel value at the target image (i, j) location.

2. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 1 is specifically implemented according to the following steps:

step 1.1, selecting a well-divided glasses-wearing face image set and a glasses-free face image set from an LFW face image data set as a training set;

step 1.2, selecting N glasses-wearing face images and N glasses-free face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:

wherein, when the face image of the person wearing the glasses is wanted to be obtained, x _i Representing the face sample image of the person wearing the glasses to be measured, y _i Representing one face image in the face image set with glasses; when a glasses-free face image is desired, x _i Representing the image of the sample of the face to be examined, y _i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.

3. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein N =100 in step 1.2.

4. The automatic glasses removing method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein the convolutional neural network setting in step 2 is specifically as follows:

5. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the value of k in step 2.2 is 100.

6. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 3 is specifically implemented according to the following steps:

step 3.1, inputting the face image to be removed to a pre-trained Visual Geometry Group network VGG for feature mapping, and extracting face features to obtain new feature representation of the face image in a depth feature space;

step 3.2, performing difference operation on the new feature representation of the face image in the depth space and the glasses attribute vector obtained in the step 2 to complete the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):

θ(w)＝θ(b)-αG (4)

in the formula (4), b is a face image of a person wearing glasses to be removed, theta (b) is a new feature representation that an input face image of the person wearing glasses to be removed is mapped into a feature space, alpha is an adjustment coefficient, the removal degree of the glasses is influenced by the value of alpha, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and theta (w) is a new feature representation in a depth feature space of the face image without glasses obtained after the glasses attribute vector is removed.

7. The method of claim 6, wherein the method comprises performing a convolutional neural network feature reconstruction based on an auto-removal of glassesCharacterized in that said λ _v Is 0.001.

8. The automatic eyeglass removal method based on convolutional neural network feature reconstruction of claim 7, wherein the adjustment coefficient α is 3.