CN110866436B - Automatic glasses removing method based on convolutional neural network feature reconstruction - Google Patents
Automatic glasses removing method based on convolutional neural network feature reconstruction Download PDFInfo
- Publication number
- CN110866436B CN110866436B CN201910808296.2A CN201910808296A CN110866436B CN 110866436 B CN110866436 B CN 110866436B CN 201910808296 A CN201910808296 A CN 201910808296A CN 110866436 B CN110866436 B CN 110866436B
- Authority
- CN
- China
- Prior art keywords
- glasses
- image
- face
- face image
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000011521 glass Substances 0.000 title claims abstract description 127
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 40
- 238000013507 mapping Methods 0.000 claims abstract description 35
- 230000000007 visual effect Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 10
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method for automatically removing glasses based on convolutional neural network feature reconstruction, which comprises the following steps of firstly, searching a source image set and a target image set of a glasses-free face image, wherein the source image set and the target image set are provided with similar attribute glasses-wearing face images, and the target image set is provided with a K neighbor algorithm; then, mapping the source image set and the target image set to a depth feature space to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain the glasses attribute; and finally, performing difference operation on the input new feature representation of the face image with the glasses and the glasses attribute vector to finish the removal of the glasses attribute, reversely mapping the removed face feature back to a pixel space, and reconstructing the pixel image without the glasses. The invention solves the problems of low efficiency and poor effect of the face glasses removing method in the prior art.
Description
Technical Field
The invention belongs to the technical field of automatic glasses removal, and particularly relates to an automatic glasses removal method based on convolutional neural network feature reconstruction.
Background
In recent years, with the rapid development of networks and the popularization of online payment, face recognition technology has come into play. The human face recognition is a biological recognition technology which is used for extracting and processing the characteristic information of human faces by researching the geometrical characteristics of the human faces and authenticating identities by utilizing the processed information. However, the face recognition technology still has some problems to be solved under the non-constrained condition, such as factors of illumination, expressions in the face, makeup on the face, shielding objects in the face and the like can bring great influence on the face recognition. Researchers find that in two face libraries, namely, the FERET and the FRVT, the most important factor influencing the face recognition rate is the change of the face after illumination, or when the face is recognized, the face shelter also influences the recognition effect. The influence of the obstruction in the face on the face recognition is more serious, so that the face recognition rate is restricted in terms of accuracy and precision, and how to effectively remove the glasses in the face image becomes one of the problems to be solved urgently in the face recognition technology. Therefore, the method has important significance and practical value for researching the problem of removing the glasses in the face recognition.
Based on the traditional face glasses removing method, most of the methods need three steps: the glasses shelter area detection, the glasses shelter area extraction and the glasses shelter area removal enable the face glasses removal efficiency to be lower, and the operation is complex. With the rapid development of the convolutional neural network, researches find that the features in the human face can be effectively expressed through training and learning of the neural network. Compared with the traditional face glasses removing method, the face glasses removing method by using the deep learning method can be more effective.
Disclosure of Invention
The invention aims to provide an automatic glasses removing method based on convolutional neural network feature reconstruction, and solves the problems of low efficiency and poor effect of a face glasses removing method in the prior art.
The technical scheme of the invention is that the automatic glasses removing method based on convolutional neural network feature reconstruction is implemented according to the following steps:
step 1: searching a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW face image data set by adopting a K nearest neighbor algorithm, wherein a set of the glasses-wearing face images is used as a source image set, and a set of the glasses-free face images is used as a target image set;
and 2, step: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain an eyeglass attribute vector;
and 3, step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;
and 4, step 4: and reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses.
The present invention is also characterized in that,
the step 1 is implemented according to the following steps:
step 1.1, selecting a well-divided glasses-containing face image set and glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:
wherein, when a face image with glasses is desired to be obtained, x i Representing the image of the face sample to be tested, y i Representing a face image in the face image set with glasses; when it is desired to obtain a glasses-free face image, x i Representing the image of the non-spectacle face sample to be measured, y i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.
N =100 in step 1.2.
The convolutional neural network setting in step 2 is specifically as follows:
the first layer of the convolutional neural network is a data input layer which is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer which is sandwiched between the continuous convolutional layers, unnecessary redundant information is removed through pooling operation, the size of a feature mapping graph generated by the convolutional layers is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data.
The step 2 is implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth characteristic space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new representation of the original image in depth feature space;
step 2.2, define the source image set as The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is-> Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute as G, and calculating the glasses attribute according to a formula (2) and a formula (3):
in formula (2)Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new representation, theta (x), of a face image in a source image set in a depth feature space t ) Represents a new representation of the face image in the target image set in the depth feature space, based on the intensity of the image in the target image set, and/or based on the intensity of the image in the target image set>K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>K nearest neighbors in formula (3) that represent a target image set having similar attributes to θ (x), and/or based on a predetermined criterion>And calculating the average characteristic value of the face image in the source image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain the glasses attribute G.
In step 2.2, the value of k is 100.
Step 3 is specifically implemented according to the following steps:
step 3.1, inputting the face image to be removed with glasses into a pre-trained Visual Geometry Group network structure VGG for feature mapping, extracting face features, and obtaining new representation of the face image in a depth feature space;
and 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a wearer to be removed, θ (b) is a new representation that an input face image of the wearer to be removed is mapped into a feature space, α is an adjustment coefficient, the removal degree of glasses is affected by the value of α, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and θ (w) is a new representation of the face image without glasses obtained after the glasses attribute vector is removed in a depth feature space.
Step 4 is specifically implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
R v (w)=∑((w i+1,j -w i,j ) 2 +(w i,j+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term that, as an image prior,the smoothness of the image can be ensured by adding the regular term; r v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient for balancing the regular term, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the position of the target image (i, j).
λ v Is 0.001.
The adjustment coefficient alpha takes a value of 3.
The invention has the beneficial effects that:
(1) The method reduces the influence of the glasses shielding on the face recognition, and provides a simple and effective method for realizing the automatic removal of the face glasses.
(2) The characteristics in the human face can be effectively expressed through training and learning of the neural network, and compared with a traditional human face glasses removing method, the method for removing the human face glasses by using a deep learning method can be more effective.
(3) The invention improves the efficiency of removing the glasses, improves the visual effect of the face image, effectively relieves the influence of the frame trace of the glasses after removing the glasses, and leads the removed visual effect to be more natural.
Drawings
FIG. 1 is a flow chart of an automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 2 is a diagram of a data set of an image of a face with glasses according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 3 is a diagram of a data set of an image of a glasses-free face according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 4 is a structural diagram of convolutional neural network feature extraction used in the automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 5 is a diagram of the glasses removal effect of the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses an automatic glasses removing method based on convolutional neural network feature reconstruction, which is implemented according to the following steps as shown in figure 1:
step 1: a K-nearest neighbor algorithm is adopted to search for a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW (laboratory Faces in the Wild) face image dataset, and the LFW (laboratory Faces in the Wild) face image dataset is a database finished by an AMOMER computer vision laboratory at the State university of Massachusetts, USA and is mainly used for researching the face recognition problem under the non-limited condition. There are more than 40 different attributes in human faces, such as age, hair, glasses, skin tone, gender, etc. Therefore, the method of the invention collects the data set according to the attributes in the human face. The glasses are one of the face attributes which need to be processed in the later period, and in order to enable the processed face images to be closer to real images, the selected images in the experimental data set should be similar as much as possible, that is, the face attributes in each face image should be similar as much as possible. The method comprises the following steps of taking a set of glasses-wearing face images as a source image set, taking a set of glasses-free face images as a target image set, and specifically:
step 1.1, selecting a well-divided glasses-containing face image set and glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K neighbor algorithm, wherein the K neighbor algorithm is used for obtaining the images by the following specific operations:
wherein, when a face image with glasses is desired to be obtained, x i Representing the image of the face sample to be tested, y i Representing a face image in the face image set with glasses; when a glasses-free face image is desired, x i Representing the image of the non-spectacle face sample to be measured, y i Representing a glasses-free face image setI represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos β is smaller, it indicates that the attributes in the two face images are less similar, as shown in fig. 2 and 3.
N =100 in step 1.2.
Step 2: the method has the advantages that the original image set and the target image set are mapped to a depth feature space of a convolutional neural network, the convolutional neural network can map input face data from a color space to a new characterization space from an image through a depth network pre-trained by an ImageNet data set, and the new characterization space is called as a depth feature space. Training through a convolutional neural network to obtain respective average characteristic values, and calculating by using the difference value of a source image set and a target image set to obtain a glasses attribute vector, wherein the convolutional neural network is specifically set as follows:
as shown in fig. 4, the first layer of the convolutional neural network is a data input layer, and the data input layer is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer, which is sandwiched between consecutive convolutional layers, through pooling operation, unnecessary redundant information is removed, the size of a feature mapping graph generated by the convolutional layer is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a connection schematic diagram of convolutional kernel extraction features in the convolutional neural network is shown in a dotted line part in fig. 4. As can be seen from fig. 4, each node in the first layers of the convolutional neural network is connected to only a part of nodes in the previous layer. In order to make up for the low efficiency performance of the linear classifier in classifying the nonlinear data set, a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data, and meanwhile, a nonlinear function can simulate more complex characteristics. In the real image, because the relevance between the features is relatively small, all the features in the feature image can be fitted through the nonlinear function, and a better effect can be achieved.
The step 2 is implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth characteristic space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new representation of the original image in depth feature space;
step 2.2, define the source image set as The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is-> Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute as G, and calculating the glasses attribute according to a formula (2) and a formula (3):
in formula (2)Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new representation, theta (x), of a face image in a source image set in a depth feature space t ) Represents a new representation of the face image in the target image set in the depth feature space, based on the intensity of the image in the target image set, and/or based on the intensity of the image in the target image set>K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>K nearest neighbors representing a target image set having similar attributes to θ (x), in equation (3), based on the number of image bins in the target image set>And calculating the average characteristic value of the face image in the source image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain the glasses attribute G.
In step 2.2, the value of k is 100.
And step 3: and (3) carrying out difference operation on a new feature representation obtained after feature mapping is carried out on the input face image of the wearer to be removed through a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2), and completing the removal of the glasses attribute, wherein the method is implemented according to the following steps:
step 3.1, inputting the face image to be removed to a pretrained Visual Geometry Group network structure VGG for feature mapping, wherein the detailed parameters of the pretrained VGG network model are shown in table 1, the selected convolutional layers are conv3_1, conv4 \ 1 and conv5_1 in the last three layers respectively, and each layer is activated once to extract the face features, so that a new representation of the face image in a depth feature space is obtained;
table 1 network model detailed parameter settings
And 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the formula for removing the glasses attribute is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a wearer to be removed, θ (b) is a new representation that an input face image of the wearer to be removed is mapped into a feature space, α is an adjustment coefficient, the removal degree of glasses is affected by the value of α, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and θ (w) is a new representation of the face image without glasses obtained after the glasses attribute vector is removed in a depth feature space.
And 4, step 4: and obtaining a pixel image through inverse mapping. The reverse mapping method of the invention is to find an image with the depth characteristic data which is most matched with the depth characteristic data after the removal of the glasses attribute is completed, and convert the image into a regularized regression problem. The inverse mapping process refers to reflecting the vector theta (b) -alpha G to a pixel space, and the obtained output image w is a face pixel picture with glasses removed, and theta (w) = theta (b) -alpha G is satisfied. The regression problem needs to specify a target function and reduce loss to find an optimal solution, and the Euclidean distance is used as a loss function, and the method is implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
R v (w)=∑((w i+1,j -w i,j ) 2 +(w ij+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, α is 3, g is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term, because point noise on the image can generate great influence on a reconstructed result image, the point noise is used as image prior, and the addition of the regularization term can ensure the smoothness of the image; r is v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient, λ, that balances the regular term v 0.001, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the target image (i, j) position.
The final effect diagram after the removal of the glasses is shown in fig. 5, in which the first column is an original image, the second column is a removal effect with an α value of 2, the third column is a removal effect with an α value of 3, and the fourth column is a removal effect with an α value of 4. It can be seen from the figure that the degree of lens removal varies from one value of α to another. If the value of alpha is too small, the purpose of removing the glasses cannot be achieved; on the contrary, if the value of alpha is too large, although the effect of removing the glasses attribute can be achieved, the reconstructed face image is unnatural. The reason is that as the value of α increases, the change in the visual characteristic component associated with the removal of the eyeglass property also increases. In order to ensure the credibility of the face image, the default value of alpha is 3.
Claims (8)
1. The automatic glasses removing method based on convolutional neural network feature reconstruction is characterized by comprising the following steps:
step 1: searching a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW face image data set by adopting a K nearest neighbor algorithm, wherein a set of the glasses-wearing face images is used as a source image set, and a set of the glasses-free face images is used as a target image set;
step 2: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain a glasses attribute vector;
and step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;
and 4, step 4: reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses;
the step 2 is specifically implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth feature space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new feature representation of the original image in depth feature space;
step 2.2, define the source image set as The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is-> Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute vector as G, and calculating the glasses attribute vector according to a formula (2) and a formula (3):
in formula (2)Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new feature representation, theta (x), of a facial image in a source image set in a depth feature space t ) Represents the new feature representation of the face image in the target image set in the depth feature space, and/or>K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>K nearest neighbors representing that the target image set has similar properties to theta (x), in equation (3),is a source diagramCalculating the average characteristic value of the face image in the image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain a glasses attribute vector G;
the step 4 is specifically implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
R v (w)=∑((w i+1,j -w i,j ) 2 +(w i,j+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, theta (w) represents a new feature representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, alpha is an adjusting coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, theta (b) represents a new feature representation of an input glasses-wearing face image mapped into the feature space, the first term is a loss term, a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated, and the smaller the loss value of the two loss values is, the reconstructed target image is closer to the face image after removing glasses; the second term is a regularization data term which is used as image prior, and the smoothness of the image can be ensured by adding the regularization term; r v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient balancing the regular term, equation (6) is a formula for calculating the regular term, w i,j Is the pixel value at the target image (i, j) location.
2. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 1 is specifically implemented according to the following steps:
step 1.1, selecting a well-divided glasses-wearing face image set and a glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-wearing face images and N glasses-free face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:
wherein, when the face image of the person wearing the glasses is wanted to be obtained, x i Representing the face sample image of the person wearing the glasses to be measured, y i Representing one face image in the face image set with glasses; when a glasses-free face image is desired, x i Representing the image of the sample of the face to be examined, y i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.
3. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein N =100 in step 1.2.
4. The automatic glasses removing method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein the convolutional neural network setting in step 2 is specifically as follows:
the first layer of the convolutional neural network is a data input layer which is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer which is sandwiched between the continuous convolutional layers, unnecessary redundant information is removed through pooling operation, the size of a feature mapping graph generated by the convolutional layers is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data.
5. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the value of k in step 2.2 is 100.
6. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 3 is specifically implemented according to the following steps:
step 3.1, inputting the face image to be removed to a pre-trained Visual Geometry Group network VGG for feature mapping, and extracting face features to obtain new feature representation of the face image in a depth feature space;
step 3.2, performing difference operation on the new feature representation of the face image in the depth space and the glasses attribute vector obtained in the step 2 to complete the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a person wearing glasses to be removed, theta (b) is a new feature representation that an input face image of the person wearing glasses to be removed is mapped into a feature space, alpha is an adjustment coefficient, the removal degree of the glasses is influenced by the value of alpha, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and theta (w) is a new feature representation in a depth feature space of the face image without glasses obtained after the glasses attribute vector is removed.
7. The method of claim 6, wherein the method comprises performing a convolutional neural network feature reconstruction based on an auto-removal of glassesCharacterized in that said λ v Is 0.001.
8. The automatic eyeglass removal method based on convolutional neural network feature reconstruction of claim 7, wherein the adjustment coefficient α is 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910808296.2A CN110866436B (en) | 2019-08-29 | 2019-08-29 | Automatic glasses removing method based on convolutional neural network feature reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910808296.2A CN110866436B (en) | 2019-08-29 | 2019-08-29 | Automatic glasses removing method based on convolutional neural network feature reconstruction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110866436A CN110866436A (en) | 2020-03-06 |
CN110866436B true CN110866436B (en) | 2023-04-07 |
Family
ID=69652431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910808296.2A Active CN110866436B (en) | 2019-08-29 | 2019-08-29 | Automatic glasses removing method based on convolutional neural network feature reconstruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110866436B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020579A (en) * | 2011-09-22 | 2013-04-03 | 上海银晨智能识别科技有限公司 | Face recognition method and system, and removing method and device for glasses frame in face image |
CN104200224A (en) * | 2014-08-28 | 2014-12-10 | 西北工业大学 | Valueless image removing method based on deep convolutional neural networks |
CN107944385A (en) * | 2017-11-22 | 2018-04-20 | 浙江大华技术股份有限公司 | A kind of method and device for being used to determine spectacle-frame region |
CN108182390A (en) * | 2017-12-14 | 2018-06-19 | 浙江大华技术股份有限公司 | A kind of spectacle-frame minimizing technology and device based on facial image |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5136965B2 (en) * | 2008-09-03 | 2013-02-06 | 日本電気株式会社 | Image processing apparatus, image processing method, and image processing program |
-
2019
- 2019-08-29 CN CN201910808296.2A patent/CN110866436B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020579A (en) * | 2011-09-22 | 2013-04-03 | 上海银晨智能识别科技有限公司 | Face recognition method and system, and removing method and device for glasses frame in face image |
CN104200224A (en) * | 2014-08-28 | 2014-12-10 | 西北工业大学 | Valueless image removing method based on deep convolutional neural networks |
CN107944385A (en) * | 2017-11-22 | 2018-04-20 | 浙江大华技术股份有限公司 | A kind of method and device for being used to determine spectacle-frame region |
CN108182390A (en) * | 2017-12-14 | 2018-06-19 | 浙江大华技术股份有限公司 | A kind of spectacle-frame minimizing technology and device based on facial image |
Non-Patent Citations (1)
Title |
---|
Deep Convolution Neural Networks for Automatic Eyeglasses Removal;MAO LIANG 等;《2nd International Conference on Artificial Intelligence and Engineering Applications》;20170923;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110866436A (en) | 2020-03-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110399929B (en) | Fundus image classification method, fundus image classification apparatus, and computer-readable storage medium | |
CN108830818B (en) | Rapid multi-focus image fusion method | |
CN112949565B (en) | Single-sample partially-shielded face recognition method and system based on attention mechanism | |
Zhao et al. | Supervised segmentation of un-annotated retinal fundus images by synthesis | |
CN108615010B (en) | Facial expression recognition method based on parallel convolution neural network feature map fusion | |
CN108399611B (en) | Multi-focus image fusion method based on gradient regularization | |
Quellec et al. | Fast wavelet-based image characterization for highly adaptive image retrieval | |
CN110634170B (en) | Photo-level image generation method based on semantic content and rapid image retrieval | |
CN107169117B (en) | Hand-drawn human motion retrieval method based on automatic encoder and DTW | |
CN110321805B (en) | Dynamic expression recognition method based on time sequence relation reasoning | |
Wang et al. | GKFC-CNN: Modified Gaussian kernel fuzzy C-means and convolutional neural network for apple segmentation and recognition | |
Zhu et al. | Learning deep patch representation for probabilistic graphical model-based face sketch synthesis | |
CN109993208B (en) | Clustering processing method for noisy images | |
CN115359576A (en) | Multi-modal emotion recognition method and device, electronic equipment and storage medium | |
CN115969329A (en) | Sleep staging method, system, device and medium | |
CN115049952A (en) | Juvenile fish limb identification method based on multi-scale cascade perception deep learning network | |
Fu et al. | A blind medical image denoising method with noise generation network | |
Zhang et al. | Face recognition under varying illumination based on singular value decomposition and retina modeling | |
Zhou et al. | Multi-objective evolutionary generative adversarial network compression for image translation | |
Li et al. | Multi-scale aggregation feature pyramid with cornerness for underwater object detection | |
CN110866436B (en) | Automatic glasses removing method based on convolutional neural network feature reconstruction | |
CN109887023B (en) | Binocular fusion stereo image quality evaluation method based on weighted gradient amplitude | |
Yu et al. | Prototypical network based on Manhattan distance | |
Jiang et al. | Single image detail enhancement via metropolis theorem | |
Li et al. | Unsupervised neural rendering for image hazing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |