CN110866436B - Automatic glasses removing method based on convolutional neural network feature reconstruction - Google Patents

Automatic glasses removing method based on convolutional neural network feature reconstruction Download PDF

Info

Publication number
CN110866436B
CN110866436B CN201910808296.2A CN201910808296A CN110866436B CN 110866436 B CN110866436 B CN 110866436B CN 201910808296 A CN201910808296 A CN 201910808296A CN 110866436 B CN110866436 B CN 110866436B
Authority
CN
China
Prior art keywords
glasses
image
face
face image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910808296.2A
Other languages
Chinese (zh)
Other versions
CN110866436A (en
Inventor
赵明华
张哲�
张利利
张鑫
石争浩
都双丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201910808296.2A priority Critical patent/CN110866436B/en
Publication of CN110866436A publication Critical patent/CN110866436A/en
Application granted granted Critical
Publication of CN110866436B publication Critical patent/CN110866436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for automatically removing glasses based on convolutional neural network feature reconstruction, which comprises the following steps of firstly, searching a source image set and a target image set of a glasses-free face image, wherein the source image set and the target image set are provided with similar attribute glasses-wearing face images, and the target image set is provided with a K neighbor algorithm; then, mapping the source image set and the target image set to a depth feature space to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain the glasses attribute; and finally, performing difference operation on the input new feature representation of the face image with the glasses and the glasses attribute vector to finish the removal of the glasses attribute, reversely mapping the removed face feature back to a pixel space, and reconstructing the pixel image without the glasses. The invention solves the problems of low efficiency and poor effect of the face glasses removing method in the prior art.

Description

Automatic glasses removing method based on convolutional neural network feature reconstruction
Technical Field
The invention belongs to the technical field of automatic glasses removal, and particularly relates to an automatic glasses removal method based on convolutional neural network feature reconstruction.
Background
In recent years, with the rapid development of networks and the popularization of online payment, face recognition technology has come into play. The human face recognition is a biological recognition technology which is used for extracting and processing the characteristic information of human faces by researching the geometrical characteristics of the human faces and authenticating identities by utilizing the processed information. However, the face recognition technology still has some problems to be solved under the non-constrained condition, such as factors of illumination, expressions in the face, makeup on the face, shielding objects in the face and the like can bring great influence on the face recognition. Researchers find that in two face libraries, namely, the FERET and the FRVT, the most important factor influencing the face recognition rate is the change of the face after illumination, or when the face is recognized, the face shelter also influences the recognition effect. The influence of the obstruction in the face on the face recognition is more serious, so that the face recognition rate is restricted in terms of accuracy and precision, and how to effectively remove the glasses in the face image becomes one of the problems to be solved urgently in the face recognition technology. Therefore, the method has important significance and practical value for researching the problem of removing the glasses in the face recognition.
Based on the traditional face glasses removing method, most of the methods need three steps: the glasses shelter area detection, the glasses shelter area extraction and the glasses shelter area removal enable the face glasses removal efficiency to be lower, and the operation is complex. With the rapid development of the convolutional neural network, researches find that the features in the human face can be effectively expressed through training and learning of the neural network. Compared with the traditional face glasses removing method, the face glasses removing method by using the deep learning method can be more effective.
Disclosure of Invention
The invention aims to provide an automatic glasses removing method based on convolutional neural network feature reconstruction, and solves the problems of low efficiency and poor effect of a face glasses removing method in the prior art.
The technical scheme of the invention is that the automatic glasses removing method based on convolutional neural network feature reconstruction is implemented according to the following steps:
step 1: searching a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW face image data set by adopting a K nearest neighbor algorithm, wherein a set of the glasses-wearing face images is used as a source image set, and a set of the glasses-free face images is used as a target image set;
and 2, step: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain an eyeglass attribute vector;
and 3, step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;
and 4, step 4: and reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses.
The present invention is also characterized in that,
the step 1 is implemented according to the following steps:
step 1.1, selecting a well-divided glasses-containing face image set and glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:
Figure BDA0002184308160000031
wherein, when a face image with glasses is desired to be obtained, x i Representing the image of the face sample to be tested, y i Representing a face image in the face image set with glasses; when it is desired to obtain a glasses-free face image, x i Representing the image of the non-spectacle face sample to be measured, y i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.
N =100 in step 1.2.
The convolutional neural network setting in step 2 is specifically as follows:
the first layer of the convolutional neural network is a data input layer which is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer which is sandwiched between the continuous convolutional layers, unnecessary redundant information is removed through pooling operation, the size of a feature mapping graph generated by the convolutional layers is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data.
The step 2 is implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth characteristic space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new representation of the original image in depth feature space;
step 2.2, define the source image set as
Figure BDA0002184308160000041
Figure BDA0002184308160000042
The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is->
Figure BDA0002184308160000043
Figure BDA0002184308160000044
Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute as G, and calculating the glasses attribute according to a formula (2) and a formula (3):
Figure BDA0002184308160000045
Figure BDA0002184308160000046
Figure BDA0002184308160000047
in formula (2)
Figure BDA0002184308160000048
Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new representation, theta (x), of a face image in a source image set in a depth feature space t ) Represents a new representation of the face image in the target image set in the depth feature space, based on the intensity of the image in the target image set, and/or based on the intensity of the image in the target image set>
Figure BDA0002184308160000049
K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>
Figure BDA00021843081600000410
K nearest neighbors in formula (3) that represent a target image set having similar attributes to θ (x), and/or based on a predetermined criterion>
Figure BDA00021843081600000411
And calculating the average characteristic value of the face image in the source image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain the glasses attribute G.
In step 2.2, the value of k is 100.
Step 3 is specifically implemented according to the following steps:
step 3.1, inputting the face image to be removed with glasses into a pre-trained Visual Geometry Group network structure VGG for feature mapping, extracting face features, and obtaining new representation of the face image in a depth feature space;
and 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a wearer to be removed, θ (b) is a new representation that an input face image of the wearer to be removed is mapped into a feature space, α is an adjustment coefficient, the removal degree of glasses is affected by the value of α, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and θ (w) is a new representation of the face image without glasses obtained after the glasses attribute vector is removed in a depth feature space.
Step 4 is specifically implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
Figure BDA0002184308160000051
R v (w)=∑((w i+1,j -w i,j ) 2 +(w i,j+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term that, as an image prior,the smoothness of the image can be ensured by adding the regular term; r v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient for balancing the regular term, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the position of the target image (i, j).
λ v Is 0.001.
The adjustment coefficient alpha takes a value of 3.
The invention has the beneficial effects that:
(1) The method reduces the influence of the glasses shielding on the face recognition, and provides a simple and effective method for realizing the automatic removal of the face glasses.
(2) The characteristics in the human face can be effectively expressed through training and learning of the neural network, and compared with a traditional human face glasses removing method, the method for removing the human face glasses by using a deep learning method can be more effective.
(3) The invention improves the efficiency of removing the glasses, improves the visual effect of the face image, effectively relieves the influence of the frame trace of the glasses after removing the glasses, and leads the removed visual effect to be more natural.
Drawings
FIG. 1 is a flow chart of an automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 2 is a diagram of a data set of an image of a face with glasses according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 3 is a diagram of a data set of an image of a glasses-free face according to a K-nearest neighbor algorithm in the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 4 is a structural diagram of convolutional neural network feature extraction used in the automatic eyeglass removal method based on convolutional neural network feature reconstruction according to the present invention;
FIG. 5 is a diagram of the glasses removal effect of the automatic glasses removal method based on convolutional neural network feature reconstruction according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses an automatic glasses removing method based on convolutional neural network feature reconstruction, which is implemented according to the following steps as shown in figure 1:
step 1: a K-nearest neighbor algorithm is adopted to search for a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW (laboratory Faces in the Wild) face image dataset, and the LFW (laboratory Faces in the Wild) face image dataset is a database finished by an AMOMER computer vision laboratory at the State university of Massachusetts, USA and is mainly used for researching the face recognition problem under the non-limited condition. There are more than 40 different attributes in human faces, such as age, hair, glasses, skin tone, gender, etc. Therefore, the method of the invention collects the data set according to the attributes in the human face. The glasses are one of the face attributes which need to be processed in the later period, and in order to enable the processed face images to be closer to real images, the selected images in the experimental data set should be similar as much as possible, that is, the face attributes in each face image should be similar as much as possible. The method comprises the following steps of taking a set of glasses-wearing face images as a source image set, taking a set of glasses-free face images as a target image set, and specifically:
step 1.1, selecting a well-divided glasses-containing face image set and glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-containing face images and N non-glasses-containing face images with similar attributes from a training set by adopting a K neighbor algorithm, wherein the K neighbor algorithm is used for obtaining the images by the following specific operations:
Figure BDA0002184308160000081
wherein, when a face image with glasses is desired to be obtained, x i Representing the image of the face sample to be tested, y i Representing a face image in the face image set with glasses; when a glasses-free face image is desired, x i Representing the image of the non-spectacle face sample to be measured, y i Representing a glasses-free face image setI represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos β is smaller, it indicates that the attributes in the two face images are less similar, as shown in fig. 2 and 3.
N =100 in step 1.2.
Step 2: the method has the advantages that the original image set and the target image set are mapped to a depth feature space of a convolutional neural network, the convolutional neural network can map input face data from a color space to a new characterization space from an image through a depth network pre-trained by an ImageNet data set, and the new characterization space is called as a depth feature space. Training through a convolutional neural network to obtain respective average characteristic values, and calculating by using the difference value of a source image set and a target image set to obtain a glasses attribute vector, wherein the convolutional neural network is specifically set as follows:
as shown in fig. 4, the first layer of the convolutional neural network is a data input layer, and the data input layer is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer, which is sandwiched between consecutive convolutional layers, through pooling operation, unnecessary redundant information is removed, the size of a feature mapping graph generated by the convolutional layer is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a connection schematic diagram of convolutional kernel extraction features in the convolutional neural network is shown in a dotted line part in fig. 4. As can be seen from fig. 4, each node in the first layers of the convolutional neural network is connected to only a part of nodes in the previous layer. In order to make up for the low efficiency performance of the linear classifier in classifying the nonlinear data set, a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data, and meanwhile, a nonlinear function can simulate more complex characteristics. In the real image, because the relevance between the features is relatively small, all the features in the feature image can be fitted through the nonlinear function, and a better effect can be achieved.
The step 2 is implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth characteristic space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new representation of the original image in depth feature space;
step 2.2, define the source image set as
Figure BDA0002184308160000101
Figure BDA0002184308160000102
The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is->
Figure BDA0002184308160000103
Figure BDA0002184308160000104
Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute as G, and calculating the glasses attribute according to a formula (2) and a formula (3):
Figure BDA0002184308160000105
Figure BDA0002184308160000106
Figure BDA0002184308160000107
in formula (2)
Figure BDA0002184308160000108
Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new representation, theta (x), of a face image in a source image set in a depth feature space t ) Represents a new representation of the face image in the target image set in the depth feature space, based on the intensity of the image in the target image set, and/or based on the intensity of the image in the target image set>
Figure BDA0002184308160000109
K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>
Figure BDA00021843081600001010
K nearest neighbors representing a target image set having similar attributes to θ (x), in equation (3), based on the number of image bins in the target image set>
Figure BDA00021843081600001011
And calculating the average characteristic value of the face image in the source image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain the glasses attribute G.
In step 2.2, the value of k is 100.
And step 3: and (3) carrying out difference operation on a new feature representation obtained after feature mapping is carried out on the input face image of the wearer to be removed through a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2), and completing the removal of the glasses attribute, wherein the method is implemented according to the following steps:
step 3.1, inputting the face image to be removed to a pretrained Visual Geometry Group network structure VGG for feature mapping, wherein the detailed parameters of the pretrained VGG network model are shown in table 1, the selected convolutional layers are conv3_1, conv4 \ 1 and conv5_1 in the last three layers respectively, and each layer is activated once to extract the face features, so that a new representation of the face image in a depth feature space is obtained;
table 1 network model detailed parameter settings
Figure BDA0002184308160000111
Figure BDA0002184308160000121
And 3.2, performing difference operation on the new representation of the face image in the depth space and the glasses attribute obtained in the step 2 to finish the removal operation of the glasses attribute, wherein the formula for removing the glasses attribute is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a wearer to be removed, θ (b) is a new representation that an input face image of the wearer to be removed is mapped into a feature space, α is an adjustment coefficient, the removal degree of glasses is affected by the value of α, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and θ (w) is a new representation of the face image without glasses obtained after the glasses attribute vector is removed in a depth feature space.
And 4, step 4: and obtaining a pixel image through inverse mapping. The reverse mapping method of the invention is to find an image with the depth characteristic data which is most matched with the depth characteristic data after the removal of the glasses attribute is completed, and convert the image into a regularized regression problem. The inverse mapping process refers to reflecting the vector theta (b) -alpha G to a pixel space, and the obtained output image w is a face pixel picture with glasses removed, and theta (w) = theta (b) -alpha G is satisfied. The regression problem needs to specify a target function and reduce loss to find an optimal solution, and the Euclidean distance is used as a loss function, and the method is implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
Figure BDA0002184308160000131
R v (w)=∑((w i+1,j -w i,j ) 2 +(w ij+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, θ (w) is a new representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, α is an adjustment coefficient, α is 3, g is a glasses attribute vector, b is a glasses-wearing face image to be removed, θ (b) is a new representation of mapping an input glasses-wearing face image into the feature space, the first term is a loss term, and a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated. The smaller the loss value of the two is, the closer the reconstructed target image is to the face image without the glasses; the second term is a regularization data term, because point noise on the image can generate great influence on a reconstructed result image, the point noise is used as image prior, and the addition of the regularization term can ensure the smoothness of the image; r is v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient, λ, that balances the regular term v 0.001, equation (6) is a regular term calculation equation, and w (i, j) is a pixel value at the target image (i, j) position.
The final effect diagram after the removal of the glasses is shown in fig. 5, in which the first column is an original image, the second column is a removal effect with an α value of 2, the third column is a removal effect with an α value of 3, and the fourth column is a removal effect with an α value of 4. It can be seen from the figure that the degree of lens removal varies from one value of α to another. If the value of alpha is too small, the purpose of removing the glasses cannot be achieved; on the contrary, if the value of alpha is too large, although the effect of removing the glasses attribute can be achieved, the reconstructed face image is unnatural. The reason is that as the value of α increases, the change in the visual characteristic component associated with the removal of the eyeglass property also increases. In order to ensure the credibility of the face image, the default value of alpha is 3.

Claims (8)

1. The automatic glasses removing method based on convolutional neural network feature reconstruction is characterized by comprising the following steps:
step 1: searching a glasses-wearing face image and a glasses-free face image with similar attributes in an LFW face image data set by adopting a K nearest neighbor algorithm, wherein a set of the glasses-wearing face images is used as a source image set, and a set of the glasses-free face images is used as a target image set;
step 2: mapping the source image set and the target image set to a depth feature space of a convolutional neural network, training the convolutional neural network to obtain respective average feature values, and calculating by using the difference value of the source image set and the target image set to obtain a glasses attribute vector;
and step 3: carrying out difference operation on a new feature representation obtained by carrying out feature mapping on the input human face image to be removed in a Visual Geometry Group network structure VGG and the glasses attribute vector obtained in the step (2) to finish the removal of the glasses attribute;
and 4, step 4: reversely mapping the removed human face features back to a pixel space, and reconstructing a pixel image without glasses;
the step 2 is specifically implemented according to the following steps:
step 2.1, defining a conversion function theta, representing that the face image is mapped to a depth feature space from a pixel space, and setting x 0 Is an original image, then theta 0 =θ(x 0 ) Representing a new feature representation of the original image in depth feature space;
step 2.2, define the source image set as
Figure FDA0003919995690000011
Figure FDA0003919995690000012
The nth image in the source image set is shown, the upper corner mark s represents the source image, and the target image set is->
Figure FDA0003919995690000013
Figure FDA0003919995690000014
Representing the nth image in the target image set, wherein the upper corner mark t represents the target image, inputting the source image set and the target image set into a set Visual Geometry Group network structure VGG, setting the glasses attribute vector as G, and calculating the glasses attribute vector according to a formula (2) and a formula (3):
Figure FDA0003919995690000021
Figure FDA0003919995690000022
Figure FDA0003919995690000023
in formula (2)
Figure FDA0003919995690000024
Is the average characteristic value of the face images in the target image set, k represents the number of images in the face image set with similar attributes, and theta (x) s ) Representing a new feature representation, theta (x), of a facial image in a source image set in a depth feature space t ) Represents the new feature representation of the face image in the target image set in the depth feature space, and/or>
Figure FDA0003919995690000025
K nearest neighbors, </or greater, representing source image sets having similar attributes to θ (x)>
Figure FDA0003919995690000026
K nearest neighbors representing that the target image set has similar properties to theta (x), in equation (3),
Figure FDA0003919995690000027
is a source diagramCalculating the average characteristic value of the face image in the image set through the difference value of the average characteristic value of the source image set and the average characteristic value of the target image set to obtain a glasses attribute vector G;
the step 4 is specifically implemented according to the following steps:
the face image after the removal of the glasses attribute is still in the feature space, so that the face image needs to be reflected back to the pixel space to reconstruct a visual pixel image, and formulas for defining an objective function of inverse mapping are shown in formulas (5) and (6):
Figure FDA0003919995690000028
/>
R v (w)=∑((w i+1,j -w i,j ) 2 +(w i,j+1 -w i,j ) 2 ) (6)
in the formula (5), w represents a target pixel image obtained after inverse mapping, namely a reconstructed pixel image after removing glasses, theta (w) represents a new feature representation of a non-glasses face image obtained after removing a glasses attribute vector in a depth feature space, alpha is an adjusting coefficient, G is a glasses attribute vector, b is a glasses-wearing face image to be removed, theta (b) represents a new feature representation of an input glasses-wearing face image mapped into the feature space, the first term is a loss term, a loss value between depth feature data of the currently input face image and depth feature data of the target image is calculated, and the smaller the loss value of the two loss values is, the reconstructed target image is closer to the face image after removing glasses; the second term is a regularization data term which is used as image prior, and the smoothness of the image can be ensured by adding the regularization term; r v For facilitating pixel smoothing, v denotes the total variation term, λ v Is a coefficient balancing the regular term, equation (6) is a formula for calculating the regular term, w i,j Is the pixel value at the target image (i, j) location.
2. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 1 is specifically implemented according to the following steps:
step 1.1, selecting a well-divided glasses-wearing face image set and a glasses-free face image set from an LFW face image data set as a training set;
step 1.2, selecting N glasses-wearing face images and N glasses-free face images with similar attributes from a training set by adopting a K nearest neighbor algorithm, wherein the specific operation of obtaining the images by the K nearest neighbor algorithm is as follows:
Figure FDA0003919995690000031
wherein, when the face image of the person wearing the glasses is wanted to be obtained, x i Representing the face sample image of the person wearing the glasses to be measured, y i Representing one face image in the face image set with glasses; when a glasses-free face image is desired, x i Representing the image of the sample of the face to be examined, y i Representing one face image in the non-glasses face image set, i represents the serial number of the current input image, n represents the total number of the images, cos beta represents the similarity of the two face images, beta is a variable, and if the value of the cos beta is 1, the two face images are completely repeated; if the value of cos beta is larger, the more similar the attributes in the two face images are; if the value of cos beta is smaller, the attribute in the two face images is more dissimilar.
3. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein N =100 in step 1.2.
4. The automatic glasses removing method based on convolutional neural network feature reconstruction as claimed in claim 2, wherein the convolutional neural network setting in step 2 is specifically as follows:
the first layer of the convolutional neural network is a data input layer which is used for inputting a face image to be processed; the second layer is a convolution calculation layer, and a feature mapping graph with the depth of 3 is obtained through convolution operation; the third layer is a pooling layer which is sandwiched between the continuous convolutional layers, unnecessary redundant information is removed through pooling operation, the size of a feature mapping graph generated by the convolutional layers is reduced, a new feature mapping graph with the depth of 3 is obtained, the operation is repeated, a depth feature mapping graph with the depth of 5 is obtained, and a ReLU nonlinear activation unit is adopted after each convolution in the convolutional neural network, so that the network structure has the capability of classifying nonlinear data.
5. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the value of k in step 2.2 is 100.
6. The automatic eyeglass removal method based on convolutional neural network feature reconstruction as claimed in claim 1, wherein the step 3 is specifically implemented according to the following steps:
step 3.1, inputting the face image to be removed to a pre-trained Visual Geometry Group network VGG for feature mapping, and extracting face features to obtain new feature representation of the face image in a depth feature space;
step 3.2, performing difference operation on the new feature representation of the face image in the depth space and the glasses attribute vector obtained in the step 2 to complete the removal operation of the glasses attribute, wherein the glasses attribute removal formula is shown as a formula (4):
θ(w)=θ(b)-αG (4)
in the formula (4), b is a face image of a person wearing glasses to be removed, theta (b) is a new feature representation that an input face image of the person wearing glasses to be removed is mapped into a feature space, alpha is an adjustment coefficient, the removal degree of the glasses is influenced by the value of alpha, G is a glasses attribute vector, w is a face image without glasses obtained after the glasses attribute vector is removed, and theta (w) is a new feature representation in a depth feature space of the face image without glasses obtained after the glasses attribute vector is removed.
7. The method of claim 6, wherein the method comprises performing a convolutional neural network feature reconstruction based on an auto-removal of glassesCharacterized in that said λ v Is 0.001.
8. The automatic eyeglass removal method based on convolutional neural network feature reconstruction of claim 7, wherein the adjustment coefficient α is 3.
CN201910808296.2A 2019-08-29 2019-08-29 Automatic glasses removing method based on convolutional neural network feature reconstruction Active CN110866436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910808296.2A CN110866436B (en) 2019-08-29 2019-08-29 Automatic glasses removing method based on convolutional neural network feature reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910808296.2A CN110866436B (en) 2019-08-29 2019-08-29 Automatic glasses removing method based on convolutional neural network feature reconstruction

Publications (2)

Publication Number Publication Date
CN110866436A CN110866436A (en) 2020-03-06
CN110866436B true CN110866436B (en) 2023-04-07

Family

ID=69652431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910808296.2A Active CN110866436B (en) 2019-08-29 2019-08-29 Automatic glasses removing method based on convolutional neural network feature reconstruction

Country Status (1)

Country Link
CN (1) CN110866436B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020579A (en) * 2011-09-22 2013-04-03 上海银晨智能识别科技有限公司 Face recognition method and system, and removing method and device for glasses frame in face image
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN107944385A (en) * 2017-11-22 2018-04-20 浙江大华技术股份有限公司 A kind of method and device for being used to determine spectacle-frame region
CN108182390A (en) * 2017-12-14 2018-06-19 浙江大华技术股份有限公司 A kind of spectacle-frame minimizing technology and device based on facial image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5136965B2 (en) * 2008-09-03 2013-02-06 日本電気株式会社 Image processing apparatus, image processing method, and image processing program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020579A (en) * 2011-09-22 2013-04-03 上海银晨智能识别科技有限公司 Face recognition method and system, and removing method and device for glasses frame in face image
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN107944385A (en) * 2017-11-22 2018-04-20 浙江大华技术股份有限公司 A kind of method and device for being used to determine spectacle-frame region
CN108182390A (en) * 2017-12-14 2018-06-19 浙江大华技术股份有限公司 A kind of spectacle-frame minimizing technology and device based on facial image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Deep Convolution Neural Networks for Automatic Eyeglasses Removal;MAO LIANG 等;《2nd International Conference on Artificial Intelligence and Engineering Applications》;20170923;全文 *

Also Published As

Publication number Publication date
CN110866436A (en) 2020-03-06

Similar Documents

Publication Publication Date Title
CN110399929B (en) Fundus image classification method, fundus image classification apparatus, and computer-readable storage medium
CN108830818B (en) Rapid multi-focus image fusion method
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
Zhao et al. Supervised segmentation of un-annotated retinal fundus images by synthesis
CN108615010B (en) Facial expression recognition method based on parallel convolution neural network feature map fusion
CN108399611B (en) Multi-focus image fusion method based on gradient regularization
Quellec et al. Fast wavelet-based image characterization for highly adaptive image retrieval
CN110634170B (en) Photo-level image generation method based on semantic content and rapid image retrieval
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
Wang et al. GKFC-CNN: Modified Gaussian kernel fuzzy C-means and convolutional neural network for apple segmentation and recognition
Zhu et al. Learning deep patch representation for probabilistic graphical model-based face sketch synthesis
CN109993208B (en) Clustering processing method for noisy images
CN115359576A (en) Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN115969329A (en) Sleep staging method, system, device and medium
CN115049952A (en) Juvenile fish limb identification method based on multi-scale cascade perception deep learning network
Fu et al. A blind medical image denoising method with noise generation network
Zhang et al. Face recognition under varying illumination based on singular value decomposition and retina modeling
Zhou et al. Multi-objective evolutionary generative adversarial network compression for image translation
Li et al. Multi-scale aggregation feature pyramid with cornerness for underwater object detection
CN110866436B (en) Automatic glasses removing method based on convolutional neural network feature reconstruction
CN109887023B (en) Binocular fusion stereo image quality evaluation method based on weighted gradient amplitude
Yu et al. Prototypical network based on Manhattan distance
Jiang et al. Single image detail enhancement via metropolis theorem
Li et al. Unsupervised neural rendering for image hazing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant