CN113470182B - Face geometric feature editing method and deep face remodeling editing method - Google Patents
Face geometric feature editing method and deep face remodeling editing method Download PDFInfo
- Publication number
- CN113470182B CN113470182B CN202111029442.5A CN202111029442A CN113470182B CN 113470182 B CN113470182 B CN 113470182B CN 202111029442 A CN202111029442 A CN 202111029442A CN 113470182 B CN113470182 B CN 113470182B
- Authority
- CN
- China
- Prior art keywords
- face
- local
- features
- geometric
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/04—Indexing scheme for image data processing or generation, in general involving 3D image data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2021—Shape modification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Architecture (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a face geometric feature editing method and a deep face remodeling editing method, which comprises the following steps: acquiring a geometric basic face image, and detecting face key points from the geometric basic face image; connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume integral variation is formed by training key points of a human face on a human face data set, and parameterizes a natural human face shape to enable the characteristics of a hidden space of the automatic encoder to decode the key points of the human face which are natural, smooth and accord with the geometric characteristics of the human face; acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder; and rendering the optimized grid into a human face geometric feature map. The invention is suitable for the fields of computer vision and computer graphics.
Description
Technical Field
The invention relates to a face geometric feature editing method and a deep face reshaping editing method. It is suitable for the fields of computer vision and computer graphics.
Background
Human face image editing is one of important research directions of computer vision and graphics, and has wide application in mass media and video industry. Early traditional face editing methods implemented editing mainly through image warping and pixel computational rendering, and it was difficult to generate details and handle hidden areas of the eyes and mouth.
In recent years, interactive face editing methods can be roughly divided into two types: one is to generate human faces from the condition input overall depth for editing, for example, Zhu et al, in 2020, "Image synthesis with a detailed region-adaptive normalization" published by Zhu et al in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration; another is a method of treating local modifications as image completion, such as SC-FEGAN, surface Editing general additive Network With User's Sketch and Color, published by Jo et al in 2019 in Proceedings of the IEEE/CVF International Conference on Computer Vision.
The above method, while capable of generating natural results, requires the user to provide line draft or semantic graph input approximating network training data for high quality drawing. When the input line draft or semantic graph is not real enough, the result is correspondingly flawed. This is relatively difficult for beginners or users without drawing skills to use.
There are also some efforts to optimize the line draft input, such as "Deep plastic surgery" published by Yang et al in Proceedings of the European conference on computer vision in 2020, but user adjustment of the input is still required to achieve good results.
The existing traditional editing methods such as liquefaction deformation and the like cannot naturally and efficiently process large-scale editing of mouths and eyes, and the editing degree is limited, as shown in fig. 4. In recent face editing, semantic graphs and line manuscripts are used as conditional input for network training, the generated result is highly fitted to input, and the input of novice users of beginners is often abstract and greatly different from a training set. Most of the work is directed to the user input into the network and thus the resulting results are flawed. At present, some works exist to optimize the line manuscript input by the user, which has certain effect, but the user still needs to repeatedly fine tune, because the editing freedom of the line manuscript graph is too high, the control precision of the user is correspondingly reduced after the optimization. Drawing-based interaction is balanced between the accuracy of control and the natural degree of a generated result, the existing face editing technology is not easy to use for common users, and the editing efficiency is low.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the existing problems, a face geometric feature editing method and a deep face reshaping editing method are provided.
The technical scheme adopted by the invention is as follows: a human face geometric feature editing method is characterized by comprising the following steps:
acquiring a geometric basic face image, and detecting face key points from the geometric basic face image;
connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume variation is formed by training face key points on a face data set, and parameterizes a natural face shape to enable the characteristics of a hidden space to decode the face key points which are natural, smooth and accord with the geometric characteristics of the face;
acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder;
and rendering the optimized grid into a human face geometric feature map.
The training of the convolutional variational auto-encoder comprises:
counting and calculating average key points of the key points in the face database;
performing Delaunay triangulation to determine the connection relation fixed topology of the average key points, and taking the connection relation as the edge relation of graph convolution;
a graph volume variational automatic encoder aiming at key points of a face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of key points output by decoding are constrained to be consistent with the input by using an L2 loss function, and the face shape of a natural face is parameterized through training, so that the features of a hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric features of the face.
The graph convolution variation automatic encoder optimizes grids formed by connecting other positions in the human face key points and the human face key points according to the coordinate difference before and after dragging of the dragging points and the positions of the fixed points, and comprises the following steps:
regarding the mesh after the user drags the point as the mesh of the local missing vertex, and regarding the key points of the face other than the fixed point and the dragged point as the missing points which can move freely; the initial mesh is used for implicit spatial coding initialization;
and iteratively optimizing a final deformed grid according to the grid decoded by the minimized hidden space and the different points corresponding to the grid after the immobile point and the dragging point are set.
A deep face reshaping editing method is characterized in that:
segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature graph is edited according to the face geometric feature editing method;
inputting the face appearance feature map into a local generation module of a corresponding part according to the face part, and extracting local appearance features corresponding to each part of the face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
and fusing the local face features corresponding to all parts of the face through the trained global fusion module to generate a face editing image with the geometric features of the face geometric feature map and the appearance features of the face appearance feature map.
The local generation module generates the local features of the face including the corresponding local geometric features and the local appearance features corresponding to each part of the face based on the local geometric features and the local appearance features, and the local generation module comprises:
inputting the local geometric features into a convolution framework of the local generation module;
the human face appearance feature map is coded to high-dimensional features through a convolution layer, the features are divided into h.w index sequences according to position codes, wherein h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, and the dimensions correspond to the height and the width of an image;
and combining each sequence and the learnable position coding parameters, sending the combined sequences into a Transformer encoder for recombination to obtain and inject parameters corresponding to a Sandwich normalization layer in the framework, and finally convoluting the framework to output an edited image.
Features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.
The global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.
The training of the local generation module and the global fusion module comprises:
a dual-scale PatchGAN discriminator D is employed in both the local generation module and the global fusion module to match the distribution between the generated results and the actual results, as follows:
where D (L, I) is the output of the arbiter and G (L, I) is the output of the generator, LinFor input of geometric features of human faces, IinThe face appearance feature image is input.
The training of the local generation module and the global fusion module comprises:
the feature matching loss function of the multi-scale discriminator used in Pix2PixHD is used, as follows:
wherein T is the number of layers of the discriminator, N is the number of characteristic elements of the ith layer, k is the scale serial number of the multi-scale discriminator, IinIs a face appearance feature map, LinAs a geometric map of the face, IoutIs the output result image.
Constraining the color difference of the a and b channels of the input-output image converted into the CIELAB color space as follows:
wherein Lab ()abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.
In the training of the local generation module and the global fusion module, a pre-training network VGG19 is adopted to encode the input and output images and perform a high-level characteristic loss function;
in the global fusion module training, a pre-training face recognition network ArcFace is used for distinguishing input and output, and the characteristic cosine similarity is calculated to be used as a loss function, wherein the loss function is as follows:
wherein R refers to a face recognition network ArcFace, and VGG refers to a pre-training network VGG 19;andgenerating face images input and output in the module locally; i isinAnd IoutThe face images are input and output in the global fusion module.
The deep face remodeling editing device is characterized in that:
the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature graph is edited according to the face geometric feature editing method;
the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face;
the local generation unit is used for generating the local human face features which are corresponding to all parts of the human face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features;
and the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program realizes the steps of the deep face reshaping editing method when being executed.
A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program realizes the steps of the deep face reshaping editing method when being executed.
An interactive interface, comprising:
the display area I is used for displaying the geometric basic face image uploaded by the user and face key points which are detected by the geometric basic face image and can be operated by the user;
the display area II is used for displaying a human face geometric feature map corresponding to the human face key points in the display area I;
the display area III is used for displaying the face appearance characteristic diagram uploaded by the user;
and the display area IV is used for displaying the face editing image which is edited and generated by adopting the depth face reshaping editing method.
The invention has the beneficial effects that: the invention uses the graph convolution network to code the face key points, and the hidden layer vector is iteratively optimized according to the loss of the user dragging point and the current network output corresponding point, and finally the corresponding deformed shape is obtained, thereby the deformed face key point shape can be iteratively optimized according to the user dragging speed.
The human face appearance feature map is divided into four parts (eyes, a nose, a mouth and a background) according to the parts, each part is separately locally coded by an appearance coder designed based on a Transformer to generate corresponding local appearance features, then corresponding local geometric features are generated according to a deformed key point map, finally, all the features are spliced into a final result by a global fusion module with a U-net structure, and the spliced image has the geometric shape features edited by dragging of a user and the appearance features of the human face appearance feature map.
The face portrait is reshaped and edited based on face key point deformation and a depth generation network, the existing face editing technical means scheme is supplemented, and compared with other editing modes based on drawing, the face portrait reshaping method is easier for common users to use in dragging and editing.
Drawings
Fig. 1 is a network architecture diagram of the embodiment, in which the left half shows a face key point deformation network and an optimization process, and the right half shows a local-to-global generation network.
Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment.
FIG. 3 illustrates a multi-instance shape editing effect of an embodiment.
Fig. 4 shows the effect of the embodiment compared with the conventional image warping method.
Fig. 5 shows the effect of two consecutive edits of the embodiment.
FIG. 6 shows the real-time drag interaction interface of the present embodiment.
Detailed Description
As shown in fig. 1, the present embodiment is a method for editing deep face reshaping capable of real-time interaction based on key point dragging, and specifically includes the following steps:
segmenting a human face geometric feature image (semantic mask image) which is well deformed and edited into four parts, namely eyes, a nose, a mouth and a background according to human face parts to obtain local geometric features which respectively correspond to the human face parts, and inputting the local geometric features into a local generation module convolution framework;
the human face appearance feature map IinInputting the corresponding local generation module according to the human face parts (eyes, nose, mouth and background) to extract the corresponding local appearance characteristics of each part of the human face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
the local features of the face corresponding to each part of the face are fused by a trained global fusion module to generate a face editing image Iout. Ideally, the face edits the image IoutThe human face geometric feature map has the same facial geometric feature as the human face geometric feature map and has the same facial appearance feature as the human face appearance feature map.
In this embodiment, the geometric feature map of the face is edited by a face geometric feature editing method based on a geometric basic face image.
The method for editing the geometric characteristics of the human face in the embodiment comprises the following steps:
acquiring geometric basic face image (human face real image, and human face appearance characteristic image I)inThe same picture or different pictures) and detects the face key point P from the geometric basic face imagein;
Connecting key points into a grid M of a 2D plane according to the positions of five sense organs on a geometric basis face imageinEncoding the grid input image convolution Variation Automatic Encoder (VAE);
obtaining key point P of user to faceinIncluding the user's face key point PinSome points are set as fixed points, and key points P of the human face are setinIn the dragging operation of some points, the graph convolution variational automatic encoder iteratively optimizes the hidden space encoding of the grid by taking the coordinate difference of key points before and after dragging as a loss function, so that the grid M output by the network after iterative optimizationoutThe original shape characteristics are kept while the dragging and editing of the user are met;
and rendering the grid into a semantic mask image after the edited grid is obtained, and taking the semantic mask image as a human face geometric characteristic image input in the deep human face reshaping editing method.
In this embodiment, the idea of 3D mesh deformation is applied to the deformation of the plane face key points, and therefore the plane 2D key points are directly adopted because the detection of the plane key points of the face is very accurate at present, and in contrast, the 3D face reconstruction not only has larger data and is slower in reconstruction, but also has larger errors compared with the 2D key points. The 2D key point data is compact, and the shape characteristics of the human face can be well described.
In the embodiment, firstly, the average key points of the key points in the face database are statistically calculated, then, Delaunay triangulation is carried out to determine the fixed topology of the connection relation of the vertexes, the connection relation is used as a limb relation of graph convolution, then, a graph integral Variational Automatic Encoder (VAE) aiming at the key points of the face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of the key points output by decoding are constrained to be consistent with the input by an L2 loss function, and the natural face shape is parameterized through training, so that the features of the hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric feature distribution of the face.
The mesh after the user-dragged point will be treated as mesh M 'of the local missing vertices'usrThat is, except the fixed point and the dragging point set by the user, the other points are regarded as the missing points which can move freely, and the initial grid M is usedinFor implicit spatial coding initialization. In this example, the final deformed mesh is iteratively optimized by minimizing the dissimilarity point between the mesh decoded by the hidden space and the mesh defined by the user, as shown in the following formula:
where dec represents the decoder, Π being for selecting the sum M'usrA matrix of identical point sequences.
In this embodiment, in order to better enable the network to learn the characteristics of the distribution of the face appearance texture and better control the local details, a local generation module is designed in combination with a transform encoder, and each structured area is provided with a local generation moduleTraining is performed on the images. As shown in the right half area of fig. 1, the face appearance feature map IinThe method comprises the steps of firstly encoding convolution layers to high-dimensional features, then splitting the features into h and w indexed sequences according to position encoding (h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, the dimensions of the h and w correspond to the height and the width of an image), enabling each sequence to be regarded as a word of the high-dimensional features of an appearance, combining each sequence and learnable position encoding parameters, sending the combined sequences to a Transformer encoder for recombination to obtain parameters corresponding to a Sandwich normalization layer in a convolution semantic mask diagram framework, injecting the parameters, and finally outputting an edited image by the convolution framework.
Random noise is injected into the convolution skeleton in the local generation module of the example to enhance the robustness of generation and avoid detail blurring.
In order to combine the parts together, the global fusion module in this embodiment fuses the outputs of the four parts together using a network with a U-net structure. In order to keep the generation details of the local generation module as much as possible and eliminate the style difference between the generated results of all the parts as much as possible, the features before the convolution layer with dimension reduced to 3 finally in the local generation module are combined, because the features keep rich higher dimension information and have the same size as the size of the input picture, the direct alignment of all the parts is convenient.
In the embodiment, the global fusion module copies each face local feature to a zero value tensor of the size of the input picture according to the coordinate position of each part in the picture, and then four features with the same size as the input picture are connected in dimension to form the input of the U-net network.
In order to avoid changing the unmodified picture part except the human face as much as possible, the embodiment uses the key point convex hull to deduct the background, sends the background into the convolutional network coding and injects the convolutional network coding into the decoder of the global fusion module, so that the final generated result can be well fused with the background.
In order to enable the local generation module and the global fusion module to learn the distribution of the face shape and improve the quality of the generated and edited image, the embodiment performs a series of preprocessing on the data set and designs and utilizes a plurality of loss functions to constrain the generated result. In this embodiment, the local generation module is trained first, and then the parameters of the local generation module are fixed to train the global fusion module.
This example uses CelebA-HQ as the training data set and performs a series of pre-treatments:
firstly, screening out side faces with the left and right angles of the face outside the range of-15 degrees and +15 degrees by a deep face alignment identification method;
then, Face + + dense Face key point prediction API is used for detecting the faces of the data set, 772 key points are saved in each Face, and a semantic mask graph is rendered.
This embodiment also screens out pictures with sunglasses because the eye key points are difficult to predict and do not represent the shape of the glasses. The window sizes of the four parts of eyes, nose, mouth and background are set to be 128 × 320, 160 × 160, 192 × 192 and 512 × 512 in sequence, and all images are scaled to be 512 × 512.
In this embodiment, training is performed in a classical training manner of generating an anti-network, and a dual-scale PatchGAN discriminator D is adopted in both the local generation module and the global fusion module to match the distribution between the generated result and the actual result, as shown in the following formula:
where D (L, I) is the output of the arbiter and G (L, I) is the output of the generator, LinFor input of geometric features of human faces, IinThe face appearance feature image is input.
In order to achieve more robust training for the local generation module and the global fusion module, the present example uses the feature matching loss function of the multi-scale discriminator used in Pix2PixHD, as follows:
where T is the number of layers of the discriminator and N is the characteristic element of the ith layerNumber, k is the scale number of the multi-scale discriminator, IinIs a face appearance feature map, LinAs a geometric map of the face, IoutIs the output result image.
In order to keep the tone of the generated result consistent, the present embodiment imposes a constraint on the color difference of the a and b channels of the input-output image converted into the CIELAB color space, as follows:
wherein Lab ()abTo convert an RGB image into a CIELAB color space and take out the functions of the a and b channels.
One of the keys of the quality of the editing result is to maintain the characteristic attributes of the character, and the embodiment uses a mixed loss function for control in the local and global phases. In both local and global network training, the pre-training network VGG19 is used to encode the input and output images as a high-level feature loss function. In order to better maintain the character features in global fusion, the embodiment further uses a pre-training face recognition network ArcFace to discriminate input and output, and calculates the cosine similarity of features to be used as a loss function, as shown in the following formula:
wherein R refers to a face recognition network ArcFace, it is noted that the loss function calculated by the network is only suitable for global faces and is not suitable for local training, and VGG refers to a pre-training network VGG 19;andgenerating face images input and output in the module locally; i isinAnd IoutThe face images are input and output in the global fusion module.
The geometrical characteristics mainly comprise two aspects: 1. shape information such as the shape of the five sense organs, the face shape of a person, the length of hair, and the like; 2. geometric details, i.e. the representation of details of geometric features of a human face, such as wrinkles of the person's face, the trend of the hair, etc.
The appearance characteristics mainly include three contents: 1. color information such as color development, skin color, lip color, and the like of a human face; 2. material information, namely the texture of the hair and skin of the human face, such as the smoothness of the skin and the like; 3. the illumination information is information of the influence of the illumination condition on the brightness of the human face, such as the brightness of light, the change of shadow, and the like. In some cases, the effects of the above factors on appearance are mutual, for example, illumination changes may affect the expression of skin color, and appearance characteristics do not make clear division between each of the above factors.
Fig. 2 is a schematic diagram illustrating a process of dynamically editing a picture by a user in the embodiment. The embodiment is easy to use for common users, after a user drags a face key point, the key point grid deformation is carried out in real time, the editing effect is automatically generated, and the user can continuously edit each part.
Fig. 3 shows a multi-case shape editing effect of the embodiment, a user may rapidly drag a face key point to implement functions such as hairline reduction (leftmost column), expression control, face thinning, and the like, a corresponding face key point line graph is listed in the first row of fig. 3, a dragged point is displayed in a small box, and an arrow in the small box is a dragging direction.
Fig. 4 shows a comparison of the effect of the present embodiment and the effect of the conventional image warping method, where the first image is an original image, the second image is a feature point before and after deformation, the third image is a result of the conventional method, and the last image is a processing result of the present embodiment. The traditional image distortion method is difficult to process the positions of eyes and mouth, after the mouth is pulled away in the example, the image distortion method cannot generate images of teeth and the like, so that the images are very unnatural, and the method of the embodiment can automatically generate the corresponding missing parts.
Fig. 5 shows the effect of two consecutive edits in this embodiment, corresponding face keypoint wiring diagrams are listed in the first and third rows.
Fig. 6 shows an interactive interface which can be dragged by a user in real time in this embodiment, where the interactive interface includes a display area i, a display area ii, and a display area iv, where the display area i is used to display a geometric-base face image uploaded by the user and face key points detected by the geometric-base face image and available for the user to operate; the display area II is used for displaying a face geometric feature map corresponding to the face key points; and the display area IV is used for displaying the face editing image which is edited and generated by adopting a depth face reshaping editing method. In the deep face remodeling editing method corresponding to the interactive interface, the geometric basic face image is the same as the face appearance feature image, so that no other display area is arranged on the face appearance feature image.
The embodiment also provides a deep face reshaping editing device, which comprises a geometric feature extraction unit, an appearance feature extraction unit, a local generation unit and a global fusion unit, wherein the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face part to obtain local geometric features corresponding to all parts of the face; the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face; the local generation unit is used for generating the local face features which are corresponding to each part of the face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features; the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
The present embodiment also provides a storage medium on which a computer program executable by a processor is stored, where the computer program is executed to implement the steps of the deep face reshaping editing method in the present embodiment.
The embodiment also provides a computer device, which has a memory and a processor, wherein the memory stores a computer program capable of being executed by the processor, and the computer program realizes the steps of the deep face reshaping editing method in the embodiment when being executed.
Claims (11)
1. A human face geometric feature editing method is characterized by comprising the following steps:
acquiring a geometric basic face image, and detecting face key points from the geometric basic face image;
connecting the key points of the human face into grids according to the positions of five sense organs on the geometric basic human face image, and encoding the grids by a convolution variational automatic encoder of an input image; the automatic encoder for volume variation is formed by training face key points on a face data set, and parameterizes a natural face shape to enable the characteristics of a hidden space to decode the face key points which are natural, smooth and accord with the geometric characteristics of the face;
acquiring a fixed point determined by a user in a face key point and a dragging point dragged by the user, and optimizing the grids formed by connecting the rest point positions in the face key point and the face key point according to the coordinate difference before and after dragging of the dragging point and the fixed point position by using a graph volume variational automatic encoder;
and rendering the optimized grid into a human face geometric feature map.
2. The method for editing geometric features of human faces according to claim 1, wherein the training of the graph convolution variational automatic encoder comprises:
counting and calculating average key points of the key points in the face database;
performing Delaunay triangulation to determine the connection relation fixed topology of the average key points, and taking the connection relation as the edge relation of graph convolution;
a graph volume variational automatic encoder aiming at key points of a face plane is trained on a face data set, the training process is the same as that of a classical variational automatic encoder, the coordinates of key points output by decoding are constrained to be consistent with the input by using an L2 loss function, and the face shape of a natural face is parameterized through training, so that the features of a hidden space of the face can decode the key points of the face which are naturally smooth and accord with the geometric features of the face.
3. The method for editing geometric features of a human face according to claim 1, wherein the automatic graph convolution variation encoder optimizes grids formed by connecting the positions of the rest points of the key points of the human face and the key points of the human face according to the coordinate difference before and after dragging of a dragging point and the position of a fixed point, and comprises the following steps:
regarding the mesh after the user drags the point as the mesh of the local missing vertex, and regarding the key points of the face other than the fixed point and the dragged point as the missing points which can move freely; the initial mesh is used for implicit spatial coding initialization;
and iteratively optimizing a final deformed grid according to the grid decoded by the minimized hidden space and the different points corresponding to the grid after the immobile point and the dragging point are set.
4. A deep face reshaping editing method is characterized in that:
segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;
inputting the face appearance feature map into a local generation module of a corresponding part according to the face part, and extracting local appearance features corresponding to each part of the face;
generating local face features which are corresponding to all parts of the face and comprise corresponding local geometric features and local appearance features through a local generation module based on the local geometric features and the local appearance features;
and fusing the local face features corresponding to all parts of the face through the trained global fusion module to generate a face editing image with the geometric features of the face geometric feature map and the appearance features of the face appearance feature map.
5. The method for deep face reshaping and editing according to claim 4, wherein the local generation module generates, based on the local geometric features and the local appearance features, local features of the face corresponding to each part of the face, the local features including the corresponding local geometric features and the corresponding local appearance features, and includes:
inputting the local geometric features into a convolution framework of the local generation module;
the human face appearance feature map is coded to high-dimensional features through a convolution layer, the features are divided into h.w index sequences according to position codes, wherein h and w are dimensions of a third dimension and a fourth dimension of the high-dimensional features, and the dimensions correspond to the height and the width of an image;
and combining each sequence and the learnable position coding parameters, sending the combined sequences into a Transformer encoder for recombination to obtain and inject parameters corresponding to a Sandwich normalization layer in the framework, and finally convoluting the framework to output an edited image.
6. A deep face reshaping editing method according to claim 4, wherein: features before the convolutional layer of the last dimension down to 3 in the local generation module are merged.
7. A deep face reshaping editing method according to claim 4, wherein: the global fusion module fuses the local features of the human face corresponding to each part of the human face together by using a network of a U-net structure.
8. The deep face remodeling editing device is characterized in that:
the geometric feature extraction unit is used for segmenting the face geometric feature image according to the face parts to obtain local geometric features corresponding to all the parts of the face; the face geometric feature map is edited according to the face geometric feature editing method of any one of claims 1 to 3;
the appearance feature extraction unit is used for inputting the face appearance feature image into the local generation module of the corresponding part according to the face part and extracting the local appearance feature corresponding to each part of the face;
the local generation unit is used for generating the local human face features which are corresponding to all parts of the human face and comprise the corresponding local geometric features and the corresponding local appearance features through the local generation module based on the local geometric features and the local appearance features;
and the global fusion unit is used for fusing the local human face features corresponding to all parts of the human face through the trained global fusion module to generate a human face image with the geometric features of the human face feature image and the appearance features of the human face appearance feature image.
9. A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program is used for realizing the steps of the deep face reshaping editing method of any one of claims 4-7 when being executed.
10. A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program is used for realizing the steps of the deep face reshaping editing method of any one of claims 4-7 when being executed.
11. An interactive interface, comprising:
the display area I is used for displaying the geometric basic face image uploaded by the user and face key points which are detected by the geometric basic face image and can be operated by the user;
the display area II is used for displaying a human face geometric feature map corresponding to the human face key points in the display area I;
the display area III is used for displaying the face appearance characteristic diagram uploaded by the user;
a display area IV for displaying the face editing image edited and generated by the deep face reshaping editing method according to any one of claims 4 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111029442.5A CN113470182B (en) | 2021-09-03 | 2021-09-03 | Face geometric feature editing method and deep face remodeling editing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111029442.5A CN113470182B (en) | 2021-09-03 | 2021-09-03 | Face geometric feature editing method and deep face remodeling editing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113470182A CN113470182A (en) | 2021-10-01 |
CN113470182B true CN113470182B (en) | 2022-02-18 |
Family
ID=77867216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111029442.5A Active CN113470182B (en) | 2021-09-03 | 2021-09-03 | Face geometric feature editing method and deep face remodeling editing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113470182B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114119977B (en) * | 2021-12-01 | 2022-12-30 | 昆明理工大学 | Graph convolution-based Transformer gastric cancer canceration region image segmentation method |
US11900545B2 (en) * | 2022-01-06 | 2024-02-13 | Lemon Inc. | Creating effects based on facial features |
CN114783017A (en) * | 2022-03-17 | 2022-07-22 | 北京明略昭辉科技有限公司 | Method and device for generating confrontation network optimization based on inverse mapping |
CN114845067B (en) * | 2022-07-04 | 2022-11-04 | 中科计算技术创新研究院 | Hidden space decoupling-based depth video propagation method for face editing |
CN115311730B (en) * | 2022-09-23 | 2023-06-20 | 北京智源人工智能研究院 | Method, system and electronic device for detecting key points of human face |
CN115810215A (en) * | 2023-02-08 | 2023-03-17 | 科大讯飞股份有限公司 | Face image generation method, device, equipment and storage medium |
CN117594202B (en) * | 2024-01-19 | 2024-04-19 | 深圳市宗匠科技有限公司 | Wrinkle size analysis method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7450126B2 (en) * | 2000-08-30 | 2008-11-11 | Microsoft Corporation | Methods and systems for animating facial features, and methods and systems for expression transformation |
CN109978930A (en) * | 2019-03-27 | 2019-07-05 | 杭州相芯科技有限公司 | A kind of stylized human face three-dimensional model automatic generation method based on single image |
CN110288697A (en) * | 2019-06-24 | 2019-09-27 | 天津大学 | 3D face representation and reconstruction method based on multi-scale graph convolutional neural network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288851B (en) * | 2020-10-23 | 2022-09-13 | 武汉大学 | A 3D Face Modeling Method Based on Dual-Traffic Network |
CN112991484B (en) * | 2021-04-28 | 2021-09-03 | 中科计算技术创新研究院 | Intelligent face editing method and device, storage medium and equipment |
-
2021
- 2021-09-03 CN CN202111029442.5A patent/CN113470182B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7450126B2 (en) * | 2000-08-30 | 2008-11-11 | Microsoft Corporation | Methods and systems for animating facial features, and methods and systems for expression transformation |
CN109978930A (en) * | 2019-03-27 | 2019-07-05 | 杭州相芯科技有限公司 | A kind of stylized human face three-dimensional model automatic generation method based on single image |
CN110288697A (en) * | 2019-06-24 | 2019-09-27 | 天津大学 | 3D face representation and reconstruction method based on multi-scale graph convolutional neural network |
Non-Patent Citations (2)
Title |
---|
Heterogeneous Face Attribute Estimation:A Deep Multi-Task Learning Approach;Hu Han等;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20181130;第40卷(第11期);第2597-2609页 * |
Pluralistic Image Completion;Chuanxia Zheng等;《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200109;第1438-1447页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113470182A (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113470182B (en) | Face geometric feature editing method and deep face remodeling editing method | |
Zhuang et al. | Dreameditor: Text-driven 3d scene editing with neural fields | |
Jam et al. | A comprehensive review of past and present image inpainting methods | |
WO2021140510A2 (en) | Large-scale generation of photorealistic 3d models | |
Bermano et al. | Facial performance enhancement using dynamic shape space analysis | |
Groshev et al. | GHOST—a new face swap approach for image and video domains | |
CN117036620B (en) | Three-dimensional face reconstruction method based on single image | |
CN112991484B (en) | Intelligent face editing method and device, storage medium and equipment | |
CN111275778A (en) | Human face sketch generation method and device | |
Ling et al. | Semantically disentangled variational autoencoder for modeling 3d facial details | |
CN116385667A (en) | Reconstruction method of three-dimensional model, training method and device of texture reconstruction model | |
CN117953180B (en) | A text-to-3D object generation method based on bimodal latent variable diffusion | |
He et al. | Data-driven 3D human head reconstruction | |
CN113129347A (en) | Self-supervision single-view three-dimensional hairline model reconstruction method and system | |
Ji et al. | Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network | |
WO2024222821A1 (en) | Three-dimensional modeling system and method based on hand-drawn sketch, intelligent association modeling method, sketch model editing method, and related device | |
US20240169701A1 (en) | Affordance-based reposing of an object in a scene | |
Berson et al. | A robust interactive facial animation editing system | |
CN117893673A (en) | Method and system for generating an animated three-dimensional head model from a single image | |
CN116468844A (en) | Illumination editing method and system for human face nerve radiation field | |
Li et al. | Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field | |
CN114283181A (en) | Dynamic texture migration method and system based on sample | |
Zhang et al. | Modeling multi-style portrait relief from a single photograph | |
CN118446930B (en) | Monocular video dressing human body space-time feature learning method based on nerve radiation field | |
Zhao et al. | Challenges and Opportunities in 3D Content Generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |