CN111524226B - Method for detecting key point and three-dimensional reconstruction of ironic portrait painting - Google Patents

Method for detecting key point and three-dimensional reconstruction of ironic portrait painting Download PDF

Info

Publication number
CN111524226B
CN111524226B CN202010316895.5A CN202010316895A CN111524226B CN 111524226 B CN111524226 B CN 111524226B CN 202010316895 A CN202010316895 A CN 202010316895A CN 111524226 B CN111524226 B CN 111524226B
Authority
CN
China
Prior art keywords
dimensional
face
model
vertex
exaggerated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010316895.5A
Other languages
Chinese (zh)
Other versions
CN111524226A (en
Inventor
张举勇
蔡泓锐
郭玉东
彭妆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010316895.5A priority Critical patent/CN111524226B/en
Publication of CN111524226A publication Critical patent/CN111524226A/en
Application granted granted Critical
Publication of CN111524226B publication Critical patent/CN111524226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a method for detecting key points and reconstructing three-dimensionally of ironic portrait painting, which comprises the following steps: constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; and performing network training by using the data set, and outputting a deformation representation model and camera projection parameters corresponding to the input sarcasian portrait painting through a convolutional neural network so as to predict the vertex coordinates and two-dimensional key point coordinates of the three-dimensional exaggerated face model. The method liberates the process of labeling key points for the sarcasian portrait, and with the help of a new face deformation representation and a huge data set, the trained convolutional neural network can directly reconstruct an exaggerated three-dimensional face through a deformation representation model obtained through prediction, and two-dimensional key point coordinates are obtained through camera projection parameters obtained through simultaneous prediction.

Description

Method for detecting key point and three-dimensional reconstruction of ironic portrait painting
Technical Field
The invention relates to the technical field of image processing and three-dimensional modeling, in particular to a method for detecting key points and reconstructing three-dimensional sarcasia portrait painting.
Background
Irony portraits are an artistic expression relying on two-dimensional images and three-dimensional models. The human face image generation method creates a visual effect with humorous colors by exaggerating certain features or details of the human face, and is often used in life scenes such as movies, advertisements and social contact. The artistic expression form is also proved to be capable of effectively improving the accuracy of face recognition in the fields of computer vision, cognitive psychology and the like. Because of its potential research prospects and wide use, the issues related to ironic portraits are attracting increasing numbers of researchers and enterprises to invest in.
Key point detection technology related to ironic portraits: compared with a normal face, the sarcasian portrait has the characteristics of exaggeration and diversity, so that the difficulty in identifying key points is high. Thus, there are few automatic key point detection algorithms on ironic portraits. On the other hand, many research topics of ironic portraits rely on key points, which are not only boring, but also time-consuming and laborious to label manually. Therefore, it is a matter of great significance to develop a key point detection algorithm related to the irony portrait, which not only fills the blank of research in this respect, but also helps the development of related topics.
Most of the current popular normal face key point detection algorithms are data-driven methods and depend on the design of a deep neural network structure. Such algorithms generally extract visual features of a human face or statistical features of pixels of a human face image from a single picture and return the positions of key points, and the extraction method comprises knowledge-based and algebraic feature-based methods. And the exaggerated face is rooted in the normal face, which needs to satisfy basic features of one face, such as the need to have a specific number of eyes, mouth, nose, ears, and the like. However, the exaggerated face usually exaggerates the features based on the normal face, so that a certain feature is greatly different among different pictures, such as the distribution of key points around the eyes. Due to the exaggerated differentiation and diversification of features, there are relatively few key point detection algorithms related to the sarcasm portrait.
Three-dimensional reconstruction techniques on ironic portraits: at present, two main methods are used for obtaining three-dimensional exaggerated face models: manual modeling and reconstruction based on deformation algorithms. Manual modeling, which is the earliest three-dimensional modeling means, is still widely used to generate an exaggerated human face three-dimensional model. But the process typically requires a person trained in specialized learning to do so on specialized modeling software such as AutoDesk Maya. Although having the advantage of high accuracy, it is more popular to obtain three-dimensional exaggerated face models based on morphing algorithms because it requires a lot of time and manpower. However, although the morphing algorithm has the advantage of automatic generation, the generated model is often limited in exaggerated style, and the three-dimensional exaggerated face with different shapes obtained by manual modeling is not diverse and not accurate enough. Moreover, most of the existing transformation algorithms depend on key points, so that time and labor are needed to label the key points, and once the label is inaccurate, the generated model is possibly not matched with the original two-dimensional ironic portrait.
The traditional method for generating a normal human face three-dimensional model based on images usually comprises the steps of firstly constructing a three-dimensional model of some people through a camera acquisition and other ways, then constructing a corresponding human face database through a statistical or dimension reduction based method, establishing a human face parameterized model (comprising a linear model and a nonlinear model), further parameterizing the complex three-dimensional human face into a low-dimensional parameter space, and reconstructing the corresponding normal human face by obtaining coordinate representation in the low-dimensional space. The traditional exaggerated face generation thought is characterized in that two-dimensional key points are marked on a single picture, and a corresponding exaggerated face is generated through key point constraint and a constructed parameterized model. The method is very dependent on key points, so that not only is time spent on the labeling task, but also the reconstructed three-dimensional model is directly influenced once the labeling accuracy is not high.
Disclosure of Invention
The invention aims to provide a method for detecting key points and reconstructing three-dimensional sarcasia face, which can automatically and quickly detect key points of an exaggerated face and generate a corresponding three-dimensional model, and has important practical application value in the fields of face recognition, animation generation, expression migration, AR/VR and the like.
The purpose of the invention is realized by the following technical scheme:
a method for detecting key points and reconstructing three-dimensional of ironic portrait painting comprises the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training, corresponding deformation representation model and camera projection parameters are obtained for the ironic portrait painting input, and therefore the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model are predicted.
It can be seen from the above technical solutions provided by the present invention that 1) the deformation on the face constrained by the deformation representation enables the generated face to still have the properties of the face, and the strong deformation representation model can also generate the face with an exaggerated style. 2) The human face deformation model and the projection parameters of the camera can be directly regressed from a single picture through a convolution neural network structure. 3) The two act together to obtain a more accurate three-dimensional exaggerated face model. Meanwhile, more accurate two-dimensional key point coordinates are obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting key points and reconstructing ironically portrait painting according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a test result performed by using a trained convolutional neural network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In the irony portrait face recognition field, the normal face-based keypoint detection algorithm is often not accurate enough because of the large distribution difference of some facial features among different pictures, and a large amount of time is still needed to adjust the positions of the keypoints after detection. In the field of irony portrait three-dimensional reconstruction, the traditional three-dimensional reconstruction method has insufficient expression capability of a base model, so that the exaggerated degree of a reconstructed face model is insufficient; some reconstruction algorithms based on optimization methods and key point constraints depend on the labeling of key points excessively, and once the labeling is not accurate enough, the generated three-dimensional model and the two-dimensional picture have large deviation. To this end, an embodiment of the present invention provides a method for detecting key points and reconstructing ironically portrait painting in three dimensions, as shown in fig. 1, which mainly includes the following steps:
step 1, constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasm portrait painting, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method.
The method mainly comprises the steps of constructing a network and collecting data; because the data set has the diversity of acquisition modes and the possibility of different data set processing, the three-dimensional exaggerated face model in the data set is required to have the same topological structure as the three-dimensional face template model, namely, different data share the same vertex number and adjacency relation, and the vertex sequence is the same on different models; in addition, the acquired face data is set to be sufficiently diverse.
Those skilled in the art will appreciate that the above-described normal face data set satisfying such conditions may be obtained by conventional means.
Step 2, in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each sarcasian portrait picture is calculated, and camera projection parameters are output; and predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode.
First, the principle of calculation of the deformation expression model will be described.
Recording the set of the top points on the three-dimensional face template model as V, V = { V = i |i=1,...,N v V is formed by all the top points V on the single face three-dimensional data i Wherein i is an index subscript, N v Is the total number of vertices; the obtained data set meets the condition that the number of vertexes and the sequence of the vertexes of the face data are the same, and meanwhile, the adjacency relation is also the same. Therefore, knowing the vertex set V and some index i, it can know which vertex is referred to.
Taking the three-dimensional face template model as a template face, and taking a three-dimensional exaggerated face model corresponding to the ironic portrait picture as a deformed face; constructing a vertex v 'with index i on deformed human face' i And a vertex v with index i on template face i Deformation gradient T between i Minimizing the energy function to solve for T i
Figure BDA0002459748670000041
Wherein N is i A subscript set of 1-neighborhood vertexes taking a vertex with index subscript i as a center and a set N in a template face are referred to i The vertex with an internal index j is denoted as v j Set N in a morphed face i With an internal index subscript jVertex is recorded as v' j ;e' ij Is the vertex v 'on the morphed face' i To v 'to vertex' j E is a side of ij As a template of the vertex v on the face i To the vertex v j The edge of (1); c. C ij The cosine Laplace weight of the template face;
after the deformation gradient of the vertex is obtained, T is decomposed through matrix polar decomposition i Decomposition into R i S i Wherein R is i Representing the vertex v i To vertex v' i Rotation matrix component of deformation gradient, S i Representing the vertex v i To v 'to vertex' i A scaling matrix component of the deformation gradient;
rotating the matrix R by matrix operation i Equivalent is expressed as exp (logR) i ) Then, the deformation representation model from the template face to the deformed face is written as:
f n ={logR i ;S i -I|i=1,...,N v }
wherein, I is a unit array, and the introduction aims at constructing a coordinate system, V n ={v' i |i=1,...,N v The vertex set on the three-dimensional exaggerated face model is used as the vertex set; the purpose of logR is to make the operation R on the rotation matrix i R j Can be expressed as exp (logR) i +logR j ) This allows the multiplication to be simplified to an addition.
Obtaining a deformation representation set F = { F based on the template human face on a three-dimensional exaggerated face model data set by coding deformation from all deformed human faces to the template human face n I N =1,.. N }, where N is the number of elements in the set represented by the deformation, that is, the number of three-dimensional data in the face data set. Illustratively, the number of elements in F is 7800, i.e., N =7800.
The set of distortion representations F is recorded as a matrix of size N M, the nth row of the matrix represents the distortion representation F of the exaggerated face with the number N based on the template face n (ii) a For each f n Its ith vertex v' i Is expressed as a deformation of { logR i ;S i -I } is recorded as a vector with one dimension of 9, so M = N v X 9, same as above, N v The total number of the vertexes on the face three-dimensional mesh.
As shown in fig. 2, the convolutional neural network includes an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector, which is split into two parts, one part being a K1-dimensional vector, i.e. a camera projection parameter; the other part is a vector of K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2= K.
Illustratively, resNet34 may be used as an encoder and a 3-layer fully-connected neural network may be used as a decoder.
For example, the resolution of the input ironic portrait may be 224 × 224, K =216, K1=6, K2=210.
Based on the principle, in the training process, the three-dimensional face template model is used as a template face, the sarcasian portrait is input, the deformation gradient is obtained through the predicted rotation matrix component and the predicted scaling matrix component in the deformation representation model, the vertex coordinates of the three-dimensional exaggerated face model corresponding to the sarcasian portrait are predicted, and the two-dimensional key point coordinates are predicted by combining with the camera projection parameters output by the network. Then, a loss function can be constructed by using the marked two-dimensional key point coordinates and the three-dimensional exaggerated face model (real values) corresponding to the corresponding sarcasian portrait in the data set, and the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model predicted by the network approach to the real values in the data set through continuous training.
The preferred embodiment of network training is as follows:
for a ironic portrait, a deformation representation model can be obtained by a convolutional neural network, and is represented as:
Figure BDA0002459748670000051
wherein the content of the first and second substances,
Figure BDA0002459748670000052
representing predicted vertices v i To v 'to vertex' i Rotation matrix component of deformation gradient,/>
Figure BDA0002459748670000053
Representing predicted vertices v i To vertex v' i A scaling matrix component of the deformation gradient; marking/conjunction>
Figure BDA0002459748670000054
Figure BDA0002459748670000055
Denotes a vertex v 'with index subscript i on the predicted warped face' i And a vertex v with subscript i corresponding to the template face i A deformation gradient;
according to predicted deformation gradient
Figure BDA0002459748670000056
Predicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
Figure BDA0002459748670000061
wherein the content of the first and second substances,
Figure BDA0002459748670000062
for the predicted three-dimensional exaggerated face model, the vertex coordinate with index i is asserted>
Figure BDA0002459748670000063
Representing a set N of predicted faces i The internal index subscript is the vertex coordinate of j; solving the optimal problem is equivalent to solving a linear equation system to obtain the vertex coordinates of the three-dimensional exaggerated face:
Figure BDA0002459748670000064
/>
the camera projection parameters P are expressed as:
Figure BDA0002459748670000065
wherein->
Figure BDA0002459748670000066
Is a zoom parameter, is asserted>
Figure BDA0002459748670000067
Is a rotation matrix (derived from an Euler angle vector), ->
Figure BDA0002459748670000068
Is a translation parameter. As in the previous example, K1=6, then £ r>
Figure BDA0002459748670000069
Sequentially 1-dimensional, 3-dimensional and 2-dimensional vectors. According to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
Figure BDA00024597486700000610
wherein L' is a three-dimensional key point set selected from the predicted vertex set of the three-dimensional exaggerated face model;
Figure BDA00024597486700000611
is a two-dimensional keypoint set, and T is the total number of two-dimensional keypoints.
For example, the key points may be 68 key points including contours, eyebrows, eyes, nose, and mouth, or other forms of key points; corresponding three-dimensional key points can be selected from the three-dimensional key point set according to the selected key point form to form a set L'.
In the training process, the data in the data set is used as the true value (supervision information) in the training. Based on the input single ironic portrait, a deformation representation model f and a camera projection parameter P can be output through the convolution neural network constructed in the step 1 in combination with the method introduced in the step, and the predicted vertex coordinates of the three-dimensional model are obtained
Figure BDA00024597486700000612
And two-dimensional key point coordinates->
Figure BDA00024597486700000613
In the embodiment of the present invention, the loss function in the training phase includes three parts:
1) Vertex-based loss function E ver
Using the three-dimensional exaggerated face vertex coordinates of the corresponding ironic portrait in the dataset as the supervisory information, the corresponding penalty function is expressed as:
Figure BDA00024597486700000614
wherein the content of the first and second substances,
Figure BDA00024597486700000615
to index the vertex coordinates, v ', with subscript i in the predicted three-dimensional exaggerated face model' i The vertex coordinates with index i in the corresponding three-dimensional exaggerated face model in the dataset are indexed.
2) Loss function E based on two-dimensional key points lan
Using the corresponding two-dimensional keypoint coordinates in the dataset as supervision information, the corresponding loss function is expressed as:
Figure BDA0002459748670000071
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;
Figure BDA0002459748670000072
is a two-dimensional key point set, and T is the total number of the two-dimensional key points; />
Figure BDA0002459748670000073
For predicted two-dimensional turn-offKey point coordinates; q's' t And (4) correspondingly marking the coordinates of the two-dimensional key points in the data set.
3) Loss function E based on camera projection parameters srt
Since the loss value of the key point not only relates to the three-dimensional vertex coordinates, but also relates to the camera parameters, more supervision information is needed to individually constrain the camera parameters when training is started, and the corresponding loss function is expressed as:
Figure BDA0002459748670000074
wherein the content of the first and second substances,
Figure BDA0002459748670000075
is a zoom parameter, is asserted>
Figure BDA0002459748670000076
Is the rotation matrix, is greater than or equal to>
Figure BDA0002459748670000077
Is a translation parameter.
Finally, the loss function for the training phase is:
E=λ 1 E ver2 E lan3 E srt
wherein, { lambda ] k I k =1,2,3 is a weight parameter; illustratively, set λ 1 =1,λ 2 =0.00001,λ 3 =0.0001。
In the embodiment of the present invention, based on the PyTorch deep learning framework training model, supervised learning may be performed by reading multiple sets of data (for example, 32 sets) each time, and the training may be completed after training for multiple cycles (for example, 2000 cycles).
And 3, after training is finished, obtaining a corresponding deformation representation model and a corresponding camera projection parameter for the irony portrait painting, so as to predict the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model.
The processing mode of the test process and the training process is the same, the sarcasia portrait is input into the trained convolutional neural network, the deformation representation model and the camera projection parameters can be obtained, and therefore the vertex coordinates of the three-dimensional exaggerated face model (the three-dimensional exaggerated face model can be directly constructed due to the fact that the topological structure is known) and the two-dimensional key point coordinates are predicted.
Some examples of test results are given schematically in fig. 3; the first row is the input two-dimensional ironic portrait (224 x 224), the second row is the predicted three-dimensional exaggerated face model, and the third row is the image labeled with the predicted two-dimensional keypoints.
Compared with the traditional key point detection and three-dimensional reconstruction algorithm based on pictures, the scheme of the embodiment of the invention mainly has the following advantages:
1) By parameterizing the three-dimensional nonlinear deformation model, the expression capability of the convolutional neural network is enhanced by the algorithm, and the key point detection task based on the exaggerated human face is realized.
2) Through a convolutional neural network, the algorithm realizes a method for reconstructing a three-dimensional face model from a two-dimensional exaggerated face picture end to end.
3) Based on the established massive data training, the recognition and modeling accuracy of the algorithm model on the sarcasic portrait works of different styles and different writers is greatly improved compared with that of the traditional algorithm.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. A method for detecting key points and reconstructing three-dimensionally of ironic portrait painting is characterized by comprising the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training is finished, obtaining a corresponding deformation representation model and a corresponding camera projection parameter for the ironic portrait painting, so as to predict the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model;
the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure, namely the two models share the same vertex number and adjacency relation, and the vertex sequence is the same on different models; recording the set of the top points on the three-dimensional face template model as V, V = { V = i |i=1,...,N v V is formed by all the top points V on the single face three-dimensional data i Is formed by i is index subscript, N v Is the total number of vertices;
during training, a three-dimensional face template model is used as a template face, and a sarcasic portrait is input to obtain a deformation representation model f and a camera projection parameter P;
the deformation representation model is expressed as:
Figure FDA0004059769140000011
wherein the content of the first and second substances,
Figure FDA0004059769140000012
representing predicted vertices v i To the vertex v i ' the rotational matrix component of the deformation gradient, device for selecting or keeping>
Figure FDA0004059769140000013
Representing predicted vertices v i To the vertex v i ' scaling matrix component of deformation gradient; marking/combining>
Figure FDA0004059769140000014
Figure FDA0004059769140000015
Denotes a vertex v with index i on the predicted warped face i ' and the vertex v with subscript i corresponding to the template face i A deformation gradient;
according to predicted deformation gradient
Figure FDA0004059769140000016
Predicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
Figure FDA0004059769140000017
wherein the content of the first and second substances,
Figure FDA0004059769140000018
for the predicted vertex coordinate with index i in the three-dimensional exaggerated face model, the method comprises>
Figure FDA0004059769140000019
Representing a set N of predicted faces i The internal index subscript is the vertex coordinate of j;
the camera projection parameters P are expressed as:
Figure FDA00040597691400000110
wherein->
Figure FDA00040597691400000111
Is a zoom parameter, <' > is selected>
Figure FDA00040597691400000112
Is the rotation matrix, is greater than or equal to>
Figure FDA00040597691400000113
Is a translation parameter; according to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
Figure FDA0004059769140000021
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;
Figure FDA0004059769140000022
is a two-dimensional keypoint set, and T is the total number of two-dimensional keypoints.
2. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1, wherein said convolutional neural network comprises an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector, which is split into two parts, one part being a K1-dimensional vector, i.e. a camera projection parameter; the other part is a vector of K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2= K.
3. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1 or 2, characterized in that the loss function of the training phase is:
E=λ 1 E ver2 E lan3 E srt
wherein, { lambda ] k I k =1,2,3 is a weight parameter;
E ver for vertex-based loss functions:
Figure FDA0004059769140000023
wherein the content of the first and second substances,
Figure FDA0004059769140000024
for the predicted vertex coordinates with index i, v, in the three-dimensional exaggerated face model i ' indexing vertex coordinates with index subscript i in corresponding three-dimensional exaggerated face model in data set; n is a radical of v Represents the total number of vertices;
E lan is a loss function based on two-dimensional key points:
Figure FDA0004059769140000025
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;
Figure FDA0004059769140000026
is a two-dimensional key point set, and T is the total number of the two-dimensional key points; />
Figure FDA0004059769140000027
The predicted two-dimensional key point coordinates are obtained; q. q.s t ' are the corresponding marked two-dimensional key point coordinates in the data set;
E srt for the loss function based on camera projection parameters:
Figure FDA0004059769140000028
wherein the content of the first and second substances,
Figure FDA0004059769140000029
is a zoom parameter, is asserted>
Figure FDA00040597691400000210
Is the rotation matrix, is greater than or equal to>
Figure FDA00040597691400000211
Is a translation parameter. />
CN202010316895.5A 2020-04-21 2020-04-21 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting Active CN111524226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010316895.5A CN111524226B (en) 2020-04-21 2020-04-21 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010316895.5A CN111524226B (en) 2020-04-21 2020-04-21 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting

Publications (2)

Publication Number Publication Date
CN111524226A CN111524226A (en) 2020-08-11
CN111524226B true CN111524226B (en) 2023-04-18

Family

ID=71903414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010316895.5A Active CN111524226B (en) 2020-04-21 2020-04-21 Method for detecting key point and three-dimensional reconstruction of ironic portrait painting

Country Status (1)

Country Link
CN (1) CN111524226B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308957B (en) * 2020-08-14 2022-04-26 浙江大学 Optimal fat and thin face portrait image automatic generation method based on deep learning
CN112700524B (en) * 2021-03-25 2021-07-02 江苏原力数字科技股份有限公司 3D character facial expression animation real-time generation method based on deep learning
CN113129347B (en) * 2021-04-26 2023-12-12 南京大学 Self-supervision single-view three-dimensional hairline model reconstruction method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1074271A (en) * 1996-08-30 1998-03-17 Nippon Telegr & Teleph Corp <Ntt> Method and device for preparing three-dimensional portrait
CN101751689A (en) * 2009-09-28 2010-06-23 中国科学院自动化研究所 Three-dimensional facial reconstruction method
CN108242074A (en) * 2018-01-02 2018-07-03 中国科学技术大学 A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model
WO2020001082A1 (en) * 2018-06-30 2020-01-02 东南大学 Face attribute analysis method based on transfer learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10755477B2 (en) * 2018-10-23 2020-08-25 Hangzhou Qu Wei Technology Co., Ltd. Real-time face 3D reconstruction system and method on mobile device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1074271A (en) * 1996-08-30 1998-03-17 Nippon Telegr & Teleph Corp <Ntt> Method and device for preparing three-dimensional portrait
CN101751689A (en) * 2009-09-28 2010-06-23 中国科学院自动化研究所 Three-dimensional facial reconstruction method
CN108242074A (en) * 2018-01-02 2018-07-03 中国科学技术大学 A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
WO2020001082A1 (en) * 2018-06-30 2020-01-02 东南大学 Face attribute analysis method based on transfer learning
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王海君 ; 杨士颖 ; 王雁飞 ; .基于NMF和LS-SVM的肖像漫画生成算法研究.电视技术.2013,(19),全文. *
董肖莉 ; 李卫军 ; 宁欣 ; 张丽萍 ; 路亚旋 ; .应用三角形坐标系的风格化肖像生成算法.西安交通大学学报.2018,(04),全文. *

Also Published As

Publication number Publication date
CN111524226A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
Gafni et al. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction
Shen et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis
CN111524226B (en) Method for detecting key point and three-dimensional reconstruction of ironic portrait painting
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
Pighin et al. Modeling and animating realistic faces from images
Liao et al. Automatic caricature generation by analyzing facial features
Hu et al. Robust hair capture using simulated examples
Zhuang et al. Dreameditor: Text-driven 3d scene editing with neural fields
Shamai et al. Synthesizing facial photometries and corresponding geometries using generative adversarial networks
Shen et al. Deepsketchhair: Deep sketch-based 3d hair modeling
CN108242074B (en) Three-dimensional exaggeration face generation method based on single irony portrait painting
Yu et al. Content-aware photo collage using circle packing
Zhang et al. Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks
Clarke et al. Automatic generation of 3D caricatures based on artistic deformation styles
Lv et al. 3D facial expression modeling based on facial landmarks in single image
Bao et al. A survey of image-based techniques for hair modeling
Shi et al. Geometric granularity aware pixel-to-mesh
CN110717978A (en) Three-dimensional head reconstruction method based on single image
Jung et al. Deep deformable 3d caricatures with learned shape control
Kao et al. Towards 3d face reconstruction in perspective projection: Estimating 6dof face pose from monocular image
Sun et al. Cgof++: Controllable 3d face synthesis with conditional generative occupancy fields
Xi et al. A data-driven approach to human-body cloning using a segmented body database
Zhang et al. Dyn-e: Local appearance editing of dynamic neural radiance fields
Du et al. SAniHead: Sketching animal-like 3D character heads using a view-surface collaborative mesh generative network
Yu et al. Mean value coordinates–based caricature and expression synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant