CN111524226B - Method for detecting key point and three-dimensional reconstruction of ironic portrait painting - Google Patents
Method for detecting key point and three-dimensional reconstruction of ironic portrait painting Download PDFInfo
- Publication number
- CN111524226B CN111524226B CN202010316895.5A CN202010316895A CN111524226B CN 111524226 B CN111524226 B CN 111524226B CN 202010316895 A CN202010316895 A CN 202010316895A CN 111524226 B CN111524226 B CN 111524226B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- face
- model
- vertex
- exaggerated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a method for detecting key points and reconstructing three-dimensionally of ironic portrait painting, which comprises the following steps: constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; and performing network training by using the data set, and outputting a deformation representation model and camera projection parameters corresponding to the input sarcasian portrait painting through a convolutional neural network so as to predict the vertex coordinates and two-dimensional key point coordinates of the three-dimensional exaggerated face model. The method liberates the process of labeling key points for the sarcasian portrait, and with the help of a new face deformation representation and a huge data set, the trained convolutional neural network can directly reconstruct an exaggerated three-dimensional face through a deformation representation model obtained through prediction, and two-dimensional key point coordinates are obtained through camera projection parameters obtained through simultaneous prediction.
Description
Technical Field
The invention relates to the technical field of image processing and three-dimensional modeling, in particular to a method for detecting key points and reconstructing three-dimensional sarcasia portrait painting.
Background
Irony portraits are an artistic expression relying on two-dimensional images and three-dimensional models. The human face image generation method creates a visual effect with humorous colors by exaggerating certain features or details of the human face, and is often used in life scenes such as movies, advertisements and social contact. The artistic expression form is also proved to be capable of effectively improving the accuracy of face recognition in the fields of computer vision, cognitive psychology and the like. Because of its potential research prospects and wide use, the issues related to ironic portraits are attracting increasing numbers of researchers and enterprises to invest in.
Key point detection technology related to ironic portraits: compared with a normal face, the sarcasian portrait has the characteristics of exaggeration and diversity, so that the difficulty in identifying key points is high. Thus, there are few automatic key point detection algorithms on ironic portraits. On the other hand, many research topics of ironic portraits rely on key points, which are not only boring, but also time-consuming and laborious to label manually. Therefore, it is a matter of great significance to develop a key point detection algorithm related to the irony portrait, which not only fills the blank of research in this respect, but also helps the development of related topics.
Most of the current popular normal face key point detection algorithms are data-driven methods and depend on the design of a deep neural network structure. Such algorithms generally extract visual features of a human face or statistical features of pixels of a human face image from a single picture and return the positions of key points, and the extraction method comprises knowledge-based and algebraic feature-based methods. And the exaggerated face is rooted in the normal face, which needs to satisfy basic features of one face, such as the need to have a specific number of eyes, mouth, nose, ears, and the like. However, the exaggerated face usually exaggerates the features based on the normal face, so that a certain feature is greatly different among different pictures, such as the distribution of key points around the eyes. Due to the exaggerated differentiation and diversification of features, there are relatively few key point detection algorithms related to the sarcasm portrait.
Three-dimensional reconstruction techniques on ironic portraits: at present, two main methods are used for obtaining three-dimensional exaggerated face models: manual modeling and reconstruction based on deformation algorithms. Manual modeling, which is the earliest three-dimensional modeling means, is still widely used to generate an exaggerated human face three-dimensional model. But the process typically requires a person trained in specialized learning to do so on specialized modeling software such as AutoDesk Maya. Although having the advantage of high accuracy, it is more popular to obtain three-dimensional exaggerated face models based on morphing algorithms because it requires a lot of time and manpower. However, although the morphing algorithm has the advantage of automatic generation, the generated model is often limited in exaggerated style, and the three-dimensional exaggerated face with different shapes obtained by manual modeling is not diverse and not accurate enough. Moreover, most of the existing transformation algorithms depend on key points, so that time and labor are needed to label the key points, and once the label is inaccurate, the generated model is possibly not matched with the original two-dimensional ironic portrait.
The traditional method for generating a normal human face three-dimensional model based on images usually comprises the steps of firstly constructing a three-dimensional model of some people through a camera acquisition and other ways, then constructing a corresponding human face database through a statistical or dimension reduction based method, establishing a human face parameterized model (comprising a linear model and a nonlinear model), further parameterizing the complex three-dimensional human face into a low-dimensional parameter space, and reconstructing the corresponding normal human face by obtaining coordinate representation in the low-dimensional space. The traditional exaggerated face generation thought is characterized in that two-dimensional key points are marked on a single picture, and a corresponding exaggerated face is generated through key point constraint and a constructed parameterized model. The method is very dependent on key points, so that not only is time spent on the labeling task, but also the reconstructed three-dimensional model is directly influenced once the labeling accuracy is not high.
Disclosure of Invention
The invention aims to provide a method for detecting key points and reconstructing three-dimensional sarcasia face, which can automatically and quickly detect key points of an exaggerated face and generate a corresponding three-dimensional model, and has important practical application value in the fields of face recognition, animation generation, expression migration, AR/VR and the like.
The purpose of the invention is realized by the following technical scheme:
a method for detecting key points and reconstructing three-dimensional of ironic portrait painting comprises the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training, corresponding deformation representation model and camera projection parameters are obtained for the ironic portrait painting input, and therefore the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model are predicted.
It can be seen from the above technical solutions provided by the present invention that 1) the deformation on the face constrained by the deformation representation enables the generated face to still have the properties of the face, and the strong deformation representation model can also generate the face with an exaggerated style. 2) The human face deformation model and the projection parameters of the camera can be directly regressed from a single picture through a convolution neural network structure. 3) The two act together to obtain a more accurate three-dimensional exaggerated face model. Meanwhile, more accurate two-dimensional key point coordinates are obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting key points and reconstructing ironically portrait painting according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a test result performed by using a trained convolutional neural network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In the irony portrait face recognition field, the normal face-based keypoint detection algorithm is often not accurate enough because of the large distribution difference of some facial features among different pictures, and a large amount of time is still needed to adjust the positions of the keypoints after detection. In the field of irony portrait three-dimensional reconstruction, the traditional three-dimensional reconstruction method has insufficient expression capability of a base model, so that the exaggerated degree of a reconstructed face model is insufficient; some reconstruction algorithms based on optimization methods and key point constraints depend on the labeling of key points excessively, and once the labeling is not accurate enough, the generated three-dimensional model and the two-dimensional picture have large deviation. To this end, an embodiment of the present invention provides a method for detecting key points and reconstructing ironically portrait painting in three dimensions, as shown in fig. 1, which mainly includes the following steps:
The method mainly comprises the steps of constructing a network and collecting data; because the data set has the diversity of acquisition modes and the possibility of different data set processing, the three-dimensional exaggerated face model in the data set is required to have the same topological structure as the three-dimensional face template model, namely, different data share the same vertex number and adjacency relation, and the vertex sequence is the same on different models; in addition, the acquired face data is set to be sufficiently diverse.
Those skilled in the art will appreciate that the above-described normal face data set satisfying such conditions may be obtained by conventional means.
First, the principle of calculation of the deformation expression model will be described.
Recording the set of the top points on the three-dimensional face template model as V, V = { V = i |i=1,...,N v V is formed by all the top points V on the single face three-dimensional data i Wherein i is an index subscript, N v Is the total number of vertices; the obtained data set meets the condition that the number of vertexes and the sequence of the vertexes of the face data are the same, and meanwhile, the adjacency relation is also the same. Therefore, knowing the vertex set V and some index i, it can know which vertex is referred to.
Taking the three-dimensional face template model as a template face, and taking a three-dimensional exaggerated face model corresponding to the ironic portrait picture as a deformed face; constructing a vertex v 'with index i on deformed human face' i And a vertex v with index i on template face i Deformation gradient T between i Minimizing the energy function to solve for T i :
Wherein N is i A subscript set of 1-neighborhood vertexes taking a vertex with index subscript i as a center and a set N in a template face are referred to i The vertex with an internal index j is denoted as v j Set N in a morphed face i With an internal index subscript jVertex is recorded as v' j ;e' ij Is the vertex v 'on the morphed face' i To v 'to vertex' j E is a side of ij As a template of the vertex v on the face i To the vertex v j The edge of (1); c. C ij The cosine Laplace weight of the template face;
after the deformation gradient of the vertex is obtained, T is decomposed through matrix polar decomposition i Decomposition into R i S i Wherein R is i Representing the vertex v i To vertex v' i Rotation matrix component of deformation gradient, S i Representing the vertex v i To v 'to vertex' i A scaling matrix component of the deformation gradient;
rotating the matrix R by matrix operation i Equivalent is expressed as exp (logR) i ) Then, the deformation representation model from the template face to the deformed face is written as:
f n ={logR i ;S i -I|i=1,...,N v }
wherein, I is a unit array, and the introduction aims at constructing a coordinate system, V n ={v' i |i=1,...,N v The vertex set on the three-dimensional exaggerated face model is used as the vertex set; the purpose of logR is to make the operation R on the rotation matrix i R j Can be expressed as exp (logR) i +logR j ) This allows the multiplication to be simplified to an addition.
Obtaining a deformation representation set F = { F based on the template human face on a three-dimensional exaggerated face model data set by coding deformation from all deformed human faces to the template human face n I N =1,.. N }, where N is the number of elements in the set represented by the deformation, that is, the number of three-dimensional data in the face data set. Illustratively, the number of elements in F is 7800, i.e., N =7800.
The set of distortion representations F is recorded as a matrix of size N M, the nth row of the matrix represents the distortion representation F of the exaggerated face with the number N based on the template face n (ii) a For each f n Its ith vertex v' i Is expressed as a deformation of { logR i ;S i -I } is recorded as a vector with one dimension of 9, so M = N v X 9, same as above, N v The total number of the vertexes on the face three-dimensional mesh.
As shown in fig. 2, the convolutional neural network includes an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector, which is split into two parts, one part being a K1-dimensional vector, i.e. a camera projection parameter; the other part is a vector of K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2= K.
Illustratively, resNet34 may be used as an encoder and a 3-layer fully-connected neural network may be used as a decoder.
For example, the resolution of the input ironic portrait may be 224 × 224, K =216, K1=6, K2=210.
Based on the principle, in the training process, the three-dimensional face template model is used as a template face, the sarcasian portrait is input, the deformation gradient is obtained through the predicted rotation matrix component and the predicted scaling matrix component in the deformation representation model, the vertex coordinates of the three-dimensional exaggerated face model corresponding to the sarcasian portrait are predicted, and the two-dimensional key point coordinates are predicted by combining with the camera projection parameters output by the network. Then, a loss function can be constructed by using the marked two-dimensional key point coordinates and the three-dimensional exaggerated face model (real values) corresponding to the corresponding sarcasian portrait in the data set, and the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model predicted by the network approach to the real values in the data set through continuous training.
The preferred embodiment of network training is as follows:
for a ironic portrait, a deformation representation model can be obtained by a convolutional neural network, and is represented as:
wherein the content of the first and second substances,representing predicted vertices v i To v 'to vertex' i Rotation matrix component of deformation gradient,/>Representing predicted vertices v i To vertex v' i A scaling matrix component of the deformation gradient; marking/conjunction> Denotes a vertex v 'with index subscript i on the predicted warped face' i And a vertex v with subscript i corresponding to the template face i A deformation gradient;
according to predicted deformation gradientPredicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
wherein the content of the first and second substances,for the predicted three-dimensional exaggerated face model, the vertex coordinate with index i is asserted>Representing a set N of predicted faces i The internal index subscript is the vertex coordinate of j; solving the optimal problem is equivalent to solving a linear equation system to obtain the vertex coordinates of the three-dimensional exaggerated face:
the camera projection parameters P are expressed as:wherein->Is a zoom parameter, is asserted>Is a rotation matrix (derived from an Euler angle vector), ->Is a translation parameter. As in the previous example, K1=6, then £ r>Sequentially 1-dimensional, 3-dimensional and 2-dimensional vectors. According to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
wherein L' is a three-dimensional key point set selected from the predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional keypoint set, and T is the total number of two-dimensional keypoints.
For example, the key points may be 68 key points including contours, eyebrows, eyes, nose, and mouth, or other forms of key points; corresponding three-dimensional key points can be selected from the three-dimensional key point set according to the selected key point form to form a set L'.
In the training process, the data in the data set is used as the true value (supervision information) in the training. Based on the input single ironic portrait, a deformation representation model f and a camera projection parameter P can be output through the convolution neural network constructed in the step 1 in combination with the method introduced in the step, and the predicted vertex coordinates of the three-dimensional model are obtainedAnd two-dimensional key point coordinates->
In the embodiment of the present invention, the loss function in the training phase includes three parts:
1) Vertex-based loss function E ver 。
Using the three-dimensional exaggerated face vertex coordinates of the corresponding ironic portrait in the dataset as the supervisory information, the corresponding penalty function is expressed as:
wherein the content of the first and second substances,to index the vertex coordinates, v ', with subscript i in the predicted three-dimensional exaggerated face model' i The vertex coordinates with index i in the corresponding three-dimensional exaggerated face model in the dataset are indexed.
2) Loss function E based on two-dimensional key points lan 。
Using the corresponding two-dimensional keypoint coordinates in the dataset as supervision information, the corresponding loss function is expressed as:
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional key point set, and T is the total number of the two-dimensional key points; />For predicted two-dimensional turn-offKey point coordinates; q's' t And (4) correspondingly marking the coordinates of the two-dimensional key points in the data set.
3) Loss function E based on camera projection parameters srt
Since the loss value of the key point not only relates to the three-dimensional vertex coordinates, but also relates to the camera parameters, more supervision information is needed to individually constrain the camera parameters when training is started, and the corresponding loss function is expressed as:
wherein the content of the first and second substances,is a zoom parameter, is asserted>Is the rotation matrix, is greater than or equal to>Is a translation parameter.
Finally, the loss function for the training phase is:
E=λ 1 E ver +λ 2 E lan +λ 3 E srt
wherein, { lambda ] k I k =1,2,3 is a weight parameter; illustratively, set λ 1 =1,λ 2 =0.00001,λ 3 =0.0001。
In the embodiment of the present invention, based on the PyTorch deep learning framework training model, supervised learning may be performed by reading multiple sets of data (for example, 32 sets) each time, and the training may be completed after training for multiple cycles (for example, 2000 cycles).
And 3, after training is finished, obtaining a corresponding deformation representation model and a corresponding camera projection parameter for the irony portrait painting, so as to predict the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model.
The processing mode of the test process and the training process is the same, the sarcasia portrait is input into the trained convolutional neural network, the deformation representation model and the camera projection parameters can be obtained, and therefore the vertex coordinates of the three-dimensional exaggerated face model (the three-dimensional exaggerated face model can be directly constructed due to the fact that the topological structure is known) and the two-dimensional key point coordinates are predicted.
Some examples of test results are given schematically in fig. 3; the first row is the input two-dimensional ironic portrait (224 x 224), the second row is the predicted three-dimensional exaggerated face model, and the third row is the image labeled with the predicted two-dimensional keypoints.
Compared with the traditional key point detection and three-dimensional reconstruction algorithm based on pictures, the scheme of the embodiment of the invention mainly has the following advantages:
1) By parameterizing the three-dimensional nonlinear deformation model, the expression capability of the convolutional neural network is enhanced by the algorithm, and the key point detection task based on the exaggerated human face is realized.
2) Through a convolutional neural network, the algorithm realizes a method for reconstructing a three-dimensional face model from a two-dimensional exaggerated face picture end to end.
3) Based on the established massive data training, the recognition and modeling accuracy of the algorithm model on the sarcasic portrait works of different styles and different writers is greatly improved compared with that of the traditional algorithm.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (3)
1. A method for detecting key points and reconstructing three-dimensionally of ironic portrait painting is characterized by comprising the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training is finished, obtaining a corresponding deformation representation model and a corresponding camera projection parameter for the ironic portrait painting, so as to predict the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model;
the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure, namely the two models share the same vertex number and adjacency relation, and the vertex sequence is the same on different models; recording the set of the top points on the three-dimensional face template model as V, V = { V = i |i=1,...,N v V is formed by all the top points V on the single face three-dimensional data i Is formed by i is index subscript, N v Is the total number of vertices;
during training, a three-dimensional face template model is used as a template face, and a sarcasic portrait is input to obtain a deformation representation model f and a camera projection parameter P;
the deformation representation model is expressed as:
wherein the content of the first and second substances,representing predicted vertices v i To the vertex v i ' the rotational matrix component of the deformation gradient, device for selecting or keeping>Representing predicted vertices v i To the vertex v i ' scaling matrix component of deformation gradient; marking/combining> Denotes a vertex v with index i on the predicted warped face i ' and the vertex v with subscript i corresponding to the template face i A deformation gradient;
according to predicted deformation gradientPredicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
wherein the content of the first and second substances,for the predicted vertex coordinate with index i in the three-dimensional exaggerated face model, the method comprises>Representing a set N of predicted faces i The internal index subscript is the vertex coordinate of j;
the camera projection parameters P are expressed as:wherein->Is a zoom parameter, <' > is selected>Is the rotation matrix, is greater than or equal to>Is a translation parameter; according to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
2. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1, wherein said convolutional neural network comprises an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector, which is split into two parts, one part being a K1-dimensional vector, i.e. a camera projection parameter; the other part is a vector of K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2= K.
3. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1 or 2, characterized in that the loss function of the training phase is:
E=λ 1 E ver +λ 2 E lan +λ 3 E srt
wherein, { lambda ] k I k =1,2,3 is a weight parameter;
E ver for vertex-based loss functions:
wherein the content of the first and second substances,for the predicted vertex coordinates with index i, v, in the three-dimensional exaggerated face model i ' indexing vertex coordinates with index subscript i in corresponding three-dimensional exaggerated face model in data set; n is a radical of v Represents the total number of vertices;
E lan is a loss function based on two-dimensional key points:
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional key point set, and T is the total number of the two-dimensional key points; />The predicted two-dimensional key point coordinates are obtained; q. q.s t ' are the corresponding marked two-dimensional key point coordinates in the data set;
E srt for the loss function based on camera projection parameters:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010316895.5A CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010316895.5A CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111524226A CN111524226A (en) | 2020-08-11 |
CN111524226B true CN111524226B (en) | 2023-04-18 |
Family
ID=71903414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010316895.5A Active CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111524226B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308957B (en) * | 2020-08-14 | 2022-04-26 | 浙江大学 | Optimal fat and thin face portrait image automatic generation method based on deep learning |
CN112700524B (en) * | 2021-03-25 | 2021-07-02 | 江苏原力数字科技股份有限公司 | 3D character facial expression animation real-time generation method based on deep learning |
CN113129347B (en) * | 2021-04-26 | 2023-12-12 | 南京大学 | Self-supervision single-view three-dimensional hairline model reconstruction method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1074271A (en) * | 1996-08-30 | 1998-03-17 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for preparing three-dimensional portrait |
CN101751689A (en) * | 2009-09-28 | 2010-06-23 | 中国科学院自动化研究所 | Three-dimensional facial reconstruction method |
CN108242074A (en) * | 2018-01-02 | 2018-07-03 | 中国科学技术大学 | A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
CN109508678A (en) * | 2018-11-16 | 2019-03-22 | 广州市百果园信息技术有限公司 | Training method, the detection method and device of face key point of Face datection model |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10755477B2 (en) * | 2018-10-23 | 2020-08-25 | Hangzhou Qu Wei Technology Co., Ltd. | Real-time face 3D reconstruction system and method on mobile device |
-
2020
- 2020-04-21 CN CN202010316895.5A patent/CN111524226B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1074271A (en) * | 1996-08-30 | 1998-03-17 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for preparing three-dimensional portrait |
CN101751689A (en) * | 2009-09-28 | 2010-06-23 | 中国科学院自动化研究所 | Three-dimensional facial reconstruction method |
CN108242074A (en) * | 2018-01-02 | 2018-07-03 | 中国科学技术大学 | A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
CN109508678A (en) * | 2018-11-16 | 2019-03-22 | 广州市百果园信息技术有限公司 | Training method, the detection method and device of face key point of Face datection model |
Non-Patent Citations (2)
Title |
---|
王海君 ; 杨士颖 ; 王雁飞 ; .基于NMF和LS-SVM的肖像漫画生成算法研究.电视技术.2013,(19),全文. * |
董肖莉 ; 李卫军 ; 宁欣 ; 张丽萍 ; 路亚旋 ; .应用三角形坐标系的风格化肖像生成算法.西安交通大学学报.2018,(04),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111524226A (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gafni et al. | Dynamic neural radiance fields for monocular 4d facial avatar reconstruction | |
Shen et al. | Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis | |
CN111524226B (en) | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting | |
CN111325851B (en) | Image processing method and device, electronic equipment and computer readable storage medium | |
Pighin et al. | Modeling and animating realistic faces from images | |
Liao et al. | Automatic caricature generation by analyzing facial features | |
Hu et al. | Robust hair capture using simulated examples | |
Zhuang et al. | Dreameditor: Text-driven 3d scene editing with neural fields | |
Shamai et al. | Synthesizing facial photometries and corresponding geometries using generative adversarial networks | |
Shen et al. | Deepsketchhair: Deep sketch-based 3d hair modeling | |
CN108242074B (en) | Three-dimensional exaggeration face generation method based on single irony portrait painting | |
Yu et al. | Content-aware photo collage using circle packing | |
Zhang et al. | Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks | |
Clarke et al. | Automatic generation of 3D caricatures based on artistic deformation styles | |
Lv et al. | 3D facial expression modeling based on facial landmarks in single image | |
Bao et al. | A survey of image-based techniques for hair modeling | |
Shi et al. | Geometric granularity aware pixel-to-mesh | |
CN110717978A (en) | Three-dimensional head reconstruction method based on single image | |
Jung et al. | Deep deformable 3d caricatures with learned shape control | |
Kao et al. | Towards 3d face reconstruction in perspective projection: Estimating 6dof face pose from monocular image | |
Sun et al. | Cgof++: Controllable 3d face synthesis with conditional generative occupancy fields | |
Xi et al. | A data-driven approach to human-body cloning using a segmented body database | |
Zhang et al. | Dyn-e: Local appearance editing of dynamic neural radiance fields | |
Du et al. | SAniHead: Sketching animal-like 3D character heads using a view-surface collaborative mesh generative network | |
Yu et al. | Mean value coordinates–based caricature and expression synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |