CN111524226A - Method for detecting key point and three-dimensional reconstruction of ironic portrait painting - Google Patents
Method for detecting key point and three-dimensional reconstruction of ironic portrait painting Download PDFInfo
- Publication number
- CN111524226A CN111524226A CN202010316895.5A CN202010316895A CN111524226A CN 111524226 A CN111524226 A CN 111524226A CN 202010316895 A CN202010316895 A CN 202010316895A CN 111524226 A CN111524226 A CN 111524226A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- face
- model
- vertex
- exaggerated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for detecting key points and reconstructing three-dimensionally of ironic portrait painting, which comprises the following steps: constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; and performing network training by using the data set, and outputting a deformation representation model and camera projection parameters corresponding to the input sarcasian portrait painting through a convolutional neural network so as to predict the vertex coordinates and two-dimensional key point coordinates of the three-dimensional exaggerated face model. The method liberates the process of labeling key points for the sarcasian portrait, and with the help of a new face deformation representation and a huge data set, the trained convolutional neural network can directly reconstruct an exaggerated three-dimensional face through a deformation representation model obtained through prediction, and two-dimensional key point coordinates are obtained through camera projection parameters obtained through simultaneous prediction.
Description
Technical Field
The invention relates to the technical field of image processing technology and three-dimensional modeling, in particular to a method for detecting key points and reconstructing three-dimensional sarcasia portrait painting.
Background
Irony portraits are an artistic expression relying on two-dimensional images and three-dimensional models. The human face image generation method creates a visual effect with humorous colors by exaggerating certain characteristics or details of the human face, and is often used in life scenes such as movies, advertisements and social contact. The artistic expression is also proved to be capable of effectively improving the accuracy of face recognition in the fields of computer vision, cognitive psychology and the like. Because of its potential research prospects and wide use, the issues related to ironic portraits are attracting increasing numbers of researchers and enterprises to invest in.
Key point detection techniques on ironic portraits: compared with a normal face, the sarcasian portrait has the characteristics of exaggeration and diversity, so that the difficulty in identifying key points is high. Thus, there are few automatic key point detection algorithms on ironic portraits. On the other hand, many research topics of ironic portraits rely on key points, which are not only boring, but also time-consuming and laborious to label manually. Therefore, it is a matter of great significance to develop a key point detection algorithm related to the irony portrait, which not only fills the blank of research in this respect, but also helps the development of related topics.
Most of the current popular normal face key point detection algorithms are data-driven methods and depend on the design of a deep neural network structure. Such algorithms generally extract visual features of a human face or statistical features of pixels of a human face image from a single picture and return the positions of key points, and the extraction method comprises knowledge-based and algebraic feature-based methods. And the exaggerated face is rooted in the normal face, which needs to satisfy basic features of one face, such as the need to have a specific number of eyes, mouth, nose, ears, and the like. But the exaggerated face usually exaggerates the features based on the normal face, so that a certain feature is more different among different pictures, such as the distribution of key points around the eyes. Due to the exaggerated differentiation and diversification of features, there are few key point detection algorithms related to the sarcasm portrait.
Three-dimensional reconstruction techniques on ironic portraits: at present, two main methods are used for obtaining three-dimensional exaggerated face models: manual modeling and reconstruction based on deformation algorithms. Manual modeling, which is the earliest three-dimensional modeling means, is still widely used to generate an exaggerated human face three-dimensional model. But the process typically requires a person trained in specialized learning to do so on specialized modeling software such as AutoDesk Maya. Although having the advantage of high accuracy, it is more popular to obtain three-dimensional exaggerated face models based on morphing algorithms because it requires a lot of time and manpower. However, although the morphing algorithm has the advantage of automatic generation, the generated model is often limited in exaggerated style, and the three-dimensional exaggerated face with different shapes obtained by manual modeling is not diverse and not accurate enough. Moreover, most of the existing transformation algorithms depend on key points, so that time and labor are needed to label the key points, and once the label is inaccurate, the generated model is possibly not matched with the original two-dimensional ironic portrait.
The traditional method for generating a normal face three-dimensional model based on an image usually constructs a three-dimensional model of some people through a camera acquisition and other ways, then constructs a corresponding face database through a statistical or dimension reduction based method, and establishes a parameterized model (including a linear model and a nonlinear model) of the face, so as to parameterize a complex three-dimensional face into a low-dimensional parameter space, and the corresponding normal face can be reconstructed by obtaining coordinate representation in the low-dimensional space. From the idea, the conventional exaggerated face generation idea firstly labels two-dimensional key points on a single picture, and generates a corresponding exaggerated face through key point constraint and a constructed parameterized model. The method is very dependent on key points, so that not only is time spent on the labeling task, but also the reconstructed three-dimensional model is directly influenced once the labeling accuracy is not high.
Disclosure of Invention
The invention aims to provide a method for detecting key points and reconstructing three-dimensional sarcasia face, which can automatically and quickly detect key points of an exaggerated face and generate a corresponding three-dimensional model, and has important practical application value in the fields of face recognition, animation generation, expression migration, AR/VR and the like.
The purpose of the invention is realized by the following technical scheme:
a method for detecting key points and reconstructing three-dimensional of ironic portrait painting comprises the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training, corresponding deformation representation model and camera projection parameters are obtained for the ironic portrait painting input, and therefore the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model are predicted.
It can be seen from the above technical solutions provided by the present invention that 1) the deformation on the face constrained by the deformation representation enables the generated face to still have the properties of the face, and the strong deformation representation model can also generate the face with an exaggerated style. 2) The human face deformation model and the projection parameters of the camera can be directly regressed from a single picture through a convolutional neural network structure. 3) The two act together to obtain a more accurate three-dimensional exaggerated face model. Meanwhile, more accurate two-dimensional key point coordinates are obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting key points and reconstructing a irony portrait according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a test result performed by using a trained convolutional neural network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In the irony portrait face recognition field, the normal face-based keypoint detection algorithm is often not accurate enough because of the large distribution difference of some facial features among different pictures, and a large amount of time is still needed to adjust the positions of the keypoints after detection. In the field of irony portrait three-dimensional reconstruction, the traditional three-dimensional reconstruction method has insufficient expression capability of a base model, so that the exaggerated degree of a reconstructed face model is insufficient; some reconstruction algorithms based on optimization methods and key point constraints depend on the labeling of key points excessively, and once the labeling is not accurate enough, the generated three-dimensional model and the two-dimensional picture have large deviation. To this end, an embodiment of the present invention provides a method for detecting key points and reconstructing a irony portrait, as shown in fig. 1, which mainly includes the following steps:
The method mainly comprises the steps of constructing a network and collecting data; because the data set has the diversity of acquisition modes and the possibility of different data set processing, the three-dimensional exaggerated face model in the data set is required to have the same topological structure as the three-dimensional face template model, namely, different data share the same vertex number and adjacency relation, and the vertex sequence is the same on different models; in addition, the acquired face data is set to be sufficiently diverse.
Those skilled in the art will appreciate that the above-described normal face data set satisfying such conditions may be obtained by conventional means.
First, the principle of calculation of the deformation expression model will be described.
Recording the set of the top points of the three-dimensional face template model as V, V ═ Vi|i=1,...,NvV is formed by all vertexes V on single face three-dimensional dataiWherein i is an index subscript, NvIs the total number of vertices; the obtained data set meets the condition that the number of vertexes and the sequence of the vertexes of the face data are the same, and meanwhile, the adjacency relation is also the same. Knowing the set of vertices V and some index i, it is possible to know which vertex is referred to.
Taking the three-dimensional face template model as a template face, and taking a three-dimensional exaggerated face model corresponding to the ironic portrait picture as a deformed face; constructing a vertex v 'with index i on deformed human face'iAnd a vertex v with index i on template faceiDeformation gradient T betweeniOfQuantity function, minimizing the energy function to solve for Ti:
Wherein N isiA subscript set of 1-neighborhood vertices with the vertex with index subscript i as the center, and a set N in the template faceiThe vertex with an internal index j is denoted as vjSet N in a morphed faceiThe vertex with the index subscript j is recorded as v'j;e'ijIs a vertex v 'on the deformed face'iTo v 'to vertex'jEdge of (e), eijAs a template of the vertex v on the faceiTo the vertex vjThe edge of (1); c. CijThe cosine Laplace weight of the template face;
after the deformation gradient of the vertex is obtained, T is decomposed through matrix polar decompositioniDecomposition into RiSiWherein R isiRepresenting the vertex viTo v 'to vertex'iRotation matrix component of deformation gradient, SiRepresenting the vertex viTo v 'to vertex'iA scaling matrix component of the deformation gradient;
rotating the matrix R by matrix operationiEquivalent is expressed as exp (logR)i) Then, the deformation representation model from the template face to the deformed face is written as:
fn={logRi;Si-I|i=1,...,Nv}
wherein, I is a unit array, and the introduction aims at constructing a coordinate system, Vn={v'i|i=1,...,NvThe vertex set on the three-dimensional exaggerated face model is used as the vertex set; the purpose of logR is to make the operation R on the rotation matrixiRjCan be expressed as exp (logR)i+logRj) This allows the multiplication to be simplified to an addition.
The method comprises the steps of coding all deformation from deformed faces to template faces to obtain a deformation representation set F ═ F based on the template faces on a three-dimensional exaggerated face model data setn1., N }, where N is in the set of deformation representationsThe number of elements, that is, the number of three-dimensional data in the face data set. Illustratively, the number of elements in F is 7800, i.e., N is 7800.
The set of distortion representations F is noted as a matrix of size N × M, with the nth row of the matrix representing the distortion representation F of the exaggerated face numbered N based on the template facen(ii) a For each fnIts ith vertex v'iIs expressed as a deformation of { logRi;Si-I } is recorded as a vector with one dimension of 9, so M ═ Nv× 9, supra, NvThe total number of the vertexes on the face three-dimensional mesh.
As shown in fig. 2, the convolutional neural network includes an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector that is split into two parts, one part being a K1-dimensional vector, i.e. camera projection parameters; the other part is a vector with K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2 ═ K.
Illustratively, ResNet34 may be used as an encoder and a 3-layer fully-connected neural network may be used as a decoder.
For example, the resolution of the ironic portrait of the input may be 224, 216, 1, and 2.
Based on the principle, in the training process, the three-dimensional face template model is used as a template face, the sarcasian portrait is input, the deformation gradient is obtained through the predicted rotation matrix component and the predicted scaling matrix component in the deformation representation model, the vertex coordinates of the three-dimensional exaggerated face model corresponding to the sarcasian portrait are predicted, and the two-dimensional key point coordinates are predicted by combining with the camera projection parameters output by the network. Then, a loss function can be constructed by using the marked two-dimensional key point coordinates and the three-dimensional exaggerated face model (real values) corresponding to the corresponding sarcasian portrait in the data set, and the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model predicted by the network approach to the real values in the data set through continuous training.
The preferred embodiment of network training is as follows:
for a ironic portrait, a deformation representation model can be obtained by a convolutional neural network, and is represented as:
wherein the content of the first and second substances,representing predicted vertices viTo v 'to vertex'iThe rotational matrix component of the deformation gradient,representing predicted vertices viTo v 'to vertex'iA scaling matrix component of the deformation gradient; note the book Denotes a vertex v 'with index subscript i on the predicted warped face'iAnd a vertex v with subscript i corresponding to the template faceiA deformation gradient;
according to predicted deformation gradientPredicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
wherein the content of the first and second substances,to index the vertex coordinates with index i in the predicted three-dimensional exaggerated face model,representing a set N of predicted facesiThe internal index subscript is the vertex coordinate of j; solving the optimal problem is equivalentAnd solving a linear equation set to obtain the vertex coordinates of the three-dimensional exaggerated face:
the camera projection parameters P are expressed as:whereinIs a scaling parameter that is a function of the zoom level,is the rotation matrix (derived from the euler angle vector),is a translation parameter. As in the previous example, K1 ═ 6, thenSequentially 1-dimensional, 3-dimensional and 2-dimensional vectors. According to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional keypoint set, and T is the total number of two-dimensional keypoints.
For example, the key points may be 68 key points including contours, eyebrows, eyes, nose, and mouth, or other forms of key points; corresponding three-dimensional key points can be selected from the three-dimensional key point set according to the selected key point form to form a set L'.
During the training process, the data in the data set is used as the true value (supervision information) during training. Based on the input single ironic portrait, the convolution neural network constructed in the step 1 can output a deformation representation model f and a camera projection parameter P in a manner combined with the introduction of the step, and thereby the predicted vertex coordinates of the three-dimensional model are obtainedAnd two-dimensional keypoint coordinates
In the embodiment of the present invention, the loss function in the training phase includes three parts:
1) vertex-based loss function Ever。
Using the three-dimensional exaggerated face vertex coordinates of the corresponding ironic portrait in the dataset as the supervisory information, the corresponding penalty function is expressed as:
wherein the content of the first and second substances,to index the vertex coordinates, v ', with subscript i in the predicted three-dimensional exaggerated face model'iThe vertex coordinates with index i in the corresponding three-dimensional exaggerated face model in the dataset are indexed.
2) Loss function E based on two-dimensional key pointslan。
Using the corresponding two-dimensional keypoint coordinates in the dataset as the supervisory information, the corresponding loss function is expressed as:
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional key point set, and T is the total number of the two-dimensional key points;the predicted two-dimensional key point coordinates are obtained; q's'tAnd (4) correspondingly marking the coordinates of the two-dimensional key points in the data set.
3) Loss function E based on camera projection parameterssrt
Since the loss value of the key point not only relates to the three-dimensional vertex coordinates, but also relates to the camera parameters, more supervision information is needed to individually constrain the camera parameters when training is started, and the corresponding loss function is expressed as:
wherein the content of the first and second substances,is a scaling parameter that is a function of the zoom level,is a matrix of rotations of the optical system,is a translation parameter.
Finally, the loss function for the training phase is:
E=λ1Ever+λ2Elan+λ3Esrt
wherein, { lambda ]k1,2,3 is a weight parameter; illustratively, set λ1=1,λ2=0.00001,λ3=0.0001。
In the embodiment of the present invention, based on the PyTorch deep learning framework training model, supervised learning may be performed by reading in multiple sets of data (for example, 32 sets) each time, and the training may be ended after training for multiple cycles (for example, 2000 cycles).
And 3, after training is finished, obtaining a corresponding deformation representation model and a corresponding camera projection parameter for the irony portrait painting, so as to predict the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model.
The testing process and the training process are the same in processing mode, the sarcasian portrait is input into the trained convolutional neural network, a deformation representation model and camera projection parameters can be obtained, and therefore vertex coordinates of the three-dimensional exaggerated face model (the three-dimensional exaggerated face model can be directly constructed due to the fact that the topological structure is known) and two-dimensional key point coordinates are predicted.
Some examples of test results are given schematically in fig. 3; the first row is the input two-dimensional ironic portrait (224 x 224), the second row is the predicted three-dimensional exaggerated face model, and the third row is the image labeled with the predicted two-dimensional keypoints.
Compared with the traditional key point detection and three-dimensional reconstruction algorithm based on pictures, the scheme of the embodiment of the invention mainly has the following advantages:
1) by means of the parameterized three-dimensional nonlinear deformation model, the expression capability of the convolutional neural network is enhanced through an algorithm, and a key point detection task based on an exaggerated human face is achieved.
2) Through the convolutional neural network, the algorithm realizes a method for reconstructing a three-dimensional face model from a two-dimensional exaggerated face picture from end to end.
3) Based on the established massive data training, the recognition and modeling accuracy of the algorithm model on the sarcasic portrait works of different styles and different writers is greatly improved compared with that of the traditional algorithm.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (5)
1. A method for detecting key points and reconstructing three-dimensionally of ironic portrait painting is characterized by comprising the following steps:
constructing a convolutional neural network, and collecting a data set which comprises a three-dimensional face template model, sarcasia portrait, marked two-dimensional key point coordinates and a three-dimensional exaggerated face model generated based on the existing method; the three-dimensional face template model and the three-dimensional exaggerated face model have the same topological structure;
in the training stage, a three-dimensional face template model is used as a template face, a deformation representation model of each ironic portrait is calculated, and camera projection parameters are output; predicting the corresponding three-dimensional exaggerated face model vertex coordinates and two-dimensional key point coordinates according to the deformation representation model and the camera projection parameters, and constructing a loss function in a training stage according to the three-dimensional exaggerated face model vertex coordinates and the two-dimensional key point coordinates, so that the network is trained in a supervision mode;
after training, corresponding deformation representation model and camera projection parameters are obtained for the ironic portrait painting input, and therefore the vertex coordinates and the two-dimensional key point coordinates of the three-dimensional exaggerated face model are predicted.
2. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1, wherein said convolutional neural network comprises an encoder and a decoder; an encoder for encoding the ironic portrait as a K-dimensional hidden vector that is split into two parts, one part being a K1-dimensional vector, i.e. camera projection parameters; the other part is a vector with K2 dimension, and the vector is decoded by a decoder to become a deformation representation model; wherein K1+ K2 ═ K.
3. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 1, wherein said three-dimensional face template model and three-dimensional exaggerated face model having the same topology means that two models share the same number and adjacency of vertices, and the order of vertices is the same on different models; recording the set of the top points of the three-dimensional face template model as V, V ═ Vi|i=1,...,NvV is formed by all vertexes V on single face three-dimensional dataiWherein i is an index subscript, NvIs the total number of vertices;
during training, the three-dimensional face template model is used as a template face, and a sarcasic portrait is input to obtain a deformation representation model f and a camera projection parameter P.
4. The ironic portrait keypoint detection and three-dimensional reconstruction method of claim 3,
the deformation representation model is expressed as:
wherein the content of the first and second substances,representing predicted vertices viTo v 'to vertex'iThe rotational matrix component of the deformation gradient,representing predicted vertices viTo v 'to vertex'iA scaling matrix component of the deformation gradient; note the book Denotes a vertex v 'with index subscript i on the predicted warped face'iAnd a vertex v with subscript i corresponding to the template faceiA deformation gradient;
according to predicted deformation gradientPredicting the vertex coordinates of the three-dimensional exaggerated face model by solving an optimization problem:
wherein the content of the first and second substances,to index the vertex coordinates with index i in the predicted three-dimensional exaggerated face model,representing a set N of predicted facesiThe internal index subscript is the vertex coordinate of j;
the camera projection parameters P are expressed as:whereinIs a scaling parameter that is a function of the zoom level,is a matrix of rotations of the optical system,is a translation parameter; according to the predicted vertex coordinates of the three-dimensional exaggerated face model and a weak perspective projection formula, two-dimensional key point coordinates can be obtained:
5. The method of ironic portrait keypoint detection and three-dimensional reconstruction of claim 1, or 2, or 3, or 4, characterized in that the loss function of the training phase is:
E=λ1Ever+λ2Elan+λ3Esrt
wherein, { lambda ]k1,2,3 is a weight parameter;
Everfor vertex-based loss functions:
wherein the content of the first and second substances,to index the vertex coordinates, v ', with subscript i in the predicted three-dimensional exaggerated face model'iIndexing vertex coordinates with subscript i in a corresponding three-dimensional exaggerated face model in the data set; n is a radical ofvRepresents the total number of vertices;
Elanis a loss function based on two-dimensional key points:
wherein L' is a three-dimensional key point set selected from a predicted vertex set of the three-dimensional exaggerated face model;is a two-dimensional key point set, and T is the total number of the two-dimensional key points;the predicted two-dimensional key point coordinates are obtained; q's'tMarking the coordinates of the two-dimensional key points in the data set correspondingly;
Esrtfor the loss function based on camera projection parameters:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010316895.5A CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010316895.5A CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111524226A true CN111524226A (en) | 2020-08-11 |
CN111524226B CN111524226B (en) | 2023-04-18 |
Family
ID=71903414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010316895.5A Active CN111524226B (en) | 2020-04-21 | 2020-04-21 | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111524226B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308957A (en) * | 2020-08-14 | 2021-02-02 | 浙江大学 | Optimal fat and thin face portrait image automatic generation method based on deep learning |
CN112700524A (en) * | 2021-03-25 | 2021-04-23 | 江苏原力数字科技股份有限公司 | 3D character facial expression animation real-time generation method based on deep learning |
CN113129347A (en) * | 2021-04-26 | 2021-07-16 | 南京大学 | Self-supervision single-view three-dimensional hairline model reconstruction method and system |
CN113538221A (en) * | 2021-07-21 | 2021-10-22 | Oppo广东移动通信有限公司 | Three-dimensional face processing method, training method, generating method, device and equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1074271A (en) * | 1996-08-30 | 1998-03-17 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for preparing three-dimensional portrait |
CN101751689A (en) * | 2009-09-28 | 2010-06-23 | 中国科学院自动化研究所 | Three-dimensional facial reconstruction method |
CN108242074A (en) * | 2018-01-02 | 2018-07-03 | 中国科学技术大学 | A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
CN109508678A (en) * | 2018-11-16 | 2019-03-22 | 广州市百果园信息技术有限公司 | Training method, the detection method and device of face key point of Face datection model |
US20190392634A1 (en) * | 2018-10-23 | 2019-12-26 | Hangzhou Qu Wei Technology Co., Ltd. | Real-Time Face 3D Reconstruction System and Method on Mobile Device |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
-
2020
- 2020-04-21 CN CN202010316895.5A patent/CN111524226B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1074271A (en) * | 1996-08-30 | 1998-03-17 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for preparing three-dimensional portrait |
CN101751689A (en) * | 2009-09-28 | 2010-06-23 | 中国科学院自动化研究所 | Three-dimensional facial reconstruction method |
CN108242074A (en) * | 2018-01-02 | 2018-07-03 | 中国科学技术大学 | A kind of three-dimensional exaggeration human face generating method based on individual satire portrait painting |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
US20190392634A1 (en) * | 2018-10-23 | 2019-12-26 | Hangzhou Qu Wei Technology Co., Ltd. | Real-Time Face 3D Reconstruction System and Method on Mobile Device |
CN109508678A (en) * | 2018-11-16 | 2019-03-22 | 广州市百果园信息技术有限公司 | Training method, the detection method and device of face key point of Face datection model |
Non-Patent Citations (2)
Title |
---|
王海君;杨士颖;王雁飞;: "基于NMF和LS-SVM的肖像漫画生成算法研究" * |
董肖莉;李卫军;宁欣;张丽萍;路亚旋;: "应用三角形坐标系的风格化肖像生成算法" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308957A (en) * | 2020-08-14 | 2021-02-02 | 浙江大学 | Optimal fat and thin face portrait image automatic generation method based on deep learning |
CN112700524A (en) * | 2021-03-25 | 2021-04-23 | 江苏原力数字科技股份有限公司 | 3D character facial expression animation real-time generation method based on deep learning |
CN112700524B (en) * | 2021-03-25 | 2021-07-02 | 江苏原力数字科技股份有限公司 | 3D character facial expression animation real-time generation method based on deep learning |
CN113129347A (en) * | 2021-04-26 | 2021-07-16 | 南京大学 | Self-supervision single-view three-dimensional hairline model reconstruction method and system |
CN113129347B (en) * | 2021-04-26 | 2023-12-12 | 南京大学 | Self-supervision single-view three-dimensional hairline model reconstruction method and system |
CN113538221A (en) * | 2021-07-21 | 2021-10-22 | Oppo广东移动通信有限公司 | Three-dimensional face processing method, training method, generating method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111524226B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111524226B (en) | Method for detecting key point and three-dimensional reconstruction of ironic portrait painting | |
Liu et al. | Editing conditional radiance fields | |
Chai et al. | Autohair: Fully automatic hair modeling from a single image | |
CN111325851B (en) | Image processing method and device, electronic equipment and computer readable storage medium | |
Hu et al. | Single-view hair modeling using a hairstyle database | |
Hu et al. | Robust hair capture using simulated examples | |
Wang et al. | High resolution acquisition, learning and transfer of dynamic 3‐D facial expressions | |
Tao et al. | Bayesian tensor approach for 3-D face modeling | |
Shen et al. | Deepsketchhair: Deep sketch-based 3d hair modeling | |
Zhang et al. | Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks | |
Yu et al. | Content-aware photo collage using circle packing | |
CN108242074B (en) | Three-dimensional exaggeration face generation method based on single irony portrait painting | |
Lv et al. | 3D facial expression modeling based on facial landmarks in single image | |
Bao et al. | A survey of image-based techniques for hair modeling | |
Luo et al. | Simpmodeling: Sketching implicit field to guide mesh modeling for 3d animalmorphic head design | |
CN110717978B (en) | Three-dimensional head reconstruction method based on single image | |
Shi et al. | Geometric granularity aware pixel-to-mesh | |
Jung et al. | Deep deformable 3d caricatures with learned shape control | |
Sang et al. | Agileavatar: Stylized 3d avatar creation via cascaded domain bridging | |
Sun et al. | Cgof++: Controllable 3d face synthesis with conditional generative occupancy fields | |
Luo et al. | Facial metamorphosis using geometrical methods for biometric applications | |
Xi et al. | A data-driven approach to human-body cloning using a segmented body database | |
Yu et al. | Mean value coordinates–based caricature and expression synthesis | |
He et al. | Data-driven 3D human head reconstruction | |
CN113379890A (en) | Character bas-relief model generation method based on single photo |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |