CN109741247B - Portrait cartoon generating method based on neural network - Google Patents

Portrait cartoon generating method based on neural network Download PDF

Info

Publication number
CN109741247B
CN109741247B CN201811631295.7A CN201811631295A CN109741247B CN 109741247 B CN109741247 B CN 109741247B CN 201811631295 A CN201811631295 A CN 201811631295A CN 109741247 B CN109741247 B CN 109741247B
Authority
CN
China
Prior art keywords
sequence
vector
point
face
cartoon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811631295.7A
Other languages
Chinese (zh)
Other versions
CN109741247A (en
Inventor
吕建成
汤臣薇
徐坤
贺喆南
李婵娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811631295.7A priority Critical patent/CN109741247B/en
Publication of CN109741247A publication Critical patent/CN109741247A/en
Application granted granted Critical
Publication of CN109741247B publication Critical patent/CN109741247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a portrait cartoon generating method based on a neural network, which comprises the following steps: s1, extracting the structural features of the face in the real face image, and converting the structural features into sequence feature data; s2, inputting the sequence feature data into the trained Seq2Seq VAE model to generate corresponding exaggeration structure sequence points; s3, applying the generated exaggerated structure sequence points to a real face image, and carrying out exaggerated deformation on the real face image; and S4, applying the cartoon style to the face image after the exaggerated deformation to generate the portrait cartoon. The invention creatively provides a method for representing the structural characteristics of the human face by using sequence characteristics, and the method is applied to cartoon generation by using a Seq2Seq VAE model to generate an exaggerated sequence. The limitations of the existing image translation method are overcome, and the generated exaggeration portrait cartoon not only has humorous exaggeration without damaging the identification degree of the character role, but also is reflected in the painting styles of different cartoon artists.

Description

Portrait cartoon generating method based on neural network
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a portrait cartoon generating method based on a neural network.
Background
Portrait rendering remains a very popular form of artistic expression up to modern times. With the continuous development of machine vision related technologies, portrait drawing is widely applied to multimedia, personalized entertainment, internet and the like of virtual reality, augmented reality, robot portrait drawing systems and the like. In order to enhance the artistic expressive force of the portrait, various artistic portraits such as sketches, cartoons and the like are generated based on different artistic characteristics, and the cartoons are paid attention and researched by a plurality of scholars as a common artistic form.
With the development of artificial intelligence, more and more scholars begin to study the combination of artificial intelligence and art, i.e., computing art. By means of mathematics and statistics, the rules involved in the art can be quantified as mathematical relationships, for example, the golden section has strict proportionality, artistry and harmony and has a rich aesthetic value. At the same time, these mathematical relationships become part of the theoretical basis of computational art. When painting involves the expression of a figure, there are many different forms of painting art.
As shown in fig. 1, the portrait includes an exaggerated portrait cartoon, a sketch, a cartoon, a simple drawing, and the like. An exaggerated portrait caricature, as the name implies, means to express a distinct distinction of a person from a popular face through exaggeration and deformation of facial organs. Compared with the sketching, the exaggeration cartoon adds humor elements on the basis of sketching. Different from cartoons and simplified strokes, the exaggerated cartoon can meet the fun of the cartoon and keep the identification degree of characters. However, there has been a lot of research work on simple artistic forms such as sketches, simplified strokes, and cartoons. In contrast, only a few research efforts have focused on the generation of exaggerated portrait caricatures.
The generation of an exaggerated portrait caricature may be viewed as a stylistic transformation from a real facial image to a caricature image. Image-to-image conversion is a popular type of visual problem, whose goal is to learn the style characteristics of the target image, and the mapping between the input and output images. Among them, a Convolutional Neural Network (CNN) based generation countermeasure network (GAN) is considered as one of the most popular image translation methods. However, the existing method can only convert the texture and color of the image, when the task has the change of the image content and the geometric structure, the effect of the convolutional neural network method based on the antagonistic generation network is not ideal, and the generation of the exaggerated portrait cartoon relates to the exaggerated deformation of the image content, namely the face structure.
In order to directly convert a portrait picture into a corresponding portrait cartoon, one method in the prior art is a sample-based method, in the method, a portrait picture of a face is given, each face is decomposed into different parts (such as a nose, a mouth and the like), for each part, corresponding cartoon components in a data set are searched by applying feature matching, and then the cartoon components are combined together to construct a cartoon face; the other method is a method based on human face features, firstly defining the feature points of a movable shape model, then generating an exaggerated portrait in a real human face image based on human faces and mutual relations, and introducing a 'contrast principle' while obtaining an exaggerated shape of the face from the exaggeration of the face shape and the exaggeration of five sense organs; and finally, generating an exaggerated image of the face image by combining an image deformation method.
In the prior art, a sample-based method needs to draw and collect a large number of cartoon components to build a database according to different local human face characteristics, so that the workload is huge and the technical requirement on callback is extremely high; the spliced human faces are relatively fixed and lack diversity; and the final effect is only cartoon to the original face, and the obvious features of the input face are not deformed, namely the definition of the exaggerated portrait cartoon is not met. And the other method based on the human face features can exaggerate the original human face to a certain degree, but the effect is not obvious, the figure identification degree is not available, and the style of the obtained cartoon effect is single.
Disclosure of Invention
Aiming at the defects in the prior art, the neural network-based portrait cartoon generation method provided by the invention solves the problems that the existing image translation method in the existing portrait cartoon generation method is limited and the obtained portrait cartoon has a single style.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a portrait cartoon generating method based on a neural network comprises the following steps:
s1, extracting the structural features of the face in the real face image, and converting the extracted structural feature data into sequence feature data;
s2, inputting the sequence feature data into the trained Seq2Seq VAE model to generate an exaggerated structure sequence point corresponding to the face image;
s3, applying the generated exaggerated structure sequence points to the real face image by utilizing a thin plate spline interpolation technology to realize the exaggerated deformation of the real face image;
and S4, applying the cartoon style to the face image after the exaggerated deformation by using the CycleGAN technology to generate the portrait cartoon.
Further, the structural features of the human face in the step S1 include the contour structural features and the facial features of the human face;
the step S1 specifically includes:
extracting 68 sequence points of a real face image to serve as structural feature data of a face, and obtaining an offset coordinate sequence of each sequence point relative to a previous sequence point according to an absolute coordinate value of each sequence point, wherein the offset coordinate sequence is sequence feature data;
wherein each sequence point is a sequence point to which a state value is added, and the sequence point to which the state value is added is represented as Q (x, y, p)1,p2,p3);
Wherein x and y represent the offset distance of the sequence point in the x and y directions relative to the previous sequence point;
p1,p2,p3a binary one-hot vector representing three facial states; p is a radical of1Indicating that the sequence point is the start of a facial contour or five sense organs, p2Represents the sequenceThe point is the same organ as the previous sequence point, p3Indicating that the sequence point is the last point of the 68 sequence points.
Further, the method for training the Seq2Seq VAE model in step S2 specifically includes:
a1, inputting the positive sequence and the negative sequence of the structural feature of the face in a real face image into a coder to obtain a positive sequence feature vector
Figure BDA0001929060530000041
And the reverse order feature vector
Figure BDA0001929060530000042
And connecting the vectors into a final feature vector h;
a2, mapping the final feature vector h into an average vector mu and a standard deviation vector sigma through two full-connection-layer networks respectively, and sampling to obtain a random vector z which enables the average vector mu and the standard deviation vector sigma to obey normal distribution;
a3, inputting a random vector z into a decoder to obtain a preliminary training Seq2Seq VAE network;
a4, inputting a plurality of real face images into a Seq2Seq VAE network trained at the previous time, and repeating the steps A1-A3 until the Seq2Seq VAE network converges to obtain a trained Seq2Seq VAE model.
Further, the encoder in step a1 includes a bidirectional LSTM network module, where the bidirectional LSTM network module includes two LSTM networks with 68 layers.
Further, the step a1 is specifically:
the positive sequence of the structural characteristics of the face in a real face image
Figure BDA0001929060530000043
Each data in the network is input into an LSTM network to obtain a positive sequence feature vector
Figure BDA0001929060530000044
Simultaneously, the face in a real face imageReverse order sequence of structural features of section
Figure BDA0001929060530000045
Is input into another LSTM network to obtain the reverse order feature vector
Figure BDA0001929060530000046
Feature vector of positive sequence
Figure BDA0001929060530000047
And the reverse order feature vector
Figure BDA0001929060530000048
Connecting to form a final feature vector h;
wherein, the positive sequence
Figure BDA0001929060530000049
Reverse order sequence
Figure BDA00019290605300000410
Wherein i is 0,1,2.. 67.
Further, in the step a 2;
the random vector z is:
z=μ+σ⊙N(0,1)
wherein ⊙ represents a vector point multiplication;
n (0,1) is IID Gaussian vector.
Further, in the step a3, the decoder is an LSTM network with a time length of 68;
the input elements of the LSTM network at each moment further comprise a vector T derived from a previous momenttAnd source point St
The output end of the LSTM network at each moment outputs a vector OtAnd the output vector O of the current time ttVector T is obtained by sampling through Gaussian mixture modeltAnd input to the LSTM network at the next moment;
wherein t represents time, and t is 0,1,2.. 67;
vector T with initial time input into LSTM network0And source point S0Are initialized to (0,0,1,0, 0).
Further, the output vector O at the current time ttVector T is obtained by sampling through Gaussian mixture modeltThe method comprises the following steps:
b1, determining the number N of normal distributions in the Gaussian mixture model, and outputting a vector OtIs set to 6N, OtThe decomposition is as follows:
Figure BDA0001929060530000051
wherein n represents the nth Gaussian mixture model;
x represents the abscissa;
y represents the ordinate;
wna weight matrix representing the nth Gaussian mixture model, an
Figure BDA0001929060530000052
μ(x,n)Representing the expectation of the abscissa x;
μ(y,n)the expectation of the ordinate y;
σ(x,n)represents the standard deviation of the abscissa x;
σ(y,n)standard deviation representing the ordinate y;
ρ (xy, n) represents a correlation coefficient;
b2, determining decomposed OtWhen the T is input into the Gaussian mixture model, T is obtained by samplingtProbability p (x, y; t);
wherein the probability p (x, y; t) is:
Figure BDA0001929060530000061
w (n, t) represents a weight matrix of the nth Gaussian mixture model at the time t;
n (x, y) represents that the coordinates (x, y) follow a normal distribution, and the parameters are mu, sigma and rho;
μ (x, n, t) represents the expectation of the abscissa of the nth gaussian model at time t;
μ (y, n, t) represents the expectation of the nth gaussian model ordinate at time t;
σ (x, n, t) represents the standard deviation of the abscissa of the nth Gaussian model at the time t;
sigma (y, n, t) represents the standard deviation of the ordinate of the nth Gaussian model at the time t;
ρ (xy, n, t) represents a correlation coefficient;
b3, substituting the probability p (x, y; T) into the reconstruction error function to obtain the reconstruction error, and maximizing the reconstruction error to enable the Gaussian mixture model to output the target vector Tt
Wherein the reconstruction error function is:
Figure BDA0001929060530000062
wherein L isRIs a reconstruction error;
(x, y) represents the horizontal and vertical coordinates of the feature points.
The invention has the beneficial effects that: the portrait cartoon network generation method based on the neural network provided by the invention initiatively proposes that the face structure characteristics are stored as sequence characteristics, and then the Seq2Seq VAE model can be used for generating an exaggerated sequence, so that the method is applied to cartoon generation; the limitations of the existing image translation method are overcome, and the generated exaggeration portrait cartoon not only has humorous exaggeration without damaging the recognition degree of the role, but also is reflected in the painting styles of different cartoon artists.
Drawings
Fig. 1 is a schematic diagram of the kind of portrait painting in the background art of the present invention.
Fig. 2 is a flowchart of an implementation of a neural network-based portrait caricature generation method according to the present invention.
Fig. 3 is a schematic representation diagram illustrating conversion of a face structure feature into a sequence feature by using a face alignment technique according to an embodiment of the present invention.
FIG. 4 is a flowchart illustrating an implementation of the Seq2Seq VAE model training method according to the present invention.
FIG. 5 is a schematic diagram comparing a complete target L with several variations thereof in an embodiment of the present invention.
Fig. 6 is a comparison result of different real face images in the embodiment provided by the present invention.
Fig. 7 is a schematic diagram showing a comparison of features between an input face and a corresponding "common face" in an embodiment provided by the present invention.
Fig. 8 is a diagram illustrating a partially exaggerated result of an original image according to an embodiment of the present invention.
FIG. 9 is a graph illustrating the comparison of the results of using different artistic styles on an exaggerated face according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 2, a portrait caricature generation method based on a neural network includes the following steps:
s1, extracting the structural features of the face in the real face image, and converting the extracted structural feature data into sequence feature data;
s2, inputting the sequence feature data into the trained Seq2Seq VAE model to generate an exaggerated structure sequence point corresponding to the face image;
s3, applying the generated exaggerated structure sequence points to the real face image by utilizing a thin plate spline interpolation technology to realize the exaggerated deformation of the real face image;
when the thin plate spline interpolation technology is adopted, a Cartesian coordinate system is established on a thin plate, and the independent variable x and the function value y are points distributed on the coordinate system. The sheet passing through all the corresponding after bending deformationThe value y point is used while minimizing the bending energy. This interpolation function is defined as
Figure BDA0001929060530000082
The specific form is as follows:
Figure BDA0001929060530000081
and S4, applying the cartoon style to the face image after the exaggerated deformation by using the CycleGAN technology to generate the portrait cartoon.
CycleGAN can transform images from source domain X to target domain Y without paired data through learning. The goal is to learn a mapping G X → Y that approximates the distribution of the image from G (X) to distribution Y with increased resistance loss. Since this mapping is highly under-constrained, reversing it produces the inverse mapping F: Y → X, and introduces a consistency penalty to constrain such that F (G (X)) is ≈ X.
In the step S1, extracting face contours and facial feature points of the face of all real faces in the MMC data set and the face of the corresponding exaggerated portrait caricature by using a face alignment technique, wherein the structural features of the face include contour structural features and facial feature features of the face; extracting 68 sequence points to represent the structural characteristics of the human face, wherein each sequence point is represented by absolute value coordinates (x ', y');
the MMC data set is a Harvard cartoon data set, and comprises a large number of collected real face images and corresponding exaggeration portrait cartoon face data.
Therefore, the step S1 is specifically:
extracting 68 sequence points of a real face image to serve as structural feature data of a face, and obtaining an offset coordinate sequence of each sequence point relative to a previous sequence point according to an absolute coordinate value of each sequence point, wherein the offset coordinate sequence is sequence feature data;
wherein, in order to distinguish each facial organ, each sequence point is a sequence point added with a state valueExpressed as Q (x, y, p)1,p2,p3) (ii) a For a conventional rectangular image, using the lower left corner of the image as the origin of the coordinate system, the extracted sequence points can be considered as points distributed in the first quadrant;
wherein x and y represent the offset distance of the sequence point in the x and y directions relative to the previous sequence point;
p1,p2,p3a binary one-hot vector representing three facial states; p is a radical of1Indicating that the sequence point is the start of a facial contour or five sense organs, p2Indicating that the sequence point is identical to the previous sequence point and is the same organ, p3Indicating that the sequence point is the last point of the 68 sequence points. By means of the state values, the human face structure can be divided into five parts, namely, the outline, the eyebrow, the eye, the nose and the mouth. Fig. 3 shows a character sketch drawn by MMC data set with several samples and based on 68 sequence points obtained by the face alignment method.
Since the face of the real face image in the MMC data set is not completely matched with the corresponding portrait cartoon, particularly the proportion of the face to the whole picture and the angle of the side face of a person, in order to reduce errors caused by data extraction, the first point S 'of the face contour of the real face image is used before the offset coordinate sequence is obtained'0=(x′0,y′0) And last point S'16=(x′16,y′16) And as a reference point, correcting the extracted 68 sequence points in sequence by rotating and scaling the corresponding cartoon until the cartoon is aligned with the 1 st point and the 16 th point of the real face, and then obtaining an offset coordinate sequence of each sequence point relative to the previous sequence point according to the absolute coordinate value of each sequence point.
In step S2, the Seq2Seq VAE network is trained using the paired exaggerated sequence data, and when the network converges (i.e. the corresponding exaggerated sequence points of the cartoon can be reconstructed from the sequence feature data of the real face), the network model is used to generate the exaggerated sequence points of the cartoon for the tested real face feature sequence data;
as shown in fig. 4, the method for training the Seq2Seq VAE model in step S2 specifically includes:
a1, inputting the positive sequence and the negative sequence of the structural feature of the face in a real face image into a coder to obtain a positive sequence feature vector
Figure BDA0001929060530000101
And the reverse order feature vector
Figure BDA0001929060530000102
And connecting the vectors into a final feature vector h;
the encoder comprises a bidirectional LSTM network module, wherein the bidirectional LSTM network module comprises two LSTM networks with 68 layers;
therefore, step a1 specifically includes:
the positive sequence of the structural characteristics of the face in a real face image
Figure BDA0001929060530000103
Each data in the network is input into an LSTM network to obtain a positive sequence feature vector
Figure BDA0001929060530000104
Simultaneously, the reverse sequence of the structural characteristics of the face in a real face image
Figure BDA0001929060530000105
Is input into another LSTM network to obtain the reverse order feature vector
Figure BDA0001929060530000106
Feature vector of positive sequence
Figure BDA0001929060530000107
And the reverse order feature vector
Figure BDA0001929060530000108
Connecting to form a final feature vector h;
wherein, the positive sequence
Figure BDA0001929060530000109
Reverse order sequence
Figure BDA00019290605300001010
Wherein i is 0,1,2.. 67.
A2, projecting the final feature vector h to an average vector mu and a standard deviation vector sigma, and sampling to obtain a random vector z which enables the average vector mu and the standard deviation vector sigma to obey normal distribution;
wherein the random vector z is:
z=μ+σ⊙N(0,1)
wherein ⊙ represents a vector point multiplication;
n (0,1) is IID Gaussian vector.
There is a divergence loss L between the random vector z and the IID Gaussian vector N (0,1) distributionKL
Figure BDA00019290605300001011
Wherein KL (·) represents a KL distance;
n (μ, σ) denotes obedience to a normal distribution;
KL (A | | B) represents the KL distance between distribution A and distribution B;
when the random vector z is determined, the divergence loss L can be calculatedKLThe divergence loss automatically propagates backward to train the LSTM network structure, so that the difference between the z obtained by subsequent input and the distribution of the gaussian vector N (0,1) becomes smaller and smaller.
A3, inputting a random vector z into a decoder to obtain a preliminary training Seq2Seq VAE network;
wherein, the decoder is an LSTM network with the time length of 68;
the input elements of the LSTM network at each moment further comprise a vector T derived from a previous momenttAnd source point St
The output end of the LSTM network at each moment outputs a vector OtAnd is andoutput vector O at current instant ttVector T is obtained by sampling through Gaussian mixture modeltAnd input to the LSTM network at the next moment;
wherein t represents time, and t is 0,1,2.. 67;
vector T with initial time input into LSTM network0And source point S0Are all initialized to (0,0,1,0, 0);
in the above process, the output of the LSTM network is O at each momenttIt cannot be directly input into the next-time LSTM network, so O is requiredtDecomposing into parameters needed by Gaussian mixture model, and then obtaining TtThe binary normal distribution is determined by five elements: (mu.) axyxyxy) In which μxAnd muyDenotes the mean value, σxAnd σyDenotes the standard deviation, pxyRepresenting the relevant parameter. For a binary normal distribution, there is also a weight w; therefore, a GMM model with N normal distributions requires (5+1) N parameters. For each sequence point of the face, the state value (p)1,p2,p3) Are fixed and therefore do not need to be generated.
Wherein, the output vector O of the current time ttVector T is obtained by sampling through Gaussian mixture modeltThe method comprises the following steps:
b1, determining the number N of normal distributions in the Gaussian mixture model, and outputting a vector OtIs set to 6N, OtThe decomposition is as follows:
Figure BDA0001929060530000121
wherein n represents the nth Gaussian mixture model;
x represents the abscissa;
y represents the ordinate;
wna weight matrix representing the nth Gaussian mixture model, an
Figure BDA0001929060530000122
μ(x,n)Representing the expectation of the abscissa x;
μ(y,n)the expectation of the ordinate y;
σ(x,n)represents the standard deviation of the abscissa x;
σ(y,n)standard deviation representing the ordinate y;
ρ (xy, n) represents a correlation coefficient;
b2, determining decomposed OtWhen the T is input into the Gaussian mixture model, T is obtained by samplingtProbability p (x, y; t);
wherein the probability p (x, y; t) is:
Figure BDA0001929060530000123
w (n, t) represents a weight matrix of the nth Gaussian mixture model at the time t;
n (x, y) represents that the coordinates (x, y) follow a normal distribution, and the parameters are mu, sigma and rho;
μ (x, n, t) represents the expectation of the abscissa of the nth gaussian model at time t;
μ (y, n, t) represents the expectation of the nth gaussian model ordinate at time t;
σ (x, n, t) represents the standard deviation of the abscissa of the nth Gaussian model at the time t;
sigma (y, n, t) represents the standard deviation of the ordinate of the nth Gaussian model at the time t;
ρ (xy, n, t) represents a correlation coefficient;
b3, substituting the probability p (x, y; T) into the reconstruction error function to obtain the reconstruction error, and maximizing the reconstruction error to enable the Gaussian mixture model to output the target vector Tt
Wherein the reconstruction error function is:
Figure BDA0001929060530000131
wherein L isRIs a reconstruction error;
(x, y) represents the horizontal and vertical coordinates of the feature points.
A4, inputting a plurality of real face images into a Seq2Seq VAE network trained at the previous time, and repeating the steps A1-A3 until the Seq2Seq VAE network converges to obtain a trained Seq2Seq VAE model.
There is also a loss of consistency L in the sequence Seq2Seq VAE network processCThe loss of consistency uses the log-likelihood of the probability distribution produced to interpret the source point S, and LCRelated to maintaining the basic structure of the face;
loss of consistency LCComprises the following steps:
Figure BDA0001929060530000132
there is a source point S in each LSTM networktThe resulting loss of consistency from each LSTM automatically adjusts back to the network structure of the decoder, thereby generating exaggerated structure sequence points.
In one embodiment of the present invention, experimental results of the inventive method on MMC data sets are provided:
1. dividing the images in the MMC data set into 500 training pairs and 47 testing pairs, and adding 100 extra real face images;
2. for the encoder, 256-dimensional feature vectors are extracted from the source data in the positive and negative order, respectively
Figure BDA0001929060530000133
And
Figure BDA0001929060530000134
then, a 512-dimensional vector h obtained by concatenation is used as an input of VAE, and the dimension of the vector z is 128. Setting GMM (Gaussian mixture model) as 20 normal distributions, and setting the output dimension of the LSTM of the decoder as 120;
3. study of Kullback-Leibler divergence loss LKLReconstruction error LRAnd loss of consistency LCThe importance of (c). The complete method is then compared to several variants, after which the effect of the batch size on the generated exaggerated sketch is analyzedThe effectiveness of the system is improved by the local exaggeration. Finally, the exaggerated portrait is transferred to various artistic styles.
Experimental results and analysis:
A. analysis of the loss function:
in FIG. 5, the complete target L is compared to several variants thereof, one being the Kullback-Leibler divergence loss LKLAnd a reconstruction error LRAnd the other is a reconstruction error LRAnd loss of consistency LC. All Seq2Seq VAE models were trained with a batch size of 64 samples, using the Adam optimization algorithm with a learning rate of 0.01 and a gradient cut of 1.0.
(1) At LKLAnd LRIn the experiment, α ═ 0.8 and β ═ 0 were set;
(2) at LRAnd LCIn the experiment, α ═ 0 and β ═ 2 were set;
(3) at LKL+LR+LCIn the experiment, α -0.5 and β -0.5 were set.
Experimental results show that the three losses play a crucial role in obtaining high-quality results. From the original sketch of the second row and the exaggerated sketch of the third row we can find LKLThe resulting sketch can be made more exaggerated and the preservation of the original facial structure is mainly achieved by minimizing LcIs achieved by the loss of. Complete model simultaneous minimization of LKL,LR,LCTo maintain the basic structure of the original image and to exaggerate the original sketch. The complete Seq2Seq VAE model not only exaggerates the facial features, but also preserves the recognition of the character. However, the complete model still has some disadvantages. The model only slightly exaggerates certain identifying features of the original image, and does not achieve the ideal exaggeration effect.
B. Analysis of batch size:
the variation in batch size may cause the network to oscillate between randomness and certainty, the most obvious manifestation of randomness and certainty in this Seq2Seq VAE is whether the generated sketch is distorted or restored. The comparison of the different real face images in fig. 6 shows that the batch size directly affects the stability of the generated sequence.
As shown in fig. 6, when the batch size is equal to 16, the degree of randomness in the generated network is much greater than the stability, resulting in severe distortion of the generated image. From the simple sketches of the second and fourth rows it is also clear that the deformation of the facial structure is very severe when the batch is small. As the batch size increases, the stability of the network increases accordingly, and the generated sequence is more consistent with the sequence of the source images. For example, when the batch size is equal to 128, the degree of exaggeration on the source image is very gradual, in order to exaggerate the source image as the model, it can not only maintain the basic structure of the facial features, but also greatly exaggerate the apparent features thereof; thus, in other experiments, the batch size was set to 64.
C. The parts are exaggerated:
in general, artists often exaggerate the apparent features that distinguish them from "mass faces" when drawing portrait caricatures. Thus, in the proposed system, the inventive method proposes a partially exaggerated approach. By comparing the proportional distribution of the input face and the "popular face" through the data sets of the "popular face" of the male and female in different countries, the obvious features of the subject can be obtained, as shown in fig. 7, a feature comparison example between the input face and the corresponding "popular face" can be seen.
The influence of the system on the generation of the cartoon can be further enhanced by inverting the x-axis and y-axis values of the corresponding local coordinate points and making corresponding changes to exaggerate the local facial organs. There are problems in that the hairstyle, forehead, ears and cheeks of a person cannot be extracted in the face alignment step, and thus these local features cannot be compared and exaggerated. From experimental results, the method can reasonably exaggerate the extracted features. Fig. 8 shows a partially exaggerated result of the original image. The first column is the original image. The second column is characteristic of the structural distribution. The blue dots are the structures of the original plane and the yellow dots are the structures after local adjustment. A third column of deformation results may be obtained when the local variation is applied to the original surface. The fourth column of caricatures is the corresponding target. Although the result does not achieve the effect of the target output, a certain exaggerated humorous effect can be obtained.
The final result is:
the generated exaggeration portrait cartoon not only has humorous exaggeration without damaging the identification degree of the role, but also is reflected in the painting styles of different cartoon artists; different styles, such as cartoon style, oil painting style, sketch style, cartoon style and the like, are trained through the CycleGAN. Fig. 9 shows the result of using different artistic styles on an exaggerated face.
The invention has the beneficial effects that: the portrait cartoon network generation method based on the neural network provided by the invention creatively provides that the face structure characteristics are stored as sequence characteristics, and then the Seq2Seq VAE model can be used for generating an exaggerated sequence, so that the method is applied to cartoon generation. The limitations of the existing image translation method are overcome, and the generated exaggeration portrait cartoon not only has humorous exaggeration without damaging the recognition degree of the role, but also is reflected in the painting styles of different cartoon artists.

Claims (2)

1. A portrait cartoon generating method based on a neural network is characterized by comprising the following steps:
s1, extracting the structural features of the face in the real face image, and converting the extracted structural feature data into sequence feature data;
s2, inputting the sequence feature data into the trained Seq2Seq VAE model to generate an exaggerated structure sequence point corresponding to the face image;
s3, applying the generated exaggerated structure sequence points to the real face image by utilizing a thin plate spline interpolation technology to realize the exaggerated deformation of the real face image;
s4, applying the cartoon style to the face image after the exaggerated deformation by using a CycleGAN technology to generate a portrait cartoon;
the structural features of the face in the step S1 include contour structural features and facial features of the face;
the step S1 specifically includes:
extracting 68 sequence points of a real face image to serve as structural feature data of a face, and obtaining an offset coordinate sequence of each sequence point relative to a previous sequence point according to an absolute coordinate value of each sequence point, wherein the offset coordinate sequence is sequence feature data;
wherein each sequence point is a sequence point to which a state value is added, and the sequence point to which the state value is added is represented as Q (x, y, p)1,p2,p3);
Wherein x and y represent the offset distance of the sequence point in the x and y directions relative to the previous sequence point;
p1,p2,p3a binary one-hot vector representing three facial states; p is a radical of1Indicating that the sequence point is the start of a facial contour or five sense organs, p2Indicating that the sequence point is the same organ as the previous sequence point, p3Indicating that the sequence point is the last point of the 68 sequence points;
the method for training the Seq2Seq VAE model in the step S2 specifically comprises the following steps:
a1, inputting the positive sequence and the negative sequence of the structural feature of the face in a real face image into a coder to obtain a positive sequence feature vector
Figure FDA0002386162680000021
And the reverse order feature vector
Figure FDA0002386162680000022
And connecting the vectors into a final feature vector h;
a2, mapping the final feature vector h into an average vector mu and a standard deviation vector sigma through two full-connection-layer networks respectively, and sampling to obtain a random vector z which enables the average vector mu and the standard deviation vector sigma to obey normal distribution;
a3, inputting a random vector z into a decoder to obtain a preliminary training Seq2Seq VAE network;
a4, sequentially inputting a plurality of real face images into a Seq2Seq VAE network trained at the previous time, and repeating the steps A1-A3 until the Seq2Seq VAE network converges to obtain a trained Seq2Seq VAE model;
the encoder in the step a1 includes a bidirectional LSTM network module, where the bidirectional LSTM network module includes two LSTM networks with 68 layers;
the step a1 specifically includes:
the positive sequence of the structural characteristics of the face in a real face image
Figure FDA0002386162680000023
Each data in the network is input into an LSTM network to obtain a positive sequence feature vector
Figure FDA0002386162680000024
Simultaneously, the reverse sequence of the structural characteristics of the face in a real face image
Figure FDA0002386162680000025
Is input into another LSTM network to obtain the reverse order feature vector
Figure FDA0002386162680000026
Feature vector of positive sequence
Figure FDA0002386162680000027
And the reverse order feature vector
Figure FDA0002386162680000028
Connecting to form a final feature vector h;
wherein, the positive sequence
Figure FDA0002386162680000029
Reverse order sequence
Figure FDA00023861626800000210
Wherein i is 0,1,2.. 67;
in said step a 2;
the random vector z is:
z=μ+σ⊙N(0,1)
wherein ⊙ represents a vector point multiplication;
n (0,1) is IID Gaussian vector;
in step a3, the decoder is an LSTM network with a time length of 68;
the input elements of the LSTM network at each moment further comprise a vector T derived from a previous momenttAnd source point St
The LSTM network outputs a vector O at each momenttAnd the output vector O of the current time ttVector T is obtained by sampling through Gaussian mixture modeltAnd input to the LSTM network at the next moment;
wherein t represents time, and t is 0,1,2.. 67;
vector T with initial time input into LSTM network0And source point S0Are initialized to (0,0,1,0, 0).
2. The neural network-based portrait caricature generation method of claim 1, wherein the output vector O at the current time ttVector T is obtained by sampling through Gaussian mixture modeltThe method comprises the following steps:
b1, determining the number N of normal distributions in the Gaussian mixture model, and outputting a vector OtIs set to 6N, OtThe decomposition is as follows:
Figure FDA0002386162680000031
wherein n represents the nth Gaussian mixture model;
x represents the abscissa;
y represents the ordinate;
wna weight matrix representing the nth Gaussian mixture model, an
Figure FDA0002386162680000032
μ(x,n)Representing the expectation of the abscissa x;
μ(y,n)the expectation of the ordinate y;
σ(x,n)represents the standard deviation of the abscissa x;
σ(y,n)standard deviation representing the ordinate y;
ρ (xy, n) represents a correlation coefficient;
b2, determining decomposed OtWhen the T is input into the Gaussian mixture model, T is obtained by samplingtProbability p (x, y; t);
wherein the probability p (x, y; t) is:
Figure FDA0002386162680000041
w (n, t) represents a weight matrix of the nth Gaussian mixture model at the time t;
n (x, y) represents that the coordinates (x, y) follow a normal distribution, and the parameters are mu, sigma and rho;
μ (x, n, t) represents the expectation of the abscissa of the nth gaussian model at time t;
μ (y, n, t) represents the expectation of the nth gaussian model ordinate at time t;
σ (x, n, t) represents the standard deviation of the abscissa of the nth Gaussian model at the time t;
sigma (y, n, t) represents the standard deviation of the ordinate of the nth Gaussian model at the time t;
ρ (xy, n, t) represents a correlation coefficient;
b3, substituting the probability p (x, y; T) into the reconstruction error function to obtain the reconstruction error, and maximizing the reconstruction error to enable the Gaussian mixture model to output the target vector Tt
Wherein the reconstruction error function is:
Figure FDA0002386162680000042
wherein L isRIs a reconstruction error;
(x, y) represents the horizontal and vertical coordinates of the feature points.
CN201811631295.7A 2018-12-29 2018-12-29 Portrait cartoon generating method based on neural network Active CN109741247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811631295.7A CN109741247B (en) 2018-12-29 2018-12-29 Portrait cartoon generating method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811631295.7A CN109741247B (en) 2018-12-29 2018-12-29 Portrait cartoon generating method based on neural network

Publications (2)

Publication Number Publication Date
CN109741247A CN109741247A (en) 2019-05-10
CN109741247B true CN109741247B (en) 2020-04-21

Family

ID=66362127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811631295.7A Active CN109741247B (en) 2018-12-29 2018-12-29 Portrait cartoon generating method based on neural network

Country Status (1)

Country Link
CN (1) CN109741247B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197226B (en) * 2019-05-30 2021-02-09 厦门大学 Unsupervised image translation method and system
CN111127309B (en) * 2019-12-12 2023-08-11 杭州格像科技有限公司 Portrait style migration model training method, portrait style migration method and device
CN111161137B (en) * 2019-12-31 2023-04-11 四川大学 Multi-style Chinese painting flower generation method based on neural network
CN111243051B (en) * 2020-01-08 2023-08-18 杭州未名信科科技有限公司 Portrait photo-based simple drawing generation method, system and storage medium
CN111243050B (en) * 2020-01-08 2024-02-27 杭州未名信科科技有限公司 Portrait simple drawing figure generation method and system and painting robot
CN111402394B (en) * 2020-02-13 2022-09-20 清华大学 Three-dimensional exaggerated cartoon face generation method and device
CN111508048B (en) * 2020-05-22 2023-06-20 南京大学 Automatic generation method of interactive arbitrary deformation style face cartoon
CN112241704B (en) * 2020-10-16 2024-05-31 百度(中国)有限公司 Portrait infringement judging method and device, electronic equipment and storage medium
CN112463912A (en) * 2020-11-23 2021-03-09 浙江大学 Raspberry pie and recurrent neural network-based simple stroke identification and generation method
CN112396693B (en) * 2020-11-25 2024-09-13 上海商汤智能科技有限公司 Face information processing method and device, electronic equipment and storage medium
CN112818118B (en) * 2021-01-22 2024-05-21 大连民族大学 Reverse translation-based Chinese humor classification model construction method
CN113158948B (en) * 2021-04-29 2024-08-02 宜宾中星技术智能系统有限公司 Information generation method, device and terminal equipment
CN113743520A (en) * 2021-09-09 2021-12-03 广州梦映动漫网络科技有限公司 Cartoon generation method, system, medium and electronic terminal
CN117291138B (en) * 2023-11-22 2024-02-13 全芯智造技术有限公司 Method, apparatus and medium for generating layout elements

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7486296B2 (en) * 2004-10-18 2009-02-03 Reallusion Inc. Caricature generating system and method
CN103116902A (en) * 2011-11-16 2013-05-22 华为软件技术有限公司 Three-dimensional virtual human head image generation method, and method and device of human head image motion tracking
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method
CN104200505A (en) * 2014-08-27 2014-12-10 西安理工大学 Cartoon-type animation generation method for human face video image
CN104463779A (en) * 2014-12-18 2015-03-25 北京奇虎科技有限公司 Portrait caricature generating method and device
CN107730573A (en) * 2017-09-22 2018-02-23 西安交通大学 A kind of personal portrait cartoon style generation method of feature based extraction
CN108596024A (en) * 2018-03-13 2018-09-28 杭州电子科技大学 A kind of illustration generation method based on human face structure information
CN109308731A (en) * 2018-08-24 2019-02-05 浙江大学 The synchronous face video composition algorithm of the voice-driven lip of concatenated convolutional LSTM

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1313979C (en) * 2002-05-03 2007-05-02 三星电子株式会社 Apparatus and method for generating 3-D cartoon
KR20070096621A (en) * 2006-03-27 2007-10-02 (주)제이디에프 The system and method for making a caricature using a shadow plate
CN101477696B (en) * 2009-01-09 2011-04-13 苏州华漫信息服务有限公司 Human character cartoon image generating method and apparatus
CN101551911B (en) * 2009-05-07 2011-04-06 上海交通大学 Human face sketch portrait picture automatic generating method
KR20130120175A (en) * 2012-04-25 2013-11-04 양재건 Apparatus, method and computer readable recording medium for generating a caricature automatically

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7486296B2 (en) * 2004-10-18 2009-02-03 Reallusion Inc. Caricature generating system and method
CN103116902A (en) * 2011-11-16 2013-05-22 华为软件技术有限公司 Three-dimensional virtual human head image generation method, and method and device of human head image motion tracking
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method
CN104200505A (en) * 2014-08-27 2014-12-10 西安理工大学 Cartoon-type animation generation method for human face video image
CN104463779A (en) * 2014-12-18 2015-03-25 北京奇虎科技有限公司 Portrait caricature generating method and device
CN107730573A (en) * 2017-09-22 2018-02-23 西安交通大学 A kind of personal portrait cartoon style generation method of feature based extraction
CN108596024A (en) * 2018-03-13 2018-09-28 杭州电子科技大学 A kind of illustration generation method based on human face structure information
CN109308731A (en) * 2018-08-24 2019-02-05 浙江大学 The synchronous face video composition algorithm of the voice-driven lip of concatenated convolutional LSTM

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Cartoon-to-Photo Facial Translation with Generative Adversarial Networks";Junhong Huang等;《Proceedings of Machine Learning Research 95》;20181123;第566页到第581页 *
"利用人脸特征及其关系的漫画夸张与合成";陈文娟等;《计算机辅助设计与图形学学报》;20100131;第22卷(第1期);第121页到第128页 *
"利用图像变形生成个性化人脸卡通";邓维等;《计算机工程与应用》;20111231;第47卷(第24期);第132页到第135页 *
"基于图像变形的人体动画和人脸夸张";陈威华;《中国优秀硕士学位论文全文数据库•信息科技辑》;20121115;第2012年卷(第11期);I138-239 *
"漫画风格的人脸肖像生成算法";阎芳等;《计算机辅助设计与图形学学报》;20070430;第19卷(第4期);第442页到第447页 *

Also Published As

Publication number Publication date
CN109741247A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109741247B (en) Portrait cartoon generating method based on neural network
KR102286037B1 (en) Learning data set generating apparatus and method for machine learning
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
Nguyen et al. Lipstick ain't enough: beyond color matching for in-the-wild makeup transfer
CN106548208B (en) A kind of quick, intelligent stylizing method of photograph image
CN109978930A (en) A kind of stylized human face three-dimensional model automatic generation method based on single image
CN112258387A (en) Image conversion system and method for generating cartoon portrait based on face photo
CN101826217A (en) Rapid generation method for facial animation
CN110188667B (en) Face rectification method based on three-party confrontation generation network
Chen et al. Face sketch synthesis with style transfer using pyramid column feature
US20240029345A1 (en) Methods and system for generating 3d virtual objects
CN111950432A (en) Makeup style migration method and system based on regional style consistency
CN111950430A (en) Color texture based multi-scale makeup style difference measurement and migration method and system
CN112883826A (en) Face cartoon generation method based on learning geometry and texture style migration
Macêdo et al. Expression transfer between photographs through multilinear AAM's
CN111563944B (en) Three-dimensional facial expression migration method and system
Jia et al. Face aging with improved invertible conditional GANs
US20220101145A1 (en) Training energy-based variational autoencoders
Lian et al. Anime style transfer with spatially-adaptive normalization
CN111611997B (en) Cartoon customized image motion video generation method based on human body action migration
Huang et al. Patch-based painting style transfer
CN106097373B (en) A kind of smiling face's synthetic method based on branch's formula sparse component analysis model
Bagwari et al. An edge filter based approach of neural style transfer to the image stylization
Do et al. Anime sketch colorization by component-based matching using deep appearance features and graph representation
Park et al. StyleBoost: A Study of Personalizing Text-to-Image Generation in Any Style using DreamBooth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant