CN109741247A

CN109741247A - A kind of portrait-cartoon generation method neural network based

Info

Publication number: CN109741247A
Application number: CN201811631295.7A
Authority: CN
Inventors: 吕建成; 汤臣薇; 徐坤; 贺喆南; 李婵娟
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-05-10
Anticipated expiration: 2038-12-29
Also published as: CN109741247B

Abstract

The invention discloses a kind of portrait-cartoon generation methods neural network based, comprising the following steps: S1, the structure feature for extracting face face in real human face image, and it is converted into sequence signature data；S2, sequence signature data are input in trained Seq2Seq VAE model, generate corresponding exaggeration structure sequence point；S3, the exaggeration structure sequence point of generation is applied in real human face image, to the exaggerated deformation of real human face image；S4, by cartoon style be applied to exaggerated deformation after facial image on, generate portrait-cartoon.Present invention proposition in a creative way indicates human face structure feature with sequence signature, and the generation of exaggeration sequence is carried out using Seq2Seq VAE model, generates to be applied to caricature.The limitation of existing image interpretation method is overcome, the exaggeration portrait-cartoon of generation not only has humorous expansiveness while not damaging the identification of character, is also embodied in the drawing style of different caricature men.

Description

A kind of portrait-cartoon generation method neural network based

Technical field

The invention belongs to technical field of image processing, and in particular to a kind of portrait-cartoon generation side neural network based Method.

Background technique

Drawing portrait is until the modern times are still a kind of popular artistic expression.With machine vision correlation skill The continuous development of art, drawing portrait is in multimedias, personalization such as virtual reality, augmented reality and robot system for drawing portrait Amusement and internet etc. are widely used.In order to enhance the artistic expression of portrait, produced based on different art feature angles A plurality of types of art up portraits, such as sketch, cartoon, caricature are given birth to, caricature is obtained as a kind of common art form The concern and research of many scholars.

With the development of artificial intelligence, more and more scholars begin one's study artificial intelligence and art combination, that is, calculate Art.The rule for including in art can be quantified as mathematical relationship, for example, golden section by mathematics and statistics by us With stringent proportionality, artistry and concordance, and there is aesthetic value abundant.Meanwhile these mathematical relationships become Calculate a part on theory of art basis.When drawing is related to the expression of personage, there are many various forms of graphics arts.

As shown in Figure 1, portrait painting just includes exaggeration portrait-cartoon, sketch, cartoon and simple picture etc..It is unrestrained to exaggerate portrait It draws, as its name suggests, refers to exaggeration and deformation by face organ to express the significant difference of personage and public face.With it is realistic Sketch is compared, and exaggeration caricature adds humorous element on the basis of realistic.It is different from cartoon and simple picture, the caricature of exaggeration Both it can satisfy the enjoyment of caricature, while having retained the identification of personage.However, simple picture and cartoon etc. are simple about sketch Single art form, has there is a lot of research work.In contrast, only a few studies work concentrate on exaggeration portrait-cartoon life Cheng Shang.

The generation of exaggeration portrait-cartoon is considered as the conversion of the style from real face image to cartoon image.Image to figure The conversion of picture is a kind of popular visual problem, and target is the style characteristics of learning objective image, and outputs and inputs figure Mapping as between.Wherein, the generation based on convolutional neural networks (Convolution Neural Networks, CNN) is fought Network (Generative Adversarial Networks, GAN) is considered as one of most popular image interpretation method.But Be existing method can only texture to image and color convert, when there are the changes of picture material and geometry for task When, the convolutional neural networks method effect based on confrontation generation network is with regard to very unsatisfactory, and the generation for exaggerating portrait-cartoon just relates to And to picture material, that is, human face structure exaggerated deformation.

In order to which portrait pictures are converted into corresponding portrait-cartoon, a kind of method in the prior art is based on sample Method, give a face portrait pictures in this method, each face can all be broken down into different parts (such as nose Son, mouth etc.), for each part, corresponding caricature component in data set is searched for using characteristic matching, then by caricature Component combines building cartoon human face；Another method is the method based on face characteristic, defines active shape mould first Then type characteristic point is based on face and correlation, generate the exaggeration portrait in real human face image, then overstate from facial shape While opening with face exaggeration shape is obtained in face exaggeration, introducing " principle of contrast "；Finally, being generated in conjunction with image distortion method The exaggeration of facial image is drawn a portrait.

It is above-mentioned to need in the method based on sample according to different local facial features in the prior art, draw collect it is big The caricature component of amount builds database, and workload is huge and reply technical requirements are high；The face for piecing together out is relatively more fixed, Lack of diversity；And final effect is a kind of cartooning to original face, there is no to input face obvious characteristic into Row deformation, that is, do not meet the definition of exaggeration portrait-cartoon.Although and another method based on face characteristic can be in certain journey Original face is exaggerated on degree, but effect is unobvious, and does not have personage's identification, obtained caricature effect style list One.

Summary of the invention

For above-mentioned deficiency in the prior art, portrait-cartoon generation method solution neural network based provided by the invention The limitation of existing image interpretation method, obtained portrait-cartoon style are single in existing portrait-cartoon generation method of having determined The problem of.

In order to achieve the above object of the invention, the technical solution adopted by the present invention are as follows: a kind of portrait neural network based is unrestrained Draw generation method, comprising the following steps:

S1, the structure feature for extracting face face in real human face image, and convert the structural characterization data of extraction to Sequence signature data；

S2, sequence signature data are input in trained Seq2Seq VAE model, generate that facial image is corresponding overstates Open structure sequence point；

S3, using thin-plate spline interpolation technology, the exaggeration structure sequence point of generation is applied in real human face image, it is real Now to the exaggerated deformation of real human face image；

S4, it is unrestrained to generate portrait by cartoon style applied on the facial image after exaggerated deformation using CycleGAN technology It draws.

Further, in the step S1 structure feature of face face include face face contour structure characteristic and five Official's structure feature；

The step S1 specifically:

68 sequence of points of real human face image are extracted as the structural characterization data of face face, and according to each sequence The absolute coordinate for arranging point, obtains offset coordinates sequence of each sequence of points relative to previous sequence of points, the offset coordinates sequence Column are sequence signature data；

Wherein, each sequence of points is to be added to the sequence of points of state value, and the sequence of points for being added to state value is expressed as Q (x,y,p₁,p₂,p₃)；

Wherein x and y indicates the offset distance of the sequence of points in the x and y direction relative to previous sequence of points；

p₁,p₂,p₃Indicate the binary system one-hot vector of three kinds of facial states；p₁Indicate the sequence of points be face contour or The starting point of face, p₂It indicates the sequence of points and previous sequence of points is homolog, p₃Indicate that the sequence of points is in 68 sequence of points The last one point.

Further, the method for Seq2Seq VAE model is trained in the step S2 specifically:

A1, the positive sequence sequence of the structure feature of face face and its inverted sequence sequence inputting in a width real human face image are arrived In encoder, positive sequence feature vector is obtainedWith inverted sequence feature vectorAnd connect into final feature vector h；

A2, final feature vector h is each mapped to average vector μ and standard difference vector by two full connection layer networks σ, and sample and obtain the random vector z for making average vector μ and standard difference vector σ Normal Distribution；

A3, random vector z is input in decoder, obtains the Seq2Seq VAE network of initial training；

In A4, the Seq2Seq VAE network once trained before being successively input to several width real human face images, repeat to walk Rapid A1-A3 obtains trained Seq2Seq VAE model until Seq2Seq VAE network convergence.

Further, the encoder in the step A1 includes a two-way LSTM network module, the two-way LSTM net Network module includes the LSTM network that two numbers of plies are 68.

Further, the step A1 specifically:

By the positive sequence sequence of the structure feature of face face in a width real human face imageIn each data be input to one In a LSTM network, positive sequence feature vector is obtainedSimultaneously by a width real human face image structure feature of face face it is anti- Sequence sequenceEach data be input in another LSTM network, obtain inverted sequence feature vectorBy positive sequence feature vectorWith Inverted sequence feature vectorConnect into final feature vector h；

Wherein, positive sequence sequence

Inverted sequence sequence

Wherein, i=0,1,2...67.

Further, in the step A2；

Random vector z are as follows:

Z=μ+σ ⊙ N (0,1)

Wherein, ⊙ indicates vector point multiplication；

N (0,1) is IID Gauss vector.

Further, in the step A3, the decoder is the LSTM network that a time span is 68；

The each moment input element of LSTM network further includes the vector T obtained from previous moment_tWith source point S_t；

The equal output vector O of output end at LSTM network each moment_t, and the output vector O of current time t_tPass through height This mixed model samples to obtain vector T_t, and it is input to the LSTM network of subsequent time；

Wherein, the t expression moment, t=0,1,2...67；

Initial time is input to the vector T in LSTM network₀With source point S₀It is initialized as (0,0,1,0,0).

Further, the output vector O of current time t_tIt samples to obtain vector T by gauss hybrid models_tMethod it is specific Are as follows:

B1, the quantity N for determining normal distribution in gauss hybrid models, by output vector O_tDimension be set as 6N, by O_tPoint Solution are as follows:

Wherein, n indicates n-th of gauss hybrid models；

X indicates abscissa；

Y indicates ordinate；

w_nIndicate the weight matrix of n-th of gauss hybrid models, and

μ_(x,n)Indicate the expectation of abscissa x；

μ_(y,n)Indicate the expectation of ordinate y；

σ_(x,n)Indicate the standard deviation of abscissa x；

σ_(y,n)Indicate the standard deviation of ordinate y；

ρ (xy, n) indicates related coefficient；

B2, the O after decomposing is determined_tWhen being input to gauss hybrid models, sampling obtains T_tProbability p (x, y；t)；

Wherein, Probability p (x, y；T) are as follows:

Wherein, w (n, t) indicates the weight matrix of n-th of gauss hybrid models of t moment；

N (x, y) indicates coordinate (x, y) Normal Distribution, parameter μ, σ, ρ；

The expectation of μ (x, n, t) expression n-th of Gauss model abscissa of t moment；

The expectation of μ (y, n, t) expression n-th of Gauss model ordinate of t moment；

The standard deviation of σ (x, n, t) expression n-th of Gauss model abscissa of t moment；

The standard deviation of σ (y, n, t) expression n-th of Gauss model ordinate of t moment；

ρ (xy, n, t) indicates related coefficient；

B3, by Probability p (x, y；T) it brings into reconstructed error function, obtains reconstructed error, maximizing reconstructed error makes Gauss Mixed model exports object vector T_t；

Wherein, reconstructed error function are as follows:

Wherein, L_RFor reconstructed error；

The transverse and longitudinal coordinate of (x, y) expression characteristic point.

The invention has the benefit that portrait-cartoon network generation method neural network based provided by the invention, is opened It proposes that face structure feature is stored as sequence signature invasively, and then Seq2Seq VAE model can be used and carry out exaggeration sequence Generation, thus be applied to caricature generate；Overcome the limitation of existing image interpretation method, the exaggeration portrait-cartoon of generation There is humorous expansiveness while being not only embodied in the identification for not damaging role, be also embodied in the drawing of different caricature men Style.

Detailed description of the invention

Fig. 1 is portrait painting type schematic diagram in background of invention.

Fig. 2 is portrait-cartoon generation method implementation flow chart neural network based in the present invention.

It using face alignment techniques by human face structure Feature Conversion is sequence signature that Fig. 3, which is in embodiment provided by the invention, Representative schematic diagram.

Fig. 4 is Seq2Seq VAE model training method implementation flow chart in the present invention.

Fig. 5 is compared schematic diagram with its several variant for complete object L in embodiment provided by the invention.

Fig. 6 is the comparison result of different real face images in embodiment provided by the invention.

Fig. 7 is that the feature in embodiment provided by the invention between input face and corresponding " public face " compares signal Figure.

Fig. 8 is the part exaggeration result schematic diagram of original image in embodiment provided by the invention.

Fig. 9 is to be illustrated on the face using the Comparative result of different artistic styles in embodiment provided by the invention in exaggeration Figure.

Specific embodiment

A specific embodiment of the invention is described below, in order to facilitate understanding by those skilled in the art this hair It is bright, it should be apparent that the present invention is not limited to the ranges of specific embodiment, for those skilled in the art, As long as various change is in the spirit and scope of the present invention that the attached claims limit and determine, these variations are aobvious and easy See, all are using the innovation and creation of present inventive concept in the column of protection.

As shown in Fig. 2, a kind of portrait-cartoon generation method neural network based, comprising the following steps:

When using thin-plate spline interpolation technology, a cartesian coordinate system, independent variable x and letter are established on one piece of thin plate Numerical value y is distributed across the point on the coordinate system.Thin plate passes through all corresponding functional value y points after bending deformation, makes simultaneously Bending energy is minimum.This interpolating function is defined asConcrete form is as follows:

CycleGAN can be by study, by image from source domain X converting into target domain in the case where no paired data Y.Target is one mapping G:X → Y of study, by increased confrontation loss, so that the distribution for coming from the image of G (X) approaches point Cloth Y.Because this mapping is height underconstrained, by its inverse mapping F:Y → X of generation in turn, and consistency is introduced Loss is to constrain so that F (G (X)) ≈ X.

In above-mentioned steps S1, using face alignment techniques, to all real human faces in MMC data set and corresponding exaggeration Xiao As the extraction of face face the progress face mask and face structure feature point of caricature, the structure feature of face face includes face The contour structure characteristic and face structure feature of face；68 sequence of points are extracted to indicate the structure feature of face, and each sequence Column point is all indicated by absolute value coordinate (x', y')；

Above-mentioned MMC data set is distorting mirror caricature data set, including a large amount of real human face images and correspondence of collection Exaggeration portrait-cartoon human face data.

Therefore, the step S1 specifically:

Wherein, in order to distinguish each face organ, each sequence of points is to be added to the sequence of points of state value, is added to shape The sequence of points of state value is expressed as Q (x, y, p₁,p₂,p₃)；For conventional rectangular image, use the lower left corner of image as coordinate system Origin, the sequence of points of extraction can be considered as being distributed in the point in first quartile；

p₁,p₂,p₃Indicate the binary system one-hot vector of three kinds of facial states；p₁Indicate the sequence of points be face contour or The starting point of face, p₂Indicate that the sequence of points is identical with previous sequence of points for homolog, p₃Indicate that the sequence of points is 68 sequences Point in the last one point.Pass through this state value, so that it may which human face structure is divided into profile, eyebrow, eyes, nose and mouth Five is most of.Fig. 3 shows the several samples of MMC data set, and the 68 sequence point-renderings obtained based on face alignment method Personage's simple picture.

Since the real human face image face in MMC data set not is exact matching with portrait-cartoon is corresponded to, especially It is face and the ratio of entire picture and the angle of personage side, in order to reduce mistake caused by extracting data, is obtaining partially It moves before coordinate sequence, with first point S ' of face contour of real human face image₀=(x '₀,y′₀) and the last one point S '₁₆= (x′₁₆,y′₁₆) it is used as datum mark, by rotating caricature corresponding with scaling, until the 1st point of caricature and real human face face With the 16th point alignment, successively extract 68 sequence of points are corrected, then further according to the absolute coordinate of each sequence of points, Obtain offset coordinates sequence of each sequence of points relative to previous sequence of points.

In above-mentioned steps S2, Seq2Seq VAE network is trained using pairs of exaggeration point sequence data, when the network (corresponding caricature exaggeration sequence of points can be reconstructed according to the sequence signature data of real human face) after convergence, just with the net Network model generates the sequence of points of exaggeration caricature to the real human face characteristic sequence data of test；

As shown in figure 4, the method for training Seq2Seq VAE model in above-mentioned steps S2 specifically:

Wherein, encoder includes a two-way LSTM network module, and the two-way LSTM network module includes two numbers of plies It is 68 LSTM network；

Therefore, step A1 specifically:

Wherein, positive sequence sequence

Inverted sequence sequence

Wherein, i=0,1,2...67.

A2, final feature vector h projects to average vector μ and standard difference vector σ, and sample obtain one make it is average to Measure the random vector z of μ and standard difference vector σ Normal Distribution；

Wherein, random vector z are as follows:

Z=μ+σ ⊙ N (0,1)

Wherein, ⊙ indicates vector point multiplication；

N (0,1) is IID Gauss vector.

There are divergence loss L between random vector z and IID Gauss vector N (0,1) distribution_KL；

Wherein, KL () indicates KL distance；

N (μ, σ) indicates Normal Distribution；

KL (A | | B) indicate the KL distance seeking distribution A and being distributed between B；

After random vector z is determined, divergence loss L can be calculated_KL, which can automatic reverse propagation training LSTM network structure keeps the subsequent difference for inputting obtained z and Gauss vector N (0,1) distribution smaller and smaller.

Wherein, decoder is the LSTM network that a time span is 68；

Wherein, the t expression moment, t=0,1,2...67；

Initial time is input to the vector T in LSTM network₀With source point S₀To be initialized as (0,0,1,0,0)；

In above process, since the output of each moment LSTM network is O_t, subsequent time cannot be directly inputted to LSTM network in, it is therefore desirable to by O_tParameter needed for being decomposed into gauss hybrid models, then obtains T again_t, binary normal state point Cloth is determined by five elements: (μ_x,μ_y,μ_x,μ_y,μ_xy), wherein μ_xAnd μ_yIndicate average value, σ_xAnd σ_yIndicate standard deviation, ρ_xyTable Show relevant parameter.For two-variable normal distribution, there are also weight w；Therefore, the GMM model with N number of normal distribution needs (5+1) N A parameter.For the sequence of points of every face face, state value (p₁,p₂,p₃) it is fixed, therefore do not need to generate them.

Wherein, the output vector O of current time t_tIt samples to obtain vector T by gauss hybrid models_tMethod specifically:

Wherein, n indicates n-th of gauss hybrid models；

X indicates abscissa；

Y indicates ordinate；

w_nIndicate the weight matrix of n-th of gauss hybrid models, and

μ_(x,n)Indicate the expectation of abscissa x；

μ_(y,n)Indicate the expectation of ordinate y；

σ_(x,n)Indicate the standard deviation of abscissa x；

σ_(y,n)Indicate the standard deviation of ordinate y；

ρ (xy, n) indicates related coefficient；

Wherein, Probability p (x, y；T) are as follows:

ρ (xy, n, t) indicates related coefficient；

Wherein, reconstructed error function are as follows:

Wherein, L_RFor reconstructed error；

In sequence Seq2Seq VAE network development process, there is also consistency to lose L_C, which, which loses, uses production The log-likelihood of probability distribution explains source point S, and L_CIt is related with the basic structure of face is maintained；

Consistency loses L_CAre as follows:

There is source point S in each LSTM network_t, can automatic reverse tune from the obtained consistency loss in each LSTM The network structure of whole decoder, and then generate the structure sequence point of exaggeration.

In one embodiment of the invention, experimental result of the method for the present invention on MMC data set is provided:

1. the image in the MMC data set be divided into 500 training to and 47 tests pair, and it is true to add additional 100 Real face image；

2. 256 dimensional feature vectors are extracted from the source data of positive sequence and inverted sequence respectively for encoderWithThen, lead to The input that the 512 dimensional vector h that connection obtains are used as VAE is crossed, and the dimension of vector z is 128.GMM (Gaussian Mixture mould is set Type) it is 20 normal distributions, the output dimension of the LSTM of decoder is 120；

3. studying Kullback-Leibler divergence loss L_KL, reconstructed error L_RAnd consistency loses L_CImportance. Then complete method is compared with several variants, later, influence of the analysis batch size to the exaggeration sketch of generation passes through Part is exaggerated to improve the validity of system.Finally, the portrait of exaggeration is transferred in various artistic styles.

Experimental result and analysis:

A. loss function is analyzed:

In Fig. 5, complete object L is compared with its several variant, one is Kullback-Leibler diverging damage Lose L_KLWith reconstructed error L_R, the other is reconstructed error L_RAnd consistency loses L_C.All Seq2Seq VAE models of training, batch Amount size is 64 samples, and using Adam optimization algorithm, learning rate 0.01, gradient is reduced to 1.0.

(1) in L_KLAnd L_RIn experiment, α=0.8 and β=0 are set；

(2) in L_RAnd L_CIn experiment, α=0 and β=2 are set；

(3) in L_KL+L_R+L_CIn experiment, α=0.5 and β=0.5 are set.

The experimental results showed that three kinds of losses all play a crucial role the result for obtaining high quality.From the second row Primal sketch and the third line exaggeration sketch, we can be found that L_KLThe sketch generated can be made more to exaggerate, and original The reservation of face structure, which mainly passes through, minimizes L_cLoss realize.Complete model minimizes L simultaneously_KL, L_R, L_C, protected to reach It holds the basic structure of original image and exaggerates primal sketch.Complete Seq2Seq VAE model not only exaggerates facial characteristics, also Remain the identification to role.But complete model there are still some drawbacks.The model is special to certain discriminations of original image The only slight exaggeration of sign, does not reach ideal exaggeration.

B. batch size is analyzed:

The variation of batch size may cause network and vibrate between randomness and certainty, in this Seq2Seq VAE with Machine and deterministic most apparent performance are whether the sketch generated distorts or restore.Different real face images in Fig. 6 Comparison result shows that batch size directly affects the stability of institute's formation sequence.

As shown in fig. 6, the degree of randomness in network generated is much larger than stability when batch size is equal to 16, Lead to the serious distortion of generated image.From the second row and the simple sketch of fourth line, it is also apparent that when batch Hour, the deformation of face structure is very serious.With the increase of batch size, the stability of network is also increase accordingly, generation Sequence and the sequence of source images are more consistent.For example, the exaggeration degree on source images is very flat when batch size is equal to 128 It is slow, in order to which when model exaggerates source images, it can not only maintain the basic structure of facial characteristics, it is bright that its can also be exaggerated significantly Aobvious feature；Therefore, in other experiments, 64 are set by batch size.

C. part is exaggerated:

In general, when drawing portrait-cartoon, artist often exaggerates the obvious characteristic for being different from " public face ".Cause This, in the system of proposition, the method for the present invention proposes local exaggeration method.Pass through " the masses of the masculinity femininity of country variant The ratio of face " data set, the face and " public face " that compare input is distributed, and the obvious characteristic of object can be obtained, such as Fig. 7 institute Show, it can be seen that the feature comparative example between input face and corresponding " public face ".

By inverting x, the y-axis value of corresponding local coordinate point, progress is corresponding to be changed to exaggerate these local face organs, The influence that the system generates caricature can be further enhanced.But there are still some problems, cannot mention in face's alignment step Hair style, forehead, ear and the cheek of people are taken, therefore these local features can not be compared and exaggerate.Judging from the experimental results, This method can reasonably exaggerate the feature of extraction.Such as the part exaggeration result that Fig. 8 is original image.First row is original graph Picture.Secondary series is the feature of structure distribution.Bluepoint is the structure in original face, and Huang point is the structure after local directed complete set.When local change When changing applied to original face, third column deformation result can be obtained.4th column caricature is corresponding target.Although result cannot reach The effect exported to target, but still certain exaggeration humorous effect can be obtained.

Final result:

The exaggeration portrait-cartoon of generation has humorous expansiveness while being not only embodied in the identification for not damaging role, It is embodied in the drawing style of different caricature men；Pass through the different style of CycleGAN training, such as cartoon style, oil painting Style, sketch style, cartoon style etc..Fig. 9 shows the result for using different artistic styles on the face in exaggeration.

The invention has the benefit that portrait-cartoon network generation method neural network based provided by the invention, is opened It proposes that face structure feature is stored as sequence signature invasively, and then Seq2Seq VAE model can be used and carry out exaggeration sequence Generation, thus be applied to caricature generate.Overcome the limitation of existing image interpretation method, the exaggeration portrait-cartoon of generation There is humorous expansiveness while being not only embodied in the identification for not damaging role, be also embodied in the drawing of different caricature men Style.

Claims

1. a kind of portrait cartoon generation method based on neural network, is characterized in that, comprises the following steps:

S1. Extract the structural features of the face in the real face image, and convert the extracted structural feature data into sequence feature data;

S2. Input the sequence feature data into the trained Seq2Seq VAE model to generate exaggerated structural sequence points corresponding to the face image;

S3. Using the thin plate spline interpolation technology, the generated exaggerated structure sequence points are applied to the real face image to realize the exaggerated deformation of the real face image;

S4. Using the CycleGAN technology, the comic style is applied to the exaggerated and deformed face image to generate a portrait comic.

2. the method for generating portrait cartoons based on neural network according to claim 1, is characterized in that, in described step S1, the structural feature of human face face comprises the contour structure feature of human face and facial features;

The step S1 is specifically:

Extract the 68 sequence points of the real face image as the structural feature data of the face, and obtain the offset coordinate sequence of each sequence point relative to the previous sequence point according to the absolute coordinate value of each sequence point. The shifted coordinate sequence is the sequence feature data;

Among them, each sequence point is a sequence point with a state value added, and the sequence point with a state value added is represented as Q(x, y, p ₁ , p ₂ , p ₃ );

Where x and y represent the offset distance of the sequence point relative to the previous sequence point in the x and y directions;

p ₁ , p ₂ , p ₃ represent the binary one-hot vectors of the three facial states; p ₁ indicates that the sequence point is the starting point of the facial contour or facial features, p ₂ indicates that the sequence point and the previous sequence point are the same organ, p ₃ means that the sequence point is the last of the 68 sequence points.

3. the method for generating portrait cartoons based on neural network according to claim 1, is characterized in that, in described step S2, the method for training Seq2Seq VAE model is specifically:

A1. Input the positive sequence and reverse sequence of the structural features of the face in a real face image into the encoder to obtain the positive sequence feature vector and the reversed eigenvectors and concatenate it into the final feature vector h;

A2. Map the final feature vector h into an average vector μ and a standard deviation vector σ respectively through two fully connected layer networks, and sample to obtain a random vector Z that makes the average vector μ and the standard deviation vector σ obey the normal distribution;

A3. Input the random vector z into the decoder to obtain the initially trained Seq2Seq VAE network;

A4. Input several real face images into the previously trained Seq2Seq VAE network in turn, and repeat steps A1-A3 until the Seq2Seq VAE network converges, and the trained Seq2Seq VAE model is obtained.

4. the method for generating portrait cartoons based on neural network according to claim 3, is characterized in that, the encoder in described step A1 comprises a bidirectional LSTM network module, and described bidirectional LSTM network module comprises that two layers are both. 68 LSTM network.

5. the method for generating portrait cartoons based on neural network according to claim 4, is characterized in that, described step A1 is specifically:

The positive sequence sequence of the structural features of the face in a real face image Each data in is input into an LSTM network to obtain a positive-order feature vector At the same time, the reverse sequence of the structural features of the face in a real face image Each of the data is input into another LSTM network to get a reverse-order feature vector the positive-order eigenvectors and the reversed eigenvectors Connected to the final feature vector h;

Among them, the positive sequence

reverse sequence

where i=0, 1, 2...67.

6. The method for generating portrait cartoons based on neural network according to claim 3, is characterized in that, in described step A2;

The random vector z is:

z=μ+σ⊙N(0,1)

Among them, ⊙ represents vector point multiplication;

N(0,1) is the IID Gaussian vector.

7. the method for generating portrait cartoons based on neural network according to claim 3, is characterized in that, in described step A3, described decoder is the LSTM network that a time length is 68;

The input element of the LSTM network at each moment also includes the vector Tt and the source point S _t obtained from the previous moment;

The LSTM network outputs a vector O t at each moment, and the output vector O _t of the current moment _t is sampled by a Gaussian mixture model to obtain a vector T _t , and input to the LSTM network at the next moment;

Among them, t represents the time, t=0, 1, 2...67;

The vector T ₀ and the source point S ₀ input into the LSTM network at the initial moment are both initialized to (0, 0, 1, 0, 0).

8. the method for generating a portrait caricature based on neural network according to claim 7, is characterized in that, the method that the output vector O _t of current moment t obtains vector T _t by Gaussian mixture model sampling is specifically:

B1. Determine the number N of normal distributions in the Gaussian mixture model, set the dimension of the output vector O _t to 6N, and decompose O _t into:

Among them, n represents the nth Gaussian mixture model;

x represents the abscissa;

y represents the ordinate;

w _n represents the weight matrix of the nth Gaussian mixture model, and

μ _{(x, n)} represents the expectation of the abscissa x;

μ _{(y, n)} represents the expectation of the ordinate y;

σ _{(x, n)} represents the standard deviation of the abscissa x;

σ _{(y, n)} represents the standard deviation of the ordinate y;

ρ(xy,n) represents the correlation coefficient;

B2. Determine the probability p(x, y; t) of obtaining T _t by sampling when the decomposed O _{t is} input to the Gaussian mixture model;

Among them, the probability p(x, y; t) is:

Among them, w(n, t) represents the weight matrix of the nth Gaussian mixture model at time t;

N(x, y) indicates that the coordinates (x, y) obey a normal distribution, and the parameters are μ, σ, ρ;

μ(x, n, t) represents the expectation of the abscissa of the nth Gaussian model at time t;

μ(y, n, t) represents the expectation of the ordinate of the nth Gaussian model at time t;

σ(x, n, t) represents the standard deviation of the abscissa of the nth Gaussian model at time t;

σ(y, n, t) represents the standard deviation of the ordinate of the nth Gaussian model at time t;

ρ(xy, n, t) represents the correlation coefficient;

B3. Bring the probability p(x, y; t) into the reconstruction error function to obtain the reconstruction error, and maximize the reconstruction error to make the Gaussian mixture model output the target vector T _t ;

Among them, the reconstruction error function is:

Among them, _LR is the reconstruction error;

(x, y) represents the horizontal and vertical coordinates of the feature point.