CN112926554B

CN112926554B - Construction of training data set of portrait cartoon stylized model and model generation

Info

Publication number: CN112926554B
Application number: CN202110458012.9A
Authority: CN
Inventors: 杨帆; 郝强; 潘鑫淼; 胡建国
Original assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Current assignee: Xiaoshi Technology Jiangsu Co ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2022-08-16
Anticipated expiration: 2041-04-27
Also published as: CN112926554A

Abstract

The invention relates to construction of a training data set of a portrait cartoon stylized model and generation of the model, wherein the five sense organs of all cartoon face images are deducted to form a material library by data amplification of a face color image and a small number of cartoon face images and by combination of size adjustment and face shape adjustment of the five sense organs through detection of key points of the face and a liquefaction algorithm, and the five sense organs are randomly combined and pasted on a blank cartoon face. Under the conditions of high drawing cost and long time consumption of cartoon data, the method of the invention expands the cartoon face image data, can multiply expand the data volume of a training set based on a small amount of data, greatly enriches the identity diversity of the cartoon face image data and realizes better training effect of the portrait cartoon stylized model.

Description

Construction of training data set of portrait cartoon stylized model and model generation

Technical Field

The invention relates to the technical field of image processing, in particular to portrait cartoon stylization making, and specifically relates to construction of a training data set of a portrait cartoon stylization model and model generation.

Background

With the development of intelligent mobile terminals and social networks, cartoon drawing technologies based on computer graphics processing are more and more favored by users, for example, a picture image can be shot or selected and input into image processing application software (mobile App or desktop level software), and cartoon stylized image output is output through software processing. In the prior art, such application software is usually implemented by using a cartoon model based on a human face contour, and the cartoon model is obtained by using a training model to train in advance through a large number of portrait photos and cartoon style images as training data.

The portrait cartoon stylized training data comprises two parts: portrait photos and cartoon images. The portrait photos are easy to shoot, but the cartoon images need to be drawn elaborately by experienced painters, so that different human face outlines, five officials and the like are enriched, diversified training data are formed, the economic cost and the time cost are high, the data scale is limited, and if training is carried out based on a small amount of data of cartoon portraits, the training effect is rough, and abundant cartoon styles are difficult to obtain.

In order to amplify a data set, the conventional methods such as random cutting, horizontal turning, random rotation, offset and the like are adopted to change the position information of an image in the existing method, but the method cannot modify the identity information of a cartoon face, the key point information of the face is basically unchanged, such as the distribution of the face shape and five sense organs, the diversity of data cannot be really and effectively improved, and the training still corresponds to the same key point information of the face in practice.

Prior art documents:

patent document 1: CN105374055A

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide the construction of the training data set of the portrait cartoon stylized model and the generation of the model, realize the large-scale cartoon portrait data amplification of cartoon stylization and the training of the cartoon stylized model by a low-cost method, and effectively improve the effect of the portrait cartoon stylized model.

The invention provides a method for constructing a training data set of a portrait cartoon stylized model, which comprises the following steps:

acquiring M human face color images to form a data set A;

acquiring N drawn cartoon face images to form a data set B;

detecting face key points of cartoon face images in the data set B by using a face key point detection model to obtain face key point coordinates of each cartoon face image, wherein the face key points comprise key points of face contours and key points of five sense organs;

randomly selecting cartoon face images from the data set B, and adjusting the face shapes and the sizes of the five sense organs of the cartoon face images by using a liquefaction algorithm according to the key points of the corresponding face contours and the key points of the five sense organs to obtain a data set C;

taking an area defined by the convex hulls of key points corresponding to the eyes, nose and mouth of the cartoon face image as a five-sense organ area, taking out five sense organs in all cartoon picture data in the data set B to form a material library, and then randomly combining the five sense organs and pasting the combined five sense organs on a blank cartoon face to obtain a data set D;

and merging the data set C and the data set D, and then adopting a data augmentation method to augment the merged data by preset times to obtain a training data set E for the portrait cartoon stylized model training.

The second aspect of the invention provides a method for generating a portrait cartoon stylized model, which comprises the following steps:

and training a cartoon stylized model based on a CycleGAN network by using a data set E obtained by the method and a data set A consisting of M human face color images as training data.

The third aspect of the present invention provides a system for constructing a training data set of a character cartoon stylized model, comprising:

a module for obtaining M face color images to form a data set A;

a module for obtaining N drawn cartoon face images to form a data set B;

a module for detecting face key points of cartoon face images in the data set B by using a face key point detection model to obtain face key point coordinates of each cartoon face image, wherein the face key points comprise key points of face contours and key points of five sense organs;

a module for randomly selecting cartoon face images from the data set B, and adjusting the face shapes and the sizes of the five sense organs of the cartoon face images by using a liquefaction algorithm according to the key points of the corresponding face contours and the key points of the five sense organs to obtain a data set C;

a module for taking an area surrounded by key point convex hulls corresponding to the eyes, nose and mouth of the cartoon face image as a five-sense organ area, deducting the five-sense organs in all cartoon picture data in the data set B to form a material library, and then randomly combining the five-sense organs and pasting the combined five-sense organs on a blank cartoon face to obtain a data set D;

and the module is used for merging the data set C and the data set D, and then amplifying the merged data by preset times by adopting a data amplification method to obtain a training data set E for training the portrait cartoon stylized model.

A fourth aspect of the present invention provides a computer system comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a flow of a method of building a training data set of the aforementioned avatar cartoon stylization model.

A fifth aspect of the invention proposes a computer-readable medium, the software comprising instructions executable by one or more computers, the instructions causing the one or more computers to perform operations by such execution, the operations comprising a flow of a method of building a training data set of the aforementioned avatar cartoon stylized model.

Under the conditions of high drawing cost and long time consumption of cartoon image data, the method of the invention multiplies the data volume of a cartoon training set by a human face color image and a small number of cartoon human face images, and is different from the problem of substantial simplification of the data set caused by the fact that the key points of the human face are not changed by the traditional turning, translation and cutting.

It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent. In addition, all combinations of claimed subject matter are considered a part of the presently disclosed subject matter.

The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.

Drawings

The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

fig. 1 is a flowchart illustrating a method of constructing a training data set of a character cartoon stylized model according to an exemplary embodiment of the present invention.

Fig. 2 is an example of a five-sense organ adjustment performed according to a construction method of an exemplary embodiment of the present invention.

Fig. 3 is an example of a random combination of five sense organs according to a construction method of an exemplary embodiment of the present invention.

Fig. 4 is an example of using a conventional horizontal flip in a construction method according to an exemplary embodiment of the present invention.

Fig. 5 is an example of conventional random cropping employed in a construction method according to an exemplary embodiment of the present invention.

Detailed Description

In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.

In this disclosure, aspects of the present invention are described with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be appreciated that the various concepts and embodiments described above, as well as those described in greater detail below, may be implemented in any of numerous ways, as the disclosed concepts and embodiments are not limited to any one implementation. Additionally, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.

As shown in fig. 1, according to the method for constructing the training data set of the portrait cartoon stylized model provided by the exemplary embodiment of the present invention, for the problems of high drawing cost, long consumed time and single identity information of cartoon portrait data, a variety of data fusion and augmentation methods are adopted to increase the diversity of the cartoon portrait data, so that the data volume is multiplied on the premise of being based on a small amount of cartoon portrait data, and the training effect of the portrait cartoon stylized model is improved.

The implementation process of the construction method of the training data set of the portrait cartoon stylized model comprises the following steps:

acquiring M human face color images to form a data set A;

acquiring N drawn cartoon face images to form a data set B;

Exemplary implementations of data augmentation of the present invention are described in more detail below with reference to the accompanying drawings.

Data acquisition

In the method, a large number of face color images and a small number of cartoon face images drawn by experienced painters can be collected and serve as an original data set to be stored in a server or a storage medium.

In the specific data processing and augmenting process, M human face color images can be acquired through a data interface to form a data set A; and acquiring N drawn cartoon face images to form a data set B.

In order to ensure enough training data and diversity, the data set A covers various scenes such as different crowds, expressions, postures, illumination, background environment and the like as much as possible. The number of images in the data sets a and B, M and N are both natural numbers equal to or greater than 100, and M is equal to or greater than 10 × N. In an embodiment of the invention, N ≈ 100.

Cartoon portrait key point detection

And aiming at the constructed data set B containing the cartoon face image, detecting face key points of the cartoon face image in the data set B by using a pre-trained face key point detection model (such as a Dlib tool), wherein the key points comprise face contours and key points of five sense organs, and obtaining face key point coordinates of the cartoon face image in the cartoon data set B. In an embodiment with a Dlib tool as the detection model, the output 68 face keypoint labels are available.

It should be understood that the aforementioned face key point detection model may also adopt other detection models without being limited to Dlib tools, such as a face key point detection model based on a pfld (functional Facial Landmark detector), or a face key point detection model based on a CNN or improving a CNN network structure.

The cartoon face image is used as input, and the face key point detection model outputs face key points comprising key points of face contours and key points of five sense organs.

In further embodiments, different numbers of face keypoints may be obtained, such as 81 keypoints, 106 keypoints, etc., based on different face keypoint detection models.

Random adjustment of five sense organs shape and face shape

On the basis of the obtained face key points, the size and the face shape of the facial features of the cartoon face image are adjusted by using a liquefaction algorithm, the first augmentation is carried out, the cartoon face data is augmented by a plurality of times, and a data set C is obtained.

In a specific example, with reference to fig. 2, the flow of the first augmentation process includes:

for any cartoon face image selected from the data set B, randomly modifying the width of the nose and the cheek by using a Local translation deformation algorithm (Local translation wars) according to key points of the nose and the cheek, and adjusting the face shape of the cartoon face image; and

and (3) randomly modifying the sizes of the eyes and the mouth by using a Local scaling warping algorithm (Local scaling warping) according to the corresponding key points of the eyes and the mouth, and adjusting the sizes of the five sense organs of the cartoon face image.

Random combination of five sense organs

The augmentation of the cartoon face data of the invention comprises the combination of the random five sense organs besides the liquefaction algorithm processing, and as shown in the figure 3, the second augmentation, enrichment and diversification of the identity of the cartoon character are realized through the combination of the random five sense organs, and the cartoon face data is augmented by a plurality of times to obtain a data set D.

The method comprises the following steps of obtaining five-sense organs in all cartoon data in a data set B to form a material library, wherein the five-sense organs in all cartoon data in the data set B are deducted, and the method comprises the following steps:

deducting all five sense organ regions of the cartoon face image in the data set B to form five sense organ material libraries which are a binocular material library, a nose material library and a mouth material library respectively; and

and forming a blank cartoon face material library by the blank cartoon face with the five sense organs removed from the cartoon face image and the key points of the five sense organs of the cartoon face image.

The operation of randomly combining the five sense organs and pasting the five sense organs on the blank cartoon face comprises the following processes:

randomly selecting eyes, nose and mouth from the five sense organ material library; volume of

And randomly selecting a blank cartoon face and corresponding key points of the five sense organs from a blank cartoon face material library, and pasting the randomly selected five sense organs to the blank cartoon face according to the centers of the key points of the five sense organs of the blank cartoon face to realize the re-amplification of the cartoon face data, thereby obtaining a data set D.

Traditional data augmentation

In this step, the data sets C and D are merged, and the merged data is further augmented by several times by using a conventional data augmentation method to obtain a data set. The conventional data augmentation method includes at least one of image random cropping, random rotation, and horizontal flipping, and examples of horizontal flipping and random cropping are exemplarily shown in fig. 4 and 5.

Model training

In the disclosed example of the invention, a data set E obtained by the construction method and a data set A consisting of M human face color images are used as training data to train a cartoon stylized model based on a cycleGAN network.

The CycleGAN network used in this embodiment is composed of two generators with the same structure and two discriminators with the same structure, each generator is composed of 2 downsampling layers, 6 residual modules and 2 upsampling layers, and each discriminator is composed of 4 downsampling layers. The input of the generator 1 is that the real person is photo and the output is cartoon style portrait, the input of the generator 2 is that the cartoon portrait is image of the real person, the input of the discriminator 1 is that the cartoon style portrait is output and the cartoon style fidelity is, and the input of the discriminator 2 is that the real person is image and the output is the fidelity of the real person.

In the embodiment of the invention, the photo is 2200 photos of a real person (2000 for training and 200 for testing), and is based on 150 drawn cartoon figures (100 for training and 50 for testing).

The construction method of the invention is adopted to expand the data volume of the cartoon face image by 40 times, compared with the existing method (random cutting, random rotation and random horizontal turning) for expanding the data by 40 times, the same network structure and loss function are adopted to respectively train the cycleGAN cartoon stylized model.

The input size of the model is set to be 256 multiplied by 256 pixels during training, the batch size is set to be 32, the learning rate is set to be 0.0001, and the loss function is trained for 10 thousands of steps by adopting a generated countermeasure loss function and a cycle consistency loss function.

During testing, the cartoon effect of the generator 1 trained by the two data augmentation methods is compared, the FID (Freecut initiation Distance score) is adopted to measure the training effect, and the smaller the FID score is, the better the stylization effect is.

The comparative results are shown in Table 1.

TABLE 1 comparison of test results of the prior art method and the method of the present invention

	FID
		Existing methods	60.5
The method of the invention	54.6

As can be seen from comparison of test effects, the method for training the model can effectively improve the effect of the portrait cartoon stylized model, and compared with the existing method, the FID score is reduced by 9.8%.

System for constructing training data set of portrait cartoon stylized model

According to the disclosure of the present invention, a system for constructing a training data set of a portrait cartoon stylized model is further provided, comprising:

a module for obtaining M face color images to form a data set A;

a module for obtaining N drawn cartoon face images to form a data set B;

It should be understood that the functions of the foregoing modules and the specific implementations thereof can be implemented based on the operations of the construction method of the training data set of the portrait cartoon stylized model of the foregoing embodiments.

Computer system

According to the disclosure of the present invention, there is also provided a computer system, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising the flow of the method of constructing a training data set of a character cartoon stylized model of the foregoing embodiment.

Computer readable medium

According to the disclosure of the present invention, a computer-readable medium is also proposed, the software includes instructions executable by one or more computers, and the instructions cause the one or more computers to execute operations including the flow of the method for constructing the training data set of the portrait cartoon stylized model of the foregoing embodiment by executing the instructions.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be defined by the appended claims.

Claims

1. A construction method of a training data set of a portrait cartoon stylized model is characterized by comprising the following steps:

acquiring M human face color images to form a data set A;

acquiring N drawn cartoon face images to form a data set B;

2. The method for constructing the training data set of the portrait cartoon stylized model as claimed in claim 1, wherein the using of the liquefaction algorithm to adjust the size and the face shape of the facial features of the cartoon to obtain the data set C comprises:

for any cartoon face image selected from the data set B, randomly modifying the width of the nose and the cheek by using a local translation deformation algorithm according to the key points of the nose and the cheek, and adjusting the face shape of the cartoon face image; and

and randomly modifying the sizes of the eyes and the mouth by using a local scaling deformation algorithm according to the corresponding key points of the eyes and the mouth, and adjusting the sizes of the five sense organs of the cartoon face image.

3. The method for constructing a training data set of a portrait cartoon stylized model as claimed in claim 1, wherein said extracting five-sense organs from all cartoon data in data set B to form a material library comprises:

4. The method for constructing a training data set of a portrait cartoon stylized model as claimed in claim 3, wherein said randomly combining five sense organs and pasting them on a blank cartoon face to obtain a data set D, comprises:

randomly selecting eyes, nose and mouth from the five sense organ material library;

randomly selecting a blank cartoon face and corresponding key points of the five sense organs from a blank cartoon face material library, and pasting the randomly selected five sense organs to the blank cartoon face according to the centers of the key points of the five sense organs of the blank cartoon face, thereby obtaining a data set D.

5. The method of claim 1, wherein the data augmentation method comprises at least one of random cropping, random rotation, and horizontal flipping of the image.

6. The method as claimed in claim 1, wherein the data set A and the data set B are both natural numbers greater than or equal to 100, and M is greater than or equal to 10N.

7. A generation method of a portrait cartoon stylized model is characterized by comprising the following steps:

training a cartoon stylized model based on a CycleGAN network by using a data set E obtained by the construction method as claimed in any one of claims 1 to 6 and a data set A consisting of M human face color images as training data.

8. A construction system of a training data set of a portrait cartoon stylized model is characterized by comprising:

a module for obtaining M face color images to form a data set A;

a module for obtaining N drawn cartoon face images to form a data set B;

9. A computer system, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a flow of a method of constructing a training dataset of a portrait cartoon stylized model as claimed in any one of claims 1-6.

10. A computer-readable medium comprising instructions executable by one or more computers, the instructions causing the one or more computers to perform operations comprising a process of a method of constructing a training dataset of a persona cartoon stylized model as claimed in any one of claims 1 to 6.