CN113570689A

CN113570689A - Portrait cartoon method, apparatus, medium and computing device

Info

Publication number: CN113570689A
Application number: CN202110859134.9A
Authority: CN
Inventors: 金强; 朱一闻; 曹偲; 刘华平
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-29
Anticipated expiration: 2041-07-28
Also published as: CN113570689B

Abstract

The embodiment of the disclosure provides a portrait cartoonizing method, a portrait cartoonizing device, a portrait cartoonizing medium and a computing device. The method is applied to a mobile terminal carrying a lightweight first generation countermeasure network model, and comprises the following steps: when the video frame image is detected to comprise the face features, carrying out image segmentation processing on the video frame image to obtain a real face image containing the face features and a background image; inputting the real face image into a first generation anti-network model to carry out portrait cartoon processing on the real face image and obtain a cartoon face image output by the first generation anti-network model; the first generation countermeasure network model is a model parameter migrated from a full-scale second generation countermeasure network model carried on a server corresponding to the mobile terminal in a model distillation mode; and carrying out image fusion processing on the cartoon face image generated by the first generated anti-network model and the background image to obtain a video frame image after the cartoon face image is subjected to image cartoon processing.

Description

Portrait cartoon method, apparatus, medium and computing device

Technical Field

The embodiment of the disclosure relates to the field of computer application, in particular to a portrait cartoon method, a portrait cartoon device, a portrait cartoon medium and a computing device.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

With the development of mobile terminal technology, the performance of the mobile terminal is gradually improved, and software such as an image shooting APP (Application program) and a video recording APP which run on the mobile terminal also start to emerge. In real life, these software provide users with a variety of shooting special effects using advanced image processing techniques and machine learning techniques, such as: the face beautifying and makeup of the portrait, the cartoon and three-dimensional animation of the portrait, and the like.

Specifically, the human image cartoon processing means that a real human face image part in an image or a video is subjected to cartoon processing by using an image processing technology to obtain a human face image in a cartoon style.

Disclosure of Invention

In this context, embodiments of the present disclosure are intended to provide a portrait cartoonizing method, apparatus, medium, and computing device.

In a first aspect of the disclosed embodiments, a portrait cartoon-oriented method is provided, which is applied to a mobile terminal, where a lightweight first generation countermeasure network model for generating a cartoon face image corresponding to a real face image is mounted on the mobile terminal; the method comprises the following steps:

detecting whether the collected video frame image comprises human face features or not;

if the video frame image comprises the face features, carrying out image segmentation processing on the video frame image to obtain a real face image containing the face features and a background image corresponding to the real face image;

inputting the real face image into the first generation countermeasure network model to perform portrait cartoon processing on the real face image, and acquiring a cartoon face image which is output by the first generation countermeasure network model and corresponds to the real face image; the first generation countermeasure network model is a model parameter which is moved out of a full-scale second generation countermeasure network model which is carried on a server corresponding to the mobile terminal and used for generating cartoon face images corresponding to real face images in a mode of model distillation;

and carrying out image fusion processing on the cartoon face image generated by the first generation countermeasure network model and the background image to obtain the video frame image subjected to the portrait cartoon processing so as to finish the portrait cartoon processing aiming at the video frame image.

Optionally, the training of the first generation countermeasure network model is completed on the server;

the training process of the first generation antagonistic network model comprises the following steps:

performing countermeasure training on the second generated countermeasure network model serving as a teacher model based on a preset training sample set; the training sample set comprises a real human face sample set formed by a plurality of real human face image samples and a first cartoon human face sample set formed by a plurality of cartoon human face image samples;

acquiring cartoon face image samples which are output by the trained second generation confrontation network model and correspond to the real face image samples in the real face sample set, so that a second cartoon face sample set is formed by the cartoon face image samples;

and taking the training sample set and the second cartoon face sample set as distillation sample sets, and carrying out model distillation training on the first generation countermeasure network model serving as a student model so as to migrate model parameters from the second generation countermeasure network model as the model parameters of the first generation countermeasure network model.

Optionally, the generating the antagonistic network model comprises cyclically generating the antagonistic network model.

Optionally, the loop generation countermeasure network model is a UGATIT model.

Optionally, the second generative confrontation network model comprises a plurality of sub-models shown below:

the cartoon portrait generation sub-model is used for generating a cartoon face image corresponding to the real face image;

the overall cartoon face identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel or not based on all images of the cartoon face image;

the local cartoon face identification submodel is used for identifying whether the cartoon face image is a cartoon face image generated by the cartoon face generation submodel or not based on a local image obtained by cutting a cartoon face image;

the real human image generation sub-model is used for generating a real human face image corresponding to the cartoon human face image;

a global real face identification submodel for identifying whether the real face image is a real face image generated by the real face generation submodel based on all images of the real face image;

and the local real face identification submodel is used for identifying whether the real face image is the real face image generated by the real face image generation submodel or not based on the local image obtained by cutting the real face image.

Optionally, the first generation antagonistic network model comprises a plurality of sub-models shown below:

and the local cartoon face identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel or not based on the local image obtained by cutting the cartoon face image.

Optionally, the initial value of the model parameter of the global cartoon figure identifier sub-model in the first generation countermeasure network model is the model parameter of the global cartoon figure identifier sub-model in the second generation countermeasure network model after training is completed;

the initial value of the model parameter of the local cartoon figure identifier model in the first generation countermeasure network model is the model parameter of the local cartoon figure identifier model in the second generation countermeasure network model after training is completed.

Optionally, the local cartoon portrait identifier submodel in the first generation pairing-immunity network model includes:

the eye feature identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face image generation submodel or not based on a local image containing the eye feature obtained by cutting the cartoon face image; and/or the presence of a gas in the gas,

and the mouth-nose characteristic identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face image generation submodel or not based on the local image containing the mouth-nose characteristic obtained by cutting the cartoon face image.

Optionally, the mobile terminal is further equipped with a portrait segmentation model for performing portrait segmentation processing on the image; wherein, the real face image samples in the real face sample set are labeled with face feature areas containing face features; the human image segmentation model is a machine learning model obtained by taking the human face features extracted from the real human face image sample by the cartoon human image generation sub-model in the first generation countermeasure network model as training samples and taking the labeled human face feature region in the real human face image sample as constraint to perform supervised training;

the image segmentation processing on the video frame image comprises the following steps:

inputting the face features extracted from the video frame image by the cartoon face generation submodel in the first generation countermeasure network model into the face segmentation model, so that a face feature area containing the face features is segmented from the video frame image by the face segmentation model to serve as a real face image containing the face features, and a background image corresponding to the real face image in the video frame image is obtained.

Optionally, training for the portrait segmentation model is performed in synchronization with model distillation training for the first generation antagonistic network model.

Optionally, before the real face image is input into the first generation antagonistic network model to perform image cartoonization on the real face image, the method further includes:

determining whether the time duration of the portrait cartoon processing of the last frame of the video frame image based on the first generation countermeasure network model reaches a preset threshold value;

and if the time duration does not reach the threshold value, further inputting the real face image into the first generation antagonistic network model so as to carry out human image cartoon processing on the real face image.

Optionally, the method further comprises:

and if the time duration reaches the threshold value, determining the last frame of image subjected to the portrait cartoon processing as the video frame image subjected to the portrait cartoon processing.

Optionally, the preset threshold includes a time interval between the capture time of the previous frame of image and the capture time of the video frame of image.

In a second aspect of the embodiments of the present disclosure, a portrait cartoon-type device is provided, which is applied to a mobile terminal, where a lightweight first generation countermeasure network model for generating a cartoon face image corresponding to a real face image is mounted on the mobile terminal; the device comprises:

the detection module is used for detecting whether the acquired video frame image comprises human face features;

the segmentation module is used for carrying out image segmentation processing on the video frame image when the video frame image comprises the face features so as to obtain a real face image containing the face features and a background image corresponding to the real face image;

the processing module is used for inputting the real face image into the first generation anti-network model so as to carry out portrait cartoon processing on the real face image and obtain a cartoon face image which is output by the first generation anti-network model and corresponds to the real face image; the first generation countermeasure network model is a model parameter which is moved out of a full-scale second generation countermeasure network model which is carried on a server corresponding to the mobile terminal and used for generating cartoon face images corresponding to real face images in a mode of model distillation;

and the fusion module is used for carrying out image fusion processing on the cartoon face image generated by the first generation countermeasure network model and the background image to obtain the video frame image subjected to the portrait cartoon processing so as to finish the portrait cartoon processing aiming at the video frame image.

Optionally, the loop generation countermeasure network model is a UGATIT model.

the segmentation module is specifically configured to:

Optionally, the apparatus further comprises:

the first determination module is used for inputting the real face image into the first generation countermeasure network model so as to determine whether the time duration for carrying out the portrait cartoon processing on the last frame image of the video frame image based on the first generation countermeasure network model reaches a preset threshold value before the portrait cartoon processing is carried out on the real face image;

the processing module is specifically configured to:

Optionally, the apparatus further comprises:

and the second determining module is used for determining the previous frame image subjected to the portrait cartoon processing as the video frame image subjected to the portrait cartoon processing when the time duration reaches the threshold value.

In a third aspect of the disclosed embodiments, there is provided a medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described portrait cartoonification methods.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising:

a processor;

a memory for storing a processor executable program;

the processor executes the executable program to realize any portrait cartoon method.

According to the portrait cartoon method of the embodiment of the present disclosure, on a mobile terminal, when it is determined that an acquired video frame image includes a face feature, a real face image including the face feature and a background image corresponding to the real face image are first segmented from the video frame image, then the real face image is input to a lightweight first generation countermeasure network model deployed on the mobile terminal to perform a portrait cartoon processing, and finally a cartoon face image corresponding to the real face image generated by the first generation countermeasure network model is fused with the background image into a video frame image after the portrait cartoon processing.

The model parameters of the first generation countermeasure network model are model parameters migrated from a full-scale second generation countermeasure network model carried on a server corresponding to the mobile terminal in a model distillation mode.

By adopting the mode, the generation countermeasure network model finally used for generating the cartoon face image corresponding to the real face image is a lightweight model, the model parameters are less, and the calculation time consumption is shorter, so that the real-time performance of portrait cartoon-making can be ensured, the method is suitable for portrait-making processing aiming at real-time videos, and the portrait-making service can be provided for software on a mobile terminal.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically illustrates a schematic diagram of an application scenario of portrait cartoonification according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a portrait cartoonification method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of a video frame image according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a cartoon face image in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a first method of training a generated antagonistic network, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a schematic view of a medium according to an embodiment of the disclosure;

fig. 7 schematically shows a block diagram of an audio playback apparatus according to an embodiment of the present disclosure;

FIG. 8 schematically shows a schematic diagram of a computing device in accordance with an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the disclosure, a portrait cartoonizing method, a portrait cartoonizing device, a portrait cartoonizing medium and a computing device are provided.

In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

The present disclosure finds that with the development of the generation of a confrontation network (GAN) technology, a portrait cartoonization method for converting a real face image into a cartoon face image using a generation confrontation network model gradually appears.

However, in the related art, the number of model parameters commonly used for generating the confrontation network model is large, and the calculation time is also long. In this case, on the one hand, the real-time performance of the portrait cartoonization processing using the generation countermeasure network model is poor, and therefore, in the case of video call, live video, and the like, it is difficult to realize the portrait cartoonization processing for the real-time video in these cases; on the other hand, the generation of the countermeasure network model requires more storage resources, computing resources and other device resources, so that the generation of the countermeasure network model is difficult to operate on the mobile terminal, and the portrait cartoon service cannot be provided for the software operating on the mobile terminal.

Therefore, an improved portrait cartoonization method is needed to improve the real-time performance of portrait cartoonization, realize portrait cartoonization processing for real-time videos, and provide portrait cartoonization service for software on a mobile terminal.

In order to solve the above problem, according to the portrait cartoon method of the embodiment of the present disclosure, on a mobile terminal, when it is determined that an acquired video frame image includes a face feature, a real face image including the face feature and a background image corresponding to the real face image are first segmented from the video frame image, the real face image is then input to a lightweight first generation countermeasure network model deployed on the mobile terminal to perform portrait cartoon processing, and finally a cartoon face image corresponding to the real face image generated by the first generation countermeasure network model is fused with the background image into a video frame image after the portrait cartoon processing.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Application scene overview

Referring first to fig. 1, fig. 1 schematically illustrates a schematic diagram of an application scenario of portrait cartoonification according to an embodiment of the present disclosure.

As shown in fig. 1, in an application scenario of portrait cartoonization, at least one mobile terminal may be included, for example: the system comprises a mobile terminal 1, a mobile terminal 2, a mobile terminal N and service terminals corresponding to the mobile terminals; the server and each mobile terminal can respectively carry out data transmission.

The server side can be loaded with a full-scale generation countermeasure network model (called as a second generation countermeasure network model) for generating cartoon face images corresponding to the real face images; the mobile terminal can be loaded with a lightweight generation countermeasure network model (called as a first generation countermeasure network model) for generating cartoon face images corresponding to real face images. Wherein the model parameters of the first generative antagonistic network model are model parameters migrated from the second generative antagonistic network model by means of model distillation.

In addition, the mobile terminal can be further provided with front-facing camera, rear-facing camera and other camera shooting hardware used for collecting images or videos, and image shooting APP, video recording APP and other software. The user can call the camera hardware installed on the mobile terminal through the software to realize the acquisition of images or videos. Subsequently, the first generation countermeasure network model carried by the mobile terminal can be used for carrying out human image cartoon processing on the real human face image part in the collected image or video.

Exemplary method

The portrait cartoonification method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2-5 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Referring to fig. 2, fig. 2 schematically shows a flowchart of a portrait cartoonification method according to an embodiment of the present disclosure.

The portrait cartoonization method can be applied to any mobile terminal shown in FIG. 1; the portrait cartoon method can comprise the following steps:

step 201: and detecting whether the acquired video frame image comprises human face features.

In this embodiment, a plurality of images may be collected at certain time intervals within a period of time, and the images may be sorted according to the order of the collection time, so that the arranged images may be combined into a video. Wherein, the images are all video frame images contained in the video.

For example, if video acquisition is performed within 2 seconds at a time interval of 50 milliseconds, 40 images can be acquired, and the 40 images can be combined into a video according to the sequence of acquisition time; the video comprises 40 video frame images, the duration is 2 seconds, and the frame rate is 20 frames/second.

In order to implement the human image cartoon processing for the video, when a video frame image is acquired, whether the video frame image includes the human face features or not can be detected firstly.

Step 202: and if the video frame image comprises the face features, carrying out image segmentation processing on the video frame image to obtain a real face image containing the face features and a background image corresponding to the real face image.

In this embodiment, if it is determined that the acquired video frame image includes a facial feature, it indicates that the video frame image may be subjected to portrait cartoon processing. In this case, the video frame image may be subjected to image segmentation processing to obtain a real face image containing the face features and a background image corresponding to the real face image.

Taking the video frame image shown in fig. 3 as an example, in the video frame image, the image in the region 301 is the real face image, and the images in the other regions except the region 301 are the background images corresponding to the real face image.

Step 203: inputting the real face image into the first generation countermeasure network model to perform portrait cartoon processing on the real face image, and acquiring a cartoon face image which is output by the first generation countermeasure network model and corresponds to the real face image; the first generation countermeasure network model is a model parameter migrated from a full-scale second generation countermeasure network model which is carried on a server corresponding to the mobile terminal and used for generating cartoon face images corresponding to real face images in a mode of model distillation.

In this embodiment, the mobile terminal is equipped with the first generative countermeasure network model, and the server corresponding to the mobile terminal is equipped with the second generative countermeasure network model; the model parameters of the first generated antagonistic network model are model parameters migrated from the second generated antagonistic network model by means of model distillation.

In one embodiment, the first generation countermeasure network model may include a cyclic generation countermeasure network (cyclic generated adaptive Networks) model, and the second generation countermeasure network model may also include a cyclic generation countermeasure network model.

Further, the above-mentioned loop generation countermeasure network model may include a UGATIT (unknown generated Adaptive Layer-impact Normalization) model.

The first generative countermeasure network model is a lightweight generative countermeasure network model, and the second generative countermeasure network model is a full-scale generative countermeasure network model. That is, the number of model parameters of the first generation countermeasure network model is less than the number of model parameters of the second generation countermeasure network model, and therefore, the calculation time of the first generation countermeasure network model is less than that of the second generation network, so that the first generation countermeasure network model can be run on the mobile terminal.

In practical applications, the second generative confrontation network model loaded on the server may be trained first. After the trained second generation countermeasure network model is obtained, model parameters can be migrated from the trained second generation countermeasure network model in a model distillation mode to serve as the model parameters of the first generation countermeasure network model, and the first generation countermeasure network model with the determined model parameters is deployed on the mobile terminal.

Under the condition that the real face image is obtained through image segmentation, the real face image can be input into the first generation countermeasure network model, and the first generation countermeasure network model carries out portrait cartoon processing on the real face image, namely a cartoon face image corresponding to the real face image is generated, and the generated cartoon face image is output; subsequently, the cartoon face image output by the first generation countermeasure network model can be obtained.

Step 204: and carrying out image fusion processing on the cartoon face image generated by the first generation countermeasure network model and the background image to obtain the video frame image subjected to the portrait cartoon processing so as to finish the portrait cartoon processing aiming at the video frame image.

In this embodiment, in the case that the cartoon face image generated by the first generation countermeasure network model is acquired, the cartoon face image and the background image may be subjected to image fusion processing to obtain the video frame image after the portrait is cartoon, so as to complete the portrait cartoon processing for the video frame image.

Continuing with the video frame image shown in fig. 3 as an example, after the real face image in the area 301 is input into the first generation countermeasure network model, and the cartoon face image corresponding to the real face image is output by the first generation countermeasure network model, the cartoon face image and the background images in the other areas except the area 301 may be subjected to image fusion processing, so as to obtain the video frame image after human image cartoon.

The training processes of the second generation countermeasure network model and the first generation countermeasure network model are described below, respectively. And training the first generation antagonistic network model, namely transferring model parameters from the second generation antagonistic network model in a model distillation mode to be used as the model parameters of the first generation antagonistic network model.

(1) The second training procedure for generating the confrontation network model

In practical applications, a real face sample set composed of a plurality of real face image samples and a cartoon face sample set (referred to as a first cartoon face sample set) composed of a plurality of cartoon face image samples can be used as training sample sets for training the second generative confrontation network model. That is, the training sample set includes the real face sample set and the first cartoon face sample set.

Subsequently, the second generated confrontation network model can be confronted and trained based on the training sample set.

It should be noted that, the real face image samples in the real face sample set and the cartoon face image samples in the first cartoon face sample set may be in a one-to-one correspondence relationship (at this time, the number of the real face image samples in the real face sample set is the same as the number of the cartoon face image samples in the first cartoon face sample set), or may not be in a one-to-one correspondence relationship (at this time, the number of the real face image samples in the real face sample set may be different from the number of the cartoon face image samples in the first cartoon face sample set), which is not limited in this respect.

In one embodiment, the second generative confrontation network model may include a plurality of sub-models shown below:

a global cartoon face identification submodel for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel in the second generation countermeasure network model based on all images of the cartoon face image;

the local cartoon face identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel in the second generation countermeasure network model or not based on the local image obtained by cutting the cartoon face image;

a global real face identification submodel for identifying whether the real face image is the real face image generated by the real face generation submodel in the second generation countermeasure network model based on all images of the real face image;

and the local real face identification submodel is used for identifying whether the real face image is the real face image generated by the real face generation submodel in the second generation countermeasure network model or not based on the local image obtained by cutting the real face image.

In this case, on one hand, each real face image sample in the real face sample set may be input into the cartoon figure generation submodel, so as to generate a cartoon face image corresponding to the real face image sample by the cartoon figure generation submodel; and, each cartoon face image sample in the first cartoon face sample set may be input into the real-person-image generation submodel, so that a real face image corresponding to the cartoon face image sample is generated by the real-person-image generation submodel.

On the other hand, each cartoon face image generated by the cartoon face generation submodel can be input into the real face generation submodel, so that the real face image corresponding to the cartoon face image is generated by the real face generation submodel; and, each real face image generated by the real face generation submodel may be input to the cartoon face generation submodel to generate a cartoon face image corresponding to the real face image by the cartoon face generation submodel.

The global cartoon face identification submodel can identify whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon face generation submodel based on all images of each cartoon face image; the local cartoon face identification submodel can identify whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon face generation submodel based on a local image obtained by cutting each cartoon face image.

It should be noted that the second generative confrontation network model may include one of the global cartoon character identifier models and a plurality of the local cartoon character identifier models.

Taking the cartoon face image shown in fig. 4 as an example, in the cartoon face image, the image in the area 401 is all the images of the cartoon face image, the image in the area 402 is a partial image containing the left eye feature in the cartoon face image, the image in the area 403 is a partial image containing the right eye feature in the cartoon face image, the image in the area 404 is a partial image containing the mouth-nose feature in the cartoon face image, the image in the area 405 is a partial image containing the left eyebrow feature in the cartoon face image, and the image in the area 406 is a partial image containing the right eyebrow feature in the cartoon face image. The sizes of the images in the respective regions may be the same or different, and are not limited thereto.

For the cartoon face image, the global cartoon face identification submodel may identify, based on all images in the area 401, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon face generation submodel; the local cartoon portrait identification submodel 1 corresponding to the eye features can identify whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon portrait generation submodel based on local images in the area 402 and the area 403; the local cartoon portrait identification submodel 2 corresponding to the mouth-nose characteristics can identify whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon portrait generation submodel based on the local image in the area 404; the local cartoon face identification submodel 3 corresponding to the eyebrow feature may identify, based on the local images in the region 405 and the region 406, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or a cartoon face image generated by the cartoon face generation submodel.

Similarly, the global real face identification submodel may identify, based on all images of each real face image, whether the real face image is a real face image sample in the set of real face samples or a real face image generated by the real face generation submodel; the local real face identification submodel may identify whether the real face image is a real face image sample in the set of real face samples or a real face image generated by the real face generation submodel based on a local image obtained by cutting each real face image.

The second generative confrontation network model may include one global true figure identifier sub-model and a plurality of local true figure identifier sub-models.

In practical applications, the second generative confrontation network model may adopt a loss function of a commonly used generative confrontation network, construct a loss function corresponding to the second generative confrontation network model, and calculate a loss of the second generative confrontation network model based on the constructed loss function, so as to adjust model parameters of the second generative confrontation network model according to the calculated loss until a calculation effect of the second generative confrontation network model meets a requirement (e.g., a minimum loss).

Specifically, in the case that the second generated countermeasure network model is a circularly generated countermeasure network model, a countermeasure loss function, a circularly consistent loss function, an identity loss function and an auxiliary classification loss function corresponding to the circularly generated countermeasure network model may be constructed, and the four loss functions are weighted and summed according to weights preset by a technician to obtain a final loss function; subsequently, the loss of the loop-generated confrontation network model can be calculated based on the final loss function, so that the model parameters of the loop-generated confrontation network model can be adjusted according to the calculated loss until the calculation effect of the loop-generated confrontation network model meets the requirement.

(2) The training process of the first generation antagonistic network model

After the trained second generative confrontation network model is obtained, model parameters can be migrated from the trained second generative confrontation network model in a model distillation mode to be used as model parameters of the first generative confrontation network model, namely, the first generative confrontation network model is trained.

In one illustrated embodiment, referring to fig. 5, fig. 5 schematically illustrates a flow chart of a first training method for generating a countering network according to an embodiment of the present disclosure.

The training method of the first generation countermeasure network may include the steps of:

step 501, performing countermeasure training on the second generated countermeasure network model serving as a teacher model based on a preset training sample set; the training sample set comprises a real human face sample set formed by a plurality of real human face image samples and a first cartoon human face sample set formed by a plurality of cartoon human face image samples;

step 502, obtaining cartoon face image samples which are output by the trained second generation confrontation network model and correspond to the real face image samples in the real face sample set, so as to form a second cartoon face sample set by the cartoon face image samples;

step 503, using the training sample set and the second cartoon face sample set as distillation sample sets, performing model distillation training on the first generated antagonistic network model as a student model, so as to migrate model parameters from the second generated antagonistic network model as model parameters of the first generated antagonistic network model.

In order to realize model distillation, the second generation countermeasure network model can be used as a teacher (teacher) model, the first generation countermeasure network model can be used as a student (student) model, and the student model can "learn" the teacher model, so that the calculation effect of the student model can reach the calculation effect of the teacher model.

The specific implementation of step 501 may refer to the training process of the second generation confrontation network model, which is not described herein again.

After the training of the second generated confrontation network model serving as the teacher model is completed, the cartoon face images output by the trained second generated confrontation network model and respectively corresponding to the real face image samples in the real face sample set, that is, the cartoon face images generated by the trained second generated confrontation network and respectively corresponding to the real face image samples in the real face sample set, may be obtained, and these cartoon face images are used as new cartoon face image samples, so that a new cartoon face sample set (referred to as a second cartoon face sample set) is formed by these cartoon face image samples.

Subsequently, the training sample set and the second cartoon face sample set may be used as distilled sample sets, and model distillation training may be performed on the first generative countermeasure network model as a student model, so as to obtain model parameters migrated from the second generative countermeasure network model as model parameters of the first generative countermeasure network.

Since the amount of calculation in training the first generation countermeasure network model is large, the training of the first generation countermeasure network can be completed at the server corresponding to the mobile terminal as shown in fig. 1 in order to improve the training efficiency.

In one embodiment, the first generation countermeasure network model may include a plurality of sub-models shown below:

the global cartoon face identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel in the first generation countermeasure network model or not based on all images of the cartoon face image;

and the local cartoon face identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel in the first generation countermeasure network model or not based on the local image obtained by cutting the cartoon face image.

In this case, each real face image sample in the set of real face samples may be input into the cartoon character generation submodel, so as to generate a cartoon face image corresponding to the real face image sample from the cartoon character generation submodel.

The global cartoon face identification submodel may identify, based on all images of the respective cartoon face images, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or the second cartoon face sample set, or a cartoon face image generated by the cartoon face generation submodel; the local cartoon face identification submodel may identify, based on the local image obtained by cutting each cartoon face image, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or the second cartoon face sample set, or a cartoon face image generated by the cartoon face generation submodel.

It should be noted that the first generation countermeasure network model may include one of the global cartoon figure identifier models and a plurality of the local cartoon figure identifier models.

When the real face image is subjected to portrait cartoon processing, the shape change of an eye region and a mouth and nose region of the generated cartoon face image is generally larger than that of the real face image, so that the two regions can be enhanced and optimized in order to ensure the portrait cartoon processing effect.

Specifically, in an embodiment shown, the local cartoon portrait identifier submodel in the first generation countermeasure network model may include:

the eye feature identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon face generation submodel in the first generation countermeasure network model or not based on the local image containing the eye features obtained by cutting the cartoon face image; and/or the presence of a gas in the gas,

and the mouth-nose characteristic identification submodel is used for identifying whether the cartoon face image is the cartoon face image generated by the cartoon figure generation submodel in the first generation pairing network model or not based on the local image containing the mouth-nose characteristic obtained by cutting the cartoon face image.

Continuing to take the cartoon face image shown in fig. 4 as an example, for the cartoon face image, the eye feature identification submodel may identify, based on the local images in the area 402 and the area 403, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or the second cartoon face sample set, or a cartoon face image generated by the cartoon face generation submodel; the mouth-nose feature identification submodel may identify, based on the local image in the area 404, whether the cartoon face image is a cartoon face image sample in the first cartoon face sample set or the second cartoon face sample set, or a cartoon face image generated by the cartoon face generation submodel.

In practical applications, the cartoon figure generation submodel may include an encoding side and a decoding side. The encoding end extracts the features of the input image and compresses and encodes the input image, and the decoding end performs up-sampling layer by layer to obtain the final cartoon portrait. The cartoon portrait generation submodel may employ a depth separable convolution (convolutional) layer as a convolutional layer.

It should be noted that, compared with the cartoon portrait generation submodel in the second generative countermeasure network model, the number of convolution layers may be the same, but the size of the convolution kernel is smaller, and the number of channels is smaller, so that the first generative countermeasure network model is a lightweight generative countermeasure network model and can be run on a mobile terminal.

In addition, the global cartoon figure identifier model and the local cartoon figure identifier model in the first generation countermeasure network model may have the same model structure as compared with the global cartoon figure identifier model and the local cartoon figure identifier model in the second generation countermeasure network model.

In an embodiment shown, the first generation countermeasure network model includes a first generation network model and a second generation network model, wherein the first generation countermeasure network model includes a global cartoon figure identifier sub-model, the second generation countermeasure network model includes a training network model, and the training network model includes a training network model and a training network model; and the initial value of the model parameter of the local cartoon figure identifier model in the first generation countermeasure network model is the model parameter of the local cartoon figure identifier model in the second generation countermeasure network model after training is finished.

For example, the initial value of the model parameter of the eye feature identifier sub-model in the first generated countermeasure network model is the model parameter of the eye feature identifier sub-model in the second generated countermeasure network model after training is completed; the initial value of the model parameter of the oral-nasal feature identifier model in the first generation countermeasure network model is the model parameter of the oral-nasal feature identifier model in the second generation countermeasure network model after training is completed.

In practical applications, for the first generation antagonistic network model, on one hand, the following loss function can be constructed as the distillation loss function:

wherein L is_distillRepresents the distillation loss, i represents the number of convolution layers, L represents the total number of convolution layers,

representing the output of the i-th convolutional layer in the cartoon portrait generation submodel in the first generation countermeasure network model, F_t ⁱAn output representing the i-th convolution layer in the cartoon portrait generation submodel in the second generation confrontation network model;

indicates that a 1 × 1 convolutional layer f is to be passed_iWill be

The number of channels is increased to and_t ⁱthe same is true.

On the other hand, the following loss function can be constructed as a reconstruction loss function:

L_recon＝|Cartoon_t-Cartoon_s|

wherein L is_reconRepresenting reconstruction loss, Cartoon_tRepresenting the final output of the Cartoon portrait creation submodel of the second generative confrontation network model, Cartoon_sAnd representing the final output of the cartoon portrait generation submodel in the first generation countermeasure network model.

In yet another aspect, a global countermeasure loss function corresponding to the global cartoon character identifier model in the first generated countermeasure network model described above can be constructed (global countermeasure loss is expressed as

) And an eye countermeasure loss function corresponding to the eye feature identifier model in the first generation countermeasure network model (the eye countermeasure loss is expressed as

) And an oronasal adversity loss function corresponding to the oronasal feature identifier model in the first generation adversity network model (oronasal adversity loss is expressed as

)。

Further, L may be paired according to a weight preset by a technician_distill、L_recon、

And

carrying out weighted summation processing on the five loss functions to obtain a final loss function; subsequently, the loss of the first generation countermeasure network model can be calculated based on the final loss function, so as to adjust the model parameters of the first generation countermeasure network model according to the calculated loss until the calculation effect of the first generation countermeasure network model reaches the required effectAnd (6) obtaining.

In order to improve the effect of image fusion between the cartoon face image and the background image, in an embodiment shown in the figure, a figure segmentation model for performing figure segmentation processing on the image may be further mounted on the mobile terminal.

At this time, each real face image sample in the real face sample set may be labeled with a face feature region including a face feature, that is, a face mask.

Correspondingly, the face features extracted from the real face image sample by the cartoon face generation submodel in the first generation countermeasure network model can be used as a training sample, a face mask marked in the real face image sample is used as a constraint, a preset machine learning model is subjected to supervised training, and the trained machine learning model is used as the face segmentation model.

In practical application, because the generated cartoon face image is output by the last layer of convolution layer of the cartoon figure generation submodel, the cartoon figure generation submodel extracts the face features from the real face image sample, and specifically may be data output by the penultimate layer of convolution layer of the cartoon figure generation submodel.

In this case, when performing image segmentation processing on the video frame image, the face feature extracted from the video frame image by the cartoon face generation sub-model in the first generation countermeasure network model may be input into the face segmentation model, so that the face mask including the face feature is segmented from the video frame image by the face segmentation model to serve as a real face image including the face feature, and a background image corresponding to the real face image in the video frame image may be acquired.

After the cartoon face image corresponding to the real face image is generated, image fusion processing can be performed on the cartoon face image and the background image in a Poisson fusion or alpha fusion mode, and the video frame image after the portrait cartoon processing is obtained.

In practical applications, for the above-mentioned portrait segmentation model, the following loss function may be constructed:

wherein L is_maskRepresenting the distillation loss, i represents the number of real face image samples, n represents the total number of real face image samples in the set of real face samples, G (x)_i) Representing the face mask, Label, segmented from the ith real face image sample by the face segmentation model_iRepresenting the face mask marked in the ith real face image sample.

Subsequently, the loss of the portrait segmentation model can be calculated based on the loss function, so that the model parameters of the first portrait segmentation model are adjusted according to the calculated loss until the calculation effect of the portrait segmentation model meets the requirement.

Further, in one embodiment shown, the training of the portrait segmentation model and the model distillation training of the first generation antagonistic network model can be performed synchronously to improve the calculation accuracy and the calculation effect of the model.

In an illustrated embodiment, before the real face image segmented from the video frame image is input into the first generation countermeasure network model for image cartoon processing, it may be determined whether the time duration for performing image cartoon processing on the previous frame image of the video frame image based on the first generation countermeasure network model reaches a preset threshold.

If the time duration does not reach the threshold, the real face image can be further input into the first generation countermeasure network model so as to carry out portrait cartoon processing on the real face image.

Further, in an illustrated embodiment, if the time duration reaches the threshold, the previous frame of image after the portrait cartoonization processing may be directly determined as the video frame of image after the portrait cartoonization processing.

In practical applications, the threshold may be a time interval between the capturing time of the previous frame image and the capturing time of the video frame image.

For example, assuming that the frame rate used when capturing a video is 20 frames/second, the time interval between the capturing time of two adjacent frames of images for the video is 50 milliseconds, and therefore, the threshold may also be 50 milliseconds.

In summary, according to the portrait cartoon method of the embodiment of the present disclosure, on the mobile terminal, when it is determined that the acquired video frame image includes the face feature, the real face image including the face feature and the background image corresponding to the real face image are first segmented from the video frame image, then the real face image is input to the lightweight first generation countermeasure network model deployed on the mobile terminal to perform the portrait cartoon processing, and finally the cartoon face image corresponding to the real face image generated by the first generation countermeasure network model is fused with the background image into the video frame image after the portrait cartoon processing.

Exemplary Medium

Having described the method of the exemplary embodiment of the present disclosure, the medium of the exemplary embodiment of the present disclosure is explained next with reference to fig. 6.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be executed on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary devices

Having described the media of the exemplary embodiments of the present disclosure, the apparatus of the exemplary embodiments of the present disclosure is described next with reference to fig. 7.

The implementation process of the functions and actions of each module in the following device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again. For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points.

Fig. 7 schematically shows a portrait cartoonization apparatus according to an embodiment of the present disclosure, applied to a mobile terminal, on which a lightweight first generation countermeasure network model for generating cartoon face images corresponding to real face images is mounted; the device comprises:

a detection module 701, configured to detect whether an acquired video frame image includes a face feature;

a segmentation module 702, configured to perform image segmentation processing on the video frame image when the video frame image includes a face feature, to obtain a real face image including the face feature and a background image corresponding to the real face image;

the processing module 703 is configured to input the real face image into the first generation countermeasure network model, to perform portrait cartoon processing on the real face image, and to obtain a cartoon face image corresponding to the real face image and output by the first generation countermeasure network model; the first generation countermeasure network model is a model parameter which is moved out of a full-scale second generation countermeasure network model which is carried on a server corresponding to the mobile terminal and used for generating cartoon face images corresponding to real face images in a mode of model distillation;

and the fusion module 704 performs image fusion processing on the cartoon face image generated by the first generation countermeasure network model and the background image to obtain the video frame image subjected to the portrait cartoon processing so as to complete the portrait cartoon processing on the video frame image.

Optionally, the loop generation countermeasure network model is a UGATIT model.

the segmentation module 702 is specifically configured to:

Optionally, the apparatus further comprises:

a first determining module 705, configured to input the real face image into the first generation countermeasure network model, so as to determine whether a time duration for performing the portrait cartoon processing on the previous frame image of the video frame image based on the first generation countermeasure network model reaches a preset threshold before performing the portrait cartoon processing on the real face image;

the processing module 703 is specifically configured to:

Optionally, the apparatus further comprises:

a second determining module 706, configured to determine, when the time duration reaches the threshold, the previous frame image after the portrait cartoon processing is determined as the video frame image after the portrait cartoon processing.

Exemplary computing device

Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device of the exemplary embodiments of the present disclosure is described next with reference to fig. 8.

The computing device 800 shown in fig. 8 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 8, computing device 800 is in the form of a general purpose computing device. Components of computing device 800 may include, but are not limited to: the at least one processing unit 801 and the at least one memory unit 802, and a bus 803 connecting the various system components (including the processing unit 801 and the memory unit 802).

The bus 803 includes a data bus, a control bus, and an address bus.

The storage unit 802 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)8021 and/or cache memory 8022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 8023.

Storage unit 802 can also include a program/utility 8025 having a set (at least one) of program modules 8024, such program modules 8024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 800 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, etc.).

Such communication may be through input/output (I/O) interfaces 805. Moreover, computing device 800 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 806. As shown in fig. 8, a network adapter 806 communicates with the other modules of the computing device 800 via the bus 803. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the portrait cartoonification apparatus are mentioned, this division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A human image cartoon method is applied to a mobile terminal, wherein a lightweight first generation countermeasure network model for generating a cartoon face image corresponding to a real face image is carried on the mobile terminal; the method comprises the following steps:

2. The method of claim 1, the first generative antagonistic network model being trained on the server;

3. The method of claim 2, the second generative confrontation network model comprising a plurality of sub-models shown below:

4. The method of claim 3, the first generative antagonistic network model comprising a plurality of sub-models shown below:

5. The method of claim 4, the mobile terminal is further loaded with a portrait segmentation model for performing portrait segmentation processing on the image; wherein, the real face image samples in the real face sample set are labeled with face feature areas containing face features; the human image segmentation model is a machine learning model obtained by taking the human face features extracted from the real human face image sample by the cartoon human image generation sub-model in the first generation countermeasure network model as training samples and taking the labeled human face feature region in the real human face image sample as constraint to perform supervised training;

6. The method of claim 1, before inputting the real face image into the first generation countermeasure network model for performing an image cartoonization process on the real face image, the method further comprising:

7. The method of claim 6, further comprising:

8. A human image cartoon device is applied to a mobile terminal, wherein a lightweight first generation countermeasure network model for generating a cartoon face image corresponding to a real face image is carried on the mobile terminal; the device comprises:

9. A medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1-7.

10. A computing device, comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor implements the method of any one of claims 1-7 by running the executable program.