CN116912345B - Portrait cartoon processing method, device, equipment and storage medium - Google Patents

Portrait cartoon processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN116912345B
CN116912345B CN202310854234.1A CN202310854234A CN116912345B CN 116912345 B CN116912345 B CN 116912345B CN 202310854234 A CN202310854234 A CN 202310854234A CN 116912345 B CN116912345 B CN 116912345B
Authority
CN
China
Prior art keywords
image
cartoon
super
resolution
portrait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310854234.1A
Other languages
Chinese (zh)
Other versions
CN116912345A (en
Inventor
肖冠正
张鑫
苏泽阳
赵岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iMusic Culture and Technology Co Ltd
Original Assignee
iMusic Culture and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iMusic Culture and Technology Co Ltd filed Critical iMusic Culture and Technology Co Ltd
Priority to CN202310854234.1A priority Critical patent/CN116912345B/en
Publication of CN116912345A publication Critical patent/CN116912345A/en
Application granted granted Critical
Publication of CN116912345B publication Critical patent/CN116912345B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a portrait animation processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a portrait picture; preprocessing the portrait picture to obtain a preprocessed image; performing super-resolution denoising treatment on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image; and performing cartoon treatment on the superdivision image through a pre-trained cartoon model to obtain a cartoon figure. The embodiment of the invention combines the super-resolution technology and the cartoon technology, improves the cartoon processing effect of the portrait, and can be widely applied to the technical field of image processing.

Description

Portrait cartoon processing method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a storage medium for processing a portrait animation.
Background
In the field of image processing, portrait animation is a technique that converts a real portrait into a cartoon or cartoon-style image. In recent years, a method for realizing animation by a deep learning technique has come to appear. In the related art, a style migration technology is generally used for realizing the migration of cartoon textures to a real-world scene, but clear facial contours cannot be reserved when the method is used for processing a human face, and pictures generated by cartoon are relatively fuzzy and have poor generation effect. In view of the foregoing, there is a need for solving the technical problems in the related art.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method, apparatus, device, and storage medium for processing a portrait animation, so as to improve the definition of a generated image.
In one aspect, the invention provides a method for processing a portrait animation, which comprises the following steps:
Acquiring a portrait picture;
preprocessing the portrait picture to obtain a preprocessed image;
Performing super-resolution denoising treatment on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image;
and performing cartoon treatment on the superdivision image through a pre-trained cartoon model to obtain a cartoon figure.
Optionally, the preprocessing the portrait picture to obtain a preprocessed image includes:
decoding the portrait picture to obtain image data;
Performing edge filling processing on the image data to obtain a filled image;
and carrying out normalization processing on the filling image to obtain a preprocessed image.
Optionally, the performing super-resolution denoising processing on the preprocessed image through a pre-trained image super-resolution model to obtain a super-resolution image includes:
inputting the preprocessed image into the pre-trained portrait super-resolution model;
performing feature extraction processing on the preprocessed image through multi-layer depth separable convolution to obtain multi-layer features;
And carrying out up-sampling treatment on the multi-layer features through the sub-pixel convolution layer to obtain a super-resolution image.
Optionally, before the super-resolution denoising processing is performed on the preprocessed image through the pre-trained super-resolution model of the portrait, the method further includes pre-training the super-resolution model of the portrait, including:
Acquiring a portrait training data set;
Inputting the portrait training data set into the portrait super-resolution model to obtain a generated data set;
inputting the generated data set into a super-resolution discriminator network to obtain a generated data discrimination result;
and updating parameters of the portrait super-resolution model according to the generated data identification result.
Optionally, the performing cartoon processing on the super-resolution image through a pre-trained cartoon model to obtain a cartoon figure, including:
Inputting the superdivision image into the pre-trained cartoon model;
performing feature extraction processing on the super-resolution image through an encoder to obtain a feature vector;
and converting the feature vector through a decoder to obtain the cartoon figure.
Optionally, before the cartoon processing is performed on the super-resolution image through the pre-trained cartoon model to obtain the cartoon figure, the method further comprises pre-training the cartoon model, and specifically comprises the following steps:
Acquiring face image data, and performing data augmentation processing on the face image data to obtain cartoon training data;
performing cartoon treatment on the cartoon training data through the cartoon model to obtain cartoon generation data;
the cartoon generation data is identified by a cartoon identifier, so that a cartoon identification result is obtained;
And updating parameters of the cartoon model according to the cartoon identification result.
Optionally, the performing data augmentation processing on the face image data to obtain cartoon training data includes:
scaling and rotating the face image data to obtain image augmentation data;
And performing edge filling processing on the image augmentation data to obtain cartoon training data.
On the other hand, the embodiment of the invention also provides a portrait animation processing device, which comprises:
the first module is used for acquiring a portrait picture;
the second module is used for preprocessing the portrait picture to obtain a preprocessed image;
the third module is used for performing super-resolution denoising processing on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image;
and the fourth module is used for performing cartoon processing on the super-resolution image through a pre-trained cartoon model to obtain a cartoon figure.
Optionally, the second module is configured to perform preprocessing on the portrait image to obtain a preprocessed image, and includes:
The first unit is used for decoding the portrait picture to obtain image data;
A second unit, configured to perform edge filling processing on the image data to obtain a filled image;
And the third unit is used for carrying out normalization processing on the filling image to obtain a preprocessed image.
Optionally, the third module is configured to perform super-resolution denoising processing on the preprocessed image through a pre-trained portrait super-resolution model, to obtain a super-resolution image, and includes:
a fourth unit for inputting the pre-processed image into the pre-trained portrait super-resolution model;
A fifth unit, configured to perform feature extraction processing on the preprocessed image through multi-layer depth separable convolution, to obtain multi-layer features;
and a sixth unit, configured to perform upsampling processing on the multi-layer feature through a sub-pixel convolution layer, so as to obtain a super-resolution image.
Optionally, the device further includes a fifth module, configured to pre-train the portrait super-resolution model, specifically including:
a seventh unit, configured to acquire a portrait training dataset;
An eighth unit, configured to input the portrait training data set to the portrait super-resolution model, to obtain a generated data set;
A ninth unit, configured to input the generated data set to a super-resolution discriminator network, to obtain a generated data discrimination result;
and a tenth unit, configured to update parameters of the portrait super-resolution model according to the generated data identification result.
Optionally, the fourth module is configured to perform animation processing on the superdivision image through a pre-trained animation model to obtain an animation portrait, and includes:
an eleventh unit for inputting the superdivision image into the pre-trained animation model;
a twelfth unit, configured to perform feature extraction processing on the super-resolution image through an encoder to obtain a feature vector;
And a thirteenth unit, configured to perform conversion processing on the feature vector by using a decoder, so as to obtain a cartoon figure.
Optionally, the apparatus further includes a sixth module for pre-training the cartoon model, including:
A fourteenth unit, configured to acquire face image data, and perform data augmentation processing on the face image data to obtain cartoon training data;
a fifteenth unit, configured to perform cartoon processing on the cartoon training data through the cartoon model to obtain cartoon generation data;
Sixteenth unit, which is used for identifying the cartoon generation data through cartoon identifier to obtain cartoon identification result;
seventeenth unit, is used for updating the parameter of the said cartoon model according to the said cartoon identification result.
Optionally, the fourteenth unit is configured to acquire face image data, perform data augmentation processing on the face image data, and obtain cartoon training data, and includes:
the first subunit is used for carrying out scaling and rotation processing on the face image data to obtain image augmentation data;
And the second subunit is used for performing edge filling processing on the image augmentation data to obtain cartoon training data.
On the other hand, the embodiment of the invention also discloses electronic equipment, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
In another aspect, embodiments of the present invention also disclose a computer readable storage medium storing a program for execution by a processor to implement a method as described above.
In another aspect, embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects: according to the embodiment of the invention, super-resolution denoising processing is carried out on the preprocessed image through a pre-trained human image super-resolution model, so that a super-resolution image is obtained; according to the embodiment of the invention, the input image is subjected to the definition processing by using the super-resolution technology, so that the input noise is reduced, and the animation effect is improved; then, according to the embodiment of the invention, the super-resolution image is subjected to cartoon treatment through the pre-trained cartoon model to obtain the cartoon human image, so that the end-to-end cartoon treatment can be carried out without depending on the face key point detection and segmentation model, and the cartoon treatment effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for processing a cartoon of a portrait, which is provided by an embodiment of the invention;
fig. 2 is a flowchart of step S102 in fig. 1;
fig. 3 is a flowchart of step S103 in fig. 1;
FIG. 4 is a schematic structural diagram of a super resolution model of a portrait according to an embodiment of the present invention;
fig. 5 is a flowchart of step S104 in fig. 1;
FIG. 6 is a schematic structural diagram of a cartoon model according to an embodiment of the present invention;
FIG. 7 is a flowchart of a training method for a cartoon model provided by an embodiment of the present invention;
fig. 8 is a flowchart of step S501 in fig. 7;
Fig. 9 is a schematic structural diagram of a portrait animation processing device according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the related art, in the field of image processing, portrait animation is a technology of converting a real portrait image into a cartoon or cartoon style image. Traditional portrait animation methods are typically implemented using image processing algorithms and filters, but cannot generate high quality images because they do not accurately represent the details and textures in the real image. In recent years, a method for realizing animation by a deep learning technique has come to appear. For example, there are methods for realizing the migration of cartoon textures to a real-world scene by means of style migration technology, but clear facial contours cannot be reserved when a face is processed. Although the cartoon processing is performed on the specific area based on the key point information and the segmentation information of the face, when the picture input by the user is relatively blurred, the cartoon processing effect is poor, such as abnormal processing of the face area due to the fact that the face cannot be detected.
Therefore, the embodiment of the invention provides a portrait animation processing method, which combines the super-resolution technology and the animation technology to improve the portrait animation processing effect. The portrait animation processing method in the embodiment of the invention can be applied to a terminal, a server, software running in the terminal or the server and the like. The terminal may be, but is not limited to, a tablet computer, a notebook computer, a desktop computer, etc. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.
Referring to fig. 1, an embodiment of the present invention provides a method for processing a portrait animation, including:
s101, acquiring a portrait picture;
S102, preprocessing the portrait picture to obtain a preprocessed image;
S103, performing super-resolution denoising treatment on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image;
S104, performing cartoon processing on the super-resolution image through a pre-trained cartoon model to obtain a cartoon figure.
In the embodiment of the invention, a portrait picture is firstly obtained, and the portrait picture can be a whole body photo image, a half body photo image or a face image and the like. And then preprocessing the portrait picture, namely, preprocessing data such as decoding, edge filling, normalization and the like on the portrait picture input by the target object to obtain a preprocessed image. And inputting the preprocessed image into a pre-trained human image super-resolution model to perform super-resolution denoising processing, so as to obtain a super-resolution image. According to the embodiment of the invention, the input preprocessed image is subjected to the sharpening processing through the human image super-resolution model, so that the noise point and the fuzzy area of the input image are removed, the interference of the noise point and the fuzzy on the follow-up animation model is reduced, and the animation effect is improved. And finally, performing cartoon treatment on the superdivision image through a pre-trained cartoon model to obtain a cartoon figure. According to the embodiment of the invention, the cartoon model trained based on the paired face data can be directly applied to a whole-body image scene through data augmentation strategies such as zooming, rotation and edge filling during training, the construction difficulty of the whole-body image cartoon model is reduced, the end-to-end cartoon processing model is obtained through training, and the end-to-end cartoon processing can be carried out without depending on the face key point detection and segmentation model, so that any image uploaded by a target object is converted into a cartoon image.
It should be noted that, in each specific embodiment of the present application, when related processing is required to be performed according to data related to the identity or characteristics of the target object, such as information of the target object, behavior data of the target object, history data of the target object, and position information of the target object, permission or consent of the target object is obtained first, and the collection, use, processing, etc. of the data complies with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive information of the target object, the independent permission or independent consent of the target object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the target object is explicitly acquired, the necessary target object related data for enabling the embodiment of the application to normally operate is acquired.
Further as an optional embodiment, referring to fig. 2, in step S102, the preprocessing the portrait picture to obtain a preprocessed image includes:
s201, decoding the portrait picture to obtain image data;
s202, performing edge filling processing on the image data to obtain a filled image;
And S203, carrying out normalization processing on the filling image to obtain a preprocessed image.
In the embodiment of the invention, the portrait picture is decoded, and the image data can be obtained by carrying out mapping, quantization and symbol encoding on an input image through an encoder, and then carrying out symbol decoding and reflection processing through a decoder. In addition, the embodiment of the invention also records the image data in numpy array form, wherein numpy is the basis of numerical calculation and scientific calculation in Python family, is the basis of many implementation by using tool kit, and numpy provides vector and matrix operation to help to optimize the performance of the quantization analysis algorithm. And then performing edge filling processing on the image data to obtain a filled image. In one possible embodiment, the input image is edge-filled to expand its width and height to a multiple of 32 by determining if the width and height of the input image are multiples of 32, if not. And finally, carrying out normalization processing on the filling image to obtain a preprocessed image, namely carrying out normalization processing on pixel values of pictures in the filling image, and normalizing the value range of the pixel values from 0 to 255 to a 0-1 interval.
Further as an optional embodiment, referring to fig. 3, in step S103, the performing, by using a pre-trained portrait super-resolution model, super-division denoising processing on the preprocessed image to obtain a super-division image includes:
s301, inputting the preprocessing image into the pre-trained portrait super-resolution model;
S302, performing feature extraction processing on the preprocessed image through multi-layer depth separable convolution to obtain multi-layer features;
S303, carrying out up-sampling processing on the multi-layer features through the sub-pixel convolution layer to obtain a super-resolution image.
In the embodiment of the invention, the pre-processed image is subjected to super-resolution denoising processing through a pre-trained human image super-resolution model, and the pre-processed image is input into the pre-trained human image super-resolution model, wherein the human image super-resolution model comprises a plurality of layers of depth separable convolution layers and sub-pixel convolution layers, the depth separable convolution layers comprise channel-by-channel convolution and point-by-point convolution and are used for extracting features, and the sub-pixel convolution layers are used for carrying out pixel recombination on input features and are convolution layers with an up-sampling function, which are applied to super-resolution reconstruction application. And carrying out feature extraction processing on the preprocessed image through multi-layer depth separable convolution to obtain multi-layer features, and carrying out up-sampling processing on the multi-layer features through a sub-pixel convolution layer to obtain a super-resolution image. Referring to fig. 4, the portrait super-resolution model is composed of a first layer of normal convolution layer, a plurality of layers of depth separable convolution, a second layer of normal convolution layer, and a sub-pixel convolution layer. According to the embodiment of the invention, through the definition processing of the human image super-resolution model on the input image, the noise point and the fuzzy area of the input image are removed, the interference of the noise point and the fuzzy on the animation model is reduced, and the subsequent animation effect is improved.
Further as an optional implementation manner, before the super-resolution denoising processing is performed on the preprocessed image through the pre-trained super-resolution model of the portrait, the method further includes pre-training the super-resolution model of the portrait, including:
Acquiring a portrait training data set;
Inputting the portrait training data set into the portrait super-resolution model to obtain a generated data set;
inputting the generated data set into a super-resolution discriminator network to obtain a generated data discrimination result;
and updating parameters of the portrait super-resolution model according to the generated data identification result.
In the embodiment of the invention, the pre-training human image super-resolution model is further included before super-division denoising processing is performed on the pre-processed image through the pre-training human image super-resolution model to obtain the super-division image. Specifically, the human figure training data set is acquired, and is input into the human figure super-resolution model for training. The embodiment of the invention can acquire high-definition portrait pictures through photographing equipment to construct a training data set, and can also acquire the training data set from a corresponding open-source portrait picture training database, wherein the training data set comprises paired portrait pictures, namely a low-definition portrait picture and a high-definition portrait picture, and the low-definition portrait picture is obtained by carrying out random JPEG compression, noise addition, fuzzy processing and scaling processing on the high-definition portrait picture. In addition, the embodiment of the invention uses a generating countermeasure network training method to carry out model training, namely, a portrait training data set is input into the portrait super-resolution model, and a generated data set after portrait super-resolution processing is obtained. And inputting the generated data set into a super-resolution discriminator network for discrimination to obtain a generated data discrimination result, and updating parameters of the human figure super-resolution model according to the generated data discrimination result. The super-resolution discriminator network structure may adopt classical network structures such as VGG16, VGG19 or ResNet. The generation countermeasure network training method is to take a super-resolution model of a human figure as a generator in the generation countermeasure network, identify data generated by the generator through a discriminator, and update parameters of the discriminator and the generator according to a discriminator result. Specifically, n super-resolution samples are extracted from a human figure training set, and n samples are generated by inputting common human figure training data distribution corresponding to the super-resolution samples into a super-resolution human figure model, namely a generator. The generator G is fixed and the discriminator D is trained to distinguish as far as possible between true and false. After the discriminator D is cyclically updated k times, the generator G is updated 1 time so that the discriminator is as indistinguishable as possible from true or false. Wherein n or k is a positive integer, and specific numerical values can be determined according to an actual training process. According to the embodiment of the invention, the super-resolution portrait model is trained by adopting the generation countermeasure network, so that the super-resolution effect of the super-resolution portrait model can be improved.
Further alternatively, referring to fig. 5, in step S104, the performing, by using a pre-trained animation model, animation processing on the super-resolution image to obtain an animated figure includes:
s401, inputting the superdivision image into the pre-trained cartoon model;
S402, performing feature extraction processing on the super-resolution image through an encoder to obtain a feature vector;
s403, converting the feature vector through a decoder to obtain the cartoon figure.
In the embodiment of the invention, the superdivision image is input into a pre-trained cartoon model, the cartoon model comprises an encoder and a decoder, the characteristic extraction processing is carried out on the superdivision image through the encoder to obtain a characteristic vector, and the conversion processing is carried out on the characteristic vector through the decoder to obtain the cartoon figure. After completing the cartoon processing, the embodiment of the invention removes the filling area and restores the image to the size of the input image according to the parameters of edge filling during the preprocessing in the step S102. Referring to FIG. 6, the animation model uses the Decoder-Encoder network architecture of class Unet in an embodiment of the invention. The Unet network structure is symmetrical and is similar to the English letter U, so the network structure is called Unet. The cartoon model in the embodiment of the invention comprises an encoder and a decoder, wherein the encoder consists of five layers of convolution blocks, the decoder also consists of five layers of convolution blocks, the encoder and the decoder are symmetrical, the encoder is used for extracting the characteristics, carrying out pooling treatment on the characteristics to reduce the characteristic dimension, the decoder is used for extracting the characteristics, carrying out up-sampling treatment on the characteristics to restore the characteristic dimension, and in each layer, the convolution blocks of the encoder and the decoder of the same layer are used for fusing the characteristics respectively extracted in a splicing way.
Further as an optional implementation manner, referring to fig. 7, before the performing, by using a pre-trained animation model, animation processing on the super-resolution image to obtain an animation portrait, the method further includes pre-training the animation model, and specifically includes:
S501, acquiring face image data, and performing data augmentation processing on the face image data to obtain cartoon training data;
S502, performing cartoon treatment on the cartoon training data through the cartoon model to obtain cartoon generation data;
S503, carrying out identification processing on the cartoon generation data through a cartoon identifier to obtain a cartoon identification result;
S504, updating parameters of the animation model according to the animation identification result.
In the embodiment of the invention, before the pre-trained cartoon model is used for carrying out the cartoon treatment on the super-resolution image to obtain the cartoon human image, the pre-trained cartoon model is also used for firstly acquiring human image data, and the cartoon model is constructed by adopting a generation countermeasure network training mode, wherein the training data are paired human face cartoon data, namely, the cartoon training data are paired human face cartoon data, namely, a common human face image and a corresponding human face cartoon image, and the paired human face cartoon data are subjected to the same data augmentation treatment to obtain the cartoon training data. According to the embodiment of the invention, the cartoon training data is subjected to cartoon treatment through the cartoon model to obtain the cartoon generation data, and then the cartoon identification result is obtained through the cartoon identifier to identify the cartoon generation data. The identifier network structure adopted by the cartoon identifier can adopt classical network structures such as VGG16, VGG19 or ResNet and the like. In addition, the loss function used by the cartoon model in the embodiment of the invention in training is as follows:
Ltotal=1*LMAE+10*LFace+0.05*Lpercep+0.1*LGAN
Wherein L MAE is the average absolute error loss of the generated cartoon image and the target cartoon image, L Face is the average absolute error loss of the face region in the generated cartoon image and the target cartoon image (the face region is obtained by the open face analysis BiSeNet model), L percep is the perception loss, and L GAN is the GAN (generator) loss.
According to the embodiment of the invention, when the cartoon model is constructed, the cartoon model trained based on paired face data can be directly applied to a whole-body image scene through data augmentation strategies such as zooming, rotation and edge filling, so that the difficulty in constructing the whole-body image cartoon model is reduced, the end-to-end cartoon processing model is obtained through training, and the end-to-end cartoon processing can be carried out without depending on the face key point detection and segmentation model, so that any portrait image uploaded by a user is converted into a cartoon image, and the cartoon processing effect is improved.
Further alternatively, referring to fig. 8, in the step S501, the performing data augmentation processing on the face image data to obtain cartoon training data includes:
S601, performing scaling and rotation processing on the face image data to obtain image augmentation data;
s602, performing edge filling processing on the image augmentation data to obtain cartoon training data.
In the embodiment of the invention, the scaling and rotation processing is carried out on the face image data to obtain the image augmentation data, the edge filling processing is carried out on the image augmentation data, the possible positions of the face in the real scene are simulated through scaling, rotation and edge filling processing, and the cartoon model can learn the mapping relation of the textures of the face region through the edge filling processing. According to the embodiment of the invention, the data augmentation processing is adopted to enable the cartoon model to learn the mapping relation of the textures of the face-removing region, so that the cartoon model trained based on the paired face data can be directly applied to a whole-body image scene, the construction difficulty of the whole-body image cartoon model is reduced, and the cartoon effect of the cartoon model is improved.
In one possible embodiment of the present invention, the portrait pictures uploaded by the user are first obtained and preprocessed. And performing super-resolution processing on the human images through a pre-trained human image super-resolution model to obtain high-definition human image pictures. And inputting the high-definition portrait picture into the cartoon model to obtain a cartoon image. In another possible embodiment, the portrait pictures uploaded by the user are obtained and preprocessed. And inputting the portrait picture into the cartoon model to obtain a cartoon image. In yet another possible embodiment, the portrait video uploaded by the user is acquired and preprocessed frame by frame. And performing super-division processing on the video frames frame by frame through a pre-trained portrait super-resolution model to obtain high-definition portrait frames. And inputting the high-definition portrait frame into the cartoon model to obtain the cartoon image frame. And sequentially writing the cartoon image frames into and encoding the cartoon image frames into a video to obtain a cartoon video result. In addition, in a feasible embodiment of the invention, the portrait video uploaded by the user is acquired and preprocessed frame by frame. And inputting the video frames into the cartoon model to obtain cartoon image frames. And sequentially writing the cartoon image frames into and encoding the cartoon image frames into a video to obtain a cartoon video result.
On the other hand, referring to fig. 9, the embodiment of the invention further provides a portrait animation processing device, which includes:
a first module 901, configured to obtain a portrait picture;
a second module 902, configured to perform preprocessing on the portrait image to obtain a preprocessed image;
a third module 903, configured to perform super-resolution denoising processing on the preprocessed image through a pre-trained portrait super-resolution model, so as to obtain a super-resolution image;
And a fourth module 904, configured to perform animation processing on the super-resolution image through a pre-trained animation model, so as to obtain an animation human image.
It can be understood that the content in the above embodiment of the method for processing the portrait animation is applicable to the embodiment of the device for processing the portrait animation, and the functions specifically realized by the embodiment of the device are the same as those of the embodiment of the method for processing the portrait animation, and the beneficial effects achieved by the embodiment of the method for processing the portrait animation are the same as those achieved by the embodiment of the method for processing the portrait animation.
Referring to fig. 10, an embodiment of the present invention further provides an electronic device including a processor 1002 and a memory 1001; the memory is used for storing programs; the processor executes the program to implement the method as described above.
It can be understood that the content in the above embodiment of the method for processing the portrait animation is applicable to the embodiment of the electronic device, and the functions specifically realized by the embodiment of the electronic device are the same as those of the embodiment of the method for processing the portrait animation, and the beneficial effects achieved by the embodiment of the method for processing the portrait animation are the same as those achieved by the embodiment of the method for processing the portrait animation.
Corresponding to the method of fig. 1, an embodiment of the present invention also provides a computer-readable storage medium storing a program to be executed by a processor to implement the method as described above.
Similarly, the content in the embodiment of the portrait animation processing method is applicable to the embodiment of the computer readable storage medium, and the functions of the embodiment of the computer readable storage medium are the same as those of the embodiment of the portrait animation processing method, and the beneficial effects achieved by the embodiment of the portrait animation processing method are the same as those achieved by the embodiment of the portrait animation processing method.
Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 1.
Similarly, the content in the embodiment of the portrait animation processing method is applicable to the embodiment of the computer program product or the computer program, and the functions of the embodiment of the computer program product or the embodiment of the computer program are the same as those of the embodiment of the portrait animation processing method, and the achieved beneficial effects are the same as those of the embodiment of the portrait animation processing method.
In summary, the embodiment of the invention has the following advantages: according to the embodiment of the invention, the super-resolution technology is combined with the animation technology, and the super-resolution technology is utilized to perform the definition processing on the input image so as to remove the noise points and the fuzzy areas of the input image, reduce the interference of the noise points and the fuzzy on the animation model and improve the animation effect. And in the process of constructing the cartoon model, the cartoon model trained based on the paired face data can be directly applied to a whole-body image scene through data augmentation strategies such as zooming, rotation and edge filling, so that the difficulty in constructing the whole-body image cartoon model is reduced. The end-to-end cartoon processing model provided by the embodiment of the invention can carry out end-to-end cartoon processing without depending on a face key point detection and segmentation model, thereby converting any image uploaded by a user into a cartoon image and improving the cartoon processing effect.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and these equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.

Claims (9)

1. A method for processing a portrait animation, the method comprising:
Acquiring a portrait picture;
preprocessing the portrait picture to obtain a preprocessed image;
Performing super-resolution denoising treatment on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image;
Performing cartoon treatment on the super-resolution image through a pre-trained cartoon model to obtain a cartoon figure;
the super-resolution denoising processing is carried out on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image, and the super-resolution denoising processing comprises the following steps:
inputting the preprocessed image into the pre-trained portrait super-resolution model;
performing feature extraction processing on the preprocessed image through multi-layer depth separable convolution to obtain multi-layer features;
And carrying out up-sampling treatment on the multi-layer features through the sub-pixel convolution layer to obtain a super-resolution image.
2. The method of claim 1, wherein preprocessing the portrait picture to obtain a preprocessed image comprises:
decoding the portrait picture to obtain image data;
Performing edge filling processing on the image data to obtain a filled image;
and carrying out normalization processing on the filling image to obtain a preprocessed image.
3. The method of claim 1, wherein prior to the super-resolution denoising of the preprocessed image by the pre-trained portrait super-resolution model, the method further comprises pre-training the portrait super-resolution model, comprising:
Acquiring a portrait training data set;
Inputting the portrait training data set into the portrait super-resolution model to obtain a generated data set;
inputting the generated data set into a super-resolution discriminator network to obtain a generated data discrimination result;
and updating parameters of the portrait super-resolution model according to the generated data identification result.
4. The method according to claim 1, wherein said performing a cartoon process on said super-resolution image by means of a pre-trained cartoon model to obtain a cartoon figure comprises:
Inputting the superdivision image into the pre-trained cartoon model;
performing feature extraction processing on the super-resolution image through an encoder to obtain a feature vector;
and converting the feature vector through a decoder to obtain the cartoon figure.
5. The method according to claim 1, wherein before the performing the cartoon processing on the super-resolution image through the pre-trained cartoon model to obtain a cartoon figure, the method further includes pre-training the cartoon model, specifically including:
Acquiring face image data, and performing data augmentation processing on the face image data to obtain cartoon training data;
performing cartoon treatment on the cartoon training data through the cartoon model to obtain cartoon generation data;
the cartoon generation data is identified by a cartoon identifier, so that a cartoon identification result is obtained;
And updating parameters of the cartoon model according to the cartoon identification result.
6. The method of claim 5, wherein the performing data augmentation processing on the face image data to obtain cartoon training data comprises:
scaling and rotating the face image data to obtain image augmentation data;
And performing edge filling processing on the image augmentation data to obtain cartoon training data.
7. A portrait animation processing device, the device comprising:
the first module is used for acquiring a portrait picture;
the second module is used for preprocessing the portrait picture to obtain a preprocessed image;
the third module is used for performing super-resolution denoising processing on the preprocessed image through a pre-trained human image super-resolution model to obtain a super-resolution image;
A fourth module, configured to perform cartoon processing on the super-resolution image through a pre-trained cartoon model, so as to obtain a cartoon figure;
The third module is configured to perform super-resolution denoising processing on the preprocessed image through a pre-trained portrait super-resolution model, to obtain a super-resolution image, and includes:
inputting the preprocessed image into the pre-trained portrait super-resolution model;
performing feature extraction processing on the preprocessed image through multi-layer depth separable convolution to obtain multi-layer features;
And carrying out up-sampling treatment on the multi-layer features through the sub-pixel convolution layer to obtain a super-resolution image.
8. An electronic device comprising a memory and a processor;
the memory is used for storing programs;
The processor executing the program implements the method of any one of claims 1 to 6.
9. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 6.
CN202310854234.1A 2023-07-12 2023-07-12 Portrait cartoon processing method, device, equipment and storage medium Active CN116912345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310854234.1A CN116912345B (en) 2023-07-12 2023-07-12 Portrait cartoon processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310854234.1A CN116912345B (en) 2023-07-12 2023-07-12 Portrait cartoon processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116912345A CN116912345A (en) 2023-10-20
CN116912345B true CN116912345B (en) 2024-04-26

Family

ID=88354272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310854234.1A Active CN116912345B (en) 2023-07-12 2023-07-12 Portrait cartoon processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116912345B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537628A (en) * 2015-01-16 2015-04-22 厦门大学 Animation image adaptation method with controllable redundancy
CN207010728U (en) * 2017-08-08 2018-02-13 灵然创智(天津)动画科技发展有限公司 A kind of long-distance cloud caricature producing device for supporting 2048 grades of pressure sensitivity
CN108564127A (en) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
CN109859295A (en) * 2019-02-01 2019-06-07 厦门大学 A kind of specific animation human face generating method, terminal device and storage medium
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network
WO2020199478A1 (en) * 2019-04-03 2020-10-08 平安科技(深圳)有限公司 Method for training image generation model, image generation method, device and apparatus, and storage medium
WO2022116161A1 (en) * 2020-12-04 2022-06-09 深圳市优必选科技股份有限公司 Portrait cartooning method, robot, and storage medium
CN115115510A (en) * 2022-05-31 2022-09-27 腾讯科技(深圳)有限公司 Image processing method, system, storage medium and terminal equipment
CN116152631A (en) * 2023-02-28 2023-05-23 商汤国际私人有限公司 Model training and image processing method, device, equipment and storage medium
CN116228528A (en) * 2023-03-15 2023-06-06 成都龙渊网络科技有限公司 Cartoon head portrait generation method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537628A (en) * 2015-01-16 2015-04-22 厦门大学 Animation image adaptation method with controllable redundancy
CN207010728U (en) * 2017-08-08 2018-02-13 灵然创智(天津)动画科技发展有限公司 A kind of long-distance cloud caricature producing device for supporting 2048 grades of pressure sensitivity
CN108564127A (en) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
CN109859295A (en) * 2019-02-01 2019-06-07 厦门大学 A kind of specific animation human face generating method, terminal device and storage medium
WO2020199478A1 (en) * 2019-04-03 2020-10-08 平安科技(深圳)有限公司 Method for training image generation model, image generation method, device and apparatus, and storage medium
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network
WO2022116161A1 (en) * 2020-12-04 2022-06-09 深圳市优必选科技股份有限公司 Portrait cartooning method, robot, and storage medium
CN115115510A (en) * 2022-05-31 2022-09-27 腾讯科技(深圳)有限公司 Image processing method, system, storage medium and terminal equipment
CN116152631A (en) * 2023-02-28 2023-05-23 商汤国际私人有限公司 Model training and image processing method, device, equipment and storage medium
CN116228528A (en) * 2023-03-15 2023-06-06 成都龙渊网络科技有限公司 Cartoon head portrait generation method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Large-scale propagation of ultrasound in a 3-D breast model based on high-resolution MRI data;Salahura Gheorghe等;《 IEEE transactions on bio-medical engineering》;第57卷(第6期);73-84 *
残差最小化的遥感影像边缘锐化;杨心雨等;《遥感信息》;第36卷(第4期);142-150 *
沈晔湖 ; 貊睿 ; 高巍 ; 魏磊 ; 朱怡 ; 彭振云 ; .用于个性化人脸动漫生成的自动头发提取方法.计算机辅助设计与图形学学报.2010,(第11期),34-40. *
用于个性化人脸动漫生成的自动头发提取方法;沈晔湖;貊睿;高巍;魏磊;朱怡;彭振云;;计算机辅助设计与图形学学报(第11期);34-40 *

Also Published As

Publication number Publication date
CN116912345A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US20210067848A1 (en) End to End Network Model for High Resolution Image Segmentation
US11200424B2 (en) Space-time memory network for locating target object in video content
CN110148085B (en) Face image super-resolution reconstruction method and computer readable storage medium
Zhang et al. Adaptive residual networks for high-quality image restoration
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
CN112598579B (en) Monitoring scene-oriented image super-resolution method, device and storage medium
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
US20230021661A1 (en) Forgery detection of face image
CN110009013A (en) Encoder training and characterization information extracting method and device
US20220044366A1 (en) Generating an image mask for a digital image by utilizing a multi-branch masking pipeline with neural networks
US11393100B2 (en) Automatically generating a trimap segmentation for a digital image by utilizing a trimap generation neural network
CN109636721B (en) Video super-resolution method based on countermeasure learning and attention mechanism
Long et al. Bishift networks for thick cloud removal with multitemporal remote sensing images
CN111914654B (en) Text layout analysis method, device, equipment and medium
Gong et al. Combining sparse representation and local rank constraint for single image super resolution
CN111079764A (en) Low-illumination license plate image recognition method and device based on deep learning
Cao et al. New architecture of deep recursive convolution networks for super-resolution
CN114694074A (en) Method, device and storage medium for generating video by using image
Salman et al. Image Enhancement using Convolution Neural Networks
CN111507950B (en) Image segmentation method and device, electronic equipment and computer-readable storage medium
Tian et al. Deformable convolutional network constrained by contrastive learning for underwater image enhancement
Hua et al. Image super resolution using fractal coding and residual network
CN116912345B (en) Portrait cartoon processing method, device, equipment and storage medium
Jia et al. Learning rich information for quad bayer remosaicing and denoising
CN112052863B (en) Image detection method and device, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant