CN111899154A - Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium - Google Patents

Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium Download PDF

Info

Publication number
CN111899154A
CN111899154A CN202010590178.1A CN202010590178A CN111899154A CN 111899154 A CN111899154 A CN 111899154A CN 202010590178 A CN202010590178 A CN 202010590178A CN 111899154 A CN111899154 A CN 111899154A
Authority
CN
China
Prior art keywords
picture
network
cartoon
loss
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010590178.1A
Other languages
Chinese (zh)
Inventor
雷杨
刘鹏
黄跃中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Mengying Animation Network Technology Co ltd
Original Assignee
Guangzhou Mengying Animation Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Mengying Animation Network Technology Co ltd filed Critical Guangzhou Mengying Animation Network Technology Co ltd
Priority to CN202010590178.1A priority Critical patent/CN111899154A/en
Publication of CN111899154A publication Critical patent/CN111899154A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a cartoon video generation method, a cartoon generation device, cartoon generation equipment and a cartoon generation medium, wherein the cartoon video generation method comprises the following steps: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon. The cartoon picture can be subjected to video processing to obtain a cartoon video, and through the confrontation training process of the generator network and the discriminator network, the noise of the cartoon picture is reduced and the conversion effect is improved.

Description

Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium
Technical Field
The present invention relates to electronic technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a short video.
Background
With the rise and spread of short videos, users have an increasing demand for shooting short videos of different styles and styles, wherein short videos of cartoon style are particularly favored by users. At present, short videos with cartoon styles have the problems of poor generation effect, insufficient approximation to real cartoon effect and the like.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an efficient caricature generation method for solving the problem of caricature-style short video authoring.
To achieve the above and other related objects, the present invention provides a cartoon generating method, including: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon.
Optionally, the generator network includes a first generator sub-network for generating the actual picture into the converted cartoon picture and a second generator sub-network for generating the converted cartoon picture into the converted actual picture, and the discriminator network includes a first discriminator sub-network for discriminating whether the converted cartoon picture is a true cartoon and a second discriminator sub-network for discriminating whether the converted actual picture is a true actual picture.
Optionally, the processing procedure of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.
Optionally, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.
Optionally, the second immunity to loss comprises:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively.
Optionally, the second pair of immunity losses includes a variation loss, and the variation loss includes:
Figure BDA0002555142580000021
Figure BDA0002555142580000022
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,
Figure BDA0002555142580000023
the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant.
Optionally, the second immunity loss comprises a content loss, and the content loss comprises:
Figure BDA0002555142580000024
wherein lcontentRepresenting content loss, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure BDA0002555142580000025
is represented by1Regularization, Real, represents the actual picture.
Optionally, the second immunity loss comprises a style loss, and the style loss comprises:
Figure BDA0002555142580000026
wherein lstyleRepresenting loss of style, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure BDA0002555142580000027
is represented by1And (4) regularization, wherein Target represents a conversion cartoon picture.
Optionally, the second pair of immunity losses further comprises a consistency loss, the consistency loss comprising:
Figure BDA0002555142580000028
Figure BDA0002555142580000029
wherein ltargetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,
Figure BDA00025551425800000210
is represented by1And (4) regularization, wherein Real represents an actual picture, and Target represents a conversion cartoon picture.
Optionally, the second countermeasure loss further includes a loopback loss, and the loopback loss includes:
Figure BDA00025551425800000211
wherein lcycleShowing loop back loss, Target showing a conversion cartoon picture,
Figure BDA00025551425800000212
is represented by1Regularization, Gab and Gba, respectively, represent the first and second generator sub-networks.
Optionally, the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:
Figure BDA00025551425800000213
wherein lattIndicating a loss of attention; eyepatch, NosePatch, lipsppatch represent eye, nose, lip regions detected in the actual picture, respectively; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch.
Optionally, the first countermeasure loss comprises a generator countermeasure loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
Optionally, the first immunity to loss comprises:
lG=ladvab+ladvba1(latt+lcycle)+a2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention.
A cartoon video generation method comprises the following steps: generating and processing the actual picture to generate a converted cartoon picture; performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.
A comic generation apparatus comprising: the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network; and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.
An apparatus, comprising: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.
One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform one or more of the methods described.
As described above, the cartoon video generation method, the cartoon generation device, the cartoon generation equipment and the cartoon generation medium provided by the invention have the following beneficial effects:
the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a high-quality cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.
Drawings
Fig. 1 is a schematic flow chart of a caricature video generation method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a generator network structure according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a network structure of a discriminator according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a caricature video generation apparatus according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment.
Fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.
Description of the element reference numerals
1100 input device
1101 first processor
1102 output device
1103 first memory
1104 communication bus
1200 processing assembly
1201 second processor
1202 second memory
1203 communication assembly
1204 Power supply Assembly
1205 multimedia assembly
1206 voice assembly
1207 input/output interface
1208 sensor assembly
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, the present invention provides a cartoon generating method, including:
s1: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;
s2: training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;
s3: and converting the actual picture through the trained generator network to finish generation of the cartoon. In the training process, the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a nearly true cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.
In some implementations, the generator network comprises a first generator subnetwork Gab for generating the actual picture as a conversion cartoon picture and a second generator subnetwork Gba for converting the cartoon picture into a conversion actual picture, the discriminator network comprising a first discriminator subnetwork Da for discriminating whether the conversion cartoon picture is a true cartoon and a second discriminator subnetwork Db for discriminating whether the conversion actual picture is a true actual picture. For example, the generator network and the discriminator network form a similar framework of CycleGAN, Gab and Gba use the same generator network structure, and Da and Db use the same discriminator network structure.
Referring to fig. 2, the process of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing. Referring to fig. 3, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.
ReflectionPad2d (padding _ size): standard mirror image filling neurons, namely for 2-dimensional input data, filling data with padding _ size width close to the edge to the periphery of the input data after mirror image;
conv2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size): the standard 2-dimensional convolution neuron performs convolution on data with the number of input channels in _ channels by using a convolution kernel with the size of kernel _ size multiplied by kernel _ size, outputs result data of out _ channels, stride refers to convolution steps, namely convolution is performed on every other point of the input data, and padding _ size refers to the width of filling 0 to the edge of the input data;
IN: standard Instance Normalization neurons, which refers to normalizing each channel of data by channel;
ReLU: neurons using a linear rectifying function as an activation function;
ConvTranspose2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size, variance): the standard 2-dimensional deconvolution neuron can enlarge smaller input data step by step through deconvolution to output data with larger size, wherein in _ channels represents the number of input channels, out _ channels represents the number of output channels, kernel _ size is the size of a convolution kernel, stride represents the convolution step size, padding _ size represents the size of an edge complement 0 of the input data, and dispation represents the number of 0 filled in expansion among elements of the convolution kernel;
BN: batch Normalization, which means that a Batch of data is normalized at the same time, is represented;
residual block (8): representing 8 Residual block networks connected in series, wherein the structure of each Residual block is consistent and comprises 2 Conv2D + BN operations, and each Residual block can be represented by the following mathematical formula:
ResidualBlockOut=Conv2D(BN(Conv2D(BN(input))))+input
wherein, input is input data, and ResidualblockOut is output of each residual block network;
sigmoid: network layer with Sigmoid function as activation function
LeakyReLU: a standard network layer with a modified version of the ReLU (linear rectification) function as the activation function.
In some implementations, the second pair of damage resistances includes:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively. When the discrimination ability of Da, Db is stronger,/abAnd lbaThe smaller the loss of (a), and thereforeabAnd lbaMay be used to update the web learning parameters for Db and Da, respectively.
In some implementations, the second pair of loss resistances comprises a variation loss comprising:
Figure BDA0002555142580000071
Figure BDA0002555142580000072
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,
Figure BDA0002555142580000073
the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant. The variation loss reflects that the smaller the variation loss is, the better the variation loss is, that is, the smaller the gradient change in the x and y directions in the generated conversion cartoon picture is, the better the variation loss is. When the variation loss is large, the fluctuation range between the generated picture pixel points is large, and the noise is more. When the variation loss is completely 0, the generated conversion cartoon picture can become a pure color without change, and in order to balance the negative effect brought by the variation loss, the algorithm sets a hyper-parameter gamma for inhibiting the negative effect brought by the variation function.
In some implementations, the second pair of impairments includes a content loss comprising:
Figure BDA0002555142580000074
wherein lcontentRepresenting content loss, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure BDA0002555142580000075
is represented by1Regularization, Real, represents the actual picture. The content loss reflects the content loss after the characteristics of the actual picture and the generated conversion cartoon picture are extracted through the VGG network, and the smaller the content loss is, the closer the content of the conversion cartoon picture is to the actual picture is.
In some implementations, the second pair of immunity losses includes a style loss, the style loss including:
Figure BDA0002555142580000076
wherein lstyleRepresenting loss of style, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure BDA0002555142580000077
is represented by1And (4) regularization, wherein Target represents a conversion cartoon picture. The style loss reflects the style consistency between the converted cartoon picture and the converted actual picture, the style characteristics are obtained through extraction of a VGG neural network, and the style retention effect of Gba is stronger when the style loss is less.
In some implementations, the second pair of immunity losses further includes a loss of coherency comprising:
Figure BDA0002555142580000078
Figure BDA0002555142580000079
wherein ltargetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,
Figure BDA00025551425800000710
is represented by1Regularization, Real represents an actual picture, Target represents a conversion cartoon picture, Gab (the actual picture generates the conversion cartoon picture) represents that the cartoon is still a cartoon after being processed, and other contents are not generated, Gba (the conversion cartoon picture generates the conversion actual picture) represents that the cartoon is still an actual picture after being processed, and other contents are not generated, and ltarget、lRealThe consistency of Gab and Gba to the converted cartoon picture and the actual picture is ensured.
In some implementations, the second pair of immunity losses further includes a loopback loss, the loopback loss including:
Figure BDA0002555142580000081
wherein lcycleShowing loop back loss, Target showing a conversion cartoon picture,
Figure BDA0002555142580000082
is represented by1And regularization, wherein Gab and Gba respectively represent a first generator sub-network and a second generator sub-network, which represent that the converted cartoon picture can be converted into a converted actual picture, then the converted actual picture can still be converted back into the cartoon picture, and the generator network can completely learn the distribution relationship between the real picture and the cartoon picture due to loopback loss.
In some implementations, the second pair of resistance losses further includes a loss of attention, the loss of attention including:
Figure BDA0002555142580000083
wherein lattIndicating a loss of attention; EyesPatch, NosePatch and LipspPatch represent realEye, nose, lip regions detected in the inter-picture; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch. The human face five sense organ areas are detected by using a human face detection algorithm during each training, and the facial organ image blocks are used as calculation targets of loss functions, so that the generated cartoon can keep the human face five sense organ characteristics as much as possible, and the characteristic effect and style effect of converting the five sense organ characteristics of the cartoon are improved.
In some implementations, the first countermeasure loss includes a generator countermeasure loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
In some implementations, the first pair of damage resistances includes:
lG=ladvab+ladvba1(latt+lcycle)+α2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention. By integrating the loss functions, the trained generator network can well consider all factors influencing the generation effect of the cartoon, and after training is completed, the generated converted cartoon picture can be fused with the cartoon characteristics on the cartoon picture and can retain the content of the original actual picture. Meanwhile, an attention mechanism is creatively used for the face picture, so that the five sense organs of the face can be well reserved in the generated cartoon, the problem that the face is easy to deform due to a general cartoon face generation algorithm is solved, in addition, an improved variation loss function is introduced, and the generated picture is prevented from generating larger noise.
The invention also provides a cartoon video generation method, which comprises the following steps:
generating and processing the actual picture to generate a converted cartoon picture;
performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering. For example, in a short video or online social application scene, a user takes or uploads a photo, generates and processes an actual picture to generate a cartoon picture, performs style migration to convert the cartoon picture, and generates a cartoon video with the style through video processing, so that the interestingness of the short video and the user experience are improved.
The invention provides a cartoon generating device, comprising:
the training module 1 is configured to obtain a conversion cartoon picture from an actual picture through training, where the training includes: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;
and the conversion module 2 is used for converting the actual picture through the trained generator network to complete generation of the cartoon.
An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: smart phones, tablet computers, laptop portable computers, car computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like, and the embodiments of the present application are not limited to specific devices.
The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the data processing method in fig. 4 according to the present embodiment.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.
In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 6 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 4 in the above embodiment.
The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.
The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.
The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.
The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.
The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 6 can be implemented as the input device in the embodiment of fig. 5.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (16)

1. A cartoon generating method, comprising:
obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;
training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;
and converting the actual picture through the trained generator network to finish generation of the cartoon.
2. A caricature generation method according to claim 1, wherein the generator network comprises a first generator subnetwork for generating the actual picture as a converted caricature picture and a second generator subnetwork for converting the caricature picture into a converted actual picture, and wherein the discriminator network comprises a first discriminator subnetwork for discriminating whether the converted caricature picture is a true caricature and a second subnetwork for discriminating whether the converted actual picture is a true caricature.
3. A caricature generation method according to claim 2, wherein the processing of the generator network and the discriminator network respectively comprises: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.
4. The caricature generation method of claim 2, wherein the second pair of damage resistances comprises:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively.
5. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a variational loss comprising:
Figure FDA0002555142570000011
Figure FDA0002555142570000012
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,
Figure FDA0002555142570000013
the gradient in the x direction and the gradient in the y direction are respectively obtained for the results of the first generator sub-network Gab and the second generator sub-network Gba, Real represents an actual picture, Target represents a conversion cartoon picture, and gamma is a constant.
6. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a content loss comprising:
Figure FDA0002555142570000021
wherein lcontentRepresenting content loss, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure FDA0002555142570000022
is represented by1Regularization, Real, represents the actual picture.
7. The caricature generation method of claim 2, wherein the second pair of resistive losses comprises a style loss comprising:
Figure FDA0002555142570000023
wherein lstyleRepresenting loss of style, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,
Figure FDA00025551425700000210
is represented by1And (4) regularization, wherein Target represents a conversion cartoon picture.
8. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a consistency loss, the consistency loss comprising:
Figure FDA0002555142570000024
Figure FDA0002555142570000025
wherein ltargetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,
Figure FDA0002555142570000026
is represented by1And (4) regularization, wherein Real represents an actual picture, and Target represents a conversion cartoon picture.
9. The caricature generation method of claim 2, wherein the second confrontation loss further comprises a loopback loss, the loopback loss comprising:
Figure FDA0002555142570000027
wherein lcycleShowing loop back loss, Target showing a conversion cartoon picture,
Figure FDA0002555142570000029
is represented by1Regularization, Gab and Gba, respectively, represent the first and second generator sub-networks.
10. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:
Figure FDA0002555142570000028
wherein lattIndicating a loss of attention; eyepatch, NosePatch, lipsppatch represent eye, nose, lip regions detected in the actual picture, respectively; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch.
11. The caricature generation method of claim 2, wherein the first counter-loss comprises a generator counter-loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
12. A caricature generation method according to any of claims 5 to 11, wherein the first pair of loss resistances comprises:
lG=ladvab+ladvba1(latt+lcycle)+α2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention.
13. A cartoon video generation method is characterized by comprising the following steps:
generating and processing the actual picture to generate a converted cartoon picture;
performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.
14. A cartoon generating apparatus, comprising:
the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;
and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.
15. An apparatus, comprising:
one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-13.
16. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-13.
CN202010590178.1A 2020-06-24 2020-06-24 Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium Pending CN111899154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010590178.1A CN111899154A (en) 2020-06-24 2020-06-24 Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010590178.1A CN111899154A (en) 2020-06-24 2020-06-24 Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium

Publications (1)

Publication Number Publication Date
CN111899154A true CN111899154A (en) 2020-11-06

Family

ID=73207974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010590178.1A Pending CN111899154A (en) 2020-06-24 2020-06-24 Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium

Country Status (1)

Country Link
CN (1) CN111899154A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330956A (en) * 2017-07-03 2017-11-07 广东工业大学 A kind of unsupervised painting methods of caricature manual draw and device
CN108805962A (en) * 2018-05-29 2018-11-13 广州梦映动漫网络科技有限公司 A kind of generation method and electronic equipment of dynamic caricature
CN109087380A (en) * 2018-08-02 2018-12-25 咪咕文化科技有限公司 A kind of caricature cardon generation method, device and storage medium
CN109800732A (en) * 2019-01-30 2019-05-24 北京字节跳动网络技术有限公司 The method and apparatus for generating model for generating caricature head portrait
CN110097086A (en) * 2019-04-03 2019-08-06 平安科技(深圳)有限公司 Image generates model training method, image generating method, device, equipment and storage medium
US20190333198A1 (en) * 2018-04-25 2019-10-31 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
CN111160264A (en) * 2019-12-30 2020-05-15 中山大学 Cartoon figure identity recognition method based on generation of confrontation network
CN111311713A (en) * 2020-02-24 2020-06-19 咪咕视讯科技有限公司 Cartoon processing method, cartoon display device, cartoon terminal and cartoon storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330956A (en) * 2017-07-03 2017-11-07 广东工业大学 A kind of unsupervised painting methods of caricature manual draw and device
US20190333198A1 (en) * 2018-04-25 2019-10-31 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
CN108805962A (en) * 2018-05-29 2018-11-13 广州梦映动漫网络科技有限公司 A kind of generation method and electronic equipment of dynamic caricature
CN109087380A (en) * 2018-08-02 2018-12-25 咪咕文化科技有限公司 A kind of caricature cardon generation method, device and storage medium
CN109800732A (en) * 2019-01-30 2019-05-24 北京字节跳动网络技术有限公司 The method and apparatus for generating model for generating caricature head portrait
CN110097086A (en) * 2019-04-03 2019-08-06 平安科技(深圳)有限公司 Image generates model training method, image generating method, device, equipment and storage medium
CN111160264A (en) * 2019-12-30 2020-05-15 中山大学 Cartoon figure identity recognition method based on generation of confrontation network
CN111311713A (en) * 2020-02-24 2020-06-19 咪咕视讯科技有限公司 Cartoon processing method, cartoon display device, cartoon terminal and cartoon storage medium

Similar Documents

Publication Publication Date Title
US10817705B2 (en) Method, apparatus, and system for resource transfer
CN106682632B (en) Method and device for processing face image
US11158057B2 (en) Device, method, and graphical user interface for processing document
US20220237812A1 (en) Item display method, apparatus, and device, and storage medium
CN111444826B (en) Video detection method, device, storage medium and computer equipment
CN109118447B (en) Picture processing method, picture processing device and terminal equipment
CN108509994B (en) Method and device for clustering character images
CN108776822B (en) Target area detection method, device, terminal and storage medium
CN111047509A (en) Image special effect processing method and device and terminal
CN111062276A (en) Human body posture recommendation method and device based on human-computer interaction, machine readable medium and equipment
CN107977636B (en) Face detection method and device, terminal and storage medium
CN111984803B (en) Multimedia resource processing method and device, computer equipment and storage medium
CN112381707B (en) Image generation method, device, equipment and storage medium
CN111753498A (en) Text processing method, device, equipment and storage medium
CN105608430A (en) Face clustering method and device
CN111325220B (en) Image generation method, device, equipment and storage medium
US11295416B2 (en) Method for picture processing, computer-readable storage medium, and electronic device
CN115239860A (en) Expression data generation method and device, electronic equipment and storage medium
CN108985215B (en) Picture processing method, picture processing device and terminal equipment
CN110929159A (en) Resource delivery method, device, equipment and medium
CN113642359B (en) Face image generation method and device, electronic equipment and storage medium
EP4303815A1 (en) Image processing method, electronic device, storage medium, and program product
CN111899154A (en) Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium
CN111818364B (en) Video fusion method, system, device and medium
CN112381064B (en) Face detection method and device based on space-time diagram convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination