CN111899154A - Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium - Google Patents
Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium Download PDFInfo
- Publication number
- CN111899154A CN111899154A CN202010590178.1A CN202010590178A CN111899154A CN 111899154 A CN111899154 A CN 111899154A CN 202010590178 A CN202010590178 A CN 202010590178A CN 111899154 A CN111899154 A CN 111899154A
- Authority
- CN
- China
- Prior art keywords
- picture
- network
- cartoon
- loss
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 54
- 238000006243 chemical reaction Methods 0.000 claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 41
- 230000006735 deficit Effects 0.000 claims abstract description 9
- 230000036039 immunity Effects 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 238000009877 rendering Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 15
- 230000008569 process Effects 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 210000000697 sensory organ Anatomy 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000003042 antagnostic effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/001—Texturing; Colouring; Generation of texture or colour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a cartoon video generation method, a cartoon generation device, cartoon generation equipment and a cartoon generation medium, wherein the cartoon video generation method comprises the following steps: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon. The cartoon picture can be subjected to video processing to obtain a cartoon video, and through the confrontation training process of the generator network and the discriminator network, the noise of the cartoon picture is reduced and the conversion effect is improved.
Description
Technical Field
The present invention relates to electronic technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a short video.
Background
With the rise and spread of short videos, users have an increasing demand for shooting short videos of different styles and styles, wherein short videos of cartoon style are particularly favored by users. At present, short videos with cartoon styles have the problems of poor generation effect, insufficient approximation to real cartoon effect and the like.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an efficient caricature generation method for solving the problem of caricature-style short video authoring.
To achieve the above and other related objects, the present invention provides a cartoon generating method, including: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon.
Optionally, the generator network includes a first generator sub-network for generating the actual picture into the converted cartoon picture and a second generator sub-network for generating the converted cartoon picture into the converted actual picture, and the discriminator network includes a first discriminator sub-network for discriminating whether the converted cartoon picture is a true cartoon and a second discriminator sub-network for discriminating whether the converted actual picture is a true actual picture.
Optionally, the processing procedure of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.
Optionally, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.
Optionally, the second immunity to loss comprises:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively.
Optionally, the second pair of immunity losses includes a variation loss, and the variation loss includes:
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant.
Optionally, the second immunity loss comprises a content loss, and the content loss comprises:
wherein lcontentRepresenting content loss, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,is represented by1Regularization, Real, represents the actual picture.
Optionally, the second immunity loss comprises a style loss, and the style loss comprises:
wherein lstyleRepresenting loss of style, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,is represented by1And (4) regularization, wherein Target represents a conversion cartoon picture.
Optionally, the second pair of immunity losses further comprises a consistency loss, the consistency loss comprising:
wherein ltargetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,is represented by1And (4) regularization, wherein Real represents an actual picture, and Target represents a conversion cartoon picture.
Optionally, the second countermeasure loss further includes a loopback loss, and the loopback loss includes:
wherein lcycleShowing loop back loss, Target showing a conversion cartoon picture,is represented by1Regularization, Gab and Gba, respectively, represent the first and second generator sub-networks.
Optionally, the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:
wherein lattIndicating a loss of attention; eyepatch, NosePatch, lipsppatch represent eye, nose, lip regions detected in the actual picture, respectively; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch.
Optionally, the first countermeasure loss comprises a generator countermeasure loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
Optionally, the first immunity to loss comprises:
lG=ladvab+ladvba+α1(latt+lcycle)+a2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention.
A cartoon video generation method comprises the following steps: generating and processing the actual picture to generate a converted cartoon picture; performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.
A comic generation apparatus comprising: the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network; and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.
An apparatus, comprising: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.
One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform one or more of the methods described.
As described above, the cartoon video generation method, the cartoon generation device, the cartoon generation equipment and the cartoon generation medium provided by the invention have the following beneficial effects:
the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a high-quality cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.
Drawings
Fig. 1 is a schematic flow chart of a caricature video generation method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a generator network structure according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a network structure of a discriminator according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a caricature video generation apparatus according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment.
Fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.
Description of the element reference numerals
1100 input device
1101 first processor
1102 output device
1103 first memory
1104 communication bus
1200 processing assembly
1201 second processor
1202 second memory
1203 communication assembly
1204 Power supply Assembly
1205 multimedia assembly
1206 voice assembly
1207 input/output interface
1208 sensor assembly
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, the present invention provides a cartoon generating method, including:
s1: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;
s2: training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;
s3: and converting the actual picture through the trained generator network to finish generation of the cartoon. In the training process, the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a nearly true cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.
In some implementations, the generator network comprises a first generator subnetwork Gab for generating the actual picture as a conversion cartoon picture and a second generator subnetwork Gba for converting the cartoon picture into a conversion actual picture, the discriminator network comprising a first discriminator subnetwork Da for discriminating whether the conversion cartoon picture is a true cartoon and a second discriminator subnetwork Db for discriminating whether the conversion actual picture is a true actual picture. For example, the generator network and the discriminator network form a similar framework of CycleGAN, Gab and Gba use the same generator network structure, and Da and Db use the same discriminator network structure.
Referring to fig. 2, the process of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing. Referring to fig. 3, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.
ReflectionPad2d (padding _ size): standard mirror image filling neurons, namely for 2-dimensional input data, filling data with padding _ size width close to the edge to the periphery of the input data after mirror image;
conv2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size): the standard 2-dimensional convolution neuron performs convolution on data with the number of input channels in _ channels by using a convolution kernel with the size of kernel _ size multiplied by kernel _ size, outputs result data of out _ channels, stride refers to convolution steps, namely convolution is performed on every other point of the input data, and padding _ size refers to the width of filling 0 to the edge of the input data;
IN: standard Instance Normalization neurons, which refers to normalizing each channel of data by channel;
ReLU: neurons using a linear rectifying function as an activation function;
ConvTranspose2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size, variance): the standard 2-dimensional deconvolution neuron can enlarge smaller input data step by step through deconvolution to output data with larger size, wherein in _ channels represents the number of input channels, out _ channels represents the number of output channels, kernel _ size is the size of a convolution kernel, stride represents the convolution step size, padding _ size represents the size of an edge complement 0 of the input data, and dispation represents the number of 0 filled in expansion among elements of the convolution kernel;
BN: batch Normalization, which means that a Batch of data is normalized at the same time, is represented;
residual block (8): representing 8 Residual block networks connected in series, wherein the structure of each Residual block is consistent and comprises 2 Conv2D + BN operations, and each Residual block can be represented by the following mathematical formula:
ResidualBlockOut=Conv2D(BN(Conv2D(BN(input))))+input
wherein, input is input data, and ResidualblockOut is output of each residual block network;
sigmoid: network layer with Sigmoid function as activation function
LeakyReLU: a standard network layer with a modified version of the ReLU (linear rectification) function as the activation function.
In some implementations, the second pair of damage resistances includes:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively. When the discrimination ability of Da, Db is stronger,/abAnd lbaThe smaller the loss of (a), and thereforeabAnd lbaMay be used to update the web learning parameters for Db and Da, respectively.
In some implementations, the second pair of loss resistances comprises a variation loss comprising:
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant. The variation loss reflects that the smaller the variation loss is, the better the variation loss is, that is, the smaller the gradient change in the x and y directions in the generated conversion cartoon picture is, the better the variation loss is. When the variation loss is large, the fluctuation range between the generated picture pixel points is large, and the noise is more. When the variation loss is completely 0, the generated conversion cartoon picture can become a pure color without change, and in order to balance the negative effect brought by the variation loss, the algorithm sets a hyper-parameter gamma for inhibiting the negative effect brought by the variation function.
In some implementations, the second pair of impairments includes a content loss comprising:
wherein lcontentRepresenting content loss, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,is represented by1Regularization, Real, represents the actual picture. The content loss reflects the content loss after the characteristics of the actual picture and the generated conversion cartoon picture are extracted through the VGG network, and the smaller the content loss is, the closer the content of the conversion cartoon picture is to the actual picture is.
In some implementations, the second pair of immunity losses includes a style loss, the style loss including:
wherein lstyleRepresenting loss of style, VGGnRepresenting the first n layers through the VGG neural network as a feature extraction network,is represented by1And (4) regularization, wherein Target represents a conversion cartoon picture. The style loss reflects the style consistency between the converted cartoon picture and the converted actual picture, the style characteristics are obtained through extraction of a VGG neural network, and the style retention effect of Gba is stronger when the style loss is less.
In some implementations, the second pair of immunity losses further includes a loss of coherency comprising:
wherein ltargetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,is represented by1Regularization, Real represents an actual picture, Target represents a conversion cartoon picture, Gab (the actual picture generates the conversion cartoon picture) represents that the cartoon is still a cartoon after being processed, and other contents are not generated, Gba (the conversion cartoon picture generates the conversion actual picture) represents that the cartoon is still an actual picture after being processed, and other contents are not generated, and ltarget、lRealThe consistency of Gab and Gba to the converted cartoon picture and the actual picture is ensured.
In some implementations, the second pair of immunity losses further includes a loopback loss, the loopback loss including:
wherein lcycleShowing loop back loss, Target showing a conversion cartoon picture,is represented by1And regularization, wherein Gab and Gba respectively represent a first generator sub-network and a second generator sub-network, which represent that the converted cartoon picture can be converted into a converted actual picture, then the converted actual picture can still be converted back into the cartoon picture, and the generator network can completely learn the distribution relationship between the real picture and the cartoon picture due to loopback loss.
In some implementations, the second pair of resistance losses further includes a loss of attention, the loss of attention including:
wherein lattIndicating a loss of attention; EyesPatch, NosePatch and LipspPatch represent realEye, nose, lip regions detected in the inter-picture; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch. The human face five sense organ areas are detected by using a human face detection algorithm during each training, and the facial organ image blocks are used as calculation targets of loss functions, so that the generated cartoon can keep the human face five sense organ characteristics as much as possible, and the characteristic effect and style effect of converting the five sense organ characteristics of the cartoon are improved.
In some implementations, the first countermeasure loss includes a generator countermeasure loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
In some implementations, the first pair of damage resistances includes:
lG=ladvab+ladvba+α1(latt+lcycle)+α2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention. By integrating the loss functions, the trained generator network can well consider all factors influencing the generation effect of the cartoon, and after training is completed, the generated converted cartoon picture can be fused with the cartoon characteristics on the cartoon picture and can retain the content of the original actual picture. Meanwhile, an attention mechanism is creatively used for the face picture, so that the five sense organs of the face can be well reserved in the generated cartoon, the problem that the face is easy to deform due to a general cartoon face generation algorithm is solved, in addition, an improved variation loss function is introduced, and the generated picture is prevented from generating larger noise.
The invention also provides a cartoon video generation method, which comprises the following steps:
generating and processing the actual picture to generate a converted cartoon picture;
performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering. For example, in a short video or online social application scene, a user takes or uploads a photo, generates and processes an actual picture to generate a cartoon picture, performs style migration to convert the cartoon picture, and generates a cartoon video with the style through video processing, so that the interestingness of the short video and the user experience are improved.
The invention provides a cartoon generating device, comprising:
the training module 1 is configured to obtain a conversion cartoon picture from an actual picture through training, where the training includes: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;
and the conversion module 2 is used for converting the actual picture through the trained generator network to complete generation of the cartoon.
An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: smart phones, tablet computers, laptop portable computers, car computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like, and the embodiments of the present application are not limited to specific devices.
The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the data processing method in fig. 4 according to the present embodiment.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.
In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 6 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 4 in the above embodiment.
The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.
The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.
The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.
The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.
The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 6 can be implemented as the input device in the embodiment of fig. 5.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Claims (16)
1. A cartoon generating method, comprising:
obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;
training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;
and converting the actual picture through the trained generator network to finish generation of the cartoon.
2. A caricature generation method according to claim 1, wherein the generator network comprises a first generator subnetwork for generating the actual picture as a converted caricature picture and a second generator subnetwork for converting the caricature picture into a converted actual picture, and wherein the discriminator network comprises a first discriminator subnetwork for discriminating whether the converted caricature picture is a true caricature and a second subnetwork for discriminating whether the converted actual picture is a true caricature.
3. A caricature generation method according to claim 2, wherein the processing of the generator network and the discriminator network respectively comprises: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.
4. The caricature generation method of claim 2, wherein the second pair of damage resistances comprises:
lab=(1-Db(Target))2+(Db(Gab(Real)))2
lba=(1-Da(Real))2+(Da(Gba(Target)))2
wherein labAnd lbaRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively.
5. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a variational loss comprising:
wherein ltvabDenotes the first variation loss,/tvbaWhich represents the loss of the second variation,the gradient in the x direction and the gradient in the y direction are respectively obtained for the results of the first generator sub-network Gab and the second generator sub-network Gba, Real represents an actual picture, Target represents a conversion cartoon picture, and gamma is a constant.
6. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a content loss comprising:
7. The caricature generation method of claim 2, wherein the second pair of resistive losses comprises a style loss comprising:
8. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a consistency loss, the consistency loss comprising:
9. The caricature generation method of claim 2, wherein the second confrontation loss further comprises a loopback loss, the loopback loss comprising:
10. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:
wherein lattIndicating a loss of attention; eyepatch, NosePatch, lipsppatch represent eye, nose, lip regions detected in the actual picture, respectively; real (t) represents extracting the corresponding image area from the actual picture;
Gba(Gab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch.
11. The caricature generation method of claim 2, wherein the first counter-loss comprises a generator counter-loss:
ladvab=(1-Db(Gab(Real)))2
ladvba=(1-Da(Gba(Target)))2
wherein ladvabRepresents the first generator loss,/advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.
12. A caricature generation method according to any of claims 5 to 11, wherein the first pair of loss resistances comprises:
lG=ladvab+ladvba+α1(latt+lcycle)+α2(ltvab+ltvba)+α3(lcontent+lstyle)+α4(ltarget+lReal)
wherein alpha is1、α2、α3、α4Representing a weight parameter,/tvabDenotes the first variation loss,/tvbaRepresents the second variation loss,/contentRepresents a content loss,/styleRepresents a loss of style,/targetRepresenting loss of consistency of the transformed caricature picture,/RealRepresenting the loss of consistency of the actual picture,/cycleRepresents the loop back loss,/attIndicating a loss of attention.
13. A cartoon video generation method is characterized by comprising the following steps:
generating and processing the actual picture to generate a converted cartoon picture;
performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.
14. A cartoon generating apparatus, comprising:
the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;
and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.
15. An apparatus, comprising:
one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-13.
16. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010590178.1A CN111899154A (en) | 2020-06-24 | 2020-06-24 | Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010590178.1A CN111899154A (en) | 2020-06-24 | 2020-06-24 | Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111899154A true CN111899154A (en) | 2020-11-06 |
Family
ID=73207974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010590178.1A Pending CN111899154A (en) | 2020-06-24 | 2020-06-24 | Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111899154A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330956A (en) * | 2017-07-03 | 2017-11-07 | 广东工业大学 | A kind of unsupervised painting methods of caricature manual draw and device |
CN108805962A (en) * | 2018-05-29 | 2018-11-13 | 广州梦映动漫网络科技有限公司 | A kind of generation method and electronic equipment of dynamic caricature |
CN109087380A (en) * | 2018-08-02 | 2018-12-25 | 咪咕文化科技有限公司 | A kind of caricature cardon generation method, device and storage medium |
CN109800732A (en) * | 2019-01-30 | 2019-05-24 | 北京字节跳动网络技术有限公司 | The method and apparatus for generating model for generating caricature head portrait |
CN110097086A (en) * | 2019-04-03 | 2019-08-06 | 平安科技(深圳)有限公司 | Image generates model training method, image generating method, device, equipment and storage medium |
US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
CN111160264A (en) * | 2019-12-30 | 2020-05-15 | 中山大学 | Cartoon figure identity recognition method based on generation of confrontation network |
CN111311713A (en) * | 2020-02-24 | 2020-06-19 | 咪咕视讯科技有限公司 | Cartoon processing method, cartoon display device, cartoon terminal and cartoon storage medium |
-
2020
- 2020-06-24 CN CN202010590178.1A patent/CN111899154A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330956A (en) * | 2017-07-03 | 2017-11-07 | 广东工业大学 | A kind of unsupervised painting methods of caricature manual draw and device |
US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
CN108805962A (en) * | 2018-05-29 | 2018-11-13 | 广州梦映动漫网络科技有限公司 | A kind of generation method and electronic equipment of dynamic caricature |
CN109087380A (en) * | 2018-08-02 | 2018-12-25 | 咪咕文化科技有限公司 | A kind of caricature cardon generation method, device and storage medium |
CN109800732A (en) * | 2019-01-30 | 2019-05-24 | 北京字节跳动网络技术有限公司 | The method and apparatus for generating model for generating caricature head portrait |
CN110097086A (en) * | 2019-04-03 | 2019-08-06 | 平安科技(深圳)有限公司 | Image generates model training method, image generating method, device, equipment and storage medium |
CN111160264A (en) * | 2019-12-30 | 2020-05-15 | 中山大学 | Cartoon figure identity recognition method based on generation of confrontation network |
CN111311713A (en) * | 2020-02-24 | 2020-06-19 | 咪咕视讯科技有限公司 | Cartoon processing method, cartoon display device, cartoon terminal and cartoon storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10817705B2 (en) | Method, apparatus, and system for resource transfer | |
CN106682632B (en) | Method and device for processing face image | |
US11158057B2 (en) | Device, method, and graphical user interface for processing document | |
US20220237812A1 (en) | Item display method, apparatus, and device, and storage medium | |
CN111444826B (en) | Video detection method, device, storage medium and computer equipment | |
CN109118447B (en) | Picture processing method, picture processing device and terminal equipment | |
CN108509994B (en) | Method and device for clustering character images | |
CN108776822B (en) | Target area detection method, device, terminal and storage medium | |
CN111047509A (en) | Image special effect processing method and device and terminal | |
CN111062276A (en) | Human body posture recommendation method and device based on human-computer interaction, machine readable medium and equipment | |
CN107977636B (en) | Face detection method and device, terminal and storage medium | |
CN111984803B (en) | Multimedia resource processing method and device, computer equipment and storage medium | |
CN112381707B (en) | Image generation method, device, equipment and storage medium | |
CN111753498A (en) | Text processing method, device, equipment and storage medium | |
CN105608430A (en) | Face clustering method and device | |
CN111325220B (en) | Image generation method, device, equipment and storage medium | |
US11295416B2 (en) | Method for picture processing, computer-readable storage medium, and electronic device | |
CN115239860A (en) | Expression data generation method and device, electronic equipment and storage medium | |
CN108985215B (en) | Picture processing method, picture processing device and terminal equipment | |
CN110929159A (en) | Resource delivery method, device, equipment and medium | |
CN113642359B (en) | Face image generation method and device, electronic equipment and storage medium | |
EP4303815A1 (en) | Image processing method, electronic device, storage medium, and program product | |
CN111899154A (en) | Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium | |
CN111818364B (en) | Video fusion method, system, device and medium | |
CN112381064B (en) | Face detection method and device based on space-time diagram convolutional network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |