CN111899154A

CN111899154A - Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium

Info

Publication number: CN111899154A
Application number: CN202010590178.1A
Authority: CN
Inventors: 雷杨; 刘鹏; 黄跃中
Original assignee: Guangzhou Mengying Animation Network Technology Co ltd
Current assignee: Guangzhou Mengying Animation Network Technology Co ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-11-06

Abstract

The invention provides a cartoon video generation method, a cartoon generation device, cartoon generation equipment and a cartoon generation medium, wherein the cartoon video generation method comprises the following steps: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon. The cartoon picture can be subjected to video processing to obtain a cartoon video, and through the confrontation training process of the generator network and the discriminator network, the noise of the cartoon picture is reduced and the conversion effect is improved.

Description

Cartoon video generation method, cartoon generation device, cartoon generation equipment and cartoon generation medium

Technical Field

The present invention relates to electronic technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a short video.

Background

With the rise and spread of short videos, users have an increasing demand for shooting short videos of different styles and styles, wherein short videos of cartoon style are particularly favored by users. At present, short videos with cartoon styles have the problems of poor generation effect, insufficient approximation to real cartoon effect and the like.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an efficient caricature generation method for solving the problem of caricature-style short video authoring.

To achieve the above and other related objects, the present invention provides a cartoon generating method, including: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network; training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network; and converting the actual picture through the trained generator network to finish generation of the cartoon.

Optionally, the generator network includes a first generator sub-network for generating the actual picture into the converted cartoon picture and a second generator sub-network for generating the converted cartoon picture into the converted actual picture, and the discriminator network includes a first discriminator sub-network for discriminating whether the converted cartoon picture is a true cartoon and a second discriminator sub-network for discriminating whether the converted actual picture is a true actual picture.

Optionally, the processing procedure of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.

Optionally, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.

Optionally, the second immunity to loss comprises:

l_ab＝(1-D_b(Target))²+(D_b(G_ab(Real)))²

l_ba＝(1-D_a(Real))²+(D_a(G_ba(Target)))²

wherein l_abAnd l_baRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively.

Optionally, the second pair of immunity losses includes a variation loss, and the variation loss includes:

wherein l_tvabDenotes the first variation loss,/_tvbaWhich represents the loss of the second variation,

the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant.

Optionally, the second immunity loss comprises a content loss, and the content loss comprises:

wherein l_contentRepresenting content loss, VGG_nRepresenting the first n layers through the VGG neural network as a feature extraction network,

is represented by₁Regularization, Real, represents the actual picture.

Optionally, the second immunity loss comprises a style loss, and the style loss comprises:

wherein l_styleRepresenting loss of style, VGG_nRepresenting the first n layers through the VGG neural network as a feature extraction network,

is represented by₁And (4) regularization, wherein Target represents a conversion cartoon picture.

Optionally, the second pair of immunity losses further comprises a consistency loss, the consistency loss comprising:

wherein l_targetRepresenting loss of consistency of the transformed caricature picture,/_RealRepresenting the loss of consistency of the actual picture,

is represented by₁And (4) regularization, wherein Real represents an actual picture, and Target represents a conversion cartoon picture.

Optionally, the second countermeasure loss further includes a loopback loss, and the loopback loss includes:

wherein l_cycleShowing loop back loss, Target showing a conversion cartoon picture,

is represented by₁Regularization, Gab and Gba, respectively, represent the first and second generator sub-networks.

Optionally, the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:

wherein l_attIndicating a loss of attention; eyepatch, NosePatch, lipsppatch represent eye, nose, lip regions detected in the actual picture, respectively; real (t) represents extracting the corresponding image area from the actual picture;

G_ba(G_ab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch.

Optionally, the first countermeasure loss comprises a generator countermeasure loss:

l_advab＝(1-D_b(G_ab(Real)))²

l_advba＝(1-D_a(G_ba(Target)))²

wherein l_advabRepresents the first generator loss,/_advbaIndicating a second generator loss, Da a first discriminator sub-network, Db a second discriminator sub-network, Gab and Gba a first generator sub-network and a second generator sub-network, respectively, Real an actual picture, and Target an inverted cartoon picture.

Optionally, the first immunity to loss comprises:

l_G＝l_advab+l_advba+α₁(l_att+l_cycle)+a₂(l_tvab+l_tvba)+α₃(l_content+l_style)+α₄(l_target+l_Real)

wherein alpha is₁、α₂、α₃、α₄Representing a weight parameter,/_tvabDenotes the first variation loss,/_tvbaRepresents the second variation loss,/_contentRepresents a content loss,/_styleRepresents a loss of style,/_targetRepresenting loss of consistency of the transformed caricature picture,/_RealRepresenting the loss of consistency of the actual picture,/_cycleRepresents the loop back loss,/_attIndicating a loss of attention.

A cartoon video generation method comprises the following steps: generating and processing the actual picture to generate a converted cartoon picture; performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.

A comic generation apparatus comprising: the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network; and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.

An apparatus, comprising: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.

One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform one or more of the methods described.

As described above, the cartoon video generation method, the cartoon generation device, the cartoon generation equipment and the cartoon generation medium provided by the invention have the following beneficial effects:

the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a high-quality cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.

Drawings

Fig. 1 is a schematic flow chart of a caricature video generation method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a generator network structure according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a network structure of a discriminator according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a caricature video generation apparatus according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment.

Fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.

Description of the element reference numerals

1100 input device

1101 first processor

1102 output device

1103 first memory

1104 communication bus

1200 processing assembly

1201 second processor

1202 second memory

1203 communication assembly

1204 Power supply Assembly

1205 multimedia assembly

1206 voice assembly

1207 input/output interface

1208 sensor assembly

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a cartoon generating method, including:

s1: obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;

s2: training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;

s3: and converting the actual picture through the trained generator network to finish generation of the cartoon. In the training process, the generator network converts the actual picture into a converted cartoon picture with a specific cartoon style; the discriminator network judges whether the generated converted cartoon picture is a true cartoon or not, if the discrimination result is not the true cartoon, the generator network can continuously improve the generation capability in the training, so that the probability that the converted cartoon picture is discriminated as true is higher and higher, and meanwhile, the discriminator network also continuously strengthens the discrimination capability in the training, so that the true cartoon generated by the generator network is more and more difficult, and through the antagonistic training process of the generator network and the discriminator network, the generation capability of the generator network is stronger and stronger until a nearly true cartoon effect can be generated, the noise of the converted cartoon picture is reduced, and the conversion effect is improved.

In some implementations, the generator network comprises a first generator subnetwork Gab for generating the actual picture as a conversion cartoon picture and a second generator subnetwork Gba for converting the cartoon picture into a conversion actual picture, the discriminator network comprising a first discriminator subnetwork Da for discriminating whether the conversion cartoon picture is a true cartoon and a second discriminator subnetwork Db for discriminating whether the conversion actual picture is a true actual picture. For example, the generator network and the discriminator network form a similar framework of CycleGAN, Gab and Gba use the same generator network structure, and Da and Db use the same discriminator network structure.

Referring to fig. 2, the process of the generator network includes: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing. Referring to fig. 3, the processing procedure of the discriminator network includes: mirror filling processing, convolution processing, activation processing and normalization processing.

ReflectionPad2d (padding _ size): standard mirror image filling neurons, namely for 2-dimensional input data, filling data with padding _ size width close to the edge to the periphery of the input data after mirror image;

conv2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size): the standard 2-dimensional convolution neuron performs convolution on data with the number of input channels in _ channels by using a convolution kernel with the size of kernel _ size multiplied by kernel _ size, outputs result data of out _ channels, stride refers to convolution steps, namely convolution is performed on every other point of the input data, and padding _ size refers to the width of filling 0 to the edge of the input data;

IN: standard Instance Normalization neurons, which refers to normalizing each channel of data by channel;

ReLU: neurons using a linear rectifying function as an activation function;

ConvTranspose2D (in _ channels, out _ channels, kernel _ size, stride, padding _ size, variance): the standard 2-dimensional deconvolution neuron can enlarge smaller input data step by step through deconvolution to output data with larger size, wherein in _ channels represents the number of input channels, out _ channels represents the number of output channels, kernel _ size is the size of a convolution kernel, stride represents the convolution step size, padding _ size represents the size of an edge complement 0 of the input data, and dispation represents the number of 0 filled in expansion among elements of the convolution kernel;

BN: batch Normalization, which means that a Batch of data is normalized at the same time, is represented;

residual block (8): representing 8 Residual block networks connected in series, wherein the structure of each Residual block is consistent and comprises 2 Conv2D + BN operations, and each Residual block can be represented by the following mathematical formula:

ResidualBlockOut＝Conv2D(BN(Conv2D(BN(input))))+input

wherein, input is input data, and ResidualblockOut is output of each residual block network;

sigmoid: network layer with Sigmoid function as activation function

LeakyReLU: a standard network layer with a modified version of the ReLU (linear rectification) function as the activation function.

In some implementations, the second pair of damage resistances includes:

l_ab＝(1-D_b(Target))²+(D_b(G_ab(Real)))²

l_ba＝(1-D_a(Real))²+(D_a(G_ba(Target)))²

wherein l_abAnd l_baRespectively representing the identification capability of the first discriminator sub-network Da and the second discriminator sub-network Db on the converted cartoon picture and the converted actual picture, Real representing the actual picture, Target representing the converted cartoon picture, and Gab and Gba representing the first generator sub-network and the second generator sub-network respectively. When the discrimination ability of Da, Db is stronger,/_abAnd l_baThe smaller the loss of (a), and therefore_abAnd l_baMay be used to update the web learning parameters for Db and Da, respectively.

In some implementations, the second pair of loss resistances comprises a variation loss comprising:

the calculation of the x-direction gradient and the y-direction gradient of the results of the first generator sub-network Gab and the second generator sub-network Gba is respectively shown, Real shows an actual picture, Target shows a conversion cartoon picture, and gamma is a constant. The variation loss reflects that the smaller the variation loss is, the better the variation loss is, that is, the smaller the gradient change in the x and y directions in the generated conversion cartoon picture is, the better the variation loss is. When the variation loss is large, the fluctuation range between the generated picture pixel points is large, and the noise is more. When the variation loss is completely 0, the generated conversion cartoon picture can become a pure color without change, and in order to balance the negative effect brought by the variation loss, the algorithm sets a hyper-parameter gamma for inhibiting the negative effect brought by the variation function.

In some implementations, the second pair of impairments includes a content loss comprising:

is represented by₁Regularization, Real, represents the actual picture. The content loss reflects the content loss after the characteristics of the actual picture and the generated conversion cartoon picture are extracted through the VGG network, and the smaller the content loss is, the closer the content of the conversion cartoon picture is to the actual picture is.

In some implementations, the second pair of immunity losses includes a style loss, the style loss including:

is represented by₁And (4) regularization, wherein Target represents a conversion cartoon picture. The style loss reflects the style consistency between the converted cartoon picture and the converted actual picture, the style characteristics are obtained through extraction of a VGG neural network, and the style retention effect of Gba is stronger when the style loss is less.

In some implementations, the second pair of immunity losses further includes a loss of coherency comprising:

is represented by₁Regularization, Real represents an actual picture, Target represents a conversion cartoon picture, Gab (the actual picture generates the conversion cartoon picture) represents that the cartoon is still a cartoon after being processed, and other contents are not generated, Gba (the conversion cartoon picture generates the conversion actual picture) represents that the cartoon is still an actual picture after being processed, and other contents are not generated, and l_target、l_RealThe consistency of Gab and Gba to the converted cartoon picture and the actual picture is ensured.

In some implementations, the second pair of immunity losses further includes a loopback loss, the loopback loss including:

is represented by₁And regularization, wherein Gab and Gba respectively represent a first generator sub-network and a second generator sub-network, which represent that the converted cartoon picture can be converted into a converted actual picture, then the converted actual picture can still be converted back into the cartoon picture, and the generator network can completely learn the distribution relationship between the real picture and the cartoon picture due to loopback loss.

In some implementations, the second pair of resistance losses further includes a loss of attention, the loss of attention including:

wherein l_attIndicating a loss of attention; EyesPatch, NosePatch and LipspPatch represent realEye, nose, lip regions detected in the inter-picture; real (t) represents extracting the corresponding image area from the actual picture;

G_ba(G_ab(Real)) (t) means that the actual picture is firstly input into a first generator sub-network to generate a conversion cartoon picture, then the conversion cartoon picture is converted into a conversion actual picture through a second generator sub-network, and a position corresponding to t is extracted from the converted actual picture, wherein the t comprises EyesPatch, NosePatch and LipspPatch. The human face five sense organ areas are detected by using a human face detection algorithm during each training, and the facial organ image blocks are used as calculation targets of loss functions, so that the generated cartoon can keep the human face five sense organ characteristics as much as possible, and the characteristic effect and style effect of converting the five sense organ characteristics of the cartoon are improved.

In some implementations, the first countermeasure loss includes a generator countermeasure loss:

l_advab＝(1-D_b(G_ab(Real)))²

l_advba＝(1-D_a(G_ba(Target)))²

In some implementations, the first pair of damage resistances includes:

l_G＝l_advab+l_advba+α₁(l_att+l_cycle)+α₂(l_tvab+l_tvba)+α₃(l_content+l_style)+α₄(l_target+l_Real)

wherein alpha is₁、α₂、α₃、α₄Representing a weight parameter,/_tvabDenotes the first variation loss,/_tvbaRepresents the second variation loss,/_contentRepresents a content loss,/_styleRepresents a loss of style,/_targetRepresenting loss of consistency of the transformed caricature picture,/_RealRepresenting the loss of consistency of the actual picture,/_cycleRepresents the loop back loss,/_attIndicating a loss of attention. By integrating the loss functions, the trained generator network can well consider all factors influencing the generation effect of the cartoon, and after training is completed, the generated converted cartoon picture can be fused with the cartoon characteristics on the cartoon picture and can retain the content of the original actual picture. Meanwhile, an attention mechanism is creatively used for the face picture, so that the five sense organs of the face can be well reserved in the generated cartoon, the problem that the face is easy to deform due to a general cartoon face generation algorithm is solved, in addition, an improved variation loss function is introduced, and the generated picture is prevented from generating larger noise.

The invention also provides a cartoon video generation method, which comprises the following steps:

generating and processing the actual picture to generate a converted cartoon picture;

performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering. For example, in a short video or online social application scene, a user takes or uploads a photo, generates and processes an actual picture to generate a cartoon picture, performs style migration to convert the cartoon picture, and generates a cartoon video with the style through video processing, so that the interestingness of the short video and the user experience are improved.

The invention provides a cartoon generating device, comprising:

the training module 1 is configured to obtain a conversion cartoon picture from an actual picture through training, where the training includes: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;

and the conversion module 2 is used for converting the actual picture through the trained generator network to complete generation of the cartoon.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: smart phones, tablet computers, laptop portable computers, car computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like, and the embodiments of the present application are not limited to specific devices.

The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the data processing method in fig. 4 according to the present embodiment.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 6 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 4 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 6 can be implemented as the input device in the embodiment of fig. 5.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A cartoon generating method, comprising:

obtaining a conversion cartoon picture from an actual picture through training, wherein the training comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, and identifying the converted cartoon picture through an identifier network;

training the generator network according to a first pair of impairments of the generator network, training the discriminator network according to a second pair of impairments between the generator network and the discriminator network;

and converting the actual picture through the trained generator network to finish generation of the cartoon.

2. A caricature generation method according to claim 1, wherein the generator network comprises a first generator subnetwork for generating the actual picture as a converted caricature picture and a second generator subnetwork for converting the caricature picture into a converted actual picture, and wherein the discriminator network comprises a first discriminator subnetwork for discriminating whether the converted caricature picture is a true caricature and a second subnetwork for discriminating whether the converted actual picture is a true caricature.

3. A caricature generation method according to claim 2, wherein the processing of the generator network and the discriminator network respectively comprises: mirror filling processing, convolution processing, activation processing, normalization processing and residual error processing.

4. The caricature generation method of claim 2, wherein the second pair of damage resistances comprises:

l_ab＝(1-D_b(Target))²+(D_b(G_ab(Real)))²

l_ba＝(1-D_a(Real))²+(D_a(G_ba(Target)))²

5. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a variational loss comprising:

the gradient in the x direction and the gradient in the y direction are respectively obtained for the results of the first generator sub-network Gab and the second generator sub-network Gba, Real represents an actual picture, Target represents a conversion cartoon picture, and gamma is a constant.

6. The caricature generation method of claim 2, wherein the second pair of immunity losses comprises a content loss comprising:

is represented by₁Regularization, Real, represents the actual picture.

7. The caricature generation method of claim 2, wherein the second pair of resistive losses comprises a style loss comprising:

8. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a consistency loss, the consistency loss comprising:

9. The caricature generation method of claim 2, wherein the second confrontation loss further comprises a loopback loss, the loopback loss comprising:

10. The caricature generation method of claim 2, wherein the second pair of resistance losses further comprises a loss of attention, the loss of attention comprising:

11. The caricature generation method of claim 2, wherein the first counter-loss comprises a generator counter-loss:

l_advab＝(1-D_b(G_ab(Real)))²

l_advba＝(1-D_a(G_ba(Target)))²

12. A caricature generation method according to any of claims 5 to 11, wherein the first pair of loss resistances comprises:

13. A cartoon video generation method is characterized by comprising the following steps:

performing video processing on the cartoon picture to obtain a cartoon video, wherein the video processing comprises one of the following steps: matching music, adding captions, separating mirrors, transferring, splicing and rendering.

14. A cartoon generating apparatus, comprising:

the training module is used for acquiring the converted cartoon picture from the actual picture through training processing, and the training processing comprises the following steps: generating an actual picture into a converted cartoon picture through a generator network, identifying the converted cartoon picture through an identifier network, training the generator network according to a first pair of loss resistances of the generator network, and training the identifier network according to a second pair of loss resistances between the generator network and the identifier network;

and the conversion module is used for converting the actual picture through the trained generator network to finish generation of the cartoon.

15. An apparatus, comprising:

one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-13.

16. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-13.