CN113743520A

CN113743520A - Cartoon generation method, system, medium and electronic terminal

Info

Publication number: CN113743520A
Application number: CN202111057401.7A
Authority: CN
Inventors: 刘鹏; 涂隆基; 庞海亮
Original assignee: Guangzhou Mengying Animation Network Technology Co ltd
Current assignee: Guangzhou Mengying Animation Network Technology Co ltd
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2021-12-03

Abstract

The invention provides a cartoon generating method, a system, a medium and an electronic terminal, wherein the cartoon generating method comprises the following steps: collecting a training set; inputting the training set into a template matching network for training to obtain a template matching model, wherein the template matching network comprises: the long-short term memory sub-networks correspond to different types of training texts; acquiring element information of a lens to be processed by acquiring a story text; inputting the information of the lens elements to be processed into a template matching model for template matching, and acquiring a scene template matched with the information of the lens elements to be processed; generating a cartoon through a scene template matched with the information of the lens elements to be processed; the cartoon generating method of the invention better realizes the conversion of the character language into the cartoon, and the converted cartoon has higher conformity and fitting degree with the original idea or story.

Description

Cartoon generation method, system, medium and electronic terminal

Technical Field

The present invention relates to the field of data processing, and in particular, to a method, a system, a medium, and an electronic terminal for generating a cartoon.

Background

With the development of the cartoon industry, cartoons are favored and concerned by more and more people, and many people devote to cartooning creation, but the requirements of the cartooning creation on creators are high, such as the steps of cartooning frame creation, split mirror creation and the like, the difficulty is high for a plurality of novice cartoons creators, the novice cartoons creators are difficult to convert ideas or stories into cartoons, and the conformity degree and the fitting degree of the converted cartoons with the original ideas or stories are low.

Disclosure of Invention

The invention provides a cartoon generation method, a cartoon generation system, a cartoon generation medium and an electronic terminal, and aims to solve the problems that in the prior art, a cartoon creator is difficult to convert an idea or a story into a cartoon, and the converted cartoon has low conformity and fitting degree with the original idea or story.

The cartoon generation method provided by the invention comprises the following steps:

collecting a training set;

inputting the training set into a template matching network for training to obtain a template matching model, wherein the template matching network comprises: the long-short term memory sub-networks correspond to different types of training texts;

acquiring element information of a lens to be processed by acquiring a story text;

inputting the information of the lens elements to be processed into the template matching model for template matching, and acquiring a scene template matched with the information of the lens elements to be processed;

and generating a cartoon through the scene template matched with the information of the lens elements to be processed.

Optionally, the different categories of training texts include at least one of the following: scene description text, dialog content, and voice-over content;

inputting the training texts of different categories into corresponding long-term and short-term memory sub-networks respectively for feature extraction, and acquiring element features of different categories;

fusing the element features of different categories according to a plurality of preset different weights to obtain fused element features;

inputting the fused element features into a classification sub-network in a template matching network for classification and scene template matching, and acquiring a scene template corresponding to the fused element features;

and training the template matching network according to the scene type to obtain a template matching model.

Optionally, the step of inputting the training texts of different categories into corresponding long-term and short-term memory subnetworks respectively for feature extraction includes:

performing character mapping according to the training text and a preset character index dictionary to obtain index data of the training text;

cutting or filling the index data according to a preset data length threshold value to obtain index data with a fixed length;

and according to the types of the index data, respectively inputting the index data with fixed length of different types into corresponding long-short term memory sub-networks for feature extraction, and acquiring the element features of different types.

Optionally, the step of constructing the character index dictionary includes:

removing the duplication of the characters of the training texts in the training set, removing punctuation marks in the training texts, and obtaining a set of duplication-removed characters;

and taking the set of the de-duplicated characters as a character index dictionary, wherein the de-duplicated characters correspond to indexes one by one.

Optionally, the step of cutting or filling the index data includes:

judging whether the length of the index data is greater than the data length threshold, if so, cutting the index data according to the data length threshold to obtain the index data with the fixed length;

and if the length of the index data is smaller than the data length threshold, filling the index data by using the index of the empty character in the character index dictionary according to the data length threshold to obtain the index data with fixed length.

Optionally, the classifying sub-network includes: one or more linear connection layers for linear classification, one or more ReLU layers for adding nonlinear factors and a loss function layer, wherein the linear connection layers and the ReLU layers are arranged in an interlaced mode;

training the template matching network by using a loss function preset in the loss function layer;

the mathematical expression of the loss function is:

wherein x is a feature vector output by a classification sub-network, c is a classification category index, and c belongs to [0, K-1 ]]K is the number of classes, exp (-) represents an exponential function with a natural constant e as the base, x [ j ]]The expression takes the jth element in the feature vector x, and j belongs to [0, K-1 ]]，n_jThe number of samples participating in training for the jth category, lambda | | | x | | luminance₂For the loss function regularization term, λ is a weight parameter.

Optionally, the step of obtaining the fused element features includes:

inputting the fused element characteristics into a noise mapping sub-network in a template matching network, and superposing random noise which is subjected to standard normal distribution to the fused element characteristics through the noise mapping sub-network according to preset adjusting parameters to obtain superposed noise element characteristics;

inputting the characteristics of the superposed noise elements into a classification sub-network in a template matching network for classification and scene template matching, and acquiring a scene template corresponding to the characteristics of the superposed noise elements;

inputting the fused element characteristics into the noise mapping sub-network for noise superposition mathematical expression as follows:

f_NO＝β·f_o+(1-β)f_N

f_N＝Sigmoid(Weights_N×Noise)

wherein f is_NOFor superposing the characteristics of noise elements, beta is an adjusting parameter for adjusting the influence degree of random noise, and beta belongs to [0,1 ]]，f_oIs a feature of the fused element, f_NNoise generated by a Noise mapping sub-network for random Noise complying with standard normal distribution, and vector length and f of random Noise_OThe length of the feature vectors is consistent, each element in Noise follows standard normal distribution with mean 0 and variance 1, Weights_NA square matrix of parameters of the sub-network is mapped for noise.

Optionally, the scene template includes at least one of: character position, character picture direction, dialog box position, background picture and foreground animation GIF (Graphics Interchange Format) material;

filling a preset character image and conversation content into the scene template to obtain a cartoon bitmap;

establishing a reference coordinate system, and acquiring coordinates of the cartoon bitmap in the reference coordinate system;

acquiring the final height of the cartoon according to the coordinates of the cartoon bitmap;

and according to the final height, performing long-image splicing to finish generation of the cartoon.

The present invention also provides a cartoon generating system, comprising:

the acquisition module is used for acquiring a training set;

a training module, configured to input the training set into a template matching network for training, and obtain a template matching model, where the template matching network includes: the long-short term memory sub-networks correspond to different types of training texts;

the element information acquisition module is used for acquiring element information of the lens to be processed by acquiring the story text;

the matching module is used for inputting the information of the lens elements to be processed into the template matching model for template matching, and acquiring a scene template matched with the information of the lens elements to be processed;

the cartoon generating module is used for generating a cartoon through the scene template matched with the information of the lens element to be processed; the collection module, the training module, the element information acquisition module, the matching module and the cartoon generation module are connected.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method as defined in any one of the above.

The present invention also provides an electronic terminal, comprising: a processor and a memory;

the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the terminal to perform the method as defined in any one of the above.

The invention has the beneficial effects that: in the cartoon generation method, system, medium and electronic terminal of the present invention, a training set is input into a template matching network for training to obtain a template matching model, wherein the template matching network comprises: each long-short term memory sub-network corresponds to different types of training texts, information of lens elements to be processed is input into the template matching model, a scene template matched with the information of the lens elements to be processed is obtained, and cartoon generation is carried out.

Drawings

Fig. 1 is a flowchart illustrating a cartoon generating method according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart illustrating training of a template matching network in the caricature generation method according to the embodiment of the present invention.

Fig. 3 is a schematic flow chart illustrating the process of obtaining different types of element features in the cartoon generating method according to the embodiment of the present invention.

Fig. 4 is a schematic flow chart of character matching in the cartoon generating method according to the embodiment of the present invention.

Fig. 5 is a schematic flow chart illustrating clipping or filling of index data in the cartoon generating method according to the embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a template matching network in the cartoon generating method according to the embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a template matching network adding a noise mapping sub-network in the cartoon generating method according to the embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a classification sub-network in the cartoon generating method according to the embodiment of the present invention.

Fig. 9 is a schematic flow chart of caricature stitching in the caricature generation method in the embodiment of the present invention.

Fig. 10 is a schematic structural diagram of a cartoon generating system in an embodiment of the present invention.

The attached drawings are as follows:

11 a first linear connection layer; 12 a second linear connecting layer; 13 a third linear connection layer;

21 a first ReLU layer; 22 a second ReLU layer; 31 loss function layer.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The inventor finds that with the development of the cartoon industry, cartoons are more and more loved and concerned by people, and many people are devoted to cartoon creation, but the requirement of the cartoon creation on creators is higher. Such as cartoon picture frame and split-mirror creation, the difficulty is high for many novice cartoon creators. The novice cartoon creator is difficult to convert the idea or story into the cartoon, and the converted cartoon has low conformity and fitting degree with the original idea or story. Therefore, the inventor proposes a cartoon generation method, a cartoon generation system, a cartoon generation medium and an electronic terminal, and obtains a template matching model by inputting a training set into a template matching network for training. The template matching network includes: the method comprises the steps that a plurality of long-term and short-term memory sub-networks for extracting features of training texts of different types effectively improve template matching accuracy of a template matching model, information of lens elements to be processed is input into the template matching model, a scene template matched with the information of the lens elements to be processed is obtained, and cartoon generation is carried out.

As shown in fig. 1, the cartoon generating method in this embodiment includes:

s101: collecting a training set; the training set includes: the scene template training method comprises a plurality of training texts and scene template training samples matched with the training texts.

S102: inputting the training set into a template matching network for training to obtain a template matching model, wherein the template matching network comprises: the long-short term memory sub-networks correspond to different types of training texts;

s103: acquiring element information of a lens to be processed by acquiring a story text; the information of the lens elements to be processed at least comprises one of the following information: scene description text, dialog content, and voice-over content. For example: when a user inputs a story text, respectively inputting different types of to-be-processed lens element information according to a preset input format, wherein the to-be-processed lens element information comprises: the scene description text, the dialogue content and the voice-over content are convenient for template matching of shot element information to be processed in the later period through the mode of respectively inputting the corresponding story texts. For another example: and classifying the contents of the stories input by the user to acquire different types of to-be-processed lens element information.

S104: and inputting the information of the lens elements to be processed into the template matching model for template matching, and acquiring a scene template matched with the information of the lens elements to be processed, namely inputting the information of the lens elements to be processed into a trained template matching model to match with a corresponding scene template.

S105: and generating a cartoon through the scene template matched with the information of the lens elements to be processed. And generating a cartoon by using the scene template matched with the information of the lens elements to be processed. For example: and filling a preset character image and the conversation content into the scene template, and generating the cartoon. The method and the device have the advantages that the character language is converted into the cartoon, the converted cartoon is high in conformity and fitting degree with the original idea or story, implementation is convenient, cost is low, and implementability is high.

Referring to fig. 2, in order to improve the matching accuracy of the template matching model, the inventors propose: the different classes of training text include at least one of: scene description text, dialog content, and voice-over content; the training step of the template matching model comprises the following steps:

s201: inputting the training texts of different categories into corresponding long-term and short-term memory sub-networks respectively for feature extraction, and acquiring element features of different categories; by respectively inputting the training texts of different categories into the corresponding long-term and short-term memory sub-networks for feature extraction, the element features of different categories can be respectively extracted, such as: inputting the scene description text into a first long-short term memory sub-network for extracting the features of the scene description text, inputting the dialogue content into a second long-short term memory sub-network for extracting the features of the dialogue content, inputting the dialogue content into a third long-short term memory sub-network for extracting the features of the dialogue content, and acquiring the element features of different categories.

S202: fusing the element features of different categories according to a plurality of preset different weights to obtain fused element features; by fusing different types of element features, the accuracy of element feature extraction is effectively improved. The weight value can be set according to the actual situation. Because the scene description text has a large match with the scene template, when the weight of the element feature of the scene description text is set, the weight of the element feature of the scene description text can be set to be larger than the weight of the element features of other categories. According to a plurality of different weights, organically fusing element characteristics of different classes, wherein the mathematical expression is as follows:

f₀＝a₁f₁+a₂f₂+a₃f₃

a₁+a₂+a₃＝1

wherein f is₀Is a fused element feature₁、a₂、a₃Is a preset weight f₁Element features of the scene description text, f₂For the element characteristics of the spoken content, f₃Is an element characteristic of the voice-over content.

S203: inputting the fused element features into a classification sub-network in a template matching network for classification and scene template matching, and acquiring a scene template corresponding to the fused element features;

s204: and training the template matching network according to the scene type to obtain a template matching model.

In order to further improve the classification accuracy of the template matching model, the inventor proposes: the classification sub-network comprises: one or more linear connection layers for linear classification, one or more ReLU layers for adding nonlinear factors and a loss function layer, wherein the linear connection layers and the ReLU layers are arranged in an interlaced mode;

the mathematical expression of the loss function is:

wherein x is the feature vector output by the classification sub-network, c is the index of the classification category, and c belongs to [0, K-1 ]]K is the number of classes, exp (-) represents an exponential function with a natural constant e as the base, x [ j ]]The expression takes the jth element in the feature vector x, and j belongs to [0, K-1 ]]，n_jThe number of samples participating in training for the jth category, lambda | | | x | | luminance₂For the loss function regularization term, λ is a weight parameter. Lambda | | | x | | non-conducting phosphor₂The function of the method is to promote model convergence and control the sparsity of the output feature vector x. The Loss function of Loss (x, c) gives a measure between the feature vector x of the output of the classification sub-network and the classification class c, facilitating the training of the matching model. The value range of c may be set according to actual situations, and is not described herein again, where K is 50, and c is ∈ [0,49 ]]。

As shown in fig. 3, in order to facilitate feature extraction on text or characters, the inventors propose: the step of respectively inputting the training texts of different categories into corresponding long-term and short-term memory sub-networks for feature extraction comprises the following steps:

s301: performing character mapping according to the training text and a preset character index dictionary to obtain index data of the training text; by acquiring the index data of the training text, the training text can be well converted into a data string which can be processed by a network, so that the subsequent feature extraction is facilitated.

S302: cutting or filling the index data according to a preset data length threshold value to obtain index data with a fixed length;

s303: and according to the types of the index data, respectively inputting the index data with fixed length of different types into corresponding long-short term memory sub-networks for feature extraction, and acquiring the element features of different types. The categories of the index data include at least one of: scene description text, dialog content, and voice-over content.

Referring to fig. 4, in order to better perform character matching or character mapping on the input text or words, the inventor proposes: the step of constructing the character index dictionary comprises the following steps:

s401: removing the duplication of the characters of the training texts in the training set, removing punctuation marks in the training texts, and obtaining a set of duplication-removed characters;

s402: and taking the set of the de-duplicated characters as a character index dictionary, wherein the de-duplicated characters correspond to indexes one by one.

The character index dictionary stores all characters, characters and numbers appearing in the training text after de-duplication, and additionally increases indexes of null characters.

As shown in fig. 5, since the index data input into the long-short term memory sub-network may have inconsistent lengths, which may result in inconvenience in feature extraction, the obtained index data needs to be clipped or filled, and the steps include:

s501: judging whether the length of the index data is greater than the data length threshold, if so, cutting the index data according to the data length threshold to obtain the index data with the fixed length;

s502: and if the length of the index data is smaller than the data length threshold, filling the index data by using the index of the empty character in the character index dictionary according to the data length threshold to obtain the index data with fixed length.

For example: referring to fig. 6, when training the template matching network, inputting training texts of different classes in a training set into a character matching sub-network in the template matching network, converting the training texts into index data that can be recognized and processed by the network, clipping or filling the index data according to a preset data length threshold, and obtaining index data T with fixed length of different classes₁、T₂、T₃Wherein, T₁Fixed-length index data for scene description text, T₂Index data of fixed length for dialogue contents, T₃Fixed-length index data of the voice-over content. Will T₁、T₂、T₃Respectively inputting the corresponding long-term and short-term memory sub-networks for feature extraction, and obtaining element features f of different classes₁、f₂、f₃. According to the preset weight, fusing the element characteristics of different categories to obtain the fused element characteristics f₀. Inputting the fused element features into a classification sub-network in a template matching network for classification and scene template matching, and acquiring a scene template corresponding to the fused element features.

As shown in fig. 7, in order to avoid the imbalance of the number of training samples and prevent overfitting, the inventors proposed: superposing random Noise subjected to standard normal distribution to the fused element characteristic f after passing through a Noise mapping sub-network_OAnd the influence on the accuracy of the model due to the unbalanced sample number is avoided. The noise superposition step comprises:

f_NO＝β·f_o+(1-β)f_N

f_N＝Sigmoid(Weights_N×Noise)

wherein f is_NOFor superposing the characteristics of noise elements, beta is an adjusting parameter for adjusting the influence degree of random noise, and beta belongs to [0,1 ]]，f_oIs a feature of the fused element, f_NNoise generated by a Noise mapping sub-network for random Noise complying with standard normal distribution, and vector length and f of random Noise_OCharacteristic directionThe quantities are of uniform length, and each element in Noise follows a standard normal distribution, Weights, with a mean of 0 and a variance of 1_NA square matrix of parameters of the sub-network is mapped for noise.

The mathematical expression of Sigmoid function is:

random Noise complying with standard normal distribution is superposed into the fused element characteristics after passing through a Noise mapping sub-network, so that the condition that the training samples are unbalanced can be well dealt with. In the prior art, random disturbance is usually introduced into a network by adopting a dropout mode, so that the problem of over-training fitting is prevented. dropout randomly sets 0 to some of the neuron outputs in the network with a certain probability. However, since dropout causes some neurons to be shut down, the learning ability of the whole network is affected to some extent. Random noise introduced by dropout may cause the training to oscillate and be unstable as the network tends to converge. Therefore, the inventor proposes that random Noise meeting standard normal distribution is superposed to the fused element characteristics through a Noise mapping sub-network, so that overfitting is effectively prevented, and meanwhile, any neuron does not need to be closed, and the learning capability of the network is not influenced. By adding certain weighted noise, the number of training samples is increased through phase change, and the problem of unbalanced number of training samples when a classification sub-network carries out classification is effectively solved. The weighted noise is noise generated by random noise which obeys standard normal distribution through a noise mapping sub-network. In the actual network training process, by introducing weighted noise, only the training is performed by randomly extracting samples from the training set of each category with equal probability according to the uniformly distributed probability density in each training, so that the problem of unbalanced samples in the training process is effectively solved, and the implementation is convenient.

Referring to fig. 8, the classification sub-network includes: in this embodiment, 3 linear connection layers, 2 ReLU layers, and one loss function layer are used for illustration, and a specific number of layers may be set according to an actual situation, which is not described herein again. In fig. 8, F1 includes a first linear connection layer 11 and a first ReLU layer 21, F2 includes a second linear connection layer 12 and a second ReLU layer 22, F3 includes a third linear connection layer 13, and a loss function layer 31, the output of each layer of linear connection layers is mathematically expressed as:

Out1＝weight×input1+bias

out1 is the output of the linear connection layer, weight is the parameter matrix of the linear connection layer, input1 is the input of the linear connection layer, and bias is the bias vector.

The mathematical expression for each ReLU layer is:

Out2＝max(0,input2)

where Out2 is the output of the ReLU layer and input2 is the input of the ReLU layer. And (3) classifying the fused element features by utilizing a classification sub-network, and matching the classified element features with a scene template, wherein the input vector dimension of F1 is 256, the output dimension is 128, the input dimension of F2 is 128, the output dimension is 64, the input dimension of F3 is 64, and the output dimension is 50. The output of the third linearly connected layer 13 in F3 represents the confidence with which the classification sub-network classifies the 50 scene templates. And obtaining the difference between the classified scene template and the real scene template by using the loss function in the loss function layer, further training the template matching network, and obtaining a better template matching model by adjusting the gradient descending speed, the iterative learning rate, the learning times and the iterative times.

In some embodiments, the scene template includes at least one of: character position, character picture orientation, dialog box position, background picture, and foreground animated GIF material. As shown in fig. 9, after the corresponding scene template is obtained, the corresponding cartoon bitmap is generated by using the scene template. And performing cartoon splicing by using the cartoon bitmap to generate a cartoon, wherein the steps comprise:

s901: filling a preset character image and conversation content into the scene template to obtain a cartoon bitmap;

s902: establishing a reference coordinate system, and acquiring coordinates of the cartoon bitmap in the reference coordinate system;

s903: acquiring the final height of the cartoon according to the coordinates of the cartoon bitmap;

s904: and according to the final height, performing long-image splicing to finish generation of the cartoon. For example: the method comprises the steps of adding the Y coordinate of the cartoon bitmap to the height of the corresponding cartoon bitmap to obtain the final height of the cartoon, splicing the cartoon bitmaps to the corresponding height position according to the final height of the cartoon to generate the cartoon, so that the text or the story is converted into the cartoon, the automation degree is high, the fitting degree of the generated cartoon and the text or the story is high, and the cost is low.

In order to facilitate editing of the generated cartoon, the inventor proposes: after the cartoon is generated, the user can rotate, zoom, move and the like the cartoon bitmap in the generated cartoon, so that the cartoon can be edited and created, and the cartoon creation cost of the user is saved.

As shown in fig. 10, the present embodiment further provides a cartoon generating system, including:

the acquisition module is used for acquiring a training set;

the cartoon generating module is used for generating a cartoon through the scene template matched with the information of the lens element to be processed; the collection module, the training module, the element information acquisition module, the matching module and the cartoon generation module are connected. Training by inputting a training set into a template matching network to obtain a template matching model, wherein the template matching network comprises: each long-short term memory sub-network corresponds to different types of training texts, information of lens elements to be processed is input into the template matching model, a scene template matched with the information of the lens elements to be processed is obtained, and cartoon generation is carried out.

In some embodiments, the different categories of training text include at least one of: scene description text, dialog content, and voice-over content;

the training module respectively inputs the training texts of different categories into corresponding long-term and short-term memory subnetworks for feature extraction to obtain element features of different categories;

In some embodiments, the step of inputting the training texts of different categories into the corresponding long-short term memory sub-networks for feature extraction includes:

In some embodiments, the step of constructing the character index dictionary comprises:

In some embodiments, the step of cropping or populating the index data comprises:

In some embodiments, the classification subnetwork comprises: one or more linear connection layers for linear classification, one or more ReLU layers for adding nonlinear factors and a loss function layer, wherein the linear connection layers and the ReLU layers are arranged in an interlaced mode;

the mathematical expression of the loss function is:

wherein x is the feature vector output by the classification sub-network, c is the index of the classification category, and c belongs to [0, K-1 ]]K is the number of classes, exp (-) represents an exponential function with a natural constant e as the base, x [ j ]]The expression takes the jth element in the feature vector x, and j belongs to [0, K-1 ]]，n_jIs the jthThe number of samples of which the category participates in training, lambda | | | x | | non-woven cells₂For the loss function regularization term, λ is a weight parameter.

In some embodiments, the step of obtaining the fused element features is followed by:

inputting the fused element characteristics into a noise mapping sub-network in a template matching network, and superposing random noise which is subjected to standard normal distribution to the fused element characteristics through the noise mapping sub-network to obtain superposed noise element characteristics;

f_NO＝β·f_o+(1-β)f_N

f_N＝Sigmoid(Weights_N×Noise)

In some embodiments, the scene template includes at least one of: character position, character picture direction, dialog box position, background picture and foreground animation GIF material;

the cartoon generating module fills a preset character image and conversation content into the scene template to obtain a cartoon bitmap;

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.

The present embodiment further provides an electronic terminal, including: a processor and a memory;

the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.

The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The electronic terminal provided by the embodiment comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for completing mutual communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for operating the computer program so that the electronic terminal can execute the steps of the method.

In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

All the "eigenvectors" in the above embodiments refer to vectors representing data features, and not to the recognized term "eigenvectors" in linear algebra, which refers to eigenvalues and eigenvectors of the matrix.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A cartoon generating method, comprising:

collecting a training set;

2. A caricature generation method according to claim 1, wherein the different categories of training text include at least one of: scene description text, dialog content, and voice-over content;

3. The cartoon generating method of claim 2, wherein the step of inputting the training texts of different categories into the corresponding long-short term memory sub-networks for feature extraction comprises:

4. A caricature generation method according to claim 3, wherein the step of cropping or filling the index data comprises:

5. The comic generation method according to claim 2,

the classification sub-network comprises: one or more linear connection layers for linear classification, one or more ReLU layers for adding nonlinear factors and a loss function layer, wherein the linear connection layers and the ReLU layers are arranged in an interlaced mode;

the mathematical expression of the loss function is:

wherein x is the feature vector output by the classification sub-network, c is the index of the classification category, and c belongs to [0, K-1 ]]K is the number of classes, exp (-) represents an exponential function with a natural constant e as the base, x [ j ]]The expression takes the jth element in the feature vector x, and j belongs to [0, K-1 ]]，n_jThe number of samples participating in training for the jth category, lambda | | | x | | luminance₂For the loss function regularization term, λ is a weight parameter.

6. The caricature generation method according to claim 2, wherein the step of obtaining the fused element features is followed by:

f_NO＝β·f_o+(1-β)f_N

f_N＝Sigmoid(Weights_N×Noise)

7. The caricature generation method of claim 1, wherein the scene template includes at least one of: character position, character picture direction, dialog box position, background picture and foreground animation GIF material;

8. A caricature generation system, comprising:

the acquisition module is used for acquiring a training set;

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.

10. An electronic terminal, comprising: a processor and a memory;

the memory is for storing a computer program and the processor is for executing the computer program stored by the memory to cause the terminal to perform the method of any of claims 1 to 7.