CN115170388A - Character line draft generation method, device, equipment and medium - Google Patents

Character line draft generation method, device, equipment and medium Download PDF

Info

Publication number
CN115170388A
CN115170388A CN202210912212.1A CN202210912212A CN115170388A CN 115170388 A CN115170388 A CN 115170388A CN 202210912212 A CN202210912212 A CN 202210912212A CN 115170388 A CN115170388 A CN 115170388A
Authority
CN
China
Prior art keywords
character
loss function
picture
line
character line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210912212.1A
Other languages
Chinese (zh)
Inventor
方承煜
韩先锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to CN202210912212.1A priority Critical patent/CN115170388A/en
Publication of CN115170388A publication Critical patent/CN115170388A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a character line draft generation method, which comprises the following steps: acquiring a target figure picture; inputting the target character picture into a pre-trained character line draft model, and converting the target character picture through the character line draft model to obtain a character line draft; the character line draft model training method comprises the following steps: acquiring a figure picture and a figure line draft corresponding to the figure picture; inputting the character pictures into a generator of a character sketch model, generating line drawings corresponding to the character pictures and inputting character line drawings corresponding to the line drawings and the character pictures into a discriminator of the character sketch model; respectively calculating a first loss function, a second loss function and a third loss function, and performing weighted summation to obtain a total loss function; and iteratively updating the generator and the discriminator to enable the total loss function to reach a preset condition. The invention can enable the network to obtain better extraction and combination capability of the geometric information and the semantic information so as to generate a line draft image with better quality and complete information.

Description

Character line draft generation method, device, equipment and medium
Technical Field
The invention belongs to the fields of computer vision, multimedia technology and digital entertainment, and particularly relates to a character line draft generation method, a character line draft generation device, character line draft generation equipment and a character line draft generation medium.
Background
The purpose of the Character Line Drawing (Line Art for short) is to convert the information in the picture/image domain into a simplified representation domain, and to represent the change of plane by basic graphic elements (such as straight lines or curved lines). It is an abstract and flexible form of art, with applications including a variety of scenes such as entertainment, key art, caricatures, and computer-generated animations. The drawing of high quality line drawings often requires considerable effort by professional artists or domain experts. More detail means more difficulty in the drawing process, and line art is therefore considered to be a labor intensive, time consuming, labor intensive, challenging art.
In the study on line drawings in the field of deep learning, zhang et al proposed a segmentation and filling mechanism to perform line art coloring. The input user graffiti is first divided into several groups for area of influence estimation, and then a data-driven color generation process is performed for each group. These outputs are eventually combined to form a high quality fill result. Zheng et al first designed a sharenet to learn three-dimensional geometric information in line drawings, and then used RenderNet to generate three-dimensional shadows. The SmartShadow application consists of three data-driven tools, a shadow brush to determine the area where the user wants to create a shadow, a shadow brush to control the shadow boundaries, and a global shadow generator to generate the entire image shadow based on the estimated global shadow direction. Li et al established a CNN-based method to solve the problem of structural line extraction of caricatures using the residual network and the idea of symmetrically skipping over the network. In order to create portrait pictures of various styles and unseen. Yi et al designed a novel portrait creation framework using an asymmetric cycle structure, and first used a regression network to calculate the mass fraction of the apdrowing. Based on this model, a quality loss is defined to direct the network to generate high quality apdradwings. Im2 Pensil and the generation process of the line draft are taken as an independent subtask to promote the synthesis of the cartoon and the illustration.
With these artificial intelligence techniques, can the conversion of character pictures into line script styles be done automatically? In essence, this problem can be seen as an image-to-image translation problem that converts a person's picture/picture into a line script representation domain, and GAN-based approaches are also practically applicable to solve this problem.
The image translation method (such as Pix2Pix and Cycle GAN) can also automatically complete the conversion from the character picture to the line draft style. But because (1) the manually drawn character lines are abstract and sparse, the method is completely different from other image translation applications; (2) The problem of unclear and incomplete lines caused by missing semantic information can not be realized, and high-quality figure drawing can not be realized.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention provides a method, an apparatus, a device and a medium for generating a character line script to solve the above technical problems.
The invention provides a character line draft generation method, which comprises the following steps:
acquiring a target figure picture;
inputting the target character picture into a pre-trained character sketch model, and converting the target character picture through the character sketch model to obtain a character sketch; the character line draft model training method comprises the following steps:
acquiring a character picture and a character line draft corresponding to the character picture;
inputting a character picture into a generator of a character sketch model, generating a line drawing corresponding to the character picture, and inputting a character sketch corresponding to the line drawing and the character picture into a discriminator of the character sketch model;
respectively calculating a first loss function, a second loss function and a third loss function, wherein the first loss function is a loss function of a generator, the second loss function is used for representing an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
weighting and summing the first loss function, the second loss function and the third loss function to obtain a total loss function;
and iteratively updating the parameters of the generator and the discriminator to enable the total loss function to reach a preset condition so as to finish the training of the character line draft model.
In an embodiment of the invention, the generator includes:
the encoder is used for encoding the figure picture to obtain an encoding result;
the decoder is used for decoding the coding result to obtain a line graph;
the encoder comprises a plurality of encoding layers and is used for carrying out multi-scale sampling on the figure picture and carrying out feature fusion on multi-scale features obtained by sampling to obtain fusion features;
the decoder comprises a plurality of decoding layers, and the feature map output by each coding layer is embedded into different decoding layers in the process of decoding the coding result.
In an embodiment of the present invention, the encoder uses a pre-trained resenext-50 as an encoder, and the coding layer is represented as:
input of the n-i +1 layer coding layer
Figure BDA0003774194340000031
Is shown as
Figure BDA0003774194340000032
Wherein the content of the first and second substances,
Figure BDA0003774194340000033
represents the output of the n-i layer coding layer,
Figure BDA0003774194340000034
to
Figure BDA0003774194340000035
Represents the outputs of layer 1 through i coding layers, PDS () represents a progressive downsampling operation,
Figure BDA0003774194340000036
represents a summing operation;
Figure BDA0003774194340000037
UP () means upsampling the feature map;
Figure BDA0003774194340000038
Figure BDA0003774194340000039
represents the output of the i-th layer coding layer,
Figure BDA00037741943400000310
represents the input of the i-th layer coding layer, C represents the number of groups,
Figure BDA00037741943400000311
representing a transformation function.
In an embodiment of the invention, the discriminator includes a plurality of sequentially connected discriminator blocks, and each of the discriminator blocks includes a convolution layer, a normalization layer, and a residual layer, which are sequentially connected.
The character line script generation method of claim 1, wherein said first loss function is expressed as:
Figure BDA0003774194340000041
wherein f is 1 Is the loss of mean square error, f 2 Representing an average operation, f 4 Adjusting the input picture to the same size as the output of the discriminator D, f 3 And f 5 The fill tensors are scalar values 1 and 0, respectively.
In an embodiment of the present invention, the third loss function is expressed as:
Figure BDA0003774194340000042
in an embodiment of the present invention, the second loss function is expressed as:
Figure BDA0003774194340000043
the invention provides a character line draft generating device, comprising:
the image acquisition module is used for acquiring a target figure image;
the character line draft generation module is used for inputting the target character picture into a pre-trained character line draft model and converting the target character picture through the character line draft model to obtain a character line draft; wherein, the training of personage line draft model is realized through the training module, the training module includes:
the data acquisition sub-module is used for acquiring the figure pictures and figure line drafts corresponding to the figure pictures;
the input sub-module is used for inputting the character pictures into a generator of a character sketch model, generating line drawings corresponding to the character pictures and inputting the character drawings corresponding to the line drawings and the character pictures into a discriminator of the character sketch model;
a loss function calculation submodule, configured to calculate a first loss function, a second loss function, and a third loss function, respectively, where the first loss function is a loss function of a generator, the second loss function is used to represent an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
the summation submodule is used for carrying out weighted summation on the first loss function, the second loss function and the third loss function to obtain a total loss function;
and the training sub-module is used for carrying out iterative updating on the parameters of the generator and the discriminator so that the total loss function reaches a preset condition to finish the training of the character line draft model.
The invention provides an electronic device, comprising:
one or more processors;
a storage device to store one or more programs that, when executed by the one or more processors, cause the electronic equipment to perform the steps of character line script generation.
The present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor of a computer, causes the computer to execute the steps of the character line script generating method.
The invention has the beneficial effects that: the character line draft generation method solves the problem that high-quality character line draft drawing cannot be generated due to clear and incomplete output image lines generated in the task of converting a picture into a line draft by the existing method in the field of style migration and image translation.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 is a schematic diagram of an implementation environment of a character line manuscript generation method according to an exemplary embodiment of the present application;
fig. 2 is a flowchart of a character line script generation method according to an exemplary embodiment of the present application;
FIG. 3 is a flowchart of a character line script model training method in accordance with an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a generator and an arbiter according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of a qualitative comparison of an exemplary embodiment of the present application with other methods;
FIG. 6 is a schematic diagram of a qualitative comparison of an exemplary embodiment of the present application with other methods;
FIG. 7 is a graphical illustration of qualitative results of an ablation experiment according to an exemplary embodiment of the present application;
fig. 8 is a block diagram of a character line script generation apparatus in an exemplary embodiment of the present application;
FIG. 9 is a block diagram of a training module of an exemplary embodiment of the present application;
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure herein, wherein the embodiments of the present invention are described in detail with reference to the accompanying drawings and preferred embodiments. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be understood that the preferred embodiments are illustrative of the invention only and are not limiting upon the scope of the invention.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than being drawn according to the number, shape and size of the components in actual implementation, and the type, amount and proportion of each component in actual implementation can be changed freely, and the layout of the components can be more complicated.
In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention, however, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, to avoid obscuring embodiments of the present invention.
Fig. 1 is a schematic diagram of an exemplary implementation environment of a character line script generation method according to the present application. Referring to fig. 1, the implementation environment includes a terminal device 101 and a server 102, and the terminal device 101 and the server 102 communicate with each other through a wired or wireless network. The terminal equipment acquires a target figure picture; inputting the target character picture into a pre-trained character line draft model, and converting the target character picture through the character line draft model to obtain a character line draft; the character line draft model training method comprises the following steps: acquiring a figure picture and a figure line draft corresponding to the figure picture; inputting a character picture into a generator of a character sketch model, generating a line drawing corresponding to the character picture, and inputting a character sketch corresponding to the line drawing and the character picture into a discriminator of the character sketch model; respectively calculating a first loss function, a second loss function and a third loss function, wherein the first loss function is a loss function of a generator, the second loss function is used for representing an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator; carrying out weighted summation on the first loss function, the second loss function and the third loss function to obtain a total loss function; and iteratively updating the parameters of the generator and the discriminator to enable the total loss function to reach a preset condition so as to finish the training of the character line draft model. The character line draft model obtained by training by the method solves the problem that the output image lines generated in the conversion task of the existing style migration and image translation field method from the picture/picture to the line draft drawing are clear and incomplete, so that high-quality character line draft drawing cannot be generated, and a network can obtain better extraction and combination capabilities of geometric information and semantic information, so that a line draft image with higher quality and complete information is generated.
It should be understood that the number of terminal devices 101 and servers 102 in fig. 1 is merely illustrative. There may be any number of terminal devices 101 and servers 102, as desired.
The terminal device 101 corresponds to a client, which may be any electronic device having a user input interface, including but not limited to a smart phone, a tablet, a notebook, a computer, etc., where the user input interface includes but not limited to a touch screen, a keyboard, a physical key, an audio pickup device, etc.
The server 102 corresponds to a server, may be a server providing various services, may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and an artificial intelligence platform, which is not limited herein.
The terminal device 101 may communicate with the server 102 through a wireless network such as 3G (third generation mobile information technology), 4G (fourth generation mobile information technology), 5G (fifth generation mobile information technology), and the like, which is not limited herein.
In order to solve the problems in the prior art, embodiments of the present application propose a character line manuscript generation method, a character line manuscript generation device, an electronic device, and a computer readable storage medium, which will be described in detail below.
Referring to fig. 2, fig. 2 is a flowchart illustrating a character line-script generation method according to an exemplary embodiment of the present application. The method may be applied to the implementation environment shown in fig. 1 and specifically executed by the terminal device 101 in the implementation environment. It should be understood that the method may be applied to other exemplary implementation environments and is specifically executed by devices in other implementation environments, and the embodiment does not limit the implementation environment to which the method is applied.
Referring to fig. 2, fig. 2 is a flowchart of an exemplary method for generating a character line script according to the present application, where the method for generating a character line script at least includes steps S210 to S220, and the following steps are introduced in detail:
step S210, obtaining a target person picture;
step S220, inputting the target character picture into a character sketch model trained in advance, and converting the target character picture through the character sketch model to obtain a character sketch; as shown in fig. 3, the method for training a character line script model includes:
step 2201, acquiring a character picture and a character line draft corresponding to the character picture;
step 2202, inputting a character picture into a generator of a character line script model, generating a line drawing corresponding to the character picture, and inputting a character line script corresponding to the line drawing and the character picture into a discriminator of the character line script model;
step S2203, calculating a first loss function, a second loss function and a third loss function respectively, where the first loss function is a loss function of a generator, the second loss function is used to represent an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
step S2204, performing weighted summation on the first loss function, the second loss function, and the third loss function to obtain a total loss function;
step S2205, iteratively updating the parameters of the generator and the discriminator to make the total loss function reach a preset condition, so as to complete the training of the character line draft model.
The respective steps are explained in detail below.
In step S210, a target person picture is acquired;
the target person picture may be a picture taken by a camera or a picture downloaded through a network, and the embodiment does not limit the manner of obtaining the picture.
In step S220, inputting the target character image into a pre-trained character sketch model, and converting the target character image through the character sketch model to obtain a character sketch;
and taking the target character picture as the input of the character line draft model, and converting the target character picture through the character line draft model to obtain the character line draft.
It should be noted that the character line-script model is trained in advance and then deployed on the corresponding server or terminal device.
In step S2201, a person picture and a person line script corresponding to the person picture are acquired;
the character line script and the character pictures can be provided by professional artists, and the character line script is well aligned with the character pictures, which is helpful for researching the drawing creation of the character line script.
In step S2202, a character picture is input to a generator of a character line drawing model, a line drawing corresponding to the character picture is generated, and a character line drawing corresponding to the line drawing and the character picture is input to a discriminator of the character line drawing model;
character line art describes the clear and key features of a character in the form of representations of structure lines. Thus, geometric and semantic information play a significant role in synthesizing important details in a picture. Although the U-net framework with Skip-connection mainly adopted by Pix2Pix, cycle GAN and the like can be used as an image translation model to generate character scripts, the models only combine feature maps with the same proportion in the encoding and decoding stages, and lack fusion of geometric and semantic information, which limits the generation quality. Therefore, in order to propagate the geometric and semantic information step by step into the output line draft, the present embodiment uses a simple and effective framework-across scale dense skip connection module as the core of the generator, which is defined as a geometric and semantic union generator, and is used in the description of the generator below.
In one embodiment, the geometric-semantic-associative generator includes:
the encoder is used for encoding the figure picture to obtain an encoding result;
the decoder is used for decoding the coding result to obtain a line graph;
the encoder comprises a plurality of encoding layers and is used for carrying out multi-scale sampling on the figure picture and carrying out feature fusion on multi-scale features obtained by sampling to obtain fusion features;
the decoder comprises a plurality of decoding layers, and the feature graph output by each coding layer is embedded into different decoding layers in the process of decoding the coding result.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a geometric-semantic-combined generator according to an exemplary embodiment of the present application. The coding Layer includes 4 residual layers (ReNetXt Layer), 6 Downsampling layers (Downsampling Layer), 4 upsampling layers (upsampling Layer), and one Convolution Layer (Convolution Layer). The first residual layer outputs a first characteristic diagram on the basis of the human figure picture, the second residual layer outputs a second characteristic diagram on the basis of the first characteristic diagram, the third residual layer outputs a third characteristic diagram on the basis of the second characteristic diagram, the fourth residual layer outputs a fourth characteristic diagram on the basis of the third characteristic diagram, and the fifth characteristic diagram is obtained after the fourth characteristic diagram passes through a convolution layer. In fig. 3, the first feature map is downsampled by one downsampling layer to obtain a first downsampling result, the second feature map is downsampled by one downsampling layer to obtain a second sampling result, and the third feature map is downsampled by one downsampling layer to obtain a third sampling result. The first sampling result is subjected to down-sampling through a down-sampling layer to obtain a fourth sampling result, the second sampling result is subjected to down-sampling through a down-sampling layer to obtain a fifth sampling result, and the fourth sampling result is subjected to down-sampling through a down-sampling layer to obtain a sixth sampling result; summing the fourth characteristic diagram, the third sampling result, the fifth sampling result and the sixth sampling result, then fusing the fourth characteristic diagram and the fifth characteristic diagram, and performing upsampling through an upsampling layer to obtain a first upsampling result; summing the third characteristic diagram, the second down-sampling result and the fourth down-sampling result, then fusing the third characteristic diagram and the first up-sampling result, and performing up-sampling through an up-sampling layer to obtain a second up-sampling result; summing the second characteristic diagram with the first down-sampling result, fusing the second characteristic diagram with the second up-sampling result, and performing up-sampling through an up-sampling layer to obtain a third up-sampling result; and after the first characteristic diagram and the third upsampling result are fused, upsampling is carried out through an upsampling layer to obtain a fourth upsampling result, and the fourth upsampling result is the same as the figure picture scale.
Specifically, the encoder compresses the rich information in the character picture into a coded representation, and the decoder constructs the required line draft from the coding. More specifically, the encoder employs a pre-trained ResNeXt-50 with simple, modular, efficient, and high learning capabilities as the encoder. Specifically, the phases Conv1 to Conv5 may be extracted from resenext as the coding layer (i.e., the resenext layer), which may be expressed as:
Figure BDA0003774194340000111
where i denotes the i-th layer of the encoder,
Figure BDA0003774194340000112
and
Figure BDA0003774194340000113
representing inputs and outputs. C represents the number of groups.
Figure BDA0003774194340000114
Representing a transformation function. In one embodiment, 1 × 1,3 × 3, and 1 × 1 convolution sequences are used.
Character pictures with the size of 512 x 512 are input in an encoder, and feature maps are extracted from each ResNeXt layer to establish a cross-scale skip connection model together with a decoding layer so as to fuse and propagate information with different abstraction degrees. In particular, assuming that the geometry-semantic-associative generators have a total of n layers, the input to the current decoding layer n-i +1 is the combination between the output of the previous layer and the corresponding coding layer with the same or greater resolution. The formula is defined as:
Figure BDA0003774194340000115
wherein the content of the first and second substances,
Figure BDA0003774194340000116
the output of the n-i layer is represented,
Figure BDA0003774194340000117
to
Figure BDA0003774194340000118
Representing the output of the 1 st to ith coding layers, whose scale is greater than or equal to that of the n-i +1 st layer. PDS () refers to a progressive downsampling operation that downsamples a feature map using convolution.
Figure BDA0003774194340000119
Refers to an element summation operation. And then, the fused multi-scale features flow through a decoding layer, and geometric and semantic information propagation is carried out to form mapping with higher resolution.
Figure BDA00037741943400001110
Wherein UP () represents upsampling the feature map; in a particular embodiment, UP () may represent a 2-fold upsampling of the feature map using a nearest neighbor algorithm. Therefore, through a cross jump connection mechanism, the feature graph of each coding layer can be embedded into different decoding layers so as to strengthen the integration of semantic and geometric information, improve the propagation of features between a coder and a decoder, directly train a generator with geometric and semantic combination to learn the character line drawing mode, and translate the real character picture into the line drawing with retained details.
In step S2202, inputting a character line script corresponding to the line drawing and the character picture to a discriminator of a character line script model;
the task of the discriminator is to discriminate between the generated character line drawings (line drawings) and the actual situation (character line drawings). In this embodiment, the method of PatchGAN may be employed to classify whether 32 × 32 Patch is authentic. The discriminator includes a plurality of discriminator blocks, each of which is composed of at least a 4 × 4 convolutional layer, an InstanceNorm normalization layer, and a LeakyReLU residual layer, wherein a slope of the LeakyReLU is 0.2. Such a discriminator helps generate a high quality line drawing because it places the P2LDGAN character line drawing model more focus on the details in Patch and can process images of arbitrary size with fewer parameters.
In step S2203, a first loss function, a second loss function and a third loss function are respectively calculated, where the first loss function is a loss function of a generator with geometric-semantic union, the second loss function is used to represent an error between the human line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
in the process of training the generator with the geometric-semantic union, calculating a first loss function of the generator with the geometric-semantic union through a discriminator; wherein the first loss function is represented as:
Figure BDA0003774194340000121
wherein G represents a generator of geometric-semantic union, f 1 Is the loss of mean square error, f 2 Representing an average operation, f 4 Adjusting the input picture to the same size as the output of the discriminator D, f 3 And f 5 Fill tensors of scalar values 1 and 0, respectively; d (l) refers to an output result of the judger's judgment on the real sample,
Figure BDA0003774194340000122
refers to the loss of the square mean of the output result obtained for the input.
In step S2203, in the process of training the geometry-semantic-combined generator, calculating a second loss function of the geometry-semantic-combined generator; wherein the first loss function is represented as:
in step S2203, a second loss function representing an error between the human character line drawing and the line drawing is obtained.
Since the character line-script model is trained using paired samples in the present embodiment, it is possible to train a character line-script model. By means of L 1 Loss (and L) 2 Loss versus less ambiguity it introduces) to measure the error between the generated line drawing (generated human line script) and the real line script, which may cause the translated drawing to follow the distribution of domain I. The second loss function is defined as:
Figure BDA0003774194340000131
Figure BDA0003774194340000132
representing the average absolute error loss for the input information p and l.
In the process of training the arbiter, a third loss function of the arbiter is calculated by the arbiter, the third loss function being expressed as:
Figure BDA0003774194340000133
in step S2204, the first loss function, the second loss function, and the third loss function are weighted and summed to obtain a total loss function; the total loss function is expressed as:
Figure BDA0003774194340000134
λ 1 、λ 2 、λ 3 denotes a weight, where λ denotes 1 、λ 2 、λ 3 The setting may be made empirically. And in one embodiment, λ 1 =1,λ 2 =0.5,λ 3 =100。
In step S2205, parameters of the generator of geometric-semantic association and the discriminator are iteratively updated, so that the total loss function reaches a preset condition, thereby completing training of the character line script model.
In the invention, the application of generating the countermeasure network is popularized to the problem of style conversion from pictures to line manuscript pictures, namely, the line manuscripts of the artistic characters are automatically generated from the real pictures. Source reference pair giving alignment
Figure BDA0003774194340000135
Wherein p is i And l i Respectively belonging to character picture fields
Figure BDA0003774194340000136
And character line draft field
Figure BDA0003774194340000137
N denotes the number of pairs of pictures, the goal of the character line script model is to learn a mapping function
Figure BDA0003774194340000141
Automatic character line finding manuscript
Figure BDA0003774194340000142
And the corresponding person picture.
Figure BDA0003774194340000143
The process of training the character line draft model comprises the following steps:
the geometric-semantic combined generator generates a line drawing according to the character picture, the line drawing can be considered as a forged character line draft, and the discriminator distinguishes the real character line draft and the forged character line draft of the character picture, so that the parameters of the geometric-semantic combined generator and the discriminator are updated, and continuous alternate training enables the geometric-semantic combined generator to generate the forged character line draft similar to the real character line draft according to one character picture. In the process of continuously updating the parameters of the generator and the discriminator of the geometric semantic association, when the total loss function reaches a preset condition, the training is stopped to obtain the character line draft model, wherein the preset condition can be that the total loss is minimum.
The invention provides an end-to-end system structure P2LDGAN (character line script model), namely a character line script model, and aims to automatically learn the cross-domain corresponding relation between pictures and character line scripts. The starting point of the P2LDGAN is that a real character photo is input, a real hand-drawn character line draft is output, and the problem that the output image line generated in the conversion task from the picture to the line draft by the existing style migration and image translation field method is clear and incomplete, so that high-quality character line draft drawing cannot be generated is solved.
In order to measure the quality of the line script of the person, the invention adopts three common Similarity indexes, namely, free Inclusion Distance (FID), structural Similarity measure (SSIM) and Peak Signal-to-noise Ratio (PSNR), to quantitatively evaluate the performance of the previous method and the P2LDGAN provided by the invention. Wherein, the FID calculates the distribution distance between the line manuscript image set created from the input photo and the corresponding real line manuscript image set (the smaller the FID value is, the better the drawing quality is), and the SSIM describes the similarity between the generated image and the real line manuscript image (the higher the SSIM value is, the better the effect is). The PSNR calculates the intensity difference between the predicted image and the real image (a larger PSNR score means a smaller difference between images).
In order to demonstrate the superior performance of the P2LDGAN model of the present invention, the present invention was compared quantitatively and qualitatively with five networks, including one existing neural style transformation method (i.e. Gatys) and four general image-to-image translation methods, cyclegan, discoGAN, UNIT, pix2pix, MUNIT. It should be noted that the present invention also uses paired data to train an unsupervised approach.
The present invention qualitatively assesses the performance of the P2LDGAN of the present invention by comparison with the five network approach on test data. Fig. 5 and 6 show examples of visualizations. From these results, the following conclusions can be drawn. The drawing of the character line draft is composed of abstract lines in nature and does not have any texture information. However, gatys based on neural style migration produces a picture with a global gray look (as shown in the third column) filled with many textures, far from the distribution of line drawings. This is mainly due to the use of the gram matrix. In contrast, the model of the invention can generate clear and natural lines with less texture.
For CycelGAN, it severely obscures details, such as the boy's clothing and shoes in the first row of FIG. 5. Also, it introduces a large twist in different areas (such as the girl's hair and bag in the third row of fig. 6), which results in an unsightly line quality. While pix2pix successfully generates a somewhat acceptable perceptual appearance, it suffers from jagged borders, incomplete lines, and loss of detail. For example, as can be seen from the second line of fig. 4, pix2pix cannot build a nose for the person. In contrast, the P2LDGAN of the present invention can alleviate this unpleasant error, create a better line script, and retain more details and structural information.
In the present invention, the character line draft model is compared to DiscoGAN, UNIT and mutt. Although they can capture the basic character line script style and produce more reliable results, the drawings they produce are corrupted by confusing borders and overly smooth lines. The method provided by the invention can provide more accurate lines to improve the visual quality and keep consistent with the real line manuscript image. Furthermore, most of these methods do not make full use of semantic information, and therefore they cannot learn the salient features of the input photograph, nor can they generate natural line drawings with less visible artifacts. For example, as is clear from the third row of fig. 5 and 6, these models do not generate refined face structures.
In conclusion, the P2LDGAN proposed by the present invention is significantly better than other methods in terms of visual quality, detail preservation and artifact reduction. The method of the present invention works well in both simple and complex character line script learning.
Table 1 quantitative comparison with the most advanced model on the imported dataset ↓ means lower score better, and ↓ > means higher score better
Figure BDA0003774194340000161
In order to quantify the authenticity and the credibility of the generated character line draft, the invention carries out objective evaluation by calculating the average scores of three evaluation methods, namely an FID (Fid definition), an SSIM (structural similarity model) and a PNSR (public data request) on a test set. In table 1, the results of the quantitative comparison of the present invention with GAN-based methods. As is clear from table 1, the character line script model proposed by the present invention achieves the best FID, SSIM and PNSR values, and is significantly superior to previous competitors from the standpoint of realism and credibility.
Specifically, the lowest FID score indicates that the generated line script graph is closest to the distribution of the real image, while the highest SSIM and PNSR scores further indicate that the similarity between the results of the present invention and the real image is the greatest.
In conclusion, the effectiveness and superiority of the model provided by the invention in synthesizing high-quality character line manuscripts are proved by quantitative experiments, which is consistent with the visual result.
In one embodiment, an attempt is made to replace the summing operation in the cross-scale skip connect module with a stitching operation. From fig. 7, it can be seen that acceptable results were obtained using the stitching method, but the borders were unclear, somewhat rough and noisy. The values in table 2 also show the effect of the fusion strategy on the drawing quality.
Table 2 quantitative results of ablation experiments
Figure BDA0003774194340000171
In one embodiment, to verify the effectiveness of the cross-scale skip-join geometry-semantic join generator model of the present invention in preserving fine detail, a quantitative and qualitative comparison is made between P2 LDGANs with/without the use of a cross-scale skip-join model. From the visual result, without the connection model, the generated character line script has many structural details loss (such as the hair area of the second row), which results in poor visual effect, and the P2LDGAN of the invention can recover the delicate structures, which results in a robust and better-looking result.
In another embodiment, the decoding layer of the present invention is also replaced with "deconvolution + upsampling + convolution". As can be seen from fig. 7 and table 2, the network using the replaced decoder produces a character script that is similar to, and even better than, the character script model of the present invention. However, it requires more parameters and computational expense than the figure line high model P2LDGAN of the present invention. In summary, the P2LDGAN of the present invention shows excellent performance in translating a character picture into a high quality character script with better detailed structure, clear lines and fewer artifacts.
Fig. 8 is a block diagram of a character line-script generation apparatus according to an exemplary embodiment of the present application. The device can be applied to the implementation environment shown in fig. 1 and is specifically configured in the terminal equipment. The apparatus may also be applied to other exemplary implementation environments and specifically configured in other devices, and the embodiment does not limit the implementation environment to which the apparatus is applied.
As shown in fig. 8, the present application provides a character line manuscript generation device, comprising:
a picture obtaining module 810, configured to obtain a picture of a target person;
a character line-script generating module 820, configured to input the target character picture into a pre-trained character line-script model, and convert the target character picture through the character line-script model to obtain a character line-script; the training of the character line draft model is implemented by a training module, as shown in fig. 9, the training module includes:
the data acquisition submodule 8201 is used for acquiring a character picture and a character line draft corresponding to the character picture;
an input sub-module 8202 for inputting a character picture into a generator of a character sketch model, generating a line drawing corresponding to the character picture and inputting a character sketch corresponding to the line drawing and the character picture into a discriminator of the character sketch model;
a loss function calculation submodule 8203, configured to calculate a first loss function, a second loss function, and a third loss function, respectively, where the first loss function is a loss function of a generator, the second loss function is used to represent an error between the character line draft and the line drawing, and the third loss function is a loss function of a discriminator;
a summation submodule 8204, configured to perform weighted summation on the first loss function, the second loss function, and the third loss function to obtain a total loss function;
a training submodule 8205, configured to iteratively update parameters of the generator and the discriminator, so that the total loss function reaches a preset condition, to complete training of the character line draft model.
It should be noted that the person line script generating apparatus provided in the foregoing embodiment and the person line script generating method provided in the foregoing embodiment belong to the same concept, and specific ways for the modules and units to perform operations have been described in detail in the method embodiment, and are not described again here. In practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions, which is not limited herein.
An embodiment of the present application further provides an electronic device, including: one or more processors; a storage device, configured to store one or more programs, and when the one or more programs are executed by the one or more processors, enable the electronic device to implement the character line script generation method provided in the foregoing embodiments.
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU) 1001 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage portion 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An Input/Output (I/O) interface 1005 is also connected to the bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated in flowchart 2. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU) 1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may comprise a propagated data signal with a computer-readable computer program embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor of a computer, causes the computer to execute the character line script generation method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.
Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the character line script generation method provided in the foregoing embodiments.
The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Those skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A character line draft generation method is characterized by comprising the following steps:
acquiring a target figure picture;
inputting the target character picture into a pre-trained character line draft model, and converting the target character picture through the character line draft model to obtain a character line draft; the character line draft model training method comprises the following steps:
acquiring a figure picture and a figure line draft corresponding to the figure picture;
inputting a character picture into a generator of a character manuscript model, generating a line drawing corresponding to the character picture, and inputting the line drawing and a character manuscript corresponding to the character picture into a discriminator of the character manuscript model;
respectively calculating a first loss function, a second loss function and a third loss function, wherein the first loss function is a loss function of a generator, the second loss function is used for representing an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
weighting and summing the first loss function, the second loss function and the third loss function to obtain a total loss function;
and iteratively updating the parameters of the generator and the discriminator to enable the total loss function to reach a preset condition so as to finish the training of the character line draft model.
2. The character line script generation method of claim 1, wherein said generator comprises:
the encoder is used for encoding the figure picture to obtain an encoding result;
the decoder is used for decoding the coding result to obtain a line graph;
the encoder comprises a plurality of encoding layers and is used for carrying out multi-scale sampling on the figure picture and carrying out feature fusion on multi-scale features obtained by sampling to obtain fusion features;
the decoder comprises a plurality of decoding layers, and the feature map output by each coding layer is embedded into different decoding layers in the process of decoding the coding result.
3. The method of claim 2, wherein the encoder uses a pre-trained resenext-50 as an encoder, and the encoding layer is represented as:
input of n-i +1 coding layer
Figure FDA00037741943300000212
Is shown as
Figure FDA0003774194330000021
Wherein the content of the first and second substances,
Figure FDA0003774194330000022
represents the output of the n-i layer coding layer,
Figure FDA0003774194330000023
to
Figure FDA0003774194330000024
Represents the outputs of layer 1 through i coding layers, PDS () represents a progressive downsampling operation,
Figure FDA0003774194330000025
represents a summing operation;
Figure FDA0003774194330000026
UP () means upsampling the feature map;
Figure FDA0003774194330000027
Figure FDA0003774194330000028
represents the output of the i-th layer coding layer,
Figure FDA0003774194330000029
indicates the input of the i-th layer coding layer, C indicates the number of groups,
Figure FDA00037741943300000210
representing a transformation function.
4. The method of claim 3, wherein the discriminator comprises a plurality of sequentially connected discriminator blocks, each of which comprises a convolutional layer, a normalization layer, and a residual layer, which are sequentially connected.
5. The character line script generation method of claim 1, wherein said first loss function is expressed as:
Figure FDA00037741943300000211
wherein f is 1 Is the loss of mean square error, f 2 Representing an average operation, f 4 Adjusting the input picture to the same size as the output of the discriminator D, f 3 And f 5 The fill tensors are scalar values 1 and 0, respectively.
6. The character line script generation method of claim 5, wherein said third loss function is expressed as:
Figure FDA0003774194330000031
7. the character line draft generation method of claim 6, wherein said second loss function is expressed as:
Figure FDA0003774194330000032
8. a character line script generating apparatus, comprising:
the image acquisition module is used for acquiring a target person image;
the character line draft generating module is used for inputting the target character picture into a pre-trained character line draft model and converting the target character picture through the character line draft model to obtain a character line draft; wherein, the training of personage line draft model is realized through the training module, the training module includes:
the data acquisition sub-module is used for acquiring the figure pictures and figure line drafts corresponding to the figure pictures;
the input sub-module is used for inputting the character pictures into a generator of a character manuscript model, generating a line drawing corresponding to the character pictures and inputting the line drawing and the character manuscript corresponding to the character pictures into a discriminator of the character manuscript model;
a loss function calculation submodule, configured to calculate a first loss function, a second loss function, and a third loss function, respectively, where the first loss function is a loss function of a generator, the second loss function is used to represent an error between the character line drawing and the line drawing, and the third loss function is a loss function of a discriminator;
the summation submodule is used for carrying out weighted summation on the first loss function, the second loss function and the third loss function to obtain a total loss function;
and the training submodule is used for iteratively updating the parameters of the generator and the discriminator to enable the total loss function to reach a preset condition so as to finish the training of the character line draft model.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to perform the steps of character line script generation as recited in any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the steps of the character line script generating method of any one of claims 1 to 7.
CN202210912212.1A 2022-07-28 2022-07-28 Character line draft generation method, device, equipment and medium Pending CN115170388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210912212.1A CN115170388A (en) 2022-07-28 2022-07-28 Character line draft generation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210912212.1A CN115170388A (en) 2022-07-28 2022-07-28 Character line draft generation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115170388A true CN115170388A (en) 2022-10-11

Family

ID=83476643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210912212.1A Pending CN115170388A (en) 2022-07-28 2022-07-28 Character line draft generation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115170388A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439894A (en) * 2022-11-08 2022-12-06 荣耀终端有限公司 Method, electronic device, program product, and medium for training fingerprint matching model
TWI826336B (en) * 2023-07-04 2023-12-11 凌網科技股份有限公司 Frame image acquisition method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002003707A2 (en) * 2000-06-30 2002-01-10 Intel Corporation Model-based video image coding
CN114387365A (en) * 2021-12-30 2022-04-22 北京科技大学 Line draft coloring method and device
CN114419178A (en) * 2022-01-19 2022-04-29 北京联合大学 Mural corresponding line draft generation method and equipment based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002003707A2 (en) * 2000-06-30 2002-01-10 Intel Corporation Model-based video image coding
CN114387365A (en) * 2021-12-30 2022-04-22 北京科技大学 Line draft coloring method and device
CN114419178A (en) * 2022-01-19 2022-04-29 北京联合大学 Mural corresponding line draft generation method and equipment based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHENG YU FANG 等: "Learning to Generate Artistic Character Line Drawing" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439894A (en) * 2022-11-08 2022-12-06 荣耀终端有限公司 Method, electronic device, program product, and medium for training fingerprint matching model
TWI826336B (en) * 2023-07-04 2023-12-11 凌網科技股份有限公司 Frame image acquisition method

Similar Documents

Publication Publication Date Title
Jam et al. A comprehensive review of past and present image inpainting methods
US10748324B2 (en) Generating stylized-stroke images from source images utilizing style-transfer-neural networks with non-photorealistic-rendering
Liu et al. Robust single image super-resolution via deep networks with sparse prior
CN115170388A (en) Character line draft generation method, device, equipment and medium
CN111325851A (en) Image processing method and device, electronic equipment and computer readable storage medium
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
US20220327767A1 (en) Utilizing voxel feature transformations for view synthesis
CN113901894A (en) Video generation method, device, server and storage medium
US20230123820A1 (en) Generating animated digital videos utilizing a character animation neural network informed by pose and motion embeddings
CN111862294A (en) ArcGAN network-based automatic coloring network structure and method for hand-drawn 3D building
CN112381716B (en) Image enhancement method based on generation type countermeasure network
CN114339409A (en) Video processing method, video processing device, computer equipment and storage medium
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN115082300A (en) Training method of image generation model, image generation method and device
CN110415169A (en) A kind of depth map super resolution ratio reconstruction method, system and electronic equipment
Li et al. Compnvs: Novel view synthesis with scene completion
JP2023545052A (en) Image processing model training method and device, image processing method and device, electronic equipment, and computer program
CN117252984A (en) Three-dimensional model generation method, device, apparatus, storage medium, and program product
CN115100334B (en) Image edge tracing and image animation method, device and storage medium
CN116977169A (en) Data processing method, apparatus, device, readable storage medium, and program product
CN116703719A (en) Face super-resolution reconstruction device and method based on face 3D priori information
Wang et al. Unsupervised scene sketch to photo synthesis
CN115908205A (en) Image restoration method and device, electronic equipment and storage medium
CN115578497A (en) Image scene relighting network structure and method based on GAN network
CN114596203A (en) Method and apparatus for generating images and for training image generation models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20221011

WD01 Invention patent application deemed withdrawn after publication