CN114387365A

CN114387365A - Line draft coloring method and device

Info

Publication number: CN114387365A
Application number: CN202111665537.6A
Authority: CN
Inventors: 王粉花; 严由齐; 郑嘉伟; 林超
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-22
Anticipated expiration: 2041-12-30
Also published as: CN114387365B

Abstract

The invention discloses a line draft coloring method and device, wherein the method comprises the following steps: collecting a color image, preprocessing the collected color image, generating a line draft according to the preprocessed color image, extracting a corresponding color prompt by adopting a random covering fuzzy method, and constructing a training data set; introducing an attention screening mechanism at a jumping connection part of a generator model of the pix2pixGAN model to improve the pix2pixGAN model to obtain a line draft coloring model; training the line draft coloring model by using the training data set; and coloring the line draft to be colored by utilizing the trained line draft coloring model. The method can improve the accuracy of the line draft coloring result and the quality of the generated picture, and can better converge and prevent oscillation during training.

Description

Line draft coloring method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a line draft coloring method and device.

Background

Image coloring is a popular theme in computer vision, and most coloring tasks are to color a gray image and to restore a black-and-white photo to a color image. At present, the animation industry is continuously developing, and line art coloring is an important ring, and good character color matching can attract more audiences. The coloring of line drawings is a very complicated task, and it may take hours or even days to manually draw a line drawing into an elegant picture. Although many deep learning methods are available to achieve satisfactory coloring results of grayscale images, if only a simple method of coloring grayscale images or image migration is applied to coloring line drawings, the final coloring results are not ideal. Because the line draft only contains contour information of some sparse edges, the information such as complete texture, shadow and gradual change is not contained like a gray image, and the coloring difficulty is greatly increased.

With the rapid development of deep learning technology, CNN (Convolutional Neural Network) based algorithms are widely applied in various fields, but for some generative tasks, the Convolutional Neural Network is not well completed. Until Goodfellow et al put forward a GAN (generic adaptive Network, generating a countermeasure Network), there was a better solution to the Generative tasks of super-resolution, image coloring, style migration, etc. Isola et al propose a generative countermeasure network (Pix2pixGAN) for image-to-image conversion to accomplish the task of image translation mapping from a source image domain to a target image domain. The Pix2pixGAN can also be used for coloring line manuscripts, but the overall effect is general, and the coloring result is random, so that the overall color matching of the image is disordered. To address this problem, Sangkloy et al propose to use the same sparse color brush as the line copy to accomplish controllable coloring, i.e. coloring by adding some continuous lines of color as cues. Zhang et al extracted a 4096-dimensional global feature of the reference map using a pre-trained VGG16/19 model, added to the middle layer of the generated network, and added two guideline decoders to avoid the disappearance of the middle layer gradients. Junsoo et al propose an SCFT (spatial correspondence Feature Transfer) module for fusing color information of a reference map into a line manuscript map, and in addition, a triplet loss based on similarity, which can better zoom in the distance between a coloring result and the reference map. Wang et al, by extracting features from an early stage of a pre-training network, introduced a local feature network to extract a semantic feature map, and used the local features as one of the inputs to a generator and a discriminator, but since the line drawing is colored, feature information is much less than a gray scale map, and the local feature network is not very good for semantic feature extraction. Guo et al propose a two-step coloring method, where the model trained in the first step guesses the color area and roughly fills in multiple colors on the line draft; the second step is used to correct for false shading and refine the shading results, which has the disadvantage of requiring manual production of a large number of images of intermediate results of the first and second steps to train the model.

In summary, the existing line draft coloring technologies are mostly improved on the basis of GAN, and these technologies have disadvantages of color overflow (i.e. filled color overflows into the surrounding area dispersively like water color), error in boundary line recognition (i.e. when lines are dense, other lines are mistaken as boundary lines, resulting in adjacent non-relevant areas being filled with wrong color), color inconsistency (i.e. the color of some areas of the coloring result does not correspond to the color hint), too large randomness of the coloring result, and the like.

Disclosure of Invention

The invention provides a line draft coloring method and device, and aims to solve the technical problems that information transmitted by a skip connection part of a current model is redundant, bottom layer characteristics are not fully utilized, and the randomness of coloring results is too high.

In order to solve the technical problems, the invention provides the following technical scheme:

in one aspect, the present invention provides a line draft coloring method, including:

collecting a color image, preprocessing the collected color image, generating a line draft according to the preprocessed color image, extracting a corresponding color prompt by adopting a random covering fuzzy method, and constructing a training data set;

introducing an attention screening mechanism at a jumping connection part of a generator model of the pix2pixGAN model to improve the pix2pixGAN model to obtain a line draft coloring model;

training the line draft coloring model by using the training data set;

and coloring the line draft to be colored by utilizing the trained line draft coloring model.

Further, the collecting the color image, preprocessing the collected color image, generating a line draft according to the preprocessed color image, extracting corresponding color prompts by using a random covering fuzzy method, and constructing a training data set, including:

collecting a color image, and preprocessing the collected color image; wherein the preprocessing the collected color images comprises: rotating and mirroring a part of images in the collected color images to realize data expansion, and cutting all the images; wherein the cutting size is 256 multiplied by 256;

generating a line draft according to the preprocessed color image;

covering the color randomly by using 120 white color blocks with the size of 20 multiplied by 20, and then carrying out fuzzy processing on the covered image by taking an average filter with the kernel size of 100 multiplied by 100 to obtain a corresponding color prompt of the color image;

and storing the color image and the corresponding line draft and color prompt thereof to construct a training data set.

Further, the generator model of the line draft coloring model comprises an up-sampling module, a down-sampling module and an attention screening module; the process of generating the color image according to the line draft and the corresponding color prompt by the generator model of the line draft coloring model comprises the following steps:

firstly, sending a single-channel line draft with the size of 256 multiplied by 256 and a corresponding color prompt with the size of 256 multiplied by 3 into a generator model of a line draft coloring model after cascading, then reducing the dimension of an image through a multilayer down-sampling module and extracting the features of all scales; then, deep information is extracted through an up-sampling module, and each up-sampling is fused with the corresponding feature extraction part; the down-sampling module and the up-sampling module both take a ReLU (Rectified Linear Unit) active layer, a 3 x 3 convolution and BN (Batch Normalization) layer as basic units, and introduce a residual mechanism to avoid gradient extinction and degradation problems of a deep network; the down-sampling and up-sampling operations scale using max-pooling and anti-max-pooling; after the up-sampling operation, a LeakyRelu active layer, a 3 × 3 convolution layer and a Tanh active layer are further performed, and finally a 256 × 256 color image is generated;

the Attention screening module takes a current network layer characteristic x and a next network layer characteristic g as input, aligns and adds the size and the channel number through convolution operation, performs dimension compression by using ReLU activation and convolution operation, generates an Attention coefficient through Sigmoid activation, and finally upsamples the Attention coefficient to the original dimension of an original vector x, multiplies the original vector x by the Attention coefficient and outputs the Attention coefficient; the calculation method is as follows:

wherein σ₁For ReLU activation function, σ₂Activating a function for Sigmoid; theta_attInvolving linear transformation

Are all 1 × 1 convolution，

An offset term for the corresponding convolution; vector quantity

For the ith feature of the current network layer, vector g_iIs the ith feature in the next layer.

Further, the discriminator model of the line draft coloring model is a full convolution network; the discriminator model of the line manuscript coloring model takes a line manuscript image and a generated image output by a generator model of the line manuscript coloring model or a line manuscript image and a real image which are cascaded as input, and outputs a matrix of 30 multiplied by 30 after passing through a plurality of layers of rolling blocks; each value in the matrix is used for judging a small area in the original image, and the values are averaged to be finally output by the discriminator.

Further, the loss function of the generator model of the line draft coloring model is as follows:

L_G＝λ_l1L_l1+λ_advL_adv+λ_tvL_tv

wherein,

wherein, I_sIs a line drawing, I_hIs a color cue, I_rIs a real image, G is a generator, D is a discriminator, lambda_l1、λ_adv、λ_tvIs a weight, x_i,jThe pixel value representing the position of coordinates (i, j) in the image, β being a super-positionA parameter;

respectively for the combined distribution I_r、I_sFor combined distribution I_h、I_sIs mathematically expected and distributed for combinations I_s、I_h、I_rThe mathematical expectation of (2).

Further, the loss function of the discriminator model of the line draft coloring model is as follows:

L_D＝-λ_advL_adv

wherein,

further, the method further comprises:

in the process of training the line draft coloring model, a learning rate attenuation mechanism is added to help the training convergence and prevent oscillation; wherein the expression of the learning rate decay mechanism is as follows:

wherein alpha is the learning rate of the optimizer in the current stage, and the learning rate attenuation operation is performed every 15 rounds.

On the other hand, the invention also provides a line draft coloring device, which comprises:

the training data set building module is used for collecting color images, preprocessing the collected color images, generating line drafts according to the preprocessed color images, extracting corresponding color prompts by adopting a random covering fuzzy method and building a training data set;

the model improvement module is used for introducing an attention screening mechanism into a jumping connection part of a generator model of the pix2pixGAN model so as to improve the pix2pixGAN model and obtain a line draft coloring model;

the model training module is used for training the line draft coloring model by utilizing the training data set;

and the coloring module is used for coloring the line draft to be colored by utilizing the trained line draft coloring model.

In yet another aspect, the present invention also provides an electronic device comprising a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the above-described method.

In yet another aspect, the present invention also provides a computer-readable storage medium having at least one instruction stored therein, the instruction being loaded and executed by a processor to implement the above method.

The technical scheme provided by the invention has the beneficial effects that at least:

compared with a pix2pixGAN model, the color prompt method can automatically extract the color prompt from the color image for training, and the network can color correctly according to the given color prompt; the learning rate attenuation mechanism can help train better convergence and prevent oscillation; and an attention screening mechanism is added, so that the model can be more fully utilized for characteristic information, the coloring of the line draft detail part is more sensitive, and the accuracy of the coloring result and the quality of the generated picture are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is an execution flow diagram of a line draft coloring method according to an embodiment of the present invention;

fig. 2 is a schematic network structure diagram of a line draft coloring model generator according to an embodiment of the present invention;

fig. 3 is a schematic network structure diagram of an attention screening mechanism according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

First embodiment

The embodiment provides a line draft coloring method, which can be implemented by an electronic device, and the electronic device can be a terminal or a server. The execution flow of the method is shown in fig. 1, and comprises the following steps:

s1, collecting color images, preprocessing the collected color images, generating line drafts according to the preprocessed color images, extracting corresponding color prompts by adopting a random covering fuzzy method, and constructing a training data set;

s2, introducing an attention screening mechanism at a jumping connection part of a generator model of the pix2pixGAN model to improve the pix2pixGAN model to obtain a line draft coloring model;

s3, training the line draft coloring model by using the training data set;

and S4, coloring the line draft to be colored by using the trained line draft coloring model.

Further, in this embodiment, the implementation process of S1 is specifically as follows:

collecting a large number of color images, and preprocessing the collected color images; wherein the pre-processing process comprises: rotating and mirroring a part of images in the collected color images to realize data expansion, and cutting all the images; wherein the cutting size is 256 multiplied by 256;

generating a line draft according to the preprocessed color image;

Further, in this embodiment, the generator model of the line draft coloring model has a structure as shown in fig. 2, and is designed based on a U-Net architecture, and is composed of an up-sampling module, a down-sampling module and an attention screening module; the process of the generator model generating a color image from the line draft and the corresponding color cues is as follows:

firstly, cascading single-channel line draft with the size of 256 multiplied by 256 and corresponding color prompts with the size of 256 multiplied by 3, sending the single-channel line draft and the corresponding color prompts into a network, then performing dimensionality reduction on an image through a multilayer downsampling module, and extracting features of all scales; then, deep information is extracted through an up-sampling module, and each up-sampling is fused with the corresponding feature extraction part; the down-sampling module and the up-sampling module both take a ReLU (Rectified Linear Unit) active layer, a 3 x 3 convolution and BN (Batch Normalization) layer as basic units, and introduce a residual mechanism to avoid gradient extinction and degradation problems of a deep network; the down-sampling and up-sampling operations adopt maximum pooling and inverse maximum pooling for scaling; after the upsampling operation, a LeakyRelu active layer, a 3 × 3 convolutional layer and a Tanh active layer are passed, finally producing a 256 × 256 color image.

The Attention screening module is shown in fig. 3, and takes a current network layer feature x and a next network layer feature g as input, aligns and adds the size and the channel number through convolution operation, performs dimension compression by using ReLU activation and convolution operation, generates an Attention coefficient through Sigmoid activation, and finally upsamples the Attention coefficient to an original dimension of an original vector x, multiplies the original vector x by the Attention coefficient and outputs the result; the specific calculation method is as follows:

Are all 1 x 1 convolutions of the original,

an offset term for the corresponding convolution; vector quantity

Further, in this embodiment, the discriminator model of the line draft coloring model takes into account the concept of PatchGAN, and is a full convolution network; the classifiers of GAN are typically mapped from a 256 x 256 image to a single scalar output, with the scalar value being used to measure the authenticity of the image. The output result is that the whole picture is weighted, the characteristics of the local image cannot be well reflected, and the method is not practical for tasks with high precision requirements. In order to better judge the local features of the image, the discriminator model of the embodiment takes the line draft and the generated image or the line draft and the real image as input after being cascaded, and outputs a matrix of 30 × 30 after passing through a plurality of layers of rolling blocks; each value in the matrix is used for judging a small area in the original image, and the values are averaged to be finally output by the discriminator.

Further, in this embodiment, the loss function of the generator model of the line draft coloring model is:

L_G＝λ_l1L_l1+λ_advL_adv+λ_tvL_tv

the loss function of the discriminator model of the line draft coloring model is as follows:

L_D＝-λ_advL_adv

wherein:

wherein, I_sIs a line drawing, I_hIs a corresponding color cue, I_rIs a real image, G is a generator, D is a discriminator, lambda_l1、λ_adv、λ_tvAs a weight, λ_l1＝100，λ_adv＝1，λ_tv＝1；x_i,jA pixel value indicating a position with coordinates (i, j) in an image, β is a hyper parameter, and in the present embodiment, β is 2.

Further, in this embodiment, the method for coloring a line draft further includes:

In summary, the embodiment is used for training by automatically extracting the color prompt, and the color prompt is added as additional information to assist in coloring when coloring, so as to achieve the purpose of interactive coloring, and enable the corresponding part of the online manuscript to be painted with the desired color. An attention screening mechanism is introduced at a jump connection part of a generator model, so that a network can automatically learn and distinguish the shape and the size of a target, and the sensitivity and the accuracy of the model are improved by inhibiting the activation of features in an irrelevant area. Thereby improving the accuracy of the coloring result and the quality of the generated picture. In addition, in the embodiment, a learning rate attenuation mechanism is added in the training process, which is beneficial to training convergence and prevents oscillation.

Second embodiment

The embodiment provides a line draft coloring device, which comprises the following modules:

The line draft coloring device of the present embodiment corresponds to the line draft coloring method of the first embodiment; the functions implemented by the functional modules in the line draft coloring device of the embodiment correspond to the flow steps in the line draft coloring method of the first embodiment one by one; therefore, it is not described herein.

Third embodiment

The present embodiment provides an electronic device, which includes a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the method of the first embodiment.

The electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) and one or more memories, where at least one instruction is stored in the memory, and the instruction is loaded by the processor and executes the method.

Fourth embodiment

The present embodiments provide a computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the method of the first embodiment. The computer readable storage medium may be, among others, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The instructions stored therein may be loaded by a processor in the terminal and perform the above-described method.

Furthermore, it should be noted that the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims

1. A line draft coloring method is characterized by comprising the following steps:

training the line draft coloring model by using the training data set;

2. The method for coloring line draft according to claim 1, wherein the collecting color images, preprocessing the collected color images, generating the line draft according to the preprocessed color images, extracting corresponding color prompts by a random covering fuzzy method, and constructing a training data set comprises:

generating a line draft according to the preprocessed color image;

3. The line draft coloring method according to claim 1, wherein the generator model of the line draft coloring model comprises an up-sampling module, a down-sampling module and an attention screening module; the process of generating the color image according to the line draft and the corresponding color prompt by the generator model of the line draft coloring model comprises the following steps:

Are all 1 x 1 convolutions of the original,

an offset term for the corresponding convolution; vector quantity

4. The method for coloring a line draft according to claim 1, wherein the discriminator model of the line draft coloring model is a full convolution network; the discriminator model of the line manuscript coloring model takes a line manuscript image and a generated image output by a generator model of the line manuscript coloring model or a line manuscript image and a real image which are cascaded as input, and outputs a matrix of 30 multiplied by 30 after passing through a plurality of layers of rolling blocks; each value in the matrix is used for judging a small area in the original image, and the values are averaged to be finally output by the discriminator.

5. The line draft coloring method according to claim 1, wherein a loss function of a generator model of the line draft coloring model is:

L_G＝λ_l1L_l1+λ_advL_adv+λ_tvL_tv

wherein,

wherein, I_sIs a line drawing, I_hIs a color cue, I_rIs a real image, G is a generator, D is a discriminator, lambda_l1、λ_adv、λ_tvIs a weight, x_i,jA pixel value representing a location of coordinates (i, j) in the image, β being a hyper-parameter;

6. The method for coloring line art according to claim 5, wherein the loss function of the discriminator model of the line art coloring model is:

L_D＝-λ_advL_adv

wherein,

7. the method for coloring line art according to claim 1, further comprising:

8. The utility model provides a line manuscript coloring device which characterized in that includes: