CN116188652A

CN116188652A - Face gray image coloring method based on double-scale circulation generation countermeasure

Info

Publication number: CN116188652A
Application number: CN202211412711.0A
Authority: CN
Inventors: 王奔; 陈亮锜
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-05-30

Abstract

The invention discloses a face gray image coloring method based on double-scale circulation generation countermeasure, which comprises the following implementation steps: data collection and preprocessing, model construction, model training and image coloring. The invention utilizes a cyclic generation network, adopts double-scale convolution, merges CBAM attention modes in jump connection to construct a human face gray image coloring model, inputs gray images into a generator, focuses on important information of a region to be colored, and inhibits mapping learning of unnecessary regions. PatchGAN is used on the arbiter to realize finer discrimination. The method realizes high-efficiency end-to-end automatic coloring, well alleviates the problems of edge color overflow, detail loss and boring coloring commonly existing in the prior art, and finally generates a color image with excellent coloring effect.

Description

Face gray image coloring method based on double-scale circulation generation countermeasure

Technical Field

The invention relates to the technical field of image processing, in particular to a face gray image coloring method based on double-scale circulation generation countermeasure.

Background

In the field of gray scale image rendering, early efforts relied primarily on manual pixel-by-pixel rendering of images, which was not only inefficient. And a great deal of manpower and material resources are consumed. Later, due to the advent and popularity of computers, people began to use computers to process images, bringing great convenience for solving the problem of gray scale image coloring.

Image rendering using a computer can be largely divided into three categories, depending on the source of the image color. A local color expansion-based, reference image-based, and deep learning-based coloring method. The time of the former two is early, user interaction is usually needed, the manual operation amount is large, the time of the latter occurrence is late, the user can realize full-automatic end-to-end coloring by training a network model, but the effect is not stable enough, and the problems of color boundary overflow, detail loss and coloring boring easily occur.

Face images are one of many images, and have a clearer region to be colored. Meanwhile, due to the limitation of the early photographic technology, a large part of black-white old photos exist today, and the old photos of the human face can be reproduced to a great extent after being colored.

Disclosure of Invention

The invention aims to provide a face gray image coloring method based on double-scale circulation generation countermeasure, which realizes full-automatic coloring of an input face gray image and relieves the problems of overflow of color boundaries, detail loss and boring coloring.

In order to achieve the purpose, the gray level image is input into the generator as a condition, shallow layer, deep layer and obvious characteristic information of the image are extracted by utilizing a double-scale convolution and attention mechanism, the consistency problem of maintaining the image space of an antagonism network is circularly generated, and finally a color image with excellent coloring effect is generated.

The method comprises the following specific steps:

step 1, data collection and pretreatment: acquiring a large number of face color images, and unifying the sizes of the images; dividing the data set into a training set and a verification set; carrying out data enhancement on training set data by adding random overturning operation; the image was converted to CIE Lab color space using cv library functions and the L channels were extracted as inputs to the model.

Step 2, constructing a human face gray image coloring model: the model adopts a cyclic generation network structure, and comprises two pairs of generator-discriminator structures; the improved U-Net is used as a generator, a double-scale convolution module is adopted to extract the characteristics, the adaptability of the model to different scale information is improved, and the multi-dimensional characteristic information is extracted; in jump connection, the information with attention weight is extracted by the CBAM attention module, and is fused with the up-sampling stage, focusing on the salient region of the image to be colored, and suppressing unnecessary regions. On the discriminator, patchGAN is adopted, a feature map is finally output in a full convolution mode, true and false probability values of multiple areas of an input image are represented, and coloring effects of more areas are considered.

Step 3, training a human face gray image coloring model: and (3) taking the L-channel gray level image extracted in the step (1) as the input of a model, and taking the rest ab channels as labels of the model. Combining the antagonism loss, the cyclic consistency loss, the identity authentication loss and the gray level loss, obtaining a final loss function through weighted calculation, carrying out optimization training on the model, and carrying out model training according to a strategy of a training-first discriminator and a training-later generator.

Step 4, coloring the gray level image of the human face: and inputting the gray-scale image of the face to be colored into the trained model, and outputting the colored face image.

Compared with the prior art, the invention has the following advantages:

first, the invention combines the cyclic generation network, and the model obtains better fitting result while maintaining the consistency of the gray level image and the coloring image. In the generator, a double-scale convolution module fused with convolution kernels of different sizes performs feature extraction on the feature map, global semantics and local features are adaptively fused, and compared with a common convolution kernel of 3×3 size, the performance of the model is further improved, the quality of a coloring image is improved, and a color image with more full color than the conventional method can be obtained.

Secondly, the invention merges attention mechanisms, and inserts the CBAM module with the channel attention and space attention serial structure in the generator jump connection, thereby effectively focusing on the salient region of the feature map and relieving the problems of color boundary overflow and detail loss commonly existing in the prior method.

Thirdly, the model is specially designed for coloring the human face gray image, can obtain better effect on coloring some old photos, and provides a certain practical significance on color dimension for repairing the old photos.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a loop generation network architecture of the present invention;

FIG. 3 is a diagram of a network architecture of a generator of the present invention;

FIG. 4 is a block diagram of a dual-scale convolution module of the present disclosure;

FIG. 5 is a block diagram of a CBAM attention module of the present invention;

fig. 6 is a diagram of a network architecture of a arbiter in accordance with the present invention.

Detailed Description

The following describes the specific implementation steps of the face gray image coloring method according to the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, a face gray image coloring method based on the two-scale circulation generation countermeasure specifically includes the following steps:

step 1, data collection and pretreatment:

first, 30000 images are randomly selected from a high-definition face data set CelebA-HQ.

Second, the resolution of all images is unified to be 256×256.

Third, after the data sets are divided according to the proportion of 90% and 10%, the number of training set images is 27000, and the number of verification set images is 3000.

And fourthly, converting the image into CIE Lab color space through a cv library function, extracting an L channel as model input, and taking an ab channel as a label value.

Fifthly, before the training set is read, the image is randomly turned over.

Step 2, constructing a human face gray image coloring model:

as shown in FIG. 2, the human face gray image coloring model generates a network structure for circulation, and comprises four sub-networks, wherein the G network is a generator for converting an image A into images B and D _B Is a discriminator responsible for discriminating the true and false probabilities of images generated through the G network; the F network is also a generator responsible for converting image B into image A, D _A Is a discriminator responsible for discriminating the true and false probabilities of images generated through the F network;

as shown in fig. 3, the generator uses U-Net as a basic structure, the left side of the U-Net is an encoder part, the resolution of the feature map is gradually reduced and the number of channels is gradually increased by extracting image features through downsampling; the right decoder portion restores the resolution of the image layer by layer.

Information sharing is performed between the encoder and the decoder through a jump connection; when the encoder is used for downsampling, the features of the image are extracted layer by layer, and due to the existence of jump connection, the downsampling stage can be fused with the lower-layer features, so that the sharing of the features is realized, and the information loss caused by downsampling is reduced.

The number of convolution kernels for the generator downsampling stages is 16, 32, 64, 128 and 256, respectively, i.e. the number of channels of the image after passing through the convolution module changes from 1 to 16, 32, 64, 128 and 256. After the double-scale convolution module carries out convolution twice, the number of channels of the image is increased, and then the resolution of the image is reduced to be half of the original resolution through a pooling layer.

The up-sampling stage uses a transpose convolution method to achieve restoration of the image size, and the number of image channels is restored from 256 to 2 layer by layer.

As shown in fig. 4, the double-scale convolution module consists of two convolution kernels of different sizes, namely 3×3 and 7×7. In the process of sampling the network model, after convolution operations of different sizes are performed on the input feature images in parallel, the final results are fused in a linkage mode. The effective dimensionality reduction is then verified by a convolution of 1 x 1 size.

The 3 x 3 convolution block consists of a 3 x 3 convolution layer, batch Normalization, reLU activation function, convolution kernel step size set to 1, pixel fill set to 1.

The 7 x 7 convolution block consists of 7 x 7 convolution layers, batch Normalization, reLU activation function, convolution kernel step size set to 1, pixel fill set to 3.

The 1 x 1 convolution kernel step size is set to 1 and the pixel fill is set to 0.

The double-scale convolution module comprises two continuous 3×3 convolution blocks and two 7×7 convolution blocks, and the convolution operation is in a parallel structure.

Based on the above-described structure, in the first layer on the left side of the generator, a 256×256×1 input image is converted into 256×256×16 by two 3×3 convolution blocks, and is converted into 256×256×16 by two 7×7 convolution blocks. Then, the number of channels is enlarged to 32 by the stitching operation, that is, the image size is 256×256×32. The dimension of the image is reduced after 1×1 convolution, but the width and the height are unchanged, and the dimension is 256×256×16. The effect of the subsequent pooling layer is to reduce the image size to half of the original, i.e. 128 x 16.

The double-scale convolution module extracts the richer characteristic information of the image, including global characteristics and local characteristics, brings cross-channel characteristics to be interactively fused, increases the nonlinearity of the model, is beneficial to realizing a more complex mapping relation, and brings more remarkable lifting effect for gray image coloring.

As shown in fig. 5, the CBAM module (convolution attention module) is a serial structure including a channel attention module and a spatial attention module.

Each channel in the channel attention module will participate in feature detection, focusing on "what" of the input image is meaningful. The channel attention carries out pooling operation on the feature images through maximum pooling Maxpool and average pooling Avgpool respectively, and then the feature images are respectively input into the same shared multi-layer perceptron to be the mostThe vectors are then combined by vector-wise summation to obtain the final channel attention map, the calculation formula for the whole flow is as follows. Wherein sigma is a sigmoid function and F is an input feature map

The weight of the MLP multi-layer perceptron is given, and r is the compression ratio. />

The spatial attention module, focusing on "where" of the input image is meaningful, i.e. which areas of the image should be given focus. The spatial attention is subjected to pooling operation on the feature images by maximum pooling Maxpool and average pooling Avgpool respectively, then the feature images are spliced together in the channel dimension, and finally the spatial attention channel image is obtained after convolution operation of convolution kernels with the size of 7 multiplied by 7, and the calculation formula of the whole flow is shown as follows. Where σ is a sigmoid function.

The CBAM attention module (convolution attention module) is placed in the jump connection, the shared low-level features contain features with certain attention weights, the coloring model can pay more attention to the significant areas, and unnecessary area color information is less to learn, so that the coloring effect of the model is improved to a certain extent.

As shown in FIG. 6, the arbiter uses PatchGAN to discriminate the image is true or false. The structure is in full convolution form, using a total of 5 convolution layers. The convolution kernel of the first three convolution layers is 4, the step length is 2, the pixel filling is 1, the image to be distinguished is downsampled, the number of channels is doubled after each convolution, and the image size is reduced to be half of the original size. The convolution kernel size and pixel fill of the last two convolution layers are unchanged, and the step size is set to 1.

Given an input image size of 256×256×3, a matrix of 30×30 is finally output after passing through the discrimination network. Each value in the matrix corresponds to a true or false probability value for a 70 x 70 size region of the input image.

The PatchGAN considers the true and false of a plurality of areas of the input image, so that finer discrimination can be realized.

And 3, training a human face gray image coloring model.

Setting the total epoch size to 200, setting the batch size to 1, and initializing the learning rate to 0.00002 by adopting a dynamic learning rate mode.

The loss function of the face gray image coloring model is as follows:

loss of resistance

Cycle consistency loss

Identity authentication loss->

Gray level loss->

Wherein x is a Gray image, y is a corresponding color image, G and F are generators, D is a discriminator, and Gray is a graying calculation function: gray (r, g, b) =0.299r+0.587g+0.114 b.

Final total loss function L _mix ＝L _GAN +λ ₁ ·L _c6nsistency +λ ₂ ·L _id；ntify +λ _T ·L _gray

The implementation isIn example lambda ₁ ＝10，λ ₂ ＝5，λ ₃ ＝10。

And 4, coloring the gray level image of the human face.

Inputting the gray level image to be colored into a trained human face gray level image coloring model to obtain a color image;

obtaining a coloring result of the human face gray image coloring model through an actual test: the whole coloring effect of the image is good, the face area is endowed with reasonable and full color, the five sense organs are clear, and the problems of color boundary overflow, detail loss and boring coloring are greatly relieved.

While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A face gray image coloring method based on double-scale circulation generation countermeasure is characterized in that: the method specifically comprises the following steps:

step 1, data collection and pretreatment: acquiring a large number of face color images, and unifying the sizes of the images; dividing the data set into a training set and a verification set; carrying out data enhancement on the training set data; converting the image into CIE Lab color space by using cv library function, and extracting L channel as input of model;

step 2, constructing a human face gray image coloring model: the model adopts a structure of a cyclic generation network and comprises two pairs of generator-discriminators; the improved U-Net is used as a generator, a double-scale convolution module is adopted to extract the characteristics, the adaptability of the model to different scale information is improved, and the multi-dimensional characteristic information is extracted; in jump connection, extracting information with attention weight through a CBAM attention module, fusing the information with the attention weight with an up-sampling stage, focusing on a salient region of an image to be colored, and inhibiting an unnecessary region; on the discriminator, patchGAN is adopted, a feature map is finally output in a full convolution mode, true and false probability values of multiple areas of an input image are represented, and coloring effects of more areas are considered;

step 3, training a human face gray image coloring model: taking the L-channel gray level image extracted in the step 1 as the input of a model, and taking the rest ab channels as labels of the model; combining the antagonism loss, the cyclic consistency loss, the identity authentication loss and the gray level loss, obtaining a final loss function through weighted calculation, carrying out optimization training on the model, and carrying out model training according to a strategy of a training-first discriminator and a training-later generator;

2. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: the loop generation network comprises two pairs of generator-discriminators, namely four sub-networks, the G network being the generator responsible for converting image A into images B, D _B Is a discriminator responsible for discriminating the true and false probabilities of images generated through the G network; the F network is also a generator responsible for converting image B into image A, D _A Is a discriminator responsible for discriminating the true and false probabilities of images generated through the F network.

3. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: the double-scale convolution module adopts a form of fusing convolution kernels with the sizes of 3 multiplied by 3 and 7 multiplied by 7; after the input feature graphs are subjected to convolution operations of two sizes, the input feature graphs are fused in the channel dimension, and then the dimension reduction is carried out by using a convolution kernel of 1 multiplied by 1, so that the efficiency reduction caused by additional model parameters brought by a large convolution kernel is reduced.

4. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: in the jump connection, a CBAM attention module combining the attention of the fusion channel and the attention of the space is parallel, and the attention of 'what' and 'where' of the feature diagram are meaningful, so that useful information of a downsampling stage is shared with that of an upsampling stage, information loss caused by sampling is reduced, unnecessary information is restrained, and coloring effect is improved.