CN116188652A - Face gray image coloring method based on double-scale circulation generation countermeasure - Google Patents

Face gray image coloring method based on double-scale circulation generation countermeasure Download PDF

Info

Publication number
CN116188652A
CN116188652A CN202211412711.0A CN202211412711A CN116188652A CN 116188652 A CN116188652 A CN 116188652A CN 202211412711 A CN202211412711 A CN 202211412711A CN 116188652 A CN116188652 A CN 116188652A
Authority
CN
China
Prior art keywords
image
coloring
model
scale
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211412711.0A
Other languages
Chinese (zh)
Inventor
王奔
陈亮锜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Normal University
Original Assignee
Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Normal University filed Critical Hangzhou Normal University
Priority to CN202211412711.0A priority Critical patent/CN116188652A/en
Publication of CN116188652A publication Critical patent/CN116188652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Color Image Communication Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face gray image coloring method based on double-scale circulation generation countermeasure, which comprises the following implementation steps: data collection and preprocessing, model construction, model training and image coloring. The invention utilizes a cyclic generation network, adopts double-scale convolution, merges CBAM attention modes in jump connection to construct a human face gray image coloring model, inputs gray images into a generator, focuses on important information of a region to be colored, and inhibits mapping learning of unnecessary regions. PatchGAN is used on the arbiter to realize finer discrimination. The method realizes high-efficiency end-to-end automatic coloring, well alleviates the problems of edge color overflow, detail loss and boring coloring commonly existing in the prior art, and finally generates a color image with excellent coloring effect.

Description

Face gray image coloring method based on double-scale circulation generation countermeasure
Technical Field
The invention relates to the technical field of image processing, in particular to a face gray image coloring method based on double-scale circulation generation countermeasure.
Background
In the field of gray scale image rendering, early efforts relied primarily on manual pixel-by-pixel rendering of images, which was not only inefficient. And a great deal of manpower and material resources are consumed. Later, due to the advent and popularity of computers, people began to use computers to process images, bringing great convenience for solving the problem of gray scale image coloring.
Image rendering using a computer can be largely divided into three categories, depending on the source of the image color. A local color expansion-based, reference image-based, and deep learning-based coloring method. The time of the former two is early, user interaction is usually needed, the manual operation amount is large, the time of the latter occurrence is late, the user can realize full-automatic end-to-end coloring by training a network model, but the effect is not stable enough, and the problems of color boundary overflow, detail loss and coloring boring easily occur.
Face images are one of many images, and have a clearer region to be colored. Meanwhile, due to the limitation of the early photographic technology, a large part of black-white old photos exist today, and the old photos of the human face can be reproduced to a great extent after being colored.
Disclosure of Invention
The invention aims to provide a face gray image coloring method based on double-scale circulation generation countermeasure, which realizes full-automatic coloring of an input face gray image and relieves the problems of overflow of color boundaries, detail loss and boring coloring.
In order to achieve the purpose, the gray level image is input into the generator as a condition, shallow layer, deep layer and obvious characteristic information of the image are extracted by utilizing a double-scale convolution and attention mechanism, the consistency problem of maintaining the image space of an antagonism network is circularly generated, and finally a color image with excellent coloring effect is generated.
The method comprises the following specific steps:
step 1, data collection and pretreatment: acquiring a large number of face color images, and unifying the sizes of the images; dividing the data set into a training set and a verification set; carrying out data enhancement on training set data by adding random overturning operation; the image was converted to CIE Lab color space using cv library functions and the L channels were extracted as inputs to the model.
Step 2, constructing a human face gray image coloring model: the model adopts a cyclic generation network structure, and comprises two pairs of generator-discriminator structures; the improved U-Net is used as a generator, a double-scale convolution module is adopted to extract the characteristics, the adaptability of the model to different scale information is improved, and the multi-dimensional characteristic information is extracted; in jump connection, the information with attention weight is extracted by the CBAM attention module, and is fused with the up-sampling stage, focusing on the salient region of the image to be colored, and suppressing unnecessary regions. On the discriminator, patchGAN is adopted, a feature map is finally output in a full convolution mode, true and false probability values of multiple areas of an input image are represented, and coloring effects of more areas are considered.
Step 3, training a human face gray image coloring model: and (3) taking the L-channel gray level image extracted in the step (1) as the input of a model, and taking the rest ab channels as labels of the model. Combining the antagonism loss, the cyclic consistency loss, the identity authentication loss and the gray level loss, obtaining a final loss function through weighted calculation, carrying out optimization training on the model, and carrying out model training according to a strategy of a training-first discriminator and a training-later generator.
Step 4, coloring the gray level image of the human face: and inputting the gray-scale image of the face to be colored into the trained model, and outputting the colored face image.
Compared with the prior art, the invention has the following advantages:
first, the invention combines the cyclic generation network, and the model obtains better fitting result while maintaining the consistency of the gray level image and the coloring image. In the generator, a double-scale convolution module fused with convolution kernels of different sizes performs feature extraction on the feature map, global semantics and local features are adaptively fused, and compared with a common convolution kernel of 3×3 size, the performance of the model is further improved, the quality of a coloring image is improved, and a color image with more full color than the conventional method can be obtained.
Secondly, the invention merges attention mechanisms, and inserts the CBAM module with the channel attention and space attention serial structure in the generator jump connection, thereby effectively focusing on the salient region of the feature map and relieving the problems of color boundary overflow and detail loss commonly existing in the prior method.
Thirdly, the model is specially designed for coloring the human face gray image, can obtain better effect on coloring some old photos, and provides a certain practical significance on color dimension for repairing the old photos.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a loop generation network architecture of the present invention;
FIG. 3 is a diagram of a network architecture of a generator of the present invention;
FIG. 4 is a block diagram of a dual-scale convolution module of the present disclosure;
FIG. 5 is a block diagram of a CBAM attention module of the present invention;
fig. 6 is a diagram of a network architecture of a arbiter in accordance with the present invention.
Detailed Description
The following describes the specific implementation steps of the face gray image coloring method according to the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, a face gray image coloring method based on the two-scale circulation generation countermeasure specifically includes the following steps:
step 1, data collection and pretreatment:
first, 30000 images are randomly selected from a high-definition face data set CelebA-HQ.
Second, the resolution of all images is unified to be 256×256.
Third, after the data sets are divided according to the proportion of 90% and 10%, the number of training set images is 27000, and the number of verification set images is 3000.
And fourthly, converting the image into CIE Lab color space through a cv library function, extracting an L channel as model input, and taking an ab channel as a label value.
Fifthly, before the training set is read, the image is randomly turned over.
Step 2, constructing a human face gray image coloring model:
as shown in FIG. 2, the human face gray image coloring model generates a network structure for circulation, and comprises four sub-networks, wherein the G network is a generator for converting an image A into images B and D B Is a discriminator responsible for discriminating the true and false probabilities of images generated through the G network; the F network is also a generator responsible for converting image B into image A, D A Is a discriminator responsible for discriminating the true and false probabilities of images generated through the F network;
as shown in fig. 3, the generator uses U-Net as a basic structure, the left side of the U-Net is an encoder part, the resolution of the feature map is gradually reduced and the number of channels is gradually increased by extracting image features through downsampling; the right decoder portion restores the resolution of the image layer by layer.
Information sharing is performed between the encoder and the decoder through a jump connection; when the encoder is used for downsampling, the features of the image are extracted layer by layer, and due to the existence of jump connection, the downsampling stage can be fused with the lower-layer features, so that the sharing of the features is realized, and the information loss caused by downsampling is reduced.
The number of convolution kernels for the generator downsampling stages is 16, 32, 64, 128 and 256, respectively, i.e. the number of channels of the image after passing through the convolution module changes from 1 to 16, 32, 64, 128 and 256. After the double-scale convolution module carries out convolution twice, the number of channels of the image is increased, and then the resolution of the image is reduced to be half of the original resolution through a pooling layer.
The up-sampling stage uses a transpose convolution method to achieve restoration of the image size, and the number of image channels is restored from 256 to 2 layer by layer.
As shown in fig. 4, the double-scale convolution module consists of two convolution kernels of different sizes, namely 3×3 and 7×7. In the process of sampling the network model, after convolution operations of different sizes are performed on the input feature images in parallel, the final results are fused in a linkage mode. The effective dimensionality reduction is then verified by a convolution of 1 x 1 size.
The 3 x 3 convolution block consists of a 3 x 3 convolution layer, batch Normalization, reLU activation function, convolution kernel step size set to 1, pixel fill set to 1.
The 7 x 7 convolution block consists of 7 x 7 convolution layers, batch Normalization, reLU activation function, convolution kernel step size set to 1, pixel fill set to 3.
The 1 x 1 convolution kernel step size is set to 1 and the pixel fill is set to 0.
The double-scale convolution module comprises two continuous 3×3 convolution blocks and two 7×7 convolution blocks, and the convolution operation is in a parallel structure.
Based on the above-described structure, in the first layer on the left side of the generator, a 256×256×1 input image is converted into 256×256×16 by two 3×3 convolution blocks, and is converted into 256×256×16 by two 7×7 convolution blocks. Then, the number of channels is enlarged to 32 by the stitching operation, that is, the image size is 256×256×32. The dimension of the image is reduced after 1×1 convolution, but the width and the height are unchanged, and the dimension is 256×256×16. The effect of the subsequent pooling layer is to reduce the image size to half of the original, i.e. 128 x 16.
The double-scale convolution module extracts the richer characteristic information of the image, including global characteristics and local characteristics, brings cross-channel characteristics to be interactively fused, increases the nonlinearity of the model, is beneficial to realizing a more complex mapping relation, and brings more remarkable lifting effect for gray image coloring.
As shown in fig. 5, the CBAM module (convolution attention module) is a serial structure including a channel attention module and a spatial attention module.
Each channel in the channel attention module will participate in feature detection, focusing on "what" of the input image is meaningful. The channel attention carries out pooling operation on the feature images through maximum pooling Maxpool and average pooling Avgpool respectively, and then the feature images are respectively input into the same shared multi-layer perceptron to be the mostThe vectors are then combined by vector-wise summation to obtain the final channel attention map, the calculation formula for the whole flow is as follows. Wherein sigma is a sigmoid function and F is an input feature map
Figure SMS_1
Figure SMS_2
The weight of the MLP multi-layer perceptron is given, and r is the compression ratio. />
Figure SMS_3
The spatial attention module, focusing on "where" of the input image is meaningful, i.e. which areas of the image should be given focus. The spatial attention is subjected to pooling operation on the feature images by maximum pooling Maxpool and average pooling Avgpool respectively, then the feature images are spliced together in the channel dimension, and finally the spatial attention channel image is obtained after convolution operation of convolution kernels with the size of 7 multiplied by 7, and the calculation formula of the whole flow is shown as follows. Where σ is a sigmoid function.
Figure SMS_4
The CBAM attention module (convolution attention module) is placed in the jump connection, the shared low-level features contain features with certain attention weights, the coloring model can pay more attention to the significant areas, and unnecessary area color information is less to learn, so that the coloring effect of the model is improved to a certain extent.
As shown in FIG. 6, the arbiter uses PatchGAN to discriminate the image is true or false. The structure is in full convolution form, using a total of 5 convolution layers. The convolution kernel of the first three convolution layers is 4, the step length is 2, the pixel filling is 1, the image to be distinguished is downsampled, the number of channels is doubled after each convolution, and the image size is reduced to be half of the original size. The convolution kernel size and pixel fill of the last two convolution layers are unchanged, and the step size is set to 1.
Given an input image size of 256×256×3, a matrix of 30×30 is finally output after passing through the discrimination network. Each value in the matrix corresponds to a true or false probability value for a 70 x 70 size region of the input image.
The PatchGAN considers the true and false of a plurality of areas of the input image, so that finer discrimination can be realized.
And 3, training a human face gray image coloring model.
Setting the total epoch size to 200, setting the batch size to 1, and initializing the learning rate to 0.00002 by adopting a dynamic learning rate mode.
The loss function of the face gray image coloring model is as follows:
loss of resistance
Figure SMS_5
Cycle consistency loss
Figure SMS_6
Figure SMS_7
Identity authentication loss->
Figure SMS_8
Figure SMS_9
Gray level loss->
Figure SMS_10
Figure SMS_11
Wherein x is a Gray image, y is a corresponding color image, G and F are generators, D is a discriminator, and Gray is a graying calculation function: gray (r, g, b) =0.299r+0.587g+0.114 b.
Final total loss function L mix =L GAN1 ·L c6nsistency2 ·L id;ntifyT ·L gray
The implementation isIn example lambda 1 =10,λ 2 =5,λ 3 =10。
And 4, coloring the gray level image of the human face.
Inputting the gray level image to be colored into a trained human face gray level image coloring model to obtain a color image;
obtaining a coloring result of the human face gray image coloring model through an actual test: the whole coloring effect of the image is good, the face area is endowed with reasonable and full color, the five sense organs are clear, and the problems of color boundary overflow, detail loss and boring coloring are greatly relieved.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (4)

1. A face gray image coloring method based on double-scale circulation generation countermeasure is characterized in that: the method specifically comprises the following steps:
step 1, data collection and pretreatment: acquiring a large number of face color images, and unifying the sizes of the images; dividing the data set into a training set and a verification set; carrying out data enhancement on the training set data; converting the image into CIE Lab color space by using cv library function, and extracting L channel as input of model;
step 2, constructing a human face gray image coloring model: the model adopts a structure of a cyclic generation network and comprises two pairs of generator-discriminators; the improved U-Net is used as a generator, a double-scale convolution module is adopted to extract the characteristics, the adaptability of the model to different scale information is improved, and the multi-dimensional characteristic information is extracted; in jump connection, extracting information with attention weight through a CBAM attention module, fusing the information with the attention weight with an up-sampling stage, focusing on a salient region of an image to be colored, and inhibiting an unnecessary region; on the discriminator, patchGAN is adopted, a feature map is finally output in a full convolution mode, true and false probability values of multiple areas of an input image are represented, and coloring effects of more areas are considered;
step 3, training a human face gray image coloring model: taking the L-channel gray level image extracted in the step 1 as the input of a model, and taking the rest ab channels as labels of the model; combining the antagonism loss, the cyclic consistency loss, the identity authentication loss and the gray level loss, obtaining a final loss function through weighted calculation, carrying out optimization training on the model, and carrying out model training according to a strategy of a training-first discriminator and a training-later generator;
step 4, coloring the gray level image of the human face: and inputting the gray-scale image of the face to be colored into the trained model, and outputting the colored face image.
2. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: the loop generation network comprises two pairs of generator-discriminators, namely four sub-networks, the G network being the generator responsible for converting image A into images B, D B Is a discriminator responsible for discriminating the true and false probabilities of images generated through the G network; the F network is also a generator responsible for converting image B into image A, D A Is a discriminator responsible for discriminating the true and false probabilities of images generated through the F network.
3. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: the double-scale convolution module adopts a form of fusing convolution kernels with the sizes of 3 multiplied by 3 and 7 multiplied by 7; after the input feature graphs are subjected to convolution operations of two sizes, the input feature graphs are fused in the channel dimension, and then the dimension reduction is carried out by using a convolution kernel of 1 multiplied by 1, so that the efficiency reduction caused by additional model parameters brought by a large convolution kernel is reduced.
4. The method for coloring a face gray image based on a two-scale cyclic generation countermeasure of claim 1, wherein: in the jump connection, a CBAM attention module combining the attention of the fusion channel and the attention of the space is parallel, and the attention of 'what' and 'where' of the feature diagram are meaningful, so that useful information of a downsampling stage is shared with that of an upsampling stage, information loss caused by sampling is reduced, unnecessary information is restrained, and coloring effect is improved.
CN202211412711.0A 2022-11-11 2022-11-11 Face gray image coloring method based on double-scale circulation generation countermeasure Pending CN116188652A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211412711.0A CN116188652A (en) 2022-11-11 2022-11-11 Face gray image coloring method based on double-scale circulation generation countermeasure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211412711.0A CN116188652A (en) 2022-11-11 2022-11-11 Face gray image coloring method based on double-scale circulation generation countermeasure

Publications (1)

Publication Number Publication Date
CN116188652A true CN116188652A (en) 2023-05-30

Family

ID=86431425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211412711.0A Pending CN116188652A (en) 2022-11-11 2022-11-11 Face gray image coloring method based on double-scale circulation generation countermeasure

Country Status (1)

Country Link
CN (1) CN116188652A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036875A (en) * 2023-07-11 2023-11-10 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036875A (en) * 2023-07-11 2023-11-10 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN
CN117036875B (en) * 2023-07-11 2024-04-26 南京航空航天大学 Infrared weak and small moving target generation algorithm based on fusion attention GAN

Similar Documents

Publication Publication Date Title
CN110555821B (en) Model training method, device and storage medium
CN110648334A (en) Multi-feature cyclic convolution saliency target detection method based on attention mechanism
US11354797B2 (en) Method, device, and system for testing an image
CN111028146A (en) Image super-resolution method for generating countermeasure network based on double discriminators
CN113962893A (en) Face image restoration method based on multi-scale local self-attention generation countermeasure network
CN111292265A (en) Image restoration method based on generating type antagonistic neural network
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN115546032B (en) Single-frame image super-resolution method based on feature fusion and attention mechanism
CN113362242B (en) Image restoration method based on multi-feature fusion network
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN113222818A (en) Method for reconstructing super-resolution image by using lightweight multi-channel aggregation network
CN113468531A (en) Malicious code classification method based on deep residual error network and mixed attention mechanism
CN116485934A (en) Infrared image colorization method based on CNN and ViT
CN116188652A (en) Face gray image coloring method based on double-scale circulation generation countermeasure
CN116740121A (en) Straw image segmentation method based on special neural network and image preprocessing
CN116403063A (en) No-reference screen content image quality assessment method based on multi-region feature fusion
CN113627487B (en) Super-resolution reconstruction method based on deep attention mechanism
CN113537246A (en) Gray level image simultaneous coloring and hyper-parting method based on counterstudy
CN117315241A (en) Scene image semantic segmentation method based on transformer structure
CN116681621A (en) Face image restoration method based on feature fusion and multiplexing
CN116823647A (en) Image complement method based on fast Fourier transform and selective attention mechanism
CN116934613A (en) Branch convolution channel attention module for character repair
CN115797181A (en) Image super-resolution reconstruction method for mine fuzzy environment
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination