CN115546338A

CN115546338A - Image Colorization Method Based on Transformer and Generative Adversarial Network

Info

Publication number: CN115546338A
Application number: CN202211247125.5A
Authority: CN
Inventors: 薛涛; 马鹏森
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2022-12-30

Abstract

The invention discloses a method for coloring images based on a Transformer and a generation countermeasure network, which solves the problem of coloring images by using the generation countermeasure network GAN and the Transformer instead of simply using a convolutional neural network CNN. The local enhanced forward propagation network and the hopping connection ensure that shallow features can be efficiently transmitted and utilized in the network so that the transform-GAN can efficiently capture the correlation between global and local information. An optimal training process is also explored through data enhancement and objective function selection, and a color image generator and a discriminator are formed to enable the Transformer-GAN to perform well in the aspect of image colorization. The best visual effect is achieved.

Description

Image Colorization Method Based on Transformer and Generative Adversarial Network

技术领域technical field

本发明属于图像处理技术领域，涉及一种基于Transformer与生成对抗网络的图像着色方法。The invention belongs to the technical field of image processing, and relates to an image coloring method based on Transformer and generation confrontation network.

背景技术Background technique

在图像着色任务中，我们的目标是从输入的灰度图像生成彩色图像。按照类别，从早期传统的基于CNN结构的无跳跃连接算法，到后来出现了由使用者指定图像颜色的网络(这些网络需要用户在特定的图层中输入颜色值)。以及使用生成对抗网络(GAN)的动画图像彩色化的端到端前馈架构，另外还有用于特定领域的红外彩色化、雷达图像彩色化等，以及后来出现的多模态着色模型(基于文本的着色网络)。多样化的彩色化网络通过生成不同的彩色图像来补偿多样性的缺乏，网络架构包含多路径网络，该网络在不同的网络路径或级别中学习不同的特征，并且用户给出参考图像作为着色网络的输入样本。以上所有模型有一个共同点都是它们都是基于卷积神经网络CNN的网络，然而与先前的工作不同，我使用transformer以及生成对抗网络GAN构建图像彩色化网络，据我所知，这是第一个使用Transformer为主要网络进行图像彩色化的研究。In the image colorization task, our goal is to generate a color image from an input grayscale image. According to the category, from the early traditional non-skip connection algorithm based on the CNN structure, to the network in which the image color is specified by the user (these networks require the user to input the color value in a specific layer). and an end-to-end feed-forward architecture for animated image colorization using Generative Adversarial Networks (GAN), as well as domain-specific infrared colorization, radar image colorization, etc., and later multimodal coloring models (text-based shading network). The diverse colorization network compensates for the lack of diversity by generating different color images, the network architecture contains a multi-path network that learns different features in different network paths or levels, and the user gives a reference image as the colorization network input sample. All the above models have one thing in common that they are all based on the convolutional neural network CNN. However, unlike the previous work, I use the transformer and the generation confrontation network GAN to build the image colorization network. As far as I know, this is the first A study of image colorization using Transformer for major networks.

发明内容Contents of the invention

本发明的目的是提供一种基于Transformer与生成对抗网络的图像着色方法，解决了现在图像着色网络着色效果差，着色多样性差的问题。The purpose of the present invention is to provide an image coloring method based on Transformer and generating confrontation network, which solves the problems of poor coloring effect and poor coloring diversity of current image coloring network.

本发明所采用的技术方案是，基于Transformer与生成对抗网络的图像着色方法，该方法按照以下步骤实施：The technical solution adopted in the present invention is, based on Transformer and generating an image coloring method against a network, the method is implemented according to the following steps:

步骤1、构建基于生成对抗网络的图像着色模型，所述图像着色模型包括彩色图像生成器和鉴别器；所述彩色图像生成器同于生成彩色图像，所述鉴别器用于判断输入的图像是真实彩色图像或伪彩色图像；Step 1, constructing an image coloring model based on generating an adversarial network, the image coloring model includes a color image generator and a discriminator; the color image generator is the same as generating a color image, and the discriminator is used to judge whether the input image is real Color images or pseudo-color images;

步骤2、将灰色图像输入所述图像着色模型的彩色图像生成器生成伪彩色图像；Step 2, the gray image is input into the color image generator of the image coloring model to generate a pseudo-color image;

步骤3、分别更新鉴别器以及彩色图像生成器的参数：Step 3. Update the parameters of the discriminator and the color image generator respectively:

步骤3.1：首先固定彩色图像生成器的参数，将所述伪彩色图像以及所述灰色图像对应的真实彩色图像依次交替输入鉴别器，然后根据损失函数计算所述灰色图像对应的真实彩色图像与标签值为1之间的损失，以及根据损失函数计算所述灰色图像生成的伪彩色图像与标签值为0之间的损失，最后利用反向传播算法，更新所述鉴别器的参数；其中标签值为1代表的是真实的图像，标签值为0代表的是生成的伪彩色图像；Step 3.1: First, fix the parameters of the color image generator, alternately input the false color image and the real color image corresponding to the gray image into the discriminator, and then calculate the real color image and label corresponding to the gray image according to the loss function The loss between the value 1, and the loss between the pseudo-color image generated by the gray image and the label value 0 according to the loss function calculation, and finally use the backpropagation algorithm to update the parameters of the discriminator; wherein the label value A value of 1 represents the real image, and a label value of 0 represents the generated pseudo-color image;

步骤3.2：固定鉴别器的参数，根据损失函数计算生成的伪彩色图像与标签值为1之间的损失，最后利用反向传播算法，更新所述彩色图像生成器的参数。Step 3.2: Fix the parameters of the discriminator, calculate the loss between the generated pseudo-color image and the label value of 1 according to the loss function, and finally update the parameters of the color image generator by using the backpropagation algorithm.

步骤3.3：不断循环步骤3.1和步骤3.2更新鉴别器和彩色图像生成器参数的过程，直至损失值收敛，彩色图像生成器生成效果不错的伪彩色图像，即获得了优化后图像着色模型；Step 3.3: Repeat step 3.1 and step 3.2 to update the discriminator and color image generator parameters until the loss value converges, and the color image generator generates a pseudo-color image with good effect, that is, the optimized image coloring model is obtained;

步骤4、利用优化后的图像着色模型就直接对灰色图像进行着色。Step 4, using the optimized image coloring model to directly color the gray image.

本发明的特点还在于，The present invention is also characterized in that,

步骤1中，所述彩色图像生成器中包含多个MWin-transformer模块，所述Mwin-transformer模块的功能是提取和重建图像的特征，输出3通道有效彩色图像：In step 1, multiple MWin-transformer modules are included in the color image generator, and the function of the Mwin-transformer module is to extract and reconstruct image features, and output 3 channel effective color images:

所述Mwin-transformer模块由三个核心部分组成：基于窗口的多头自注意力机制、层归一化操作LN和局部增强前向传播网络LeFF。The Mwin-transformer module consists of three core parts: a window-based multi-head self-attention mechanism, a layer normalization operation LN and a local enhanced forward propagation network LeFF.

彩色图像生成器生成伪彩色图像的流程如下所示：The flow of the color image generator to generate a pseudo-color image is as follows:

X′＝Embedded Tokens(X_in)X'＝Embedded Tokens(X _in )

X″＝W-MSA(LN(X))+X′X"=W-MSA(LN(X))+X'

X_out＝LeFF(LN(X″))+X″X _out ＝LeFF(LN(X″))+X″

其中，X_in表示输入，为灰色图像或伪彩色图像；Among them, X _in represents the input, which is a gray image or a pseudo-color image;

Embedding Tokens表示将X_in转换成向量；Embedding Tokens means converting X _in into a vector;

X′表示将X_in输入进Embedding Tokens得到的向量输出；X' represents the vector output obtained by inputting X _in into Embedding Tokens;

然后将向量X′进行层归一化后的结果LN(X′)输入进基于窗口的多头自注意力机制W-MSA得到提取了特征信息的向量，再与X′相加得到汇聚了更多特征信息的向量X″；X″表示将X′输入进基于窗口的多头自注意力机制以及层归一化操作得到的输出；Then the layer normalized result LN(X') of the vector X' is input into the window-based multi-head self-attention mechanism W-MSA to obtain a vector with extracted feature information, and then added to X' to obtain more The vector X" of feature information; X" represents the output obtained by inputting X' into the window-based multi-head self-attention mechanism and layer normalization operation;

继续将向量X″进行层归一化，将归一化后的LN(X″)输入进局部增强前向传播网络得到提取了更多局部特征信息的向量，再与X″相加得到汇聚了更多局部特征信息的向量X_out，X_out表示将X″输入进局部增强前向传播网络LeFF以及层归一化操作得到的输出。Continue to normalize the vector X″, and input the normalized LN(X″) into the local enhanced forward propagation network to obtain a vector that extracts more local feature information, and then add it to X″ to get a converged The vector X _out of more local feature information, X _out represents the output obtained by inputting X″ into the local enhanced forward propagation network LeFF and the layer normalization operation.

所述层归一化LN操作是为了解决内部协变量偏移问题，层归一化操作的计算过程为:The layer normalization LN operation is to solve the internal covariate offset problem, and the calculation process of the layer normalization operation is:

其中，LN层的作用对象是

X代表向量，μ以及δ分别代表每个样本的均值和方差，

和

为仿射学习参数，d_k是隐藏维度，

表示该数是一个k维的向量。Among them, the object of the LN layer is

X represents the vector, μ and δ represent the mean and variance of each sample, respectively,

and

is the affine learning parameter, d _k is the hidden dimension,

Indicates that the number is a k-dimensional vector.

所述基于窗口的多头自注意力机制如下：The window-based multi-head self-attention mechanism is as follows:

将伪彩色图像分成多个窗口，然后在这些不同的窗口中执行自注意力计算，由于一个窗口中的patch数远小于一幅图片中的所有小块数，并且窗口的数目保持不变，所以基于窗口的多头自注意力机制的计算复杂度与图像大小便由平方关系变成了成线性关系，大大降低了模型的计算复杂度。Divide the pseudo-color image into multiple windows, and then perform self-attention calculations in these different windows. Since the number of patches in a window is much smaller than the number of all small blocks in a picture, and the number of windows remains the same, so The computational complexity of the window-based multi-head self-attention mechanism and the image size have changed from a square relationship to a linear relationship, which greatly reduces the computational complexity of the model.

将卷积添加到Mwin-transformer模块中的前向传播网络，从而形成局部增强前向传播网络LeFF。Convolutions are added to the forward propagation network in the Mwin-transformer module, resulting in a locally enhanced forward propagation network LeFF.

损失函数为：The loss function is:

其中，in,

其中，G^*表示损失函数之和，

表示条件生成对抗网络损失，

表示Charbonnier损失，λ表示Charbonnier损失的权重系数；where G ^* represents the sum of loss functions,

Denotes the conditional generative adversarial network loss,

Represents Charbonnier loss, λ represents the weight coefficient of Charbonnier loss;

x表示输入的灰色图像；x represents the input gray image;

y表述输入的灰色图像对应的真实的彩色图像；y represents the real color image corresponding to the input gray image;

log表示以2为底的对数函数；log represents a logarithmic function with base 2;

表示自变量为x,y；

Indicates that the independent variable is x, y;

表示自变量为x；

Indicates that the independent variable is x;

ε表示一个值为10^-3的常系数；ε represents a constant coefficient with a value of 10 ^-3 ;

||||表示求绝对值。|||| means to find the absolute value.

本发明的有益效果是：本发明是一种基于Transformer与生成对抗网络用于图像着色的方法，可对灰色图像着色，在整个发明中，transformer有利于本发明捕获图像的全局特征，发明中包含的前向传播网络(LeFF)有利于本发明捕获图像的局部特征，生成对抗网络有利于更好的训练整个图像着色模型。该图像着色方法无论是在细节上还是整体上对灰色图像均有良好的着色效果，且着色方法适用于任何大小的灰色图像，既有较高的通用性。The beneficial effects of the present invention are: the present invention is a method for image coloring based on Transformer and generation confrontation network, which can color gray images. In the whole invention, Transformer is beneficial to the present invention to capture the global features of the image, and the invention includes The forward propagation network (LeFF) of the present invention is conducive to capturing the local features of the image, and the generative confrontation network is conducive to better training the entire image coloring model. The image coloring method has a good coloring effect on the gray image both in details and as a whole, and the coloring method is suitable for gray images of any size, which has high versatility.

附图说明Description of drawings

图1是本发明的图像着色模型的结构图；Fig. 1 is a structural diagram of the image coloring model of the present invention;

图2是彩色图像生成器G的结构图；Fig. 2 is the structural diagram of color image generator G;

图3(a)是本发明鉴别器D的结构图；Fig. 3 (a) is the structural diagram of discriminator D of the present invention;

图3(b)是本发明MWin-transformer的结构图；Fig. 3 (b) is the structural diagram of MWin-transformer of the present invention;

图3(c)是本发明局部增强前向传播网络LeFF的结构图。Fig. 3(c) is a structural diagram of the local enhanced forward propagation network LeFF of the present invention.

具体实施方式detailed description

本发明的基于Transformer与生成对抗网络的图像着色方法，该方法按照以下步骤实施：The image coloring method based on Transformer and generation confrontation network of the present invention, the method is implemented according to the following steps:

步骤1、构建基于生成对抗网络的图像着色模型，所述图像着色模型包括彩色图像生成器和鉴别器；所述彩色图像成器同于生成彩色图像，所述鉴别器用于判断输入的图像是真实彩色图像或伪彩色图像；Step 1, constructing an image coloring model based on generating a confrontation network, the image coloring model includes a color image generator and a discriminator; the color image generator is the same as generating a color image, and the discriminator is used to judge whether the input image is real Color images or pseudo-color images;

如图1所示，构建基于Transformer与生成对抗网络GAN的图像着色模型，G和D分别代表彩色图像生成器和鉴别器，具体来说，将灰色图像x∈R^3×H×W作为输入进彩色图像彩色图像生成器G，生成伪彩色图像G(x)，然后将伪彩色图像G(x)和真实的彩色图像y交替输入鉴别器。As shown in Figure 1, an image coloring model based on Transformer and GAN is constructed. G and D represent the color image generator and discriminator respectively. Specifically, the gray image x∈R ^3×H×W is used as input into Color image The color image generator G generates a pseudo-color image G(x), and then alternately inputs the pseudo-color image G(x) and the real color image y to the discriminator.

首先固定彩色图像生成器的参数，将所述伪彩色图像以及所述灰色图像对应的真实彩色图像依次交替输入鉴别器，然后根据损失函数计算所述灰色图像对应的真实彩色图像与标签值为1之间的损失，以及根据损失函数计算所述灰色图像生成的伪彩色图像与标签值为0之间的损失，最后利用反向传播算法，更新所述鉴别器的参数；其中标签值为1代表的是真实的图像，标签值为0代表的是生成的伪彩色图像。First, fix the parameters of the color image generator, alternately input the false color image and the real color image corresponding to the gray image into the discriminator, and then calculate the true color image corresponding to the gray image and the label value as 1 according to the loss function The loss between, and according to the loss function to calculate the loss between the pseudo-color image generated by the gray image and the label value 0, and finally use the backpropagation algorithm to update the parameters of the discriminator; where the label value 1 represents is the real image, and the label value of 0 represents the generated pseudo-color image.

固定鉴别器的参数，根据损失函数计算生成的伪彩色图像与标签值为1之间的损失，最后利用反向传播算法，更新所述彩色图像生成器的参数。不断循环步骤3.1和步骤3.2更新鉴别器和彩色图像生成器参数的过程，直至损失值收敛，彩色图像生成器生成效果不错的伪彩色图像，即获得了优化后图像着色模型。利用优化后的图像着色模型就直接对灰色图像进行着色。The parameters of the discriminator are fixed, and the loss between the generated pseudo-color image and the label value of 1 is calculated according to the loss function, and finally the parameters of the color image generator are updated using the back propagation algorithm. Continuously repeat steps 3.1 and 3.2 to update the discriminator and color image generator parameters until the loss value converges, and the color image generator generates a pseudo-color image with good effect, that is, the optimized image coloring model is obtained. The gray image is directly colored by using the optimized image coloring model.

该方法的设计主要包括两个要点，分别是彩色图像生成器和鉴别器的设计以及它们的组件的设计，下面逐一介绍它们的详细构造。The design of this method mainly includes two main points, namely the design of the color image generator and discriminator and the design of their components, and their detailed constructions are introduced one by one below.

①.彩色图像生成器G的设计：①. Design of color image generator G:

图像彩色化的输入和输出是一种映射关系，这些组件的深度转换过程应该是一种对称关系，基于这个想法，如图2所示，将整个彩色图像生成器设计为U形；总的来说，在编码器阶段，灰色图像x首先经过输入维度调整，这是一个卷积核3×3的卷积层，以及激活函数LeakyReLU用于调整输入维度和提取低级特征。然后，在设计的基于窗口transformer模块MWin-transformer之后，到达由步幅为2的卷积核4×4的卷积层组成的下采样层，这个步骤重复2次。然后，图像通过作为瓶颈级的MWin-transformer模块。解码器与编码器相互对应，它们的设计是完全对称一致的：首先，它通过窗口MWin-transformer模块并进行上采样，其中上采样操作是一个步长为2的卷积核为2×2的转置卷积。为了保持网络的对称性，这个步骤与编码器一样重复2次。最后，使用由3×3卷积组成的输出维度调整来调整输出维度，以确保输出是3通道有效彩色图像。The input and output of image colorization is a mapping relationship, and the depth conversion process of these components should be a symmetrical relationship. Based on this idea, as shown in Figure 2, the entire color image generator is designed as a U-shape; in general Say, in the encoder stage, the gray image x first undergoes input dimension adjustment, which is a convolutional layer with a convolution kernel of 3×3, and the activation function LeakyReLU is used to adjust the input dimension and extract low-level features. Then, after the designed window-based transformer module MWin-transformer, it reaches the downsampling layer consisting of a 4×4 convolutional layer with a stride of 2, and this step is repeated twice. Then, the image passes through the MWin-transformer module as the bottleneck level. The decoder and the encoder correspond to each other, and their design is completely symmetrical and consistent: first, it passes through the window MWin-transformer module and performs upsampling, where the upsampling operation is a convolution kernel with a stride of 2 and a 2×2 Transposed convolution. To maintain the symmetry of the network, this step is repeated 2 times as in the encoder. Finally, the output dimensionality is adjusted using an output dimensionality adjustment consisting of 3×3 convolutions to ensure that the output is a 3-channel effective color image.

②.MWin-transformer模块的设计②. MWin-transformer module design

在图3(b)中，构建的MWin-transformer模块，它由三个核心部分组成：W-MHSA机制、层归一化操作LN和局部增强前向传播网络LeFF网络；In Figure 3(b), the constructed MWin-transformer module consists of three core parts: W-MHSA mechanism, layer normalization operation LN and local enhanced forward propagation network LeFF network;

其中层归一化操作LN如下:The layer normalization operation LN is as follows:

LN层是图像着色模型快速训练和稳定收敛的重要保证，LN层的作用对象是

以及

分别代表每个样本的均值和方差，

和

为仿射学习参数，d_k是隐藏维度。MWin-transformer模块的计算流程如下所示：The LN layer is an important guarantee for the rapid training and stable convergence of the image coloring model. The role of the LN layer is

as well as

represent the mean and variance of each sample, respectively,

and

is the affine learning parameter, and d _k is the hidden dimension. The calculation process of the MWin-transformer module is as follows:

X′＝Embedded Tokens(X_in)X'＝Embedded Tokens(X _in )

X″＝W-MSA(LN(X))+X′X"=W-MSA(LN(X))+X'

X_out＝LeFF(LN(X″))+X^″ X _out =LeFF(LN(X″))+X ^″

嵌入令牌Embedding Tokens表示将X_in转换成向量；Embedding Tokens Embedding Tokens means converting X _in into a vector;

X′表示将X_in输入进嵌入令牌Embedding Tokens得到的向量输出；X' represents the vector output obtained by inputting X _in into Embedding Tokens;

继续将向量X″进行层归一化，将归一化后的LN(X″)输入进局部增强前向传播网络得到提取了更多局部特征信息的向量，再与X″相加得到汇聚了更多局部特征信息的向量X_out，X_out表示将X″输入进局部增强前向传播网络以及层归一化操作得到的输出。Continue to normalize the vector X″, and input the normalized LN(X″) into the local enhanced forward propagation network to obtain a vector that extracts more local feature information, and then add it to X″ to get a converged The vector X _out of more local feature information, X _out represents the output obtained by inputting X″ into the local enhancement forward propagation network and layer normalization operation.

③局部增强前向传播网络(LeFF)③Local Enhanced Forward Propagation Network (LeFF)

为了增强图像着色模型捕获局部特征的能力，我们将卷积被添加到前向传播网络，从而形成局部增强前向传播网络(LeFF)，具体设计如图3(c)所示：首先将输入序列，然后经过序列变成图像模块将序列变为图像，我们对图像使用卷积核为1*1的卷积，再经过激活函数激活，再经过卷积核为3*3的卷积和卷积核为1*1的卷积，再经过激活函数激活，最后将图像再变为序列便完成了局部增强前向传播网络。In order to enhance the ability of the image coloring model to capture local features, we add convolution to the forward propagation network to form a local enhanced forward propagation network (LeFF). The specific design is shown in Figure 3(c): first, the input sequence , and then through the sequence into an image module to turn the sequence into an image, we use a convolution with a convolution kernel of 1*1 for the image, then activate it through an activation function, and then pass through a convolution with a convolution kernel of 3*3 and convolution The kernel is a 1*1 convolution, and then activated by the activation function, and finally the image is converted into a sequence to complete the local enhanced forward propagation network.

④.鉴别器D的设计④.Design of discriminator D

鉴别器的本质是判断给定的数据是否“真实”，即判断是真实的训练数据还是彩色图像生成器G生成的假数据，如图3(a)所示，我们首先输入真实的彩色图像或者伪彩色图像，并通过线性展平将它们展开为小块(patch)，线性展平由卷积层组成的，然后再堆叠4个与G结构相同的MWin Transformer块，最后通过线性层输出真或者假从而实现判别的功能。The essence of the discriminator is to judge whether the given data is "real", that is, to judge whether it is the real training data or the fake data generated by the color image generator G. As shown in Figure 3(a), we first input the real color image or Pseudo-color images, and expand them into patches by linear flattening, linear flattening consists of convolutional layers, and then stack 4 MWin Transformer blocks with the same structure as G, and finally output true or False to achieve the function of discrimination.

实施例1Example 1

为了证明本发明图像着色模型的有效性，分别在动物脸图以及风景图上进行了实验，我们与其他当前较为流行的模型进行了比较：Yoo等人的研究(Yoo,S.,Bahng,H.,Chung,S.,Lee,J.,Chang,J.,Choo,J.:Coloring with limited data:Few-shotcolorization via memory augmented networks.In:Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition,pp.11283–11292(2019))，Su等人的研究(Su,J.-W.,Chu,H.-K.,Huang,J.-B.:Instance-aware imagecolorization.In:Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition,pp.7968–7977(2020))，我们分别在三个指标上进行了比较：Frechet Inception距离得分、峰值信噪比、结构相似性，在这两项数据集的测试上，这三项指标均有提升，最低分别提升0.003、0.263、0.014。我们发现越是复杂的场景，模型着色的细节越真实，我们假设这是因为transformer比一个CNN更能学习大量数据的分布规律。另外，我们观察到我们的模型的整体着色效果比其他方法更加平滑和均匀，并且没有过多的颜色突变，这证明了transformer可以更好地捕捉图像的全局信息。In order to prove the effectiveness of the image coloring model of the present invention, experiments were carried out on animal face images and landscape images, and we compared them with other currently popular models: Research by Yoo et al. (Yoo, S., Bahng, H .,Chung,S.,Lee,J.,Chang,J.,Choo,J.:Coloring with limited data:Few-shotcolorization via memory augmented networks.In:Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, pp.11283–11292(2019)), research by Su et al. (Su, J.-W., Chu, H.-K., Huang, J.-B.:Instance-aware imagecolorization.In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7968–7977(2020)), we compared three indicators: Frechet Inception distance score, peak signal-to-noise ratio, and structural similarity. In these two data In the test set, these three indicators have all improved, and the lowest increases are 0.003, 0.263, and 0.014 respectively. We found that the more complex the scene, the more realistic the details of the model coloring, we hypothesize that this is because the transformer is better than a CNN to learn the distribution of large amounts of data. In addition, we observe that the overall colorization effect of our model is smoother and more uniform than other methods, without excessive color mutations, which proves that the transformer can better capture the global information of the image.

Claims

1. Image rendering method based on Transformer and generative confrontation network, characterized in that it is implemented according to the following steps,

step 1, constructing an image coloring model based on a generation countermeasure network, wherein the image coloring model comprises a color image generator and a discriminator; the color image generator is used for generating a color image, and the discriminator is used for judging whether the input image is a real color image or a false color image;

step 2, inputting the gray image into a color image generator of the image coloring model to generate a pseudo color image;

and 3, respectively updating parameters of the discriminator and the color image generator:

step 3.1: firstly, fixing parameters of a color image generator, inputting the false color image and the real color image corresponding to the gray image into an identifier in sequence, then calculating the loss between the real color image corresponding to the gray image and a label value of 1 according to a loss function, calculating the loss between the false color image generated by the gray image and a label value of 0 according to the loss function, and finally updating the parameters of the identifier by using a back propagation algorithm; wherein a label value of 1 represents a real image and a label value of 0 represents a generated pseudo-color image;

step 3.2: fixing parameters of the discriminator, calculating the loss between the generated pseudo color image and the label value of 1 according to a loss function, and finally updating the parameters of the color image generator by utilizing a back propagation algorithm;

step 3.3: continuously circulating the process of updating parameters of the discriminator and the color image generator in the steps 3.1 and 3.2 until the loss value is converged, and generating a false color image with good effect by the color image generator, namely obtaining an optimized image coloring model;

and 4, directly coloring the gray image by using the optimized image coloring model.

2. The method for rendering images based on Transformer and generation countermeasure network as claimed in claim 1, wherein in step 1, the color image generator comprises multiple MWin-Transformer modules, and the function of the MWin-Transformer module is to extract and reconstruct the features of the images, and output a 3-channel effective color image: the Mwin-transformer module consists of three core parts: a window-based multi-head self-attention mechanism, a layer normalization operation LN, and a local enhanced forward propagation network LeFF.

3. The Transformer-based and confrontational network-generating image rendering method of claim 2, wherein the color image generator generates a pseudo-color image as follows:

X′＝Embedded Tokens(X _in )

X″＝W-MSA(LN(X))+X′

X _out ＝LeFF(LN(X″))+X″

wherein X _in Representing an input as a gray image or a false color image;

embedding Tokens denotes to assign X to _in Converting into a vector;

x' represents X _in Inputting the vector obtained by Embedding Tokens and outputting the vector;

then inputting a result LN (X ') obtained by carrying out layer normalization on the vector X' into a multi-head self-attention mechanism W-MSA based on a window to obtain a vector with extracted characteristic information, and adding the vector with X 'to obtain a vector X' with more collected characteristic information; x 'represents the input of X' into the output of the window-based multi-headed autofocusing mechanism and layer normalization operation;

continuously carrying out layer normalization on the vector X ', inputting the normalized LN (X ') into a local enhanced forward propagation network to obtain a vector with more extracted local characteristic information, and adding the vector with X ' to obtain a vector X with more gathered local characteristic information _out ，X _out Represents the input of X "into the locally enhanced forward propagation network LeFF and the resulting output of the layer normalization operation.

4. The transform-and-generative-confrontation-network-based image rendering method according to claim 3, wherein the layer normalization operation is calculated by:

wherein, the action object of LN layer is

X represents a vector, μ and δ represent the mean sum of each sampleThe variance of the measured values is calculated,

and

as affine learning parameters, d _k Is the dimension of the concealment that is to be hidden,

indicating that the number is a k-dimensional vector.

5. The transform-based and generative countermeasure network-based image rendering method of claim 3, wherein the window-based multi-headed self-attention mechanism is as follows:

the method comprises the steps of dividing a pseudo-color image into a plurality of windows, and then performing self-attention calculation in the different windows, wherein the patch number in one window is far less than all small blocks in one picture, and the number of the windows is kept unchanged, so that the calculation complexity of the multi-head self-attention mechanism based on the windows and the size of the image are changed into a linear relation from a square relation, and the calculation complexity of a model is greatly reduced.

6. The transform-and-generate countermeasure-network-based image shading method of claim 2, wherein a convolution is added to the forward propagation network in the Mwin-transform module, thereby forming a local enhanced forward propagation network LeFF.

7. The Transformer-based and generative countermeasure network-based image rendering method of claim 1, wherein the loss function is: