CN114067018B

CN114067018B - Infrared image colorization method for generating countermeasure network based on expansion residual error

Info

Publication number: CN114067018B
Application number: CN202111376380.5A
Authority: CN
Inventors: 詹伟达; 桂婷婷; 朱德鹏; 陈宇; 李锐
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2024-04-09
Anticipated expiration: 2041-11-19
Also published as: CN114067018A

Abstract

A light single image super-resolution reconstruction method belongs to the field of image super-resolution reconstruction, and aims to solve the problem that the existing super-resolution method is high in space complexity and time complexity, and the method comprises the following steps: constructing a network model: the entire network includes four main modules: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module, an information fusion module and an up-sampling module; preparing a data set: performing simulated degradation on the used data set, wherein the obtained high-low resolution image pair is used for training the whole convolutional neural network; training a network model; minimizing the loss value; fine tuning the model; and (3) saving a model: and solidifying the finally obtained model parameters, and directly loading the image and the network parameters into a network to obtain a final super-resolution image when super-resolution reconstruction operation is needed. On the premise of keeping higher reconstruction quality, the parameter quantity and the calculation quantity of the network are greatly reduced, and the method is more suitable for being realized on embedded equipment.

Description

Infrared image colorization method for generating countermeasure network based on expansion residual error

Technical Field

The invention relates to an infrared image colorization method for generating an countermeasure network based on an expansion residual error, and belongs to the field of image colorization.

Background

The image colorization, also called image coloring, is a process of converting a gray level image into a color image, and has a large application space in a plurality of fields such as remote sensing images, medical images, night vision images, movies, cartoons and the like. The infrared image can be imaged under extremely severe conditions, so that the application of the infrared image in life of people is wider and wider, but the infrared image is a single-color image like a visible light gray level image, and compared with a color image, the infrared image lacks more color characteristics, is not beneficial to direct resolution and identification of human eyes, and restricts the acquisition of effective information in a scene. Therefore, coloring the gray level image is beneficial to extracting the characteristics of the image and identifying the information in the image by human eyes, thereby being convenient for people to work. However, most of the existing image coloring methods are based on unsupervised learning neural networks, which can generate three key problems, namely color overflow, detail loss and lack of contrast of the colored image.

Chinese patent publication No. CN109712203B, entitled "a self-attention based image coloring method for generating an countermeasure network", which first trains a gray image coloring model using a persistence Loss function and a range Loss function; secondly, inputting a gray image to be colored; then, obtaining a feature extraction layer output result through convolution operation, spectrum normalization, batch normalization and activation function operation; then, the output of the feature extraction convolution module is connected with a deconvolution calculation stage by utilizing a feature fusion module; and finally, outputting the coloring image through a self-attention network. The contrast of the colored image obtained by the method is low, the color is hard and unnatural, the visual effect of human eyes is not met, and meanwhile, the realization process is complex and the efficiency is low.

Disclosure of Invention

The invention provides an infrared image colorizing method for generating an countermeasure network based on an expansion residual error, which aims to solve the problems of color overflow, detail loss and low contrast of a colored image obtained by the existing colorizing method. The method can well inhibit the color overflow problem of the obtained colored image, the color of the image is vivid and natural, the method is more in line with the visual observation of human eyes, and meanwhile, the method provided by the invention has the advantages of simple implementation process and higher image coloring efficiency.

The technical scheme adopted by the invention is as follows:

an infrared image colorization method for generating an countermeasure network based on an expansion residual error, comprising the following steps:

step 1, constructing a network model: the whole generated countermeasure network comprises a generator and a discriminator;

the infrared image is input into the generator, the infrared image firstly extracts image shallow information through the shallow feature extraction module, then extracts image deep semantic information through the downsampling module, and finally reconstructs an infrared image chromaticity image through the upsampling module;

the discriminator adopts a small VGG network structure and consists of six convolution blocks with normalization removed and two full connection layers; the chroma image output by the generator is fused with the infrared image to form a generated color infrared image, the generated color infrared image and a visible light color image in the data set are input into a discriminator, and the discriminator outputs true and false probability information to judge whether the input image is true or not;

step 2, training a network model: training an infrared image coloring model, and performing countermeasure training on the whole generated visible light data set ImageNet for a countermeasure network by adopting a generator and a discriminator. The size of each image in the data set is adjusted, the size of the input image is fixed, the brightness channel L of the color image is separated and used as the input of the whole network, and the color image is used as the label to carry out supervised training.

Step 3, minimizing the loss function value: the generator adopts chromaticity loss and countermeasures loss, the discriminator adopts countermeasures loss, the generator and the discriminator are alternately optimized by minimizing the loss functions of the generator and the discriminator, the model parameters can be considered to be trained and completed until the countermeasures of the generator and the discriminator reach an equilibrium state, and the model parameters are saved;

step 4, fine tuning the network model: the infrared image dataset is used for training and fine tuning the model, so that stable and reliable model parameters are obtained. Finally, the coloring effect of the model is better;

step 5, saving a network model: and solidifying the finally determined model parameters.

If the infrared image is required to be colored, the image is directly input into the network model obtained by the method, and the final colored infrared image can be obtained.

The shallow feature extraction module consists of a first convolution block and a second convolution block and extracts primary structure information of an image. The downsampling module consists of a residual convolution block I to a residual convolution block III, an expansion residual convolution block I to a expansion residual convolution block II and a convolution block III to a six, the residual convolution block I to the three complete downsampling operation, the expansion residual block I to the expansion residual convolution block II maintains a receptive field on the premise of keeping the resolution of an image unchanged, the semantic advanced features of the image are further extracted, and the grid effect caused by the expansion residual convolution is eliminated by the convolution block III to the six.

The residual convolution block and the expansion residual convolution block have the same network structure, and the characteristic diagram output in the upper network layer and the characteristic diagram output by the two residual structures are spliced, so that the effective utilization of the characteristic diagram information of different network layers is improved. The difference between the two is that the residual convolution block is used for realizing the downsampling operation, so that the convolution operation exists in the jump connection to further realize the splicing operation, and the expansion residual convolution block is an identical residual block, so that the splicing operation can be directly carried out.

The up-sampling module is composed of seven to nine convolution blocks and two times of pixel recombination, the feature image output by the down-sampling module firstly passes through seven expansion feature image channels of the convolution blocks, then the feature image channel is doubly reconstructed through the first time of pixel recombination, the feature image channels are continuously enlarged by the convolution block eight, the feature image four times of reconstruction is carried out through the second time of pixel recombination, and finally the reconstructed chromaticity image is output by the convolution block nine.

The beneficial effects of the invention are as follows:

1. the method is based on deep learning, the adopted network structure is a generated countermeasure network structure, the extraction of deep information of the image is enhanced by utilizing the game characteristics between the generator and the discriminator, the naturalness and the authenticity of the colored image are enhanced, and the quality of the colored image is dynamically improved.

2. The invention improves the structure of the downsampling module, and the downsampling module uses an expanded residual convolution network structure, which can effectively reduce the parameters of the network, lighten the depth of the network and improve the coloring efficiency of the network. In the characteristic extraction process, the use of expansion convolution can maintain the receptive field on the premise of keeping the image resolution unchanged, and the characteristic extraction and expression capacity of the network are improved; the use of the residual structure increases the effective utilization of the different feature map information, and strengthens the link between the network layers so that the network is more robust.

3. The invention improves the structure of the up-sampling module, replaces the traditional deconvolution with the convolution block and the pixel recombination to carry out image reconstruction and coloring, and improves the problems of detail and color information loss in the image reconstruction process.

4. The invention removes the batch normalization layer in the discriminator, which damages the contrast information of the original image, causes unpredictable artifacts and limits the generalization capability of the model, and the removal of the batch normalization layer can improve the performance of the network, reduce the calculation amount and increase the training stability. Meanwhile, the invention improves the loss function, and the design of the loss function is helpful to learn clearer edges and more detailed textures of the image, so that the output image has higher authenticity.

Drawings

FIG. 1 is a flowchart of a method for colorizing an infrared image based on an expansion residual to generate an countermeasure network according to the present invention.

Fig. 2 is a network structure diagram of a generator of the method for colorizing an infrared image based on an expansion residual to generate an countermeasure network according to the present invention.

Fig. 3 is a network structure diagram of a residual convolution block and an expanded residual convolution block in a generator network of the present invention.

Fig. 4 is a network structure diagram of a discriminator of an infrared image colorization method for generating an countermeasure network based on an expansion residual.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, a method for colorizing an infrared image based on an expansion residual to generate an countermeasure network specifically includes the following steps:

and 1, constructing a network model.

The whole generated countermeasure network comprises a generator and a discriminator, wherein the generator generates a color infrared image according to the input target infrared image, and the discriminator judges whether the generated color infrared image is real enough or not.

As shown in fig. 2, the generator is composed of three modules, namely a shallow feature extraction module, a downsampling module and an upsampling module. The infrared image is input into a generator, the infrared image firstly extracts image shallow information through a shallow feature extraction module, then extracts image deep semantic information through a downsampling module, and finally reconstructs an infrared image chromaticity image through an upsampling module.

As shown in fig. 3, the network structures of the difference convolution block and the expansion residual convolution block are the same, and the characteristic diagram output in the upper layer network and the characteristic diagram output by the two residual structures are spliced, so that the effective utilization of the characteristic diagram information of different network layers is increased. The difference between the two is that the residual convolution block is used for realizing the downsampling operation, so that the convolution operation exists in the jump connection to further realize the splicing operation, and the expansion residual convolution block is an identical residual block, so that the splicing operation can be directly carried out.

The sampling module is composed of seven to nine convolution blocks and two times of pixel recombination, the feature image output by the downsampling module firstly passes through seven expansion feature image channels of the convolution blocks, then the feature image is doubly reconstructed through the first time of pixel recombination, the feature image channels are continuously enlarged by the convolution block eight, the feature image is quadruple reconstructed through the second time of pixel recombination, and finally the reconstructed chromaticity image is output by the convolution block nine.

As shown in fig. 4, the discriminator adopts a small VGG network structure, consisting of 6 convolutions with the normalization of the batch removed and two fully connected layers. The chroma image output by the generator is fused with the infrared image to form a generated color infrared image, the generated color infrared image and the visible light color image in the data set are input into the discriminator, and the discriminator outputs true and false probability information to judge whether the input image is true or not.

And 2, training a network model.

Training an infrared image coloring model, and performing countermeasure training on the whole generated visible light data set ImageNet for a countermeasure network by adopting a generator and a discriminator. The size of each image in the data set is adjusted, the size of the input image is fixed, the brightness channel L of the color image is separated and used as the input of the whole network, and the color image is used as the label to carry out supervised training.

Step 3, minimizing the loss function value.

The generator adopts the chromatic loss and the antagonism loss of the relative average discriminator, the discriminator adopts the antagonism loss of the relative average discriminator, the generator and the discriminator are alternately optimized by minimizing the loss functions of the generator and the discriminator until the antagonism of the generator and the discriminator reaches the equilibrium state, the model parameters can be considered to be trained, and the model parameters are saved. The generator calculates a chromaticity loss using the L1 norm, avoiding excessive jumps in the chromaticity result by minimizing the chromaticity loss between the reconstructed rendered image and the original color image in the dataset; the contrast loss using a relative average discriminator improves the naturalness and realism of the coloring effect by minimizing the contrast loss. The discriminator uses the contrast loss of the relative average discriminator, and minimizing the contrast loss is beneficial to accelerating the coloring process and improving the coloring effect.

And 4, fine-tuning the network model. And training and fine-tuning the model by using an infrared image data set FLIR to obtain stable and reliable model parameters. And finally, the model coloring effect is better.

And 5, saving the network model. And solidifying the finally determined model parameters.

Examples:

an infrared image colorization method for generating an countermeasure network based on an expansion residual error, which specifically comprises the following steps:

and 1, constructing a network model.

As shown in fig. 2, in the generator, the convolution block is composed of a convolution layer, a batch normalization layer and an activation function, the convolution kernel size is 3×3, the step size is 1, and the padding is 1; as shown in fig. 3, in the residual convolution blocks one to three, the first convolution layer step length is 2 for implementing the downsampling operation, the second convolution layer step length is 1 for extracting the features, and the convolution kernel sizes are 3×3; the traditional downsampling operation is used for changing the resolution of an image to extract deeper information, which is not beneficial to image recovery, and the expansion residual convolution blocks one to two are used for replacing the traditional downsampling operation, wherein the expansion rates are respectively 2 and 4, so that deep information such as image details, semantics and the like is further extracted on the premise of keeping the resolution of the image to be 28 multiplied by 28, but the expansion residual convolution can cause a meshing effect, and therefore three to four expansion rate convolution blocks with 2 and five to six expansion rate convolution blocks with 1 are added behind a network, and the meshing phenomenon is eliminated through network learning. The convolution blocks seven to eight are used for expanding the characteristic image channel, up-sampling is realized by two times of pixel recombination, the amplification scale of the convolution blocks is respectively 2 and 4, the convolution blocks are used for recovering the original size, detail and color of the infrared image, and the convolution blocks nine are used for outputting the chromaticity image.

In general, the coloring process is to input an infrared image, extract features through a downsampling operation with 3 steps of 2, reconstruct a chrominance image through 2-pixel recombination, and finally fuse the output chrominance image with the infrared image into an infrared color image to be input into a discriminator.

As shown in fig. 4, the mining information part in the discriminator is composed of a repeated convolution layer and an activation function, and finally the full connection layer and the activation function are connected to output the true and false probability information of the image. The invention removes the batch normalization layer in the discriminator, which can destroy the original contrast information of the image, lead to unpredictable artifacts and limit the generalization capability of the model, and the removal of the batch normalization layer can improve the performance of the network, reduce the calculation amount and increase the training stability. The convolution kernel sizes of the convolution layers in the discriminator are all 3×3.

In order to ensure the robustness of the network, retain more structural information and fully extract image features, the invention uses two kinds of activation functions, namely a Sigmoid function and a LeakyReLU function, the last activation function of a discriminator and the activation function in a convolution block nine use the Sigmoid function, and the rest of the activation functions are both LeakyReLU functions. The Sigmoid function and the LeakyReLU function are defined as follows:

and 2, training a network model.

Training an infrared image coloring model, and performing countermeasure training on the whole generated visible light data set ImageNet for a countermeasure network by adopting a generator and a discriminator. ImageNet contains 1000 scene classifications, 120 ten thousand pictures, which is a huge dataset, and we adjust the 24000 image sizes to 224×224 as input to the whole network by randomly selecting several pictures of each type to be combined into 24000 ImageNet datasets. The challenge training of the visible light data set may determine a set of initialization parameters that expedite the subsequent network training process.

Step 3, minimizing the loss function value.

The generator and discriminator are alternately optimized by minimizing the loss function of the generator and discriminator. The discriminator parameters are updated by fixing the generator parameters such that the discriminator loss function is minimized, and the optimum network model is obtained by fixing the discriminator parameters such that the generator loss function is minimized. The generator calculates the chromaticity loss using the L1 norm and constrains the color realism using the contrast loss of the relative average discriminator. L1 chroma loss learns two chroma values (a, b) for each pixel, defining the L1 calculation formula as:

it is an evaluation-generated image G (x _i ) And an actual image y _i The loss of content at the L1 distance between them avoids excessive flattening and jumping of the coloring result.

Loss of antagonism of definition generatorThe calculation formula is as follows:

wherein D is _Ra Representing the relative average discriminator,representing the averaging of all spurious data in a small batch, σ being the Sigmoid function, C (x) being the non-transformed discriminator output, x _f ＝G(x _i )，x _i For input grey scale image, x _r X is the actual color image _f Is a fusion image of the generated image, i.e. the chrominance image, and the infrared image.

Thus the total loss L of the generator _G The method comprises the following steps:

where γ is a coefficient that balances the different penalty terms, controlling the weight of each penalty term.

The discriminator loss function is defined as:

by optimizing the generator and discriminator loss functions, the network is helped to learn clearer edges and more detailed textures, so that the color of the coloring image is natural, the reality is higher, and the visual effect is better.

The training times were set to 100, the batch size was set to 10, the loss function weight γ was set to 100, and the initial learning rate in the training process was 0.0003. The network parameter optimizer selects the self-adaptive moment estimation algorithm, and has the advantages that after bias correction, each iteration learning rate has a certain range, so that the parameters are stable. When the discriminator determines that the capabilities of the fake image are balanced with the capabilities of the generator to generate an image spoofing the discriminator, the network is considered to have been substantially trained.

And 5, saving the network model. And after the network training is finished, solidifying the finally determined model parameters.

If the infrared image is required to be colored, the image is directly input into the network model obtained by the method, and the final colored infrared image can be obtained. The network model fixes the size of the input image to 224 x 224.

The implementation of convolution, jump connection, stitching operation, residual structure, expansion convolution and pixel reorganization is an algorithm well known to those skilled in the art, and the specific flow and method can be referred to in the corresponding textbook or technical literature.

According to the invention, the infrared image colorization network model for generating the countermeasure network based on the expansion residual error is constructed, so that the characteristic utilization rate is improved, the infrared image can be automatically colored, and the trouble of needing manual auxiliary coloring is avoided. Subjectively, the colored image of the invention has vivid color, and the problems of color overflow, detail loss and lack of contrast are well improved; in objective, the feasibility and superiority of the method are further verified by calculating the related indexes of the image obtained by the existing method. The related index pairs of the prior art and the method proposed by the present invention are shown in table 1:

TABLE 1 correlation index comparison of prior art and the method proposed by the present invention

As can be seen from the table, the method provided by the invention has two indexes of higher peak signal-to-noise ratio and structural similarity, and further shows that the method has better coloring image quality.

Claims

1. An infrared image colorization method for generating an countermeasure network based on an expansion residual error is characterized by comprising the following steps:

step 2, training a network model: training an infrared image coloring model, and performing countermeasure training on the whole visible light data set image Net for generating a countermeasure network by adopting a generator and a discriminator; adjusting the size of each image in the data set, fixing the size of an input image, separating a brightness channel L of a color image, taking the brightness channel L as the input of the whole network, taking the color image as a label, and performing supervised training;

step 4, fine tuning the network model: training and fine-tuning the model by using the infrared image dataset to obtain stable and reliable model parameters; finally, the coloring effect of the model is better;

2. The method for colorizing an infrared image based on an expansion residual generated countermeasure network according to claim 1, wherein the shallow feature extraction module is composed of a first convolution block and a second convolution block, and extracts primary structure information of the image; the downsampling module consists of a residual convolution block I to a residual convolution block III, an expansion residual convolution block I to a expansion residual convolution block II and a convolution block III to a six, the residual convolution block I to the three complete downsampling operation, the expansion residual block I to the expansion residual convolution block II maintains a receptive field on the premise of keeping the resolution of an image unchanged, the semantic advanced features of the image are further extracted, and the grid effect caused by the expansion residual convolution is eliminated by the convolution block III to the six.

3. The method for colorizing the infrared image based on the expansion residual generated against network according to claim 2, wherein the residual convolution block and the expansion residual convolution block have the same network structure, and the method is characterized in that the characteristic image output in the upper layer network and the characteristic image output by the upper layer network through the two residual structures are spliced, so that the effective utilization of the characteristic image information of different network layers is increased; the difference between the two is that the residual convolution block is used for realizing the downsampling operation, so that the convolution operation exists in the jump connection to further realize the splicing operation, and the expansion residual convolution block is an identical residual block, so that the splicing operation can be directly carried out.

4. The method for colorizing an infrared image based on an expansion residual generated against a network according to claim 1, wherein the up-sampling module consists of seven to nine convolution blocks and two times of pixel recombination, the feature image output by the down-sampling module is subjected to seven expansion feature image channels of the convolution block, then subjected to twice reconstruction of the feature image by the first time of pixel recombination, the feature image channels are continuously expanded by the convolution block eight, four times reconstruction of the feature image is performed by the second time of pixel recombination, and finally the reconstructed chromaticity image is output by the nine convolution blocks.