CN113870126A

CN113870126A - Bayer image recovery method based on attention module

Info

Publication number: CN113870126A
Application number: CN202111043024.1A
Authority: CN
Inventors: 孙帮勇; 魏凌云
Original assignee: Xian University of Technology
Current assignee: Shenzhen Dianwei Culture Communication Co ltd; Shenzhen Litong Information Technology Co ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-31
Anticipated expiration: 2041-09-07
Also published as: CN113870126B

Abstract

The invention discloses a Bayer image recovery method based on an attention module, which comprises the following steps: 1) constructing a green channel recovery network, wherein the input of the green channel recovery network is a green sampling image separated from a Bayer image channel, and the output of the green channel recovery network is a reconstructed green channel image; 2) constructing a feature guide module, wherein the input of the feature guide module is the reconstructed green channel map and the output of the coder-decoder, and the output of the feature guide module is a guide map; 3) constructing a red channel recovery network, wherein the input of the red channel recovery network is a red sampling graph and a guide graph, and the output of the red channel recovery network is a reconstructed red channel graph; 4) constructing a blue channel recovery network; 5) fusing the red channel image, the green channel image and the blue channel image obtained by reconstruction to obtain a reconstructed RGB image; 6) the mean absolute error between the reconstructed RGB map and the true image in the image pair is calculated, and the network model is optimized with the goal of minimizing the L1 loss function. The method of the invention can obtain high-quality reconstructed images.

Description

Bayer image recovery method based on attention module

Technical Field

The invention belongs to the technical field of image processing and deep learning, and relates to a Bayer image recovery method based on an attention module.

Background

A color image is generally represented by three color components, red (R), green (G), and blue (B), each of which is referred to as a color channel. Nowadays, RGB digital cameras record color images most commonly, and most digital cameras adopt a single-sensor imaging structure. The image sensor in the single-sensor color digital camera is usually a photosensitive coupling element or a complementary metal oxide semiconductor chip, which can only sense the intensity of light but not the color, so that a filter is required to be placed in front of the sensor to allow only light with a certain wavelength to pass through, each pixel position of the sensor only collects one color during exposure imaging, the directly sampled image is called a Bayer image, and the process of reconstructing other two color component information which is not directly sampled at each pixel position is called image demosaicing.

In current RGB digital cameras, the most common color filter array is a Bayer filter array, and the imaging area thereof is composed of 2 × 2 repeating arrays, each 2 × 2 array including 2 green (G), 1 red (R), and 1 blue (B) pixels. This results in 2/3 color information being lost, and 1/3 color information being sampled that is mostly contaminated by noise, thereby affecting the quality of the reconstructed image. The image demosaicing is the first step in the image processing flow, and a foundation is laid for a series of subsequent image processing tasks, so that the important significance is brought to the reconstruction of high-quality images.

Current methods of image demosaicing can be broadly divided into three categories: interpolation-based methods, sparse representation-based methods, and deep learning-based methods. In the interpolation-based method, some algorithms ignore the correlation among channels, and even if the correlation among the channels is considered, the reconstruction effect of some algorithms in the areas such as edges and textures is not satisfactory. The sparse representation-based method has high accuracy, but also has high complexity, and is limited in practical application. Based on a deep learning method, a neural network is designed, features in a raw image and correlation between adjacent pixels of each channel are learned, and an image is reconstructed, so that certain progress is achieved; however, some networks firstly sample the raw image into a half-size four-channel image, so that the resolution is reduced, the image details are lost, the relative position information between RGB is lost, and the reconstructed image is inaccurate; and some network models have the problems of complexity and difficult training.

Disclosure of Invention

The invention aims to provide a Bayer image recovery method based on an attention module, which solves the problems of difficult network model training and low reconstruction precision when demosaicing is performed based on deep learning in the prior art.

The technical scheme of the invention is that a Bayer image recovery method based on an attention module is implemented according to the following steps:

step 1, constructing a green channel recovery network, wherein the input of the green channel recovery network is a green sampling image separated from a Bayer image channel, and the output of the green channel recovery network is a reconstructed green channel image;

step 2, constructing a characteristic guide module, wherein the input of the characteristic guide module is the reconstructed green channel image and the output of the coder-decoder, and the output of the characteristic guide module is a guide image;

step 3, constructing a red channel recovery network, wherein the input of the red channel recovery network is a red sampling graph and a guide graph, and the output of the red channel recovery network is a reconstructed red channel graph;

step 4, constructing a blue channel recovery network, wherein the blue channel recovery network and the red channel recovery network have similar structural flows, and the blue channel recovery network is distinguished in that a blue sampling graph and a guide graph are input and a reconstructed blue channel graph is output;

step 5, fusing the red channel image, the green channel image and the blue channel image obtained by reconstruction to obtain a reconstructed RGB image;

and 6, calculating the average absolute error between the reconstructed RGB image and the real image in the image pair according to the established attention module-based network model, and optimizing the network model by taking the minimized L1 loss function as a target.

The beneficial effects of the invention are that the invention comprises the following aspects:

1) most of the existing methods sample a Bayer image into a half-size image of four channels RGGB, so that the resolution is reduced, the details of the image are lost, and the relative position information of RGB is lost. The invention provides the preprocessing of channel separation for the Bayer image, and the image resolution and detail as well as the relative positions of RGB are reserved.

2) The invention provides a method for respectively reconstructing images of three channels of Bayer, which is different from most methods for applying the same network structure to all channels at all positions, and takes the characteristics of the Bayer lattice structure into consideration, so that RGB applies different networks.

3) The invention utilizes the attention module principle to enable each channel of the feature diagram to learn different types of features, enables the network to independently select and pay attention to learning useful features, and inhibits useless features.

4) In the prior art, when a reconstructed green channel image is used as a guide image or used as prior information to reconstruct images of a red channel and a blue channel, simple operations such as splicing, addition, element multiplication and the like are mostly adopted, but the characteristics of the prior information cannot be effectively mined. The invention provides a feature guide module, which effectively fuses the prior information of a green channel image by using a non-homogeneous linear mapping model in a feature domain, thereby obtaining a high-quality reconstructed image.

Drawings

FIG. 1 is a general flow diagram of the method of the present invention;

FIG. 2 is a block flow diagram of a green channel recovery network constructed by the method of the present invention;

FIG. 3 is a flow diagram of an attention module constructed by the method of the present invention;

FIG. 4 is a block flow diagram of a feature guidance module constructed by the method of the present invention;

fig. 5 is a block flow diagram of a red/blue channel recovery network constructed by the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention mainly relates to a Bayer image recovery method based on an attention module, which has the overall thought that: firstly, preprocessing a real picture in a data set, sampling the real picture by using a Bayer optical filter array to obtain a Bayer image, forming an image pair with the real image so as to be convenient for optimizing a network, and then splitting a channel of the obtained Bayer image to obtain a red sampling image, a green sampling image and a blue sampling image; then, restoring a green channel, extracting learning characteristics of the green sampling image through an attention module and a coder-decoder, and reconstructing a green channel image; secondly, the reconstructed green channel image is used as prior information for guiding feature learning of the red channel and the blue channel, and the red channel image and the blue channel image are reconstructed respectively; and finally, combining the obtained three channel images to obtain a final reconstructed RGB image.

As shown in fig. 1, the process of the method of the present invention includes four parts, namely green channel recovery, red channel recovery, blue channel recovery, and merging into a reconstructed RGB map. The green channel recovery mainly comprises the steps of learning the characteristics of different channels through an attention module, and effectively utilizing multi-scale characteristics through a coder-decoder to obtain a reconstructed green channel image. And the red channel recovery fully utilizes the prior information of the reconstructed green channel image to reconstruct the red channel image. The process of blue channel recovery is the same as that of red channel recovery. And finally, fusing the reconstructed red channel image, the green channel image and the blue channel image together to form an RGB three-channel image, and finally obtaining the reconstructed RGB image.

The method of the invention is implemented by utilizing the principle and the network framework according to the following steps:

step 1, constructing a green channel recovery network, wherein the green channel recovery network is mainly used for reconstructing a green channel map.

The input of the green channel recovery network is a green sampling image separated from a Bayer image channel, and the size of the green sampling image is H multiplied by W multiplied by 3, wherein H and W respectively represent the height and the width of an input image; the output is a reconstructed green channel map with the size of H multiplied by W multiplied by 1;

as shown in FIG. 2, the green channel recovery network includes a convolution operation, an attention module, and a coder-decoder, FIG. 2

Representing a downsampling operation using bilinear interpolation and 1 x 1 convolution,

to perform an upsampling operation using bilinear interpolation and 1 x 1 convolution,

indicating that the stitching operation is performed in dimension 1. The flow structure of the green channel recovery network is as follows in sequence: the green sample map (H × W × 3) is input → the first convolution layer Conv1 → the attention module → the second convolution layer Conv2 → the third convolution layer Conv3 → the fourth convolution layer Conv4 → the fifth convolution layer Conv5 → the sixth convolution layer Conv6 → the seventh convolution layer Conv7 → the eighth convolution layer Conv8 → output as the reconstructed green channel map (H × W × 1).

In this embodiment, the convolution kernel size of the first convolution layer Conv1 is 3 × 3, the step size is 3, and the total number of feature mappings is 64; the second convolutional layer Conv 2-the seventh convolutional layer Conv7 constitute a codec, the sizes of convolutional kernels are all 3 × 3, the step sizes are all 3, the total number of feature maps is 64, and all feature maps are activated by ReLU; the output sizes of the second convolutional layer Conv2, the third convolutional layer Conv3, and the fourth convolutional layer Conv4 are H × W, 1/2(H × W), 1/4(H × W), respectively; the output sizes of the fifth convolutional layer Conv5, the sixth convolutional layer Conv6, and the seventh convolutional layer Conv7 are 1/4(H × W), 1/2(H × W), and H × W, respectively; the eighth convolutional layer Conv8 has a convolutional kernel size of 1 × 1, a step size of 1, and a total number of feature maps of 1.

The attention module is used for enabling each channel of the feature map to learn different kinds of features, enabling the network to autonomously select useful features and suppressing useless features.

As shown in fig. 3, in the figure

In order to perform the multiplication operation by the elements,

for the element-wise addition operation, note that the flow structure of the blocks is in the order of input → first convolution layer Conv1 → second convolution layer Conv2 → global average pooling operation → third convolution layer Conv3 → fourth convolution layer Conv4 → output of second convolution layer Conv2 and output of fourth convolution layer Conv4 to be element-wise multiplied → initial input and result of the multiplication operation to be element-wise added. In this embodiment, the convolution kernel size of the first convolution layer Conv1 is 3 × 3, the step size is 3, and the total number of feature maps is 64And activated by PReLU; the convolution kernel size of the second convolution layer Conv2 is 3 × 3, the step size is 3, and the total number of feature mappings is 64; the convolution kernel size of the third convolution layer Conv3 is 1 × 1, the step size is 1, the total number of feature mappings is 64, and the activation is performed through PReLU; the convolution kernel size of the fourth convolution layer Conv4 is 1 × 1, the step size is 1, the total number of feature maps is 64, and the weights of the features are obtained by a Sigmoid function.

Step 2, constructing a characteristic guide module, wherein the input of the characteristic guide module is a reconstructed green channel map (with the size of H multiplied by W multiplied by 1) and the output (with the size of H multiplied by W multiplied by 64) of the coder-decoder; the output is a guide map with a size of H × W × 64. Since most demosaicing methods usually use simple operations, such as splicing, addition, and element multiplication, to fuse the prior information of the green channel map to reconstruct the red channel map and the blue channel map, the prior information cannot be mined efficiently to obtain a high-quality image. Therefore, the feature guidance module of this step mainly applies the non-homogeneous linear mapping model in the feature domain to effectively fuse the prior information of the green channel map.

As shown in fig. 4, the flow structure of the feature guidance module sequentially includes: the output of the codec → the 1 st convolutional layer Conv1 → the output of the 1 st convolutional layer Conv1 multiplies the output of the green channel map by element by the output of the 2 nd convolutional layer Conv2 → the output of the codec and the output of the multiplication add by element → the generation of the guide map. In this embodiment, the convolution kernel size of the 1 st convolution layer Conv1 is 1 × 1, the step size is 1, and the total number of feature maps is 64; the convolution layer 2 Conv2 has a convolution kernel size of 1 × 1, a step size of 1, a total number of feature maps of 64, and is activated by a Sigmoid function.

Step 3, constructing a red channel recovery network, wherein the input of the red channel recovery network is a red sampling graph (with the size of H multiplied by W multiplied by 1) and a guide graph (with the size of H multiplied by W multiplied by 64); the output is a reconstructed red channel map of size H × W × 1. The role of the red channel recovery network is to guide the reconstruction of the red channel map by using the guide map as a priori information.

As shown in fig. 5, the flow structure of the red channel recovery network sequentially includes: red sample map → I convolutional layer Conv1 → attention module → output of attention module and guide map are stitched in dimension 1 → encoder-decoder → II convolutional layer Conv2 → reconstructed red channel map. In this embodiment, the convolution kernel size of the I-th convolution layer Conv1 is 3 × 3, the step size is 3, and the total number of feature maps is 64; the convolution kernel size of the II convolutional layer Conv2 is 1 × 1, the step size is 1, and the total number of feature maps is 1.

Step 4, constructing a blue channel recovery network, as shown in fig. 5, the blue channel recovery network and the red channel recovery network have similar structural flows, and the difference is that a blue sampling graph (with the size of H × W × 1) and a guide graph (with the size of H × W × 64) are input; the output is a reconstructed blue channel map with the size of H multiplied by W multiplied by 1.

And 5, fusing the red channel map (with the size of H multiplied by W multiplied by 1), the green channel map (with the size of H multiplied by W multiplied by 1) and the blue channel map (with the size of H multiplied by W multiplied by 1) obtained through reconstruction to obtain a reconstructed RGB map (with the size of H multiplied by W multiplied by 3).

And 6, calculating the average absolute error between the reconstructed RGB image (with the size of H multiplied by W multiplied by 3) and the real image (Ground Truth) in the image pair according to the established network model based on the attention module, and optimizing the network model by taking the minimized L1 loss function as a target.

The L1 loss function is mainly used to measure the difference between the reconstructed RGB image and the corresponding real image (Ground Truth), and mainly used to protect the color and structure information of the image.

The expression of the L1 loss function is:

wherein N is the number of images in each batch, XⁱFor the reconstructed RGB map obtained in step 5,

is the corresponding real image (i.e. Ground Truth).

Since the L1 loss function belongs to a loss function at the pixel level, the pixel loss does not actually take into account the image quality (such as perceptual quality, texture), often lacks high-frequency details, and the generated texture is too smooth to be satisfactory, this step introduces a perceptual loss function so that the generated features are similar enough to the features corresponding to the real image (i.e. Ground Truth), thereby improving the perceptual quality of the final reconstructed RGB image, where the perceptual loss function is expressed as:

wherein, C_jA channel characteristic of layer j, H_jHigh, W, characteristic of layer j_jWidth of features of layer j,. psi_j() Is a feature map obtained by the jth convolutional layer in the pre-trained VGG19 model, I_GFor real images, I_RIs a reconstructed RGB image;

and (3) synthesizing the two loss functions, wherein the loss function expression of the whole Bayer image recovery model is as follows:

L＝λ₁L₁+λ₂L_p，

wherein λ is₁、λ₂Is the tuning parameter between the L1 loss function and the perceptual loss function.

Claims

1. A Bayer image recovery method based on an attention module is characterized by comprising the following steps:

and 6, calculating the average absolute error between the reconstructed RGB image and the real image in the image pair according to the network model, and optimizing the network model by taking the minimized L1 loss function as a target.

2. The attention module-based Bayer image restoration method according to claim 1, characterized in that: the green channel recovery network comprises a convolution operation, an attention module and a coder-decoder;

the flow structure of the green channel recovery network is as follows in sequence: a green sampling map as an input → the first convolution layer Conv1 → the attention module → the second convolution layer Conv2 → the third convolution layer Conv3 → the fourth convolution layer Conv4 → the fifth convolution layer Conv5 → the sixth convolution layer Conv6 → the seventh convolution layer Conv7 → the eighth convolution layer Conv8 → an output as a reconstructed green channel map; wherein, the convolution kernel size of the first convolution layer Conv1 is 3 × 3, the step size is 3, and the total number of feature mappings is 64; the second convolutional layer Conv 2-the seventh convolutional layer Conv7 constitute a codec, the sizes of convolutional kernels are all 3 × 3, the step sizes are all 3, the total number of feature maps is 64, and all feature maps are activated by ReLU; the output sizes of the second convolutional layer Conv2, the third convolutional layer Conv3, and the fourth convolutional layer Conv4 are H × W, 1/2(H × W), 1/4(H × W), respectively; the output sizes of the fifth convolutional layer Conv5, the sixth convolutional layer Conv6, and the seventh convolutional layer Conv7 are 1/4(H × W), 1/2(H × W), and H × W, respectively; the eighth convolutional layer Conv8 has a convolutional kernel size of 1 × 1, a step size of 1, and a total number of feature maps of 1.

3. The attention module-based Bayer image restoration method according to claim 2, characterized in that: the flow structure of the attention module is as follows in sequence: input → first convolution layer Conv1 → second convolution layer Conv2 → global average pooling operation → third convolution layer Conv3 → fourth convolution layer Conv4 → output of the second convolution layer Conv2 and output of the fourth convolution layer Conv4 are multiplied by element → initial input is added by element with the result of the multiplication operation; the convolution kernel size of the first convolution layer Conv1 is 3 × 3, the step size is 3, the total number of feature mappings is 64, and the feature mappings are activated by the PReLU; the convolution kernel size of the second convolution layer Conv2 is 3 × 3, the step size is 3, and the total number of feature mappings is 64; the convolution kernel size of the third convolution layer Conv3 is 1 × 1, the step size is 1, the total number of feature mappings is 64, and the activation is performed through PReLU; the convolution kernel size of the fourth convolution layer Conv4 is 1 × 1, the step size is 1, the total number of feature maps is 64, and the weights of the features are obtained by a Sigmoid function.

4. The attention module-based Bayer image restoration method according to claim 1, characterized in that: the flow structure of the characteristic guide module is as follows in sequence: the output of the codec → the 1 st convolutional layer Conv1 → the output of the 1 st convolutional layer Conv1 multiplies the output of the green channel map by element by the output of the 2 nd convolutional layer Conv2 → the output of the codec and the output of the multiplication add by element → generation of a guide map; wherein, the convolution kernel size of the 1 st convolution layer Conv1 is 1 × 1, the step size is 1, and the total number of feature mappings is 64; the convolution layer 2 Conv2 has a convolution kernel size of 1 × 1, a step size of 1, a total number of feature maps of 64, and is activated by a Sigmoid function.

5. The attention module-based Bayer image restoration method according to claim 1, characterized in that: the flow structure of the red channel recovery network is as follows in sequence: red sample map → I convolutional layer Conv1 → attention module → output and guide map of attention module is spliced in dimension 1 → encoder-decoder → II convolutional layer Conv2 → reconstructed red channel map; wherein, the convolution kernel size of the I-th convolution layer Conv1 is 3 × 3, the step size is 3, and the total number of feature mappings is 64; the convolution kernel size of the II convolutional layer Conv2 is 1 × 1, the step size is 1, and the total number of feature maps is 1.

6. The attention module-based Bayer image restoration method according to claim 1, characterized in that: the expression of the L1 loss functionComprises the following steps:

is a corresponding real image;

in addition, a perceptual loss function is introduced in the step, and the expression of the perceptual loss function is as follows:

and (3) integrating two loss functions, wherein the loss function expression of the whole Bayer image recovery model is as follows:

L＝λ₁L₁+λ₂L_p，