CN111696036B

CN111696036B - Residual error neural network based on cavity convolution and two-stage image demosaicing method

Info

Publication number: CN111696036B
Application number: CN202010447460.4A
Authority: CN
Inventors: 朱树元; 王岩; 王忠荣; 刘光辉; 曾辽原
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2023-03-28
Anticipated expiration: 2040-05-25
Also published as: CN111696036A

Abstract

The invention belongs to the field of digital image processing, and particularly relates to a residual error neural network based on cavity convolution and a two-stage image demosaicing method; the invention introduces a shallow layer feature extraction unit, a local residual error unit and a deep layer feature extraction unit based on a residual error neural network, three basic units interact with each other to greatly enhance the learning capability and the modeling capability of a target neural network, and can establish accurate mapping from a mosaic image to an RGB color image aiming at the problem of image demosaicing, and finally, the mosaic image in a Bayer CFA mode can be processed through the established effective mapping to obtain the RGB color image; meanwhile, a two-stage image demosaicing model is introduced, prior information is fully utilized, the modeling capability of the network is improved, and the knowledge space is optimized; the image demosaicing method can obviously improve the peak signal-to-noise ratio of the image, greatly improve the efficiency, quality and robustness of demosaicing the image, and has profound significance in the field of image processing.

Description

Residual error neural network based on cavity convolution and two-stage image demosaicing method

Technical Field

The invention belongs to the field of digital image processing, and particularly relates to a residual error neural network based on cavity convolution and a two-stage image demosaicing method.

Background

Single sensor color imaging technology using CFA is widely used in the current digital camera industry. In a single sensor camera with CFA, each pixel point only records one pixel point in an R channel image, a G channel image and a B channel image, and when an RGB color image is recovered, the other two lost pixel values need to be estimated; this process, commonly referred to as image demosaicing, plays a crucial role in the field of obtaining high quality RGB color images. The most popular and widely used CFA is the bayer CFA, in which green pixel points are sampled on a quincunx grid and red and blue pixel points are sampled on a rectangular grid, where the number of green sample points is twice the number of red or blue sample points.

The bayer cfa-based image demosaicing method has been widely studied, and most bayer cfa-based image demosaicing methods first interpolate G-channel image values; for the R channel image and the B channel image, they are usually first converted into color ratios or color differences, and then interpolated in the transform domain. Another popular method is a frequency domain algorithm, which first converts the mosaic image into the frequency domain and then separates the luminance and chrominance components by frequency filtering. In addition, an image demosaicing method based on a compressed sensing theory is also provided.

In recent years, deep learning has been excellent in solving many problems such as image classification, object detection, and natural language processing; among the different types of neural networks, convolutional Neural Networks (CNNs) are the most extensively studied. The CNN can automatically extract effective representations in the target image, namely the CNN can directly carry out effective modeling on data from the original pixels through few preprocessing. The CNN is adopted in the literature 'Color Image demosaicing via Deep Residual Learning', so that the Image demosaicing is realized, and compared with the traditional method, the method has good performance gain, but due to the introduction of Batch Normalization (BN), the depth of the network is reduced on the premise of the same operation cost, and the perception domain of the network is reduced. A three-Stage Image demosaicing CNN is constructed in a Color Image demosaicing Using a 3-Stage conditional Neural Network Structure, so that the excellent modeling capability and learning capability of a Neural Network are fully embodied, but the restored G channel Image is used as prior information to guide restoration of an R channel Image and a B channel Image to be suboptimal.

Disclosure of Invention

The invention aims to provide a residual error neural network based on cavity convolution and a two-stage image demosaicing method aiming at the technical problems, the relation between a sensing domain and the network operation cost is fully considered, a residual error block is constructed by adopting the cavity convolution, and the sensing domain of the network is increased on the premise of not increasing the operation cost; in the network training stage, the original G channel image is used as prior information to guide the recovery of the R channel image and the B channel image, so that the modeling capability of the network is better, and the learning space is optimized; the residual error neural network based on the cavity convolution and the two-stage image demosaicing method can obviously improve the peak signal-to-noise ratio (PSNR) of the image and have the advantages of good demosaicing effect, high speed, strong robustness and the like.

In order to achieve the purpose, the invention adopts the following technical scheme:

a residual error neural network based on cavity convolution and a two-stage image demosaicing method comprise the following steps:

step 1: building a residual error neural network model based on cavity convolution;

and 2, step: converting the RGB color image into a mosaic image through Bayer CFA, carrying out data preprocessing to form a training set, and setting parameters of a training target neural network;

and step 3: according to the residual error neural network model based on the cavity convolution, training a corresponding neural network model by taking a minimized loss function as a target in two stages;

and 4, step 4: and (3) according to the target neural network model obtained by training, performing the same data preprocessing as the step (2) on the mosaic image of the Bayer CFA mode to be processed, inputting the mosaic image into the target neural network model, and outputting the RGB color image without mosaic.

Further, in step 1, the residual neural network model based on the hole convolution includes: the device comprises a neural network G, a neural network R and a neural network B, wherein the neural network G, the neural network R and the neural network B all adopt a residual error neural network based on cavity convolution; the method comprises the steps that a neural network G inputs a mosaic image to restore a G-channel image, a neural network R inputs the mosaic image, the G-channel image restored by the neural network G and an image only containing R-channel sampling pixels restore an R-channel image, a neural network B inputs the mosaic image, the G-channel image restored by the neural network G and the image only containing B-channel sampling pixels restore a B-channel image, and the restored R-channel image, the restored G-channel image and the restored B-channel image are synthesized into a restored RGB color image.

Still further, in step 1, the residual neural network based on the hole convolution includes: 1 shallow layer feature extraction unit, N local residual error units and 1 deep layer feature extraction unit, wherein N is more than or equal to 1; the method is characterized in that an input image is converted into shallow features through a shallow feature extraction unit, the shallow features sequentially pass through N local residual error units to form main features, and the main features pass through a deep feature extraction unit to output residual error images.

Still further, the shallow feature extraction unit includes: a3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layers without activation function; the input image is converted into shallow layer characteristics through A3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without the activation functions in sequence;

the local residual unit includes: b residual blocks and 1 dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, C3 × 3 void convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the residual block forms a local residual using a jump join; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; inputting shallow layer characteristics and sequentially passing through B residual blocks, and outputting the output of each residual block through dimension reduction of a dimension reduction block;

the deep feature extraction unit includes: d3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, which are connected end to end in sequence; the input main features sequentially pass through D3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions to output residual images.

In all the local residual units, the local residual in the residual block is subjected to a feature scaling technique and then the input and the output are connected in an identical manner.

Further, in step 2, the data preprocessing process includes:

firstly, setting a G-channel sampling pixel and a B-channel sampling pixel in a mosaic image to be 0 to obtain an image only containing the R-channel sampling pixel, and setting the G-channel sampling pixel and the R-channel sampling pixel in the mosaic image to be 0 to obtain an image only containing the B-channel sampling pixel;

then, the mosaic image, the image only containing R-channel sampling pixels, the image only containing B-channel sampling pixels, the original R-channel image, the original G-channel image and the original B-channel image are respectively divided into a plurality of mosaic image blocks, image blocks only containing R-channel sampling pixels, image blocks only containing B-channel sampling pixels, original R-channel image blocks, original G-channel image blocks and original B-channel image blocks, wherein the number and the size of the mosaic image blocks, the image blocks only containing R-channel sampling pixels, the image blocks only containing B-channel sampling pixels, the original R-channel image blocks, the original G-channel image blocks and the original B-channel image blocks are the same.

Further, in step 3, the first stage of the two-stage image demosaicing method is: training a neural network R by using a loss function R, wherein the neural network R constructs a mapping from a mosaic image, an image only containing R channel sampling pixels and an original G channel image to an R channel image; training a neural network G by using a loss function G, wherein the neural network G constructs a mapping from a mosaic image to a G channel image; training a neural network B by using a loss function B, wherein the neural network B constructs a mapping from a mosaic image, an image only containing B-channel sampling pixels and an original G-channel image to a B-channel image; the second stage of the two-stage image demosaicing method is as follows: and (3) jointly training a neural network R, a neural network G and a neural network B by using a loss function RGB, wherein the neural network R, the neural network G and the neural network B together construct a mapping from the mosaic image to the RGB color image.

Further, in step 3, the loss function of the first stage includes: a loss function R, a loss function G and a loss function B; the loss function of the second stage includes: the loss function RGB, and all of the loss functions take the form of a Mean Absolute Error (MAE) function:

wherein L is _R (·)、L _G (·)、L _B (. And L) _RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively _R 、θ _G And theta _B Respectively representing parameters of a neural network R, parameters of a neural network G and parameters of a neural network B, wherein M represents the total number of image blocks; f. of _R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f _G (. Represents the trained slave mosaicMapping of gram images to G-channel images, f _B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f _R,G,B (. Represents f) _R (·)，f _G (. And f) _B (ii) a combined mapping of; x is the number of _mos 、x _{mos_r} And x _{mos_b} Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x _rgb Representing an RGB color image recovered by synthesizing an R channel image recovered by a neural network R, a G channel image recovered by a neural network G and a B channel image recovered by a neural network B; y is _R 、y _G 、y _B And y _RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.

Still further, in the step 3, in the process of training the residual error neural network model based on cavity convolution, a parameter θ in the residual error neural network based on cavity convolution _R 、θ _G And theta _B Adopts an Xavier method, namely that the mean value is 0 and the variance is Var (theta) _R )、Var(θ _G ) And Var (θ) _B ) Distribution of (a):

/>

wherein the content of the first and second substances,

and &>

Respectively represents the number of the input neurons of the layer in the neural network R, the number of the input neurons of the layer in the neural network G and the number of the input neurons of the layer in the neural network B, and is/are>

And &>

Respectively representing the number of the output neurons of the layer in the neural network R, the number of the output neurons of the layer in the neural network G and the number of the output neurons of the layer in the neural network B; the minimization loss function employs an Adam optimization method.

Compared with the prior art, the invention has the beneficial effects that:

according to the residual error neural network based on cavity convolution and the two-stage image demosaicing method, the shallow feature extraction unit, the local residual error unit and the deep feature extraction unit are introduced, three basic units interact with each other to greatly enhance the learning capability and the modeling capability of a target neural network, accurate mapping from a mosaic image to an RGB color image can be established aiming at the image demosaicing problem, and finally the mosaic image of a Bayer CFA mode can be processed through the established effective mapping to obtain the RGB color image; meanwhile, a two-stage image demosaicing model is introduced, prior information is fully utilized, the modeling capability of the network is improved, and the knowledge space is optimized; the image demosaicing method can obviously improve the peak signal-to-noise ratio (PSNR) of the image, greatly improve the efficiency, quality and robustness of image demosaicing, and has profound significance in the field of image processing.

Further, the invention also has the following beneficial effects:

the built residual error neural network model based on the cavity convolution has the modularized property, different units have different functions, a shallow feature extraction unit aims at converting an image into a shallow feature, a local residual error unit is a network main body and aims at converting the shallow feature into a main feature, a deep feature extraction unit aims at converting the main feature into a residual error image, and the shallow feature extraction unit, a plurality of local residual error units and the deep feature extraction unit are sequentially connected end to end; according to the connection mode, the network can stack a plurality of local residual error units, and the network depth is increased on the premise of ensuring efficient feature extraction, so that the nonlinear modeling capability and learning capability of the network are improved.

Each local residual error unit in the residual error neural network model based on the cavity convolution built by the invention contains a plurality of residual error blocks, each residual error block is provided with a jump connection, and each jump connection is subjected to residual error addition once; local residual learning in the local residual unit and global residual learning of the whole network interact, so that the performance and the convergence speed of the network are further improved; meanwhile, the residual block introduces a hole convolution, and the sensing domain of the network is expanded on the premise of ensuring the operation cost; the similar structure of the hole convolution and the Bayer CFA further promotes the extraction of the characteristics by the residual block, thereby improving the network performance.

Drawings

Fig. 1 is a schematic diagram of an internal structure of a residual neural network G based on hole convolution according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of the internal structure of the residual neural network R/B based on the hole convolution according to the embodiment of the invention.

Fig. 3 is a schematic structural diagram of the shallow feature extraction unit in fig. 1.

Fig. 4 is a schematic structural diagram of the local residual unit in fig. 1.

Fig. 5 is a schematic structural diagram of the deep layer feature extraction unit in fig. 1.

FIG. 6 is a flow chart of the residual neural network based on hole convolution and the two-stage image demosaicing method according to the preferred embodiment of the invention.

Fig. 7 is a schematic diagram of the ReLU function in fig. 3.

Detailed Description

The invention will be further described with reference to the drawings and preferred embodiments.

The present embodiment provides a residual neural network based on void convolution and a two-stage image demosaicing method, and a flow diagram thereof is shown in fig. 6; the method comprises the following steps:

the residual error neural network G based on the hole convolution is shown in fig. 1, and includes: 1 shallow layer feature extraction unit, 3 local residual error units and 1 deep layer feature extraction unit; the 3 local residual error units are sequentially connected end to end; the input image is converted into shallow layer characteristics through a shallow layer characteristic extraction unit, the shallow layer characteristics sequentially pass through 3 local residual error units to form main characteristics, and the main characteristics pass through a deep layer characteristic extraction unit to output residual error images;

the residual error neural network R/B based on the hole convolution is shown in FIG. 2 and comprises: 1 shallow layer feature extraction unit, 1 local residual error unit and 1 deep layer feature extraction unit; the input image is converted into a shallow feature through a shallow feature extraction unit, the shallow feature forms a main feature through a local residual error unit, and the main feature outputs a residual error image through a deep feature extraction unit;

in this embodiment, the reason that the neural network R and the neural network B only use 1 local residual unit is that, as the network depth increases, the restored R channel image and the restored B channel image generate a frame effect, and 1 local residual block can avoid the frame effect and obtain good subjective/objective performance;

as shown in fig. 3, the shallow feature extraction unit includes: 1 3 × 3 convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function; the input image is converted into shallow layer characteristics through 1 3 x 3 convolutional layer with a ReLU activation function and 1 3 x 3 convolutional layer without the activation function in sequence; in this embodiment, the number of convolution kernels of the 3 × 3 convolution layers with the ReLU activation function (32 in this embodiment) is half of the number of convolution kernels of the 3 × 3 convolution layers without the activation function (64 in this embodiment), so as to achieve a gradual structure characteristic, which is beneficial for the transformation from the image domain to the feature domain.

As shown in fig. 4, the local residual unit includes: a first residual block, a second residual block, a third residual block and a first dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, 2 3 × 3 void convolutional layers with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; the output characteristic of each residual block is input into a cascade layer, and the cascade layer is followed by 1 multiplied by 1 convolutional layer dimensionality reduction output without an activation function; in this embodiment, the local residual units have 3 jump connections, the jump connection unit in each residual block does not directly adopt identity mapping to connect input and output, but introduces a feature scaling technique, before identity mapping, a coefficient α smaller than 1 is used to scale features, and the feature scaling makes the training process more stable when training a deep layer network (α =0.1 in this embodiment); in addition, the 3 × 3 void convolution layer (with an expansion coefficient of 2) is selected because of the small number of parameters and low calculation cost, and because of the similarity to the bayer cfa pattern to some extent, it is advantageous to extract features.

As shown in fig. 5, the deep feature extraction unit includes: 3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layer without activation function; the input image sequentially passes through 3 × 3 convolutional layers with a ReLU activation function and 1 3 × 3 convolutional layer without an activation function to output a residual image.

Generally, a sensing domain has a direct relation to improvement of network performance, and expansion of the sensing domain usually uses a larger convolution kernel size or a deeper network structure, but the excessively large convolution kernel size or the excessively deeper network structure increases network operation cost, and even causes disappearance of a gradient and generation of gradient explosion, which affects network performance. The local residual error unit introduced by the invention uses the cavity convolution, and expands the perception domain of the network on the premise of not increasing the operation cost; in addition, the cavity convolution with the expansion coefficient of 2 and the Bayer CFA have similar physical properties to a certain extent, so that more effective characteristics can be extracted from the mosaic image, and the performance of the network is improved.

Step 2: converting the RGB color image into a mosaic image through Bayer CFA to form a training set, and setting parameters of a training target neural network;

the training set selects 800 images in DIV2K as a training set, RGB color images are converted into mosaic images through Bayer CFA, G channel sampling pixels and B channel sampling pixels in the mosaic images are set to be 0, images only containing R channel sampling pixels are obtained, and G channel sampling pixels and R channel sampling pixels in the mosaic images are set to be 0, and images only containing B channel sampling pixels are obtained; then setting training parameters of a residual error neural network model based on the cavity convolution, wherein the training parameters comprise the number of image blocks input into the model training each time, the sizes of input image blocks and output image blocks, learning rate and the like; respectively dividing a mosaic image, an image only containing R-channel sampling pixels, an image only containing B-channel sampling pixels, an original R-channel image, an original G-channel image and an original B-channel image in a training set into image blocks with the same resolution, wherein the number and the size of the mosaic image block, the image block only containing R-channel sampling pixels, the image block only containing B-channel sampling pixels, the original R-channel image block, the original G-channel image block and the original B-channel image block are the same; zero-padding operations are performed for each convolution (i.e., the image size is not reduced according to the size of the convolution kernel, i.e., the input and output sizes are consistent).

In this embodiment, a mosaic image in a training set, an image containing only R-channel sampling pixels, an image containing only B-channel sampling pixels, an original R-channel image, an original G-channel image, and an original B-channel image are respectively divided into a mosaic image block of 64 × 64, an image block containing only R-channel sampling pixels, an image block containing only B-channel sampling pixels, an original R-channel image block, an original G-channel image block, and an original B-channel image block, so that structural information and detail information of the images can be better captured when a model is trained; the number of image blocks trained by the model each time is 32 (in other embodiments, any value of 16, 32 and 128 can be taken); the learning rate is set to 0.0001 (in other embodiments, any value from 0.01 to 0.00001 may be used), and the decay rate per training is set to 0.9 (in other embodiments, any value from 0.1 to 0.9 may be used); the test is performed every 1000 times of training (in other embodiments, any value from 500 to 5000 can be taken), and relevant parameters of the model are changed according to the effect of the model on the verification set. Wherein, a test set can be selected while the training set is selected, and the Kodak or McMaster image set is selected from the test set.

And step 3: according to the residual error neural network model based on the cavity convolution, training a corresponding neural network model by taking minimized respective loss functions as targets in two stages;

specifically, the first stage: the minimum loss function R obtains the weight and the deviation of the neural network R, the minimum loss function G obtains the weight and the deviation of the neural network G, and the minimum loss function B obtains the weight and the deviation of the neural network B; and a second stage: and (3) minimizing the loss function RGB to obtain the weights and the deviations of the neural network R, the neural network G and the neural network B, so that a target neural network model of the image demosaicing problem can be established.

Wherein the loss function R, the loss function G, the loss function B and the loss function RGB all take the form of Mean Absolute Error (MAE) functions:

/>

wherein L is _R (·)、L _G (·)、L _B (. Cndot.) and L _RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively _R 、θ _G And theta _B Respectively representing parameters of a neural network R, G and B, M representing the total number of image blocks, f _R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f _G (. C) shows a trained mapping from mosaic images to G-channel images, f _B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f _R,G,B (. Represents f) _R (·)，f _G (. And f) _B (ii) combined mapping of (x) _mos 、x _{mos_r} And x _{mos_b} Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x _rgb A recovered RGB color image, y, representing a synthesis of an R-channel image recovered by a neural network R, a G-channel image recovered by a neural network G, and a B-channel image recovered by a neural network B _R 、y _G 、y _B And y _RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.

Since the peak signal-to-noise ratio (PSNR) is formulated as:

wherein x is _i Representing the recovered image, y _i Representing the corresponding original image, and theta represents a parameter of the neural network; it can be seen from the above equation that a high peak signal-to-noise ratio (PSNR) can be obtained by continuously minimizing the loss function, i.e. the higher the objective quality of the image.

In this embodiment, the minimization loss function adopts an Adam optimization method, wherein the calculation method of the Adam optimization method is as follows:

s←ρ ₁ s+(1-ρ ₁ )g

γ←ρ ₂ γ+(1-ρ ₂ )g⊙g

wherein ρ ₁ 、ρ ₂ ε and δ are constants (the default values are respectively ρ ₁ ＝0.9、ρ ₂ ＝0.999、ε＝0.001、δ＝10 ^-8 ) G represents the gradient of the loss function with respect to the parameter theta, s represents the biased first order moment estimate, gamma represents the biased second order moment estimate,

represents a deviation which corrects a first moment>

Represents the deviation of the second moment of correction, and delta theta represents the change of the parameter theta; the updating of parameters of the Adam optimization method is not influenced by the expansion and contraction transformation of the gradient, and the learning rate can be adjusted in a self-adaptive manner; in addition, the algorithm is simple to implement, high in calculation efficiency and low in memory requirement.

In the embodiment, the parameter theta of the residual neural network model based on the cavity convolution _R 、θ _G And theta _B The method comprises the following steps:

wherein the content of the first and second substances,

and &>

And &>

Respectively representing the number of the output neurons of the layer in the neural network R, the number of the output neurons of the layer in the neural network G and the number of the output neurons of the layer in the neural network B; parameter theta _R 、θ _G And theta _B Initialization obeys a mean of 0 and a variance of Var (theta) _R )、Var(θ _G ) And Var (θ) _B ) The distribution of (a); the Xavier parameter initialization method can improve the training efficiency of the network and can improve the performance of the network.

And 4, step 4: according to a target neural network model obtained through training, inputting a mosaic image of a Bayer CFA mode into the target neural network model, and outputting an RGB color image subjected to mosaic removal.

In the training set in the step 2, RGB color images are converted into mosaic images through a Bayer CFA to form a training set, in the step 3, a neural network R, a neural network G and a neural network B are trained in two stages to obtain an image demosaiced target neural network model aiming at a Bayer CFA mode, in the step 4, the mosaic images of the Bayer CFA mode are input into the target neural network model, and corresponding RGB color images can be obtained.

In this embodiment, RGB color images in a Kodak test set (including 24 RGB color images with an image size of 768 × 512) are converted into a mosaic image by Bayer CFA, and after model mapping, the average PSNR of the obtained RGB color images is 42.94dB; converting RGB color images in an McMaster test set (comprising 18 RGB color images with the image size of 500 multiplied by 500) into mosaic images through Bayer CFA, and obtaining the average PSNR of the color RGB images to be 39.76dB after model mapping; by the residual error neural network based on the cavity convolution and the two-stage image demosaicing method trained by the invention, the objective quality of the image is greatly improved, and the visual effect is satisfactory; the results are shown in the following table:

according to the image demosaicing method, a target neural network model can be trained in advance, the target neural network model is end-to-end mapping from an input Bayer CFA mode mosaic image to an output RGB color image, demosaicing speed of the mosaic image through the target neural network model is extremely high, the practical value is very high, and the method can be applied to occasions needing real-time deblocking effect; besides the advantages of high speed, good mosaic removing effect and the like, the mosaic removing method has strong robustness, and objective gain and subjective gain of mosaic removing do not fluctuate greatly aiming at mosaic images of different types and different scenes. Therefore, the residual error neural network based on the cavity convolution and the two-stage image demosaicing method have the advantages of good demosaicing effect, high speed, strong robustness, strong practicability and real-time performance, wide market prospect and particularly high requirement on the real-time performance.

The invention is based on the residual error neural network of the void convolution and the two-stage image demosaicing method, and can accurately learn the mapping from the mosaic image to the RGB color image; the hole convolution can enlarge the perception domain of the network on the premise of not increasing the operation cost; the convergence speed of the network can be accelerated by combining the local residual error and the global residual error; the neural network of the invention has interpretability by adopting a modular network structure; the two-stage training mode fully utilizes prior information and optimizes the solution space of the network model.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. A residual error neural network based on cavity convolution and a two-stage image demosaicing method comprise the following steps:

the residual error neural network model based on the cavity convolution comprises: the device comprises a neural network G, a neural network R and a neural network B, wherein the neural network G, the neural network R and the neural network B all adopt a residual error neural network based on cavity convolution; wherein, the mosaic image is input into the neural network G to recover the G channel image, the mosaic image is input into the neural network R, the G channel image recovered from the neural network G and the R channel image recovered from the image only containing R channel sampling pixels, the method comprises the following steps that a mosaic image is input into a neural network B, a G channel image recovered by a neural network G and a B channel image recovered by an image only containing B channel sampling pixels are input into the neural network B, and the recovered R channel image, the recovered G channel image and the recovered B channel image are synthesized into a recovered RGB color image;

the residual error neural network based on the hole convolution comprises: 1 shallow layer feature extraction unit, N local residual error units and 1 deep layer feature extraction unit, wherein N is more than or equal to 1; the method is characterized in that an input image is converted into shallow features through a shallow feature extraction unit, the shallow features sequentially pass through N local residual error units to form main features, and the main features pass through a deep feature extraction unit to output residual error images;

the shallow feature extraction unit includes: a3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layers without activation function; the input image is converted into shallow layer characteristics through A3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without the activation functions in sequence;

the local residual unit includes: b residual blocks and 1 dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, C3 × 3 void convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the residual block forms a local residual using a jump connection; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; inputting shallow layer characteristics and sequentially passing through B residual blocks, and outputting the output of each residual block through dimension reduction of a dimension reduction block;

the deep feature extraction unit includes: d3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, which are connected end to end in sequence; inputting main features, sequentially passing through D3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, and outputting residual error images;

step 2: converting the RGB color image into a mosaic image through Bayer CFA, carrying out data preprocessing to form a training set, and setting parameters of a training target neural network;

2. The residual neural network and two-stage image demosaicing method based on hole convolution of claim 1, wherein the local residuals in the residual block are connected with input and output identities after a feature scaling technique is applied to all the local residual units.

3. The residual neural network and two-stage image demosaicing method based on hole convolution as claimed in claim 1, wherein in step 2, the data preprocessing process is as follows:

firstly, setting a G channel sampling pixel and a B channel sampling pixel in a mosaic image to be 0 to obtain an image only containing the R channel sampling pixel, and setting the G channel sampling pixel and the R channel sampling pixel in the mosaic image to be 0 to obtain an image only containing the B channel sampling pixel;

4. The residual neural network based on hole convolution and two-stage image demosaicing method as claimed in claim 1, wherein in step 3, the first stage of the two-stage image demosaicing method is: training a neural network R by using a loss function R, wherein the neural network R constructs a mapping from a mosaic image, an image only containing R channel sampling pixels and an original G channel image to an R channel image; training a neural network G by using a loss function G, wherein the neural network G constructs a mapping from a mosaic image to a G channel image; training a neural network B by using a loss function B, and constructing a mapping from a mosaic image, an image only containing B-channel sampling pixels and an original G-channel image to a B-channel image by using the neural network B; the second stage of the two-stage image demosaicing method is as follows: and (3) jointly training a neural network R, a neural network G and a neural network B by using a loss function RGB, wherein the neural network R, the neural network G and the neural network B together construct a mapping from the mosaic image to the RGB color image.

5. The method for residual neural network and two-stage image demosaicing based on hole convolution of claim 1, wherein in the step 3, the loss function of the first stage comprises: a loss function R, a loss function G and a loss function B; the loss function of the second stage includes: the loss functions RGB, and all loss functions take the form of Mean Absolute Error (MAE) functions:

wherein L is _R (·)、L _G (·)、L _B (. And L) _RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively _R 、θ _G And theta _B Respectively representing parameters of a neural network R, parameters of a neural network G and parameters of a neural network B, wherein M represents the total number of image blocks; f. of _R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f _G (. Represents a trained mapping from mosaic images to G-channel images, f _B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f _R,G,B (. Represents f) _R (·)，f _G (. And f) _B (ii) a combinatorial mapping; x is a radical of a fluorine atom _mos 、x _{mos_r} And x _{mos_b} Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x _rgb Representing an RGB color image recovered by synthesizing an R channel image recovered by a neural network R, a G channel image recovered by a neural network G and a B channel image recovered by a neural network B; y is _R 、y _G 、y _B And y _RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.