CN111696036B - Residual error neural network based on cavity convolution and two-stage image demosaicing method - Google Patents
Residual error neural network based on cavity convolution and two-stage image demosaicing method Download PDFInfo
- Publication number
- CN111696036B CN111696036B CN202010447460.4A CN202010447460A CN111696036B CN 111696036 B CN111696036 B CN 111696036B CN 202010447460 A CN202010447460 A CN 202010447460A CN 111696036 B CN111696036 B CN 111696036B
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- channel
- mosaic
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000000605 extraction Methods 0.000 claims abstract description 34
- 238000013507 mapping Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims description 93
- 238000005070 sampling Methods 0.000 claims description 46
- 230000004913 activation Effects 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 38
- 238000003062 neural network model Methods 0.000 claims description 28
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 21
- 230000009467 reduction Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 7
- 239000011800 void material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 229910052731 fluorine Inorganic materials 0.000 claims 1
- 125000001153 fluoro group Chemical group F* 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 210000002364 input neuron Anatomy 0.000 description 6
- 210000004205 output neuron Anatomy 0.000 description 6
- 238000012360 testing method Methods 0.000 description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 239000000126 substance Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011423 initialization method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4015—Image demosaicing, e.g. colour filter arrays [CFA] or Bayer patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Biodiversity & Conservation Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of digital image processing, and particularly relates to a residual error neural network based on cavity convolution and a two-stage image demosaicing method; the invention introduces a shallow layer feature extraction unit, a local residual error unit and a deep layer feature extraction unit based on a residual error neural network, three basic units interact with each other to greatly enhance the learning capability and the modeling capability of a target neural network, and can establish accurate mapping from a mosaic image to an RGB color image aiming at the problem of image demosaicing, and finally, the mosaic image in a Bayer CFA mode can be processed through the established effective mapping to obtain the RGB color image; meanwhile, a two-stage image demosaicing model is introduced, prior information is fully utilized, the modeling capability of the network is improved, and the knowledge space is optimized; the image demosaicing method can obviously improve the peak signal-to-noise ratio of the image, greatly improve the efficiency, quality and robustness of demosaicing the image, and has profound significance in the field of image processing.
Description
Technical Field
The invention belongs to the field of digital image processing, and particularly relates to a residual error neural network based on cavity convolution and a two-stage image demosaicing method.
Background
Single sensor color imaging technology using CFA is widely used in the current digital camera industry. In a single sensor camera with CFA, each pixel point only records one pixel point in an R channel image, a G channel image and a B channel image, and when an RGB color image is recovered, the other two lost pixel values need to be estimated; this process, commonly referred to as image demosaicing, plays a crucial role in the field of obtaining high quality RGB color images. The most popular and widely used CFA is the bayer CFA, in which green pixel points are sampled on a quincunx grid and red and blue pixel points are sampled on a rectangular grid, where the number of green sample points is twice the number of red or blue sample points.
The bayer cfa-based image demosaicing method has been widely studied, and most bayer cfa-based image demosaicing methods first interpolate G-channel image values; for the R channel image and the B channel image, they are usually first converted into color ratios or color differences, and then interpolated in the transform domain. Another popular method is a frequency domain algorithm, which first converts the mosaic image into the frequency domain and then separates the luminance and chrominance components by frequency filtering. In addition, an image demosaicing method based on a compressed sensing theory is also provided.
In recent years, deep learning has been excellent in solving many problems such as image classification, object detection, and natural language processing; among the different types of neural networks, convolutional Neural Networks (CNNs) are the most extensively studied. The CNN can automatically extract effective representations in the target image, namely the CNN can directly carry out effective modeling on data from the original pixels through few preprocessing. The CNN is adopted in the literature 'Color Image demosaicing via Deep Residual Learning', so that the Image demosaicing is realized, and compared with the traditional method, the method has good performance gain, but due to the introduction of Batch Normalization (BN), the depth of the network is reduced on the premise of the same operation cost, and the perception domain of the network is reduced. A three-Stage Image demosaicing CNN is constructed in a Color Image demosaicing Using a 3-Stage conditional Neural Network Structure, so that the excellent modeling capability and learning capability of a Neural Network are fully embodied, but the restored G channel Image is used as prior information to guide restoration of an R channel Image and a B channel Image to be suboptimal.
Disclosure of Invention
The invention aims to provide a residual error neural network based on cavity convolution and a two-stage image demosaicing method aiming at the technical problems, the relation between a sensing domain and the network operation cost is fully considered, a residual error block is constructed by adopting the cavity convolution, and the sensing domain of the network is increased on the premise of not increasing the operation cost; in the network training stage, the original G channel image is used as prior information to guide the recovery of the R channel image and the B channel image, so that the modeling capability of the network is better, and the learning space is optimized; the residual error neural network based on the cavity convolution and the two-stage image demosaicing method can obviously improve the peak signal-to-noise ratio (PSNR) of the image and have the advantages of good demosaicing effect, high speed, strong robustness and the like.
In order to achieve the purpose, the invention adopts the following technical scheme:
a residual error neural network based on cavity convolution and a two-stage image demosaicing method comprise the following steps:
step 1: building a residual error neural network model based on cavity convolution;
and 2, step: converting the RGB color image into a mosaic image through Bayer CFA, carrying out data preprocessing to form a training set, and setting parameters of a training target neural network;
and step 3: according to the residual error neural network model based on the cavity convolution, training a corresponding neural network model by taking a minimized loss function as a target in two stages;
and 4, step 4: and (3) according to the target neural network model obtained by training, performing the same data preprocessing as the step (2) on the mosaic image of the Bayer CFA mode to be processed, inputting the mosaic image into the target neural network model, and outputting the RGB color image without mosaic.
Further, in step 1, the residual neural network model based on the hole convolution includes: the device comprises a neural network G, a neural network R and a neural network B, wherein the neural network G, the neural network R and the neural network B all adopt a residual error neural network based on cavity convolution; the method comprises the steps that a neural network G inputs a mosaic image to restore a G-channel image, a neural network R inputs the mosaic image, the G-channel image restored by the neural network G and an image only containing R-channel sampling pixels restore an R-channel image, a neural network B inputs the mosaic image, the G-channel image restored by the neural network G and the image only containing B-channel sampling pixels restore a B-channel image, and the restored R-channel image, the restored G-channel image and the restored B-channel image are synthesized into a restored RGB color image.
Still further, in step 1, the residual neural network based on the hole convolution includes: 1 shallow layer feature extraction unit, N local residual error units and 1 deep layer feature extraction unit, wherein N is more than or equal to 1; the method is characterized in that an input image is converted into shallow features through a shallow feature extraction unit, the shallow features sequentially pass through N local residual error units to form main features, and the main features pass through a deep feature extraction unit to output residual error images.
Still further, the shallow feature extraction unit includes: a3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layers without activation function; the input image is converted into shallow layer characteristics through A3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without the activation functions in sequence;
the local residual unit includes: b residual blocks and 1 dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, C3 × 3 void convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the residual block forms a local residual using a jump join; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; inputting shallow layer characteristics and sequentially passing through B residual blocks, and outputting the output of each residual block through dimension reduction of a dimension reduction block;
the deep feature extraction unit includes: d3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, which are connected end to end in sequence; the input main features sequentially pass through D3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions to output residual images.
In all the local residual units, the local residual in the residual block is subjected to a feature scaling technique and then the input and the output are connected in an identical manner.
Further, in step 2, the data preprocessing process includes:
firstly, setting a G-channel sampling pixel and a B-channel sampling pixel in a mosaic image to be 0 to obtain an image only containing the R-channel sampling pixel, and setting the G-channel sampling pixel and the R-channel sampling pixel in the mosaic image to be 0 to obtain an image only containing the B-channel sampling pixel;
then, the mosaic image, the image only containing R-channel sampling pixels, the image only containing B-channel sampling pixels, the original R-channel image, the original G-channel image and the original B-channel image are respectively divided into a plurality of mosaic image blocks, image blocks only containing R-channel sampling pixels, image blocks only containing B-channel sampling pixels, original R-channel image blocks, original G-channel image blocks and original B-channel image blocks, wherein the number and the size of the mosaic image blocks, the image blocks only containing R-channel sampling pixels, the image blocks only containing B-channel sampling pixels, the original R-channel image blocks, the original G-channel image blocks and the original B-channel image blocks are the same.
Further, in step 3, the first stage of the two-stage image demosaicing method is: training a neural network R by using a loss function R, wherein the neural network R constructs a mapping from a mosaic image, an image only containing R channel sampling pixels and an original G channel image to an R channel image; training a neural network G by using a loss function G, wherein the neural network G constructs a mapping from a mosaic image to a G channel image; training a neural network B by using a loss function B, wherein the neural network B constructs a mapping from a mosaic image, an image only containing B-channel sampling pixels and an original G-channel image to a B-channel image; the second stage of the two-stage image demosaicing method is as follows: and (3) jointly training a neural network R, a neural network G and a neural network B by using a loss function RGB, wherein the neural network R, the neural network G and the neural network B together construct a mapping from the mosaic image to the RGB color image.
Further, in step 3, the loss function of the first stage includes: a loss function R, a loss function G and a loss function B; the loss function of the second stage includes: the loss function RGB, and all of the loss functions take the form of a Mean Absolute Error (MAE) function:
wherein L is R (·)、L G (·)、L B (. And L) RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively R 、θ G And theta B Respectively representing parameters of a neural network R, parameters of a neural network G and parameters of a neural network B, wherein M represents the total number of image blocks; f. of R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f G (. Represents the trained slave mosaicMapping of gram images to G-channel images, f B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f R,G,B (. Represents f) R (·),f G (. And f) B (ii) a combined mapping of; x is the number of mos 、x mos_r And x mos_b Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x rgb Representing an RGB color image recovered by synthesizing an R channel image recovered by a neural network R, a G channel image recovered by a neural network G and a B channel image recovered by a neural network B; y is R 、y G 、y B And y RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.
Still further, in the step 3, in the process of training the residual error neural network model based on cavity convolution, a parameter θ in the residual error neural network based on cavity convolution R 、θ G And theta B Adopts an Xavier method, namely that the mean value is 0 and the variance is Var (theta) R )、Var(θ G ) And Var (θ) B ) Distribution of (a):
wherein the content of the first and second substances,and &>Respectively represents the number of the input neurons of the layer in the neural network R, the number of the input neurons of the layer in the neural network G and the number of the input neurons of the layer in the neural network B, and is/are>And &>Respectively representing the number of the output neurons of the layer in the neural network R, the number of the output neurons of the layer in the neural network G and the number of the output neurons of the layer in the neural network B; the minimization loss function employs an Adam optimization method.
Compared with the prior art, the invention has the beneficial effects that:
according to the residual error neural network based on cavity convolution and the two-stage image demosaicing method, the shallow feature extraction unit, the local residual error unit and the deep feature extraction unit are introduced, three basic units interact with each other to greatly enhance the learning capability and the modeling capability of a target neural network, accurate mapping from a mosaic image to an RGB color image can be established aiming at the image demosaicing problem, and finally the mosaic image of a Bayer CFA mode can be processed through the established effective mapping to obtain the RGB color image; meanwhile, a two-stage image demosaicing model is introduced, prior information is fully utilized, the modeling capability of the network is improved, and the knowledge space is optimized; the image demosaicing method can obviously improve the peak signal-to-noise ratio (PSNR) of the image, greatly improve the efficiency, quality and robustness of image demosaicing, and has profound significance in the field of image processing.
Further, the invention also has the following beneficial effects:
the built residual error neural network model based on the cavity convolution has the modularized property, different units have different functions, a shallow feature extraction unit aims at converting an image into a shallow feature, a local residual error unit is a network main body and aims at converting the shallow feature into a main feature, a deep feature extraction unit aims at converting the main feature into a residual error image, and the shallow feature extraction unit, a plurality of local residual error units and the deep feature extraction unit are sequentially connected end to end; according to the connection mode, the network can stack a plurality of local residual error units, and the network depth is increased on the premise of ensuring efficient feature extraction, so that the nonlinear modeling capability and learning capability of the network are improved.
Each local residual error unit in the residual error neural network model based on the cavity convolution built by the invention contains a plurality of residual error blocks, each residual error block is provided with a jump connection, and each jump connection is subjected to residual error addition once; local residual learning in the local residual unit and global residual learning of the whole network interact, so that the performance and the convergence speed of the network are further improved; meanwhile, the residual block introduces a hole convolution, and the sensing domain of the network is expanded on the premise of ensuring the operation cost; the similar structure of the hole convolution and the Bayer CFA further promotes the extraction of the characteristics by the residual block, thereby improving the network performance.
Drawings
Fig. 1 is a schematic diagram of an internal structure of a residual neural network G based on hole convolution according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of the internal structure of the residual neural network R/B based on the hole convolution according to the embodiment of the invention.
Fig. 3 is a schematic structural diagram of the shallow feature extraction unit in fig. 1.
Fig. 4 is a schematic structural diagram of the local residual unit in fig. 1.
Fig. 5 is a schematic structural diagram of the deep layer feature extraction unit in fig. 1.
FIG. 6 is a flow chart of the residual neural network based on hole convolution and the two-stage image demosaicing method according to the preferred embodiment of the invention.
Fig. 7 is a schematic diagram of the ReLU function in fig. 3.
Detailed Description
The invention will be further described with reference to the drawings and preferred embodiments.
The present embodiment provides a residual neural network based on void convolution and a two-stage image demosaicing method, and a flow diagram thereof is shown in fig. 6; the method comprises the following steps:
step 1: building a residual error neural network model based on cavity convolution;
the residual error neural network G based on the hole convolution is shown in fig. 1, and includes: 1 shallow layer feature extraction unit, 3 local residual error units and 1 deep layer feature extraction unit; the 3 local residual error units are sequentially connected end to end; the input image is converted into shallow layer characteristics through a shallow layer characteristic extraction unit, the shallow layer characteristics sequentially pass through 3 local residual error units to form main characteristics, and the main characteristics pass through a deep layer characteristic extraction unit to output residual error images;
the residual error neural network R/B based on the hole convolution is shown in FIG. 2 and comprises: 1 shallow layer feature extraction unit, 1 local residual error unit and 1 deep layer feature extraction unit; the input image is converted into a shallow feature through a shallow feature extraction unit, the shallow feature forms a main feature through a local residual error unit, and the main feature outputs a residual error image through a deep feature extraction unit;
in this embodiment, the reason that the neural network R and the neural network B only use 1 local residual unit is that, as the network depth increases, the restored R channel image and the restored B channel image generate a frame effect, and 1 local residual block can avoid the frame effect and obtain good subjective/objective performance;
as shown in fig. 3, the shallow feature extraction unit includes: 1 3 × 3 convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function; the input image is converted into shallow layer characteristics through 1 3 x 3 convolutional layer with a ReLU activation function and 1 3 x 3 convolutional layer without the activation function in sequence; in this embodiment, the number of convolution kernels of the 3 × 3 convolution layers with the ReLU activation function (32 in this embodiment) is half of the number of convolution kernels of the 3 × 3 convolution layers without the activation function (64 in this embodiment), so as to achieve a gradual structure characteristic, which is beneficial for the transformation from the image domain to the feature domain.
As shown in fig. 4, the local residual unit includes: a first residual block, a second residual block, a third residual block and a first dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, 2 3 × 3 void convolutional layers with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; the output characteristic of each residual block is input into a cascade layer, and the cascade layer is followed by 1 multiplied by 1 convolutional layer dimensionality reduction output without an activation function; in this embodiment, the local residual units have 3 jump connections, the jump connection unit in each residual block does not directly adopt identity mapping to connect input and output, but introduces a feature scaling technique, before identity mapping, a coefficient α smaller than 1 is used to scale features, and the feature scaling makes the training process more stable when training a deep layer network (α =0.1 in this embodiment); in addition, the 3 × 3 void convolution layer (with an expansion coefficient of 2) is selected because of the small number of parameters and low calculation cost, and because of the similarity to the bayer cfa pattern to some extent, it is advantageous to extract features.
As shown in fig. 5, the deep feature extraction unit includes: 3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layer without activation function; the input image sequentially passes through 3 × 3 convolutional layers with a ReLU activation function and 1 3 × 3 convolutional layer without an activation function to output a residual image.
Generally, a sensing domain has a direct relation to improvement of network performance, and expansion of the sensing domain usually uses a larger convolution kernel size or a deeper network structure, but the excessively large convolution kernel size or the excessively deeper network structure increases network operation cost, and even causes disappearance of a gradient and generation of gradient explosion, which affects network performance. The local residual error unit introduced by the invention uses the cavity convolution, and expands the perception domain of the network on the premise of not increasing the operation cost; in addition, the cavity convolution with the expansion coefficient of 2 and the Bayer CFA have similar physical properties to a certain extent, so that more effective characteristics can be extracted from the mosaic image, and the performance of the network is improved.
Step 2: converting the RGB color image into a mosaic image through Bayer CFA to form a training set, and setting parameters of a training target neural network;
the training set selects 800 images in DIV2K as a training set, RGB color images are converted into mosaic images through Bayer CFA, G channel sampling pixels and B channel sampling pixels in the mosaic images are set to be 0, images only containing R channel sampling pixels are obtained, and G channel sampling pixels and R channel sampling pixels in the mosaic images are set to be 0, and images only containing B channel sampling pixels are obtained; then setting training parameters of a residual error neural network model based on the cavity convolution, wherein the training parameters comprise the number of image blocks input into the model training each time, the sizes of input image blocks and output image blocks, learning rate and the like; respectively dividing a mosaic image, an image only containing R-channel sampling pixels, an image only containing B-channel sampling pixels, an original R-channel image, an original G-channel image and an original B-channel image in a training set into image blocks with the same resolution, wherein the number and the size of the mosaic image block, the image block only containing R-channel sampling pixels, the image block only containing B-channel sampling pixels, the original R-channel image block, the original G-channel image block and the original B-channel image block are the same; zero-padding operations are performed for each convolution (i.e., the image size is not reduced according to the size of the convolution kernel, i.e., the input and output sizes are consistent).
In this embodiment, a mosaic image in a training set, an image containing only R-channel sampling pixels, an image containing only B-channel sampling pixels, an original R-channel image, an original G-channel image, and an original B-channel image are respectively divided into a mosaic image block of 64 × 64, an image block containing only R-channel sampling pixels, an image block containing only B-channel sampling pixels, an original R-channel image block, an original G-channel image block, and an original B-channel image block, so that structural information and detail information of the images can be better captured when a model is trained; the number of image blocks trained by the model each time is 32 (in other embodiments, any value of 16, 32 and 128 can be taken); the learning rate is set to 0.0001 (in other embodiments, any value from 0.01 to 0.00001 may be used), and the decay rate per training is set to 0.9 (in other embodiments, any value from 0.1 to 0.9 may be used); the test is performed every 1000 times of training (in other embodiments, any value from 500 to 5000 can be taken), and relevant parameters of the model are changed according to the effect of the model on the verification set. Wherein, a test set can be selected while the training set is selected, and the Kodak or McMaster image set is selected from the test set.
And step 3: according to the residual error neural network model based on the cavity convolution, training a corresponding neural network model by taking minimized respective loss functions as targets in two stages;
specifically, the first stage: the minimum loss function R obtains the weight and the deviation of the neural network R, the minimum loss function G obtains the weight and the deviation of the neural network G, and the minimum loss function B obtains the weight and the deviation of the neural network B; and a second stage: and (3) minimizing the loss function RGB to obtain the weights and the deviations of the neural network R, the neural network G and the neural network B, so that a target neural network model of the image demosaicing problem can be established.
Wherein the loss function R, the loss function G, the loss function B and the loss function RGB all take the form of Mean Absolute Error (MAE) functions:
wherein L is R (·)、L G (·)、L B (. Cndot.) and L RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively R 、θ G And theta B Respectively representing parameters of a neural network R, G and B, M representing the total number of image blocks, f R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f G (. C) shows a trained mapping from mosaic images to G-channel images, f B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f R,G,B (. Represents f) R (·),f G (. And f) B (ii) combined mapping of (x) mos 、x mos_r And x mos_b Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x rgb A recovered RGB color image, y, representing a synthesis of an R-channel image recovered by a neural network R, a G-channel image recovered by a neural network G, and a B-channel image recovered by a neural network B R 、y G 、y B And y RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.
Since the peak signal-to-noise ratio (PSNR) is formulated as:
wherein x is i Representing the recovered image, y i Representing the corresponding original image, and theta represents a parameter of the neural network; it can be seen from the above equation that a high peak signal-to-noise ratio (PSNR) can be obtained by continuously minimizing the loss function, i.e. the higher the objective quality of the image.
In this embodiment, the minimization loss function adopts an Adam optimization method, wherein the calculation method of the Adam optimization method is as follows:
s←ρ 1 s+(1-ρ 1 )g
γ←ρ 2 γ+(1-ρ 2 )g⊙g
wherein ρ 1 、ρ 2 ε and δ are constants (the default values are respectively ρ 1 =0.9、ρ 2 =0.999、ε=0.001、δ=10 -8 ) G represents the gradient of the loss function with respect to the parameter theta, s represents the biased first order moment estimate, gamma represents the biased second order moment estimate,represents a deviation which corrects a first moment>Represents the deviation of the second moment of correction, and delta theta represents the change of the parameter theta; the updating of parameters of the Adam optimization method is not influenced by the expansion and contraction transformation of the gradient, and the learning rate can be adjusted in a self-adaptive manner; in addition, the algorithm is simple to implement, high in calculation efficiency and low in memory requirement.
In the embodiment, the parameter theta of the residual neural network model based on the cavity convolution R 、θ G And theta B The method comprises the following steps:
wherein the content of the first and second substances,and &>Respectively represents the number of the input neurons of the layer in the neural network R, the number of the input neurons of the layer in the neural network G and the number of the input neurons of the layer in the neural network B, and is/are>And &>Respectively representing the number of the output neurons of the layer in the neural network R, the number of the output neurons of the layer in the neural network G and the number of the output neurons of the layer in the neural network B; parameter theta R 、θ G And theta B Initialization obeys a mean of 0 and a variance of Var (theta) R )、Var(θ G ) And Var (θ) B ) The distribution of (a); the Xavier parameter initialization method can improve the training efficiency of the network and can improve the performance of the network.
And 4, step 4: according to a target neural network model obtained through training, inputting a mosaic image of a Bayer CFA mode into the target neural network model, and outputting an RGB color image subjected to mosaic removal.
In the training set in the step 2, RGB color images are converted into mosaic images through a Bayer CFA to form a training set, in the step 3, a neural network R, a neural network G and a neural network B are trained in two stages to obtain an image demosaiced target neural network model aiming at a Bayer CFA mode, in the step 4, the mosaic images of the Bayer CFA mode are input into the target neural network model, and corresponding RGB color images can be obtained.
In this embodiment, RGB color images in a Kodak test set (including 24 RGB color images with an image size of 768 × 512) are converted into a mosaic image by Bayer CFA, and after model mapping, the average PSNR of the obtained RGB color images is 42.94dB; converting RGB color images in an McMaster test set (comprising 18 RGB color images with the image size of 500 multiplied by 500) into mosaic images through Bayer CFA, and obtaining the average PSNR of the color RGB images to be 39.76dB after model mapping; by the residual error neural network based on the cavity convolution and the two-stage image demosaicing method trained by the invention, the objective quality of the image is greatly improved, and the visual effect is satisfactory; the results are shown in the following table:
according to the image demosaicing method, a target neural network model can be trained in advance, the target neural network model is end-to-end mapping from an input Bayer CFA mode mosaic image to an output RGB color image, demosaicing speed of the mosaic image through the target neural network model is extremely high, the practical value is very high, and the method can be applied to occasions needing real-time deblocking effect; besides the advantages of high speed, good mosaic removing effect and the like, the mosaic removing method has strong robustness, and objective gain and subjective gain of mosaic removing do not fluctuate greatly aiming at mosaic images of different types and different scenes. Therefore, the residual error neural network based on the cavity convolution and the two-stage image demosaicing method have the advantages of good demosaicing effect, high speed, strong robustness, strong practicability and real-time performance, wide market prospect and particularly high requirement on the real-time performance.
The invention is based on the residual error neural network of the void convolution and the two-stage image demosaicing method, and can accurately learn the mapping from the mosaic image to the RGB color image; the hole convolution can enlarge the perception domain of the network on the premise of not increasing the operation cost; the convergence speed of the network can be accelerated by combining the local residual error and the global residual error; the neural network of the invention has interpretability by adopting a modular network structure; the two-stage training mode fully utilizes prior information and optimizes the solution space of the network model.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (5)
1. A residual error neural network based on cavity convolution and a two-stage image demosaicing method comprise the following steps:
step 1: building a residual error neural network model based on cavity convolution;
the residual error neural network model based on the cavity convolution comprises: the device comprises a neural network G, a neural network R and a neural network B, wherein the neural network G, the neural network R and the neural network B all adopt a residual error neural network based on cavity convolution; wherein, the mosaic image is input into the neural network G to recover the G channel image, the mosaic image is input into the neural network R, the G channel image recovered from the neural network G and the R channel image recovered from the image only containing R channel sampling pixels, the method comprises the following steps that a mosaic image is input into a neural network B, a G channel image recovered by a neural network G and a B channel image recovered by an image only containing B channel sampling pixels are input into the neural network B, and the recovered R channel image, the recovered G channel image and the recovered B channel image are synthesized into a recovered RGB color image;
the residual error neural network based on the hole convolution comprises: 1 shallow layer feature extraction unit, N local residual error units and 1 deep layer feature extraction unit, wherein N is more than or equal to 1; the method is characterized in that an input image is converted into shallow features through a shallow feature extraction unit, the shallow features sequentially pass through N local residual error units to form main features, and the main features pass through a deep feature extraction unit to output residual error images;
the shallow feature extraction unit includes: a3 x 3 convolutional layers with ReLU activation function and 1 3 x 3 convolutional layers without activation function; the input image is converted into shallow layer characteristics through A3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without the activation functions in sequence;
the local residual unit includes: b residual blocks and 1 dimension reduction block; wherein, the residual block contains 1 3 × 3 convolutional layer with ReLU activation function, C3 × 3 void convolutional layer with ReLU activation function and 1 3 × 3 convolutional layer without activation function, and the layers are connected end to end in sequence; the residual block forms a local residual using a jump connection; the dimension reduction block comprises: 1 cascade layer and 1X 1 convolution layer without activation function, and the layers are connected end to end in sequence; inputting shallow layer characteristics and sequentially passing through B residual blocks, and outputting the output of each residual block through dimension reduction of a dimension reduction block;
the deep feature extraction unit includes: d3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, which are connected end to end in sequence; inputting main features, sequentially passing through D3 x 3 convolutional layers with ReLU activation functions and 1 3 x 3 convolutional layer without activation functions, and outputting residual error images;
step 2: converting the RGB color image into a mosaic image through Bayer CFA, carrying out data preprocessing to form a training set, and setting parameters of a training target neural network;
and step 3: according to the residual error neural network model based on the cavity convolution, training a corresponding neural network model by taking a minimized loss function as a target in two stages;
and 4, step 4: and (3) according to the target neural network model obtained by training, performing the same data preprocessing as the step (2) on the mosaic image of the Bayer CFA mode to be processed, inputting the mosaic image into the target neural network model, and outputting the RGB color image without mosaic.
2. The residual neural network and two-stage image demosaicing method based on hole convolution of claim 1, wherein the local residuals in the residual block are connected with input and output identities after a feature scaling technique is applied to all the local residual units.
3. The residual neural network and two-stage image demosaicing method based on hole convolution as claimed in claim 1, wherein in step 2, the data preprocessing process is as follows:
firstly, setting a G channel sampling pixel and a B channel sampling pixel in a mosaic image to be 0 to obtain an image only containing the R channel sampling pixel, and setting the G channel sampling pixel and the R channel sampling pixel in the mosaic image to be 0 to obtain an image only containing the B channel sampling pixel;
then, the mosaic image, the image only containing R-channel sampling pixels, the image only containing B-channel sampling pixels, the original R-channel image, the original G-channel image and the original B-channel image are respectively divided into a plurality of mosaic image blocks, image blocks only containing R-channel sampling pixels, image blocks only containing B-channel sampling pixels, original R-channel image blocks, original G-channel image blocks and original B-channel image blocks, wherein the number and the size of the mosaic image blocks, the image blocks only containing R-channel sampling pixels, the image blocks only containing B-channel sampling pixels, the original R-channel image blocks, the original G-channel image blocks and the original B-channel image blocks are the same.
4. The residual neural network based on hole convolution and two-stage image demosaicing method as claimed in claim 1, wherein in step 3, the first stage of the two-stage image demosaicing method is: training a neural network R by using a loss function R, wherein the neural network R constructs a mapping from a mosaic image, an image only containing R channel sampling pixels and an original G channel image to an R channel image; training a neural network G by using a loss function G, wherein the neural network G constructs a mapping from a mosaic image to a G channel image; training a neural network B by using a loss function B, and constructing a mapping from a mosaic image, an image only containing B-channel sampling pixels and an original G-channel image to a B-channel image by using the neural network B; the second stage of the two-stage image demosaicing method is as follows: and (3) jointly training a neural network R, a neural network G and a neural network B by using a loss function RGB, wherein the neural network R, the neural network G and the neural network B together construct a mapping from the mosaic image to the RGB color image.
5. The method for residual neural network and two-stage image demosaicing based on hole convolution of claim 1, wherein in the step 3, the loss function of the first stage comprises: a loss function R, a loss function G and a loss function B; the loss function of the second stage includes: the loss functions RGB, and all loss functions take the form of Mean Absolute Error (MAE) functions:
wherein L is R (·)、L G (·)、L B (. And L) RGB (. The) represents a loss function R, a loss function G, a loss function B, and a loss function RGB, θ, respectively R 、θ G And theta B Respectively representing parameters of a neural network R, parameters of a neural network G and parameters of a neural network B, wherein M represents the total number of image blocks; f. of R (. -) represents a trained mapping from a mosaic image, an image containing only R-channel sample pixels, and an original G-channel image to an R-channel image, f G (. Represents a trained mapping from mosaic images to G-channel images, f B (. -) represents a trained mapping from a mosaic image, an image containing only B-channel sample pixels, and an original G-channel image to a B-channel image, f R,G,B (. Represents f) R (·),f G (. And f) B (ii) a combinatorial mapping; x is a radical of a fluorine atom mos 、x mos_r And x mos_b Respectively representing a mosaic image, an image containing only R-channel sample pixels and an image containing only B-channel sample pixels, x rgb Representing an RGB color image recovered by synthesizing an R channel image recovered by a neural network R, a G channel image recovered by a neural network G and a B channel image recovered by a neural network B; y is R 、y G 、y B And y RGB Representing an original R-channel image, an original G-channel image, an original B-channel image, and an original RGB color image, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010447460.4A CN111696036B (en) | 2020-05-25 | 2020-05-25 | Residual error neural network based on cavity convolution and two-stage image demosaicing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010447460.4A CN111696036B (en) | 2020-05-25 | 2020-05-25 | Residual error neural network based on cavity convolution and two-stage image demosaicing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111696036A CN111696036A (en) | 2020-09-22 |
CN111696036B true CN111696036B (en) | 2023-03-28 |
Family
ID=72478175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010447460.4A Active CN111696036B (en) | 2020-05-25 | 2020-05-25 | Residual error neural network based on cavity convolution and two-stage image demosaicing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111696036B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112488956A (en) * | 2020-12-14 | 2021-03-12 | 南京信息工程大学 | Method for image restoration based on WGAN network |
CN113076804B (en) * | 2021-03-09 | 2022-06-17 | 武汉理工大学 | Target detection method, device and system based on YOLOv4 improved algorithm |
CN112926692B (en) * | 2021-04-09 | 2023-05-09 | 四川翼飞视科技有限公司 | Target detection device, method and storage medium based on non-uniform mixed convolution |
CN113850269B (en) * | 2021-12-01 | 2022-03-15 | 西南石油大学 | Denoising method based on multi-branch selective kernel nested connection residual error network |
CN114240776B (en) * | 2021-12-12 | 2024-03-12 | 西北工业大学 | Demosaicing and compression fusion framework for MSFA hyperspectral image |
CN114612299B (en) * | 2022-02-17 | 2024-06-04 | 北京理工大学 | Space self-adaptive real image demosaicing method and system |
CN116128735B (en) * | 2023-04-17 | 2023-06-20 | 中国工程物理研究院电子工程研究所 | Multispectral image demosaicing structure and method based on densely connected residual error network |
CN116503671B (en) * | 2023-06-25 | 2023-08-29 | 电子科技大学 | Image classification method based on residual network compression of effective rank tensor approximation |
CN117939309A (en) * | 2024-03-25 | 2024-04-26 | 荣耀终端有限公司 | Image demosaicing method, electronic device and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578392A (en) * | 2017-09-25 | 2018-01-12 | 华北电力大学 | A kind of convolutional neural networks demosaicing algorithms based on remaining interpolation |
CN108492265A (en) * | 2018-03-16 | 2018-09-04 | 西安电子科技大学 | CFA image demosaicing based on GAN combines denoising method |
CN108765295A (en) * | 2018-06-12 | 2018-11-06 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus and storage medium |
US10235601B1 (en) * | 2017-09-07 | 2019-03-19 | 7D Labs, Inc. | Method for image analysis |
CN109886875A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | Image super-resolution rebuilding method and device, storage medium |
CN110009590A (en) * | 2019-04-12 | 2019-07-12 | 北京理工大学 | A kind of high-quality colour image demosaicing methods based on convolutional neural networks |
CN110120019A (en) * | 2019-04-26 | 2019-08-13 | 电子科技大学 | A kind of residual error neural network and image deblocking effect method based on feature enhancing |
WO2019222951A1 (en) * | 2018-05-24 | 2019-11-28 | Nokia Technologies Oy | Method and apparatus for computer vision |
CN110706181A (en) * | 2019-10-09 | 2020-01-17 | 中国科学技术大学 | Image denoising method and system based on multi-scale expansion convolution residual error network |
CN111047515A (en) * | 2019-12-29 | 2020-04-21 | 兰州理工大学 | Cavity convolution neural network image super-resolution reconstruction method based on attention mechanism |
-
2020
- 2020-05-25 CN CN202010447460.4A patent/CN111696036B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10235601B1 (en) * | 2017-09-07 | 2019-03-19 | 7D Labs, Inc. | Method for image analysis |
CN107578392A (en) * | 2017-09-25 | 2018-01-12 | 华北电力大学 | A kind of convolutional neural networks demosaicing algorithms based on remaining interpolation |
CN108492265A (en) * | 2018-03-16 | 2018-09-04 | 西安电子科技大学 | CFA image demosaicing based on GAN combines denoising method |
WO2019222951A1 (en) * | 2018-05-24 | 2019-11-28 | Nokia Technologies Oy | Method and apparatus for computer vision |
CN108765295A (en) * | 2018-06-12 | 2018-11-06 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus and storage medium |
CN109886875A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | Image super-resolution rebuilding method and device, storage medium |
CN110009590A (en) * | 2019-04-12 | 2019-07-12 | 北京理工大学 | A kind of high-quality colour image demosaicing methods based on convolutional neural networks |
CN110120019A (en) * | 2019-04-26 | 2019-08-13 | 电子科技大学 | A kind of residual error neural network and image deblocking effect method based on feature enhancing |
CN110706181A (en) * | 2019-10-09 | 2020-01-17 | 中国科学技术大学 | Image denoising method and system based on multi-scale expansion convolution residual error network |
CN111047515A (en) * | 2019-12-29 | 2020-04-21 | 兰州理工大学 | Cavity convolution neural network image super-resolution reconstruction method based on attention mechanism |
Non-Patent Citations (3)
Title |
---|
Color Image Compression with Transform Domain Down-Sampling and Deep Convolutional Reconstruction;Yan Wang等;《2019 IEEE Visual Communications and Image Processing(VCIP)》;20200123;第3章第B节,图2 * |
COLOR IMAGE DEMOSAICKING USING A 3-STAGE CONVOLUTIONAL NEURAL NETWORK STRUCTURE;Kai Cui等;《2018 25th IEEE International Conference on Image Processing (ICIP)》;20180906;第2-3章,图1-3 * |
Kai Cui等.COLOR IMAGE DEMOSAICKING USING A 3-STAGE CONVOLUTIONAL NEURAL NETWORK STRUCTURE.《2018 25th IEEE International Conference on Image Processing (ICIP)》.2018, * |
Also Published As
Publication number | Publication date |
---|---|
CN111696036A (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111696036B (en) | Residual error neural network based on cavity convolution and two-stage image demosaicing method | |
CN102254301B (en) | Demosaicing method for CFA (color filter array) images based on edge-direction interpolation | |
CN110120019B (en) | Residual error neural network based on feature enhancement and image deblocking method | |
US8717460B2 (en) | Methods and systems for automatic white balance | |
CN112804561A (en) | Video frame insertion method and device, computer equipment and storage medium | |
CN101287130A (en) | Apparatus and method for generating wide colour gamut signal in image capturing device | |
CN107169946B (en) | Image fusion method based on nonnegative sparse matrix and hypersphere color transformation | |
CN111383200A (en) | CFA image demosaicing method based on generative antagonistic neural network | |
CN112215767B (en) | Anti-blocking effect image video enhancement method | |
CN106447632A (en) | RAW image denoising method based on sparse representation | |
CN111583129A (en) | Screen shot image moire removing method based on convolutional neural network AMNet | |
CN106709874B (en) | Compressed low-resolution face image restoration method based on face structure correlation | |
Guo et al. | Joint demosaicking and denoising benefits from a two-stage training strategy | |
CN112019704B (en) | Video denoising method based on prior information and convolutional neural network | |
CN111654705A (en) | Mosaic image compression method based on novel color space conversion | |
CN111222515A (en) | Image translation method based on context-aware attention | |
CN110728643A (en) | Low-illumination band noise image optimization method based on convolutional neural network | |
CN115841523A (en) | Double-branch HDR video reconstruction algorithm based on Raw domain | |
CN111160257B (en) | Monocular face in-vivo detection method stable to illumination transformation | |
CN113538505A (en) | Motion estimation system and method of single picture based on deep learning | |
CN111681176A (en) | Self-adaptive convolution residual error correction single image rain removal method | |
CN102034225A (en) | Edge mode-based image color component interpolating method | |
CN110992266A (en) | Demosaicing method and demosaicing system based on multi-dimensional non-local statistical eigen | |
CN114240776B (en) | Demosaicing and compression fusion framework for MSFA hyperspectral image | |
Liu et al. | A Low-light Image Enhancement Method with Histogram Equalization Prior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |