CN115082315B

CN115082315B - Demosaicing method applicable to low-illumination small-pixel CFA sampling and edge computing equipment

Info

Publication number: CN115082315B
Application number: CN202210763316.0A
Authority: CN
Inventors: 石匆; 任静; 李睿; 王海冰; 高灏然; 何俊贤; 王腾霄; 王丽
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2024-09-13
Anticipated expiration: 2042-06-30
Also published as: CN115082315A

Abstract

The invention provides a demosaicing method applicable to low-illumination small-pixel CFA sampling and edge computing equipment, which comprises the following steps of: after Gaussian noise is added to the original data set and the brightness is reduced, the RGB image is processed into a mosaic image through the CFA with 75% transparent elements, and parameters of a training target neural network are set. Step 2: and building a neural network model taking the UNet++ network as a main framework. Step 3: according to the neural network model, training a corresponding network model in two stages with the aim of minimizing respective loss functions. Step 4: and processing the image to be processed by using the trained neural network model to obtain a demosaiced image. The invention optimizes the network topology structure on the basis of ensuring the image restoration effect of the small-pixel color filtering whole-row mosaic, reduces the parameter as much as possible, accelerates the training time, and can be applied to network deployment on edge computing equipment.

Description

Low-light small-pixel CFA sampling and de-mosaicing methods for edge computing devices

技术领域Technical Field

本发明属于数字图像处理领域，更具体地，涉及视觉处理神经网络软硬件协同设计。The present invention belongs to the field of digital image processing, and more specifically, to the collaborative design of software and hardware of a visual processing neural network.

背景技术Background Art

计算机视觉已有较多使用深度学习的方法完成去马赛克任务，但这些方法单方面关注于算法层面的效果最优，因此参数量巨大，依赖于高成本、低能效的图形处理单元或远程计算中心，在能源和成本预算紧张的情况下，训练时间较长，同时这些算法很难被部署在便携式或移动实时系统上。另一方面这些算法大都基于经典Bayer模式在常规大小像素的彩色滤波阵列采样后的马赛克图像恢复，并不适用于经过低光照小像素彩色滤波阵列所得马赛克图像的去马赛克情形。In computer vision, many deep learning methods have been used to complete the de-mosaicing task. However, these methods only focus on the optimal effect at the algorithm level, so the number of parameters is huge, and they rely on high-cost, low-energy-efficiency graphics processing units or remote computing centers. Under tight energy and cost budgets, the training time is long, and these algorithms are difficult to deploy on portable or mobile real-time systems. On the other hand, most of these algorithms are based on the mosaic image restoration after sampling the color filter array of regular pixel size based on the classic Bayer pattern, and are not suitable for de-mosaicing of mosaic images obtained through low-light small-pixel color filter arrays.

因此，现有的技术方案主要关注于算法层面的去马赛克效果，并没有考虑到算法的计算时长以及是否适于在边缘计算设备上的部署，此外这些方案，是适用于常规Bayer模式常规大小像素的彩色滤波阵列的，并不适用于低光照小像素的情况。Therefore, existing technical solutions mainly focus on the de-mosaicing effect at the algorithm level, without considering the calculation time of the algorithm and whether it is suitable for deployment on edge computing devices. In addition, these solutions are suitable for color filter arrays with regular-sized pixels in a conventional Bayer pattern, and are not suitable for small pixels in low light conditions.

发明内容Summary of the invention

针对现有技术存在的不足，本发明提出一种基于低光照小像素CFA(彩色滤波阵列)采样与边缘计算设备适用的去马赛克方法，该方法在保证适用于小像素彩色滤波整列马赛克图像恢复效果的基础上，优化网络拓扑结构，尽量减小参数量，加速训练时长，同时可以适用于边缘计算设备上的网络部署。In view of the shortcomings of the prior art, the present invention proposes a de-mosaicing method based on low-light small-pixel CFA (color filter array) sampling and applicable to edge computing devices. This method optimizes the network topology, minimizes the number of parameters, and accelerates the training time on the basis of ensuring the restoration effect of the entire mosaic image of small-pixel color filtering. At the same time, it can be suitable for network deployment on edge computing devices.

本发明的技术方案如下：The technical solution of the present invention is as follows:

一种低光照小像素CFA采样与边缘计算设备适用的去马赛克方法，包括以下步骤：A demosaicing method for low-light small-pixel CFA sampling and edge computing devices, comprising the following steps:

步骤1：首先给原始数据集添加高斯噪声、降低亮度后，通过具有75％透明元素的CFA将RGB图像处理成马赛克图像，然后，进行数据预处理(如对每张图片剪裁，成256*256后再次剪裁成4个128*128的图片)步骤后构成训练集，并设置训练目标神经网络的参数。Step 1: First, add Gaussian noise to the original data set and reduce the brightness. Then, process the RGB image into a mosaic image through CFA with 75% transparent elements. Then, perform data preprocessing (such as cropping each image to 256*256 and then cropping it again into 4 128*128 images) to form a training set and set the parameters of the training target neural network.

步骤2：搭建以UNet++网络作为主框架的神经网络模型。Step 2: Build a neural network model with UNet++ network as the main framework.

步骤3：根据所述神经网络模型，分两阶段以最小化各自损失函数为目标训练相应的网络模型。Step 3: According to the neural network model, the corresponding network model is trained in two stages with the goal of minimizing the respective loss functions.

步骤4：利用训练好的所述神经网络模型，对已经添加了高斯噪声，降低了亮度，按照具有75％透明元素的彩色滤波阵列方式采样后的待处理图像去马赛克，得到去马赛克的图像。Step 4: Using the trained neural network model, the image to be processed that has been sampled using a color filter array with 75% transparent elements and to which Gaussian noise has been added and brightness has been reduced is demosaiced to obtain a demosaiced image.

进一步，所述步骤1中，为图像添加高斯噪声、降低亮度的方法是：Furthermore, in step 1, the method of adding Gaussian noise to the image and reducing the brightness is:

通过计算单张黑照片的图像的方差v并以该值作为基准，将服从均值为0,方差为v分布的高斯噪声按照如下公式(1)复用到待处理的数据集的三个通道上By calculating the variance v of a single black photo image and using this value as a benchmark, the Gaussian noise with a mean of 0 and a variance of v is multiplexed onto the three channels of the dataset to be processed according to the following formula (1):

Y＝C(A·X+N(0,B·v)) (1)Y＝C(A·X+N(0,B·v)) (1)

其中，Y:处理后的低光照噪声图，C:图像像素截断函数，A:光照降低倍数，X:待处理原始图片，N(a,b):服从均值为a，方差为b的高斯正态分布噪声产生函数，B:高斯噪声方差微调倍数。Where, Y: processed low-light noise image, C: image pixel truncation function, A: light reduction factor, X: original image to be processed, N(a,b): Gaussian normal distribution noise generation function with mean a and variance b, B: Gaussian noise variance fine-tuning factor.

以上，使用高斯平滑替代卷积结构，可以保证不同水平的语义可以跨层连接。In the above, using Gaussian smoothing instead of convolutional structure can ensure that semantics at different levels can be connected across layers.

进一步，所述步骤1中，进行数据预处理后构成训练集是指对用于训练的数据集按照彩色滤波阵列的模式进行采样预处理得到神经网络训练所用数据集，包括：Furthermore, in step 1, forming a training set after data preprocessing means sampling and preprocessing the data set used for training according to the mode of the color filter array to obtain the data set used for neural network training, including:

对已经添加了高斯噪声，降低了亮度的图像，按照具有75％透明元素的彩色滤波阵列的数据采样方式进行采样。将仅含B通道采样像素的图像、仅含R通道采样像素的图像、仅含G通道采样像素的图像G1、仅含G通道采样像素的图像G2、仅含透明通道采样图像分别分割成多个马赛克图像块，其中仅含透明通道采样图像大小为其他图像的3倍经过两次下采样之后，得到与其他图像块大小相同的透明通道图像块。For the image to which Gaussian noise has been added and the brightness has been reduced, sampling is performed according to the data sampling method of the color filter array with 75% transparent elements. The image containing only B channel sampling pixels, the image containing only R channel sampling pixels, the image G1 containing only G channel sampling pixels, the image G2 containing only G channel sampling pixels, and the image containing only transparent channel sampling pixels are respectively divided into multiple mosaic image blocks, among which the size of the image containing only transparent channel sampling is 3 times that of other images. After two downsamplings, a transparent channel image block with the same size as other image blocks is obtained.

进一步，所述步骤2中的神经网络模型包括特征抽取模块和图像重建模块。Furthermore, the neural network model in step 2 includes a feature extraction module and an image reconstruction module.

所述特征抽取模块包括三次高斯平滑与特征抽取和六次跨层连接以及一次上采样结构。在所属特征抽取模块中，一共由三次高斯平滑，这三次高斯平滑的关系如下，输入图像首先经过一次高斯平滑后得到了经过了一次高斯平滑的图片，随后将该经过了一次高斯平滑的图片水平的送入特征抽取模块；同时也将该经过了一次高斯平滑的图片向下的再次经过一次高斯平滑提取出更高水平的语义，得到经过了两次高斯平滑的图片，随后同样的将该经过了两次高斯平滑的图片水平的送入特征抽取模块；同时，也将该经过了两次高斯平滑的图片向下的再一次进故宫一次高斯平滑提取出更高水平的语义，得到经过了三次高斯平滑的图片，随后同样的将该经过了三次高斯平滑的图片水平的送入特征抽取模块。The feature extraction module includes three Gaussian smoothing and feature extraction, six cross-layer connections and one upsampling structure. In the feature extraction module, there are three Gaussian smoothings in total, and the relationship between the three Gaussian smoothings is as follows: the input image is firstly smoothed once to obtain a picture that has been smoothed once, and then the picture that has been smoothed once is horizontally sent to the feature extraction module; at the same time, the picture that has been smoothed once is also smoothed downward once again to extract a higher level of semantics, and a picture that has been smoothed twice is obtained, and then the picture that has been smoothed twice is also horizontally sent to the feature extraction module; at the same time, the picture that has been smoothed twice is also smoothed downward once again to extract a higher level of semantics, and a picture that has been smoothed three times is obtained, and then the picture that has been smoothed three times is also horizontally sent to the feature extraction module.

在特征抽取模块中，首先将经过了0，1，2，3次高斯平滑的图片送入各层的特征抽取模块中，而在不同的特征抽取模块中卷积核尺寸根据平滑次数的不同而不同。在各层特征抽取模块中具有不同卷积核尺寸，其中经过了0次高斯平滑的卷积核尺寸为3*3，经过了1次高斯平滑的卷积核尺寸为3*3，经过了2次高斯平滑的卷积核尺寸为5*5，经过了3次高斯平滑的卷积核尺寸为7*7。这些卷积结构的作用在于进一步扩张感受域，使得网络可以在所涉及的三角形结构的上层获得更大范围的图像信息。在各层的卷积结构之后是三个密实连接单元，每个密实连接单元由三个深度可分离卷积和一个PReLU组成。在每个密实连接单元中各个深度可分离卷积结构之间都采用了密实连接结构，即第一层深度可分离卷积的输入和输出都将被一起送入到第二层深度可分离卷积的输入，而同样的第二层深度可分离卷积的输入和输出都将被一起送入到第三层深度可分离卷积的输入层。同时，在特征重建模块中，每个密实连接单元的输入和输出通道数。In the feature extraction module, the images that have been Gaussian smoothed 0, 1, 2, and 3 times are first sent to the feature extraction modules of each layer, and the convolution kernel size in different feature extraction modules varies according to the number of smoothing times. There are different convolution kernel sizes in the feature extraction modules of each layer, among which the convolution kernel size after 0 Gaussian smoothing is 3*3, the convolution kernel size after 1 Gaussian smoothing is 3*3, the convolution kernel size after 2 Gaussian smoothing is 5*5, and the convolution kernel size after 3 Gaussian smoothing is 7*7. The role of these convolution structures is to further expand the receptive field, so that the network can obtain a wider range of image information in the upper layer of the involved triangular structure. After the convolution structure of each layer are three densely connected units, each of which consists of three depth-separable convolutions and one PReLU. In each densely connected unit, each depth-separable convolution structure uses a dense connection structure, that is, the input and output of the first layer of depth-separable convolution will be sent together to the input of the second layer of depth-separable convolution, and similarly, the input and output of the second layer of depth-separable convolution will be sent together to the input layer of the third layer of depth-separable convolution. At the same time, in the feature reconstruction module, the number of input and output channels of each densely connected unit.

所述图像重建模块由一个3×3卷积和一个1×1卷积结构以及一个PReLU组成，用于将经过特征抽取之后的特征图重建成无马赛克无噪声的图像。The image reconstruction module consists of a 3×3 convolution, a 1×1 convolution structure and a PReLU, and is used to reconstruct the feature map after feature extraction into a mosaic-free and noise-free image.

进一步，所述特征抽取模块在用残差连接结构连接的特征抽取模块中嵌入具有残差连接结构的密实连接单元，该残差结构由一个卷积结构和三个密集连接单元组成，其中的卷积结构用于初步抽取出不同层的相应水平特征同时进一步扩展感受野，每一个密集连接单元包含三个深度可分离卷积以及六个PReLU激活层。Furthermore, the feature extraction module embeds a densely connected unit with a residual connection structure in a feature extraction module connected with a residual connection structure. The residual structure consists of a convolution structure and three densely connected units. The convolution structure is used to preliminarily extract corresponding horizontal features of different layers while further expanding the receptive field. Each densely connected unit contains three depth-wise separable convolutions and six PReLU activation layers.

进一步，所述步骤3包括：Further, the step 3 comprises:

3.1数据选取，选取ImageNet作为训练网络的数据集，在用于训练之前，首先将图片以中心为基准，裁成256*256分辨率的图片，随后将网络裁成128*128分辨率的图片后用于网络训练。3.1 Data selection,ImageNet is selected as the data set for training the network. Before being used for training, the image is first cropped into a 256*256 resolution image based on the center, and then the network is cropped into a 128*128 resolution image for network training.

3.2优化器选取，采用Adam优化，初始化学习率设为0.001,学习率每隔10个周期下降为原来的一半，小批量大小为16，其他超参数采用默认设置。3.2 Optimizer selection: Adam optimization is used, the initial learning rate is set to 0.001, the learning rate is reduced to half of the original value every 10 cycles, the mini-batch size is 16, and other hyperparameters are set by default.

3.3损失函数设计，在训练的10个周期内(e≤10)，采用衡量局部相似性的SSIM指标作为损失函数设计的依据，在10个周期后(e>10)，采用衡量全局特征的MSE作为损失函数设计的依据，实验总的周期数设置为150个周期。3.3 Loss function design. Within 10 cycles of training (e≤10), the SSIM indicator for measuring local similarity is used as the basis for loss function design. After 10 cycles (e>10), the MSE for measuring global features is used as the basis for loss function design. The total number of experimental cycles is set to 150 cycles.

由以上技术方案可见，本发明方案提出的是一种适用于一种与专用于低光照小像素摄像机的彩色滤波阵列相配套的适用于边缘计算设备的神经网络去马赛克方法，其优势在于：It can be seen from the above technical solutions that the solution of the present invention is a neural network demosaicing method suitable for edge computing devices that is matched with a color filter array dedicated to low-light small-pixel cameras, and its advantages are:

1、适用于低光照情况的具有75％透明元素的彩色滤波阵列采样所得的图像的去噪和去马赛克。1. De-noising and demosaicing of images sampled with a color filter array with 75% transparent elements suitable for low light conditions.

2、在原有UNet++网络结构的基础上，在UNet++网络的下采样阶段将原有的卷积结构替换成高斯平滑结构，保证了不同水平的语义可以跨层连接，不同语义的特征图可以同时用于图像恢复。2. Based on the original UNet++ network structure, the original convolution structure is replaced with a Gaussian smoothing structure in the downsampling stage of the UNet++ network, ensuring that semantics at different levels can be connected across layers and feature maps with different semantics can be used for image restoration at the same time.

3、在不同的训练周期采用不同的损失函数，用于神经网络的训练，充分利用了先验知识，既保证了图像恢复的局部效果，又保证了网络的训练收敛速度。3. Different loss functions are used in different training cycles for neural network training, which makes full use of prior knowledge, ensuring both the local effect of image restoration and the training convergence speed of the network.

4、在保证图像恢复效果的基础上，通过修改网络拓扑结构、使用深度可分离卷积替换上采样阶段的传统卷积减少网络参数量，网络训练完成后，可以根据实际情况，使用剪枝选择合适的网络规模，便于移植到边缘计算设备。4. On the basis of ensuring the image restoration effect, the network topology is modified and the traditional convolution in the upsampling stage is replaced by the depthwise separable convolution to reduce the number of network parameters. After the network training is completed, the appropriate network scale can be selected by pruning according to the actual situation, which is convenient for porting to edge computing devices.

5、本方法较为高效，训练时长较小。5. This method is more efficient and takes less training time.

本发明设计的神经网络算法是专用于低光照小像素条件的彩色滤波阵列采样所得的RAW图像的去噪去马赛克，在保证较好的图像恢复质量的基础上又通过修改拓扑结构等方法，将神经网络的参数量降低使神经网络适用于边缘计算设备。The neural network algorithm designed in the present invention is specifically used for denoising and demosaicing of RAW images obtained by sampling the color filter array under low-light and small-pixel conditions. On the basis of ensuring good image restoration quality, the parameters of the neural network are reduced by modifying the topological structure and other methods, making the neural network suitable for edge computing devices.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是添加高斯噪声、降低亮度的处理效果图；Figure 1 is a diagram showing the processing effect of adding Gaussian noise and reducing brightness;

图2是彩色滤波阵列；Fig. 2 is a color filter array;

图3数据集预处理过程；Fig. 3 Dataset preprocessing process;

图4是网络拓扑结构；Figure 4 is a network topology;

图5是特征抽取模块。Figure 5 is the feature extraction module.

具体实施方式DETAILED DESCRIPTION

以下结合附图进一步详细说明本发明的技术实现：The technical implementation of the present invention is further described in detail below with reference to the accompanying drawings:

本发明提出的方案是配套于具有75％透明元素的彩色滤波阵列的去马赛克方法，这一方法主要解决了特征图大小不一的问题，在利用该彩色滤波阵列的较高感光特性以及适用于小摄像头的特点的基础上又实现去马赛克效果。该方法主要包括以下步骤：The solution proposed by the present invention is a demosaicing method for a color filter array with 75% transparent elements. This method mainly solves the problem of different sizes of feature maps, and realizes the demosaicing effect based on the high sensitivity of the color filter array and the characteristics of being suitable for small cameras. The method mainly includes the following steps:

步骤1：首先对原始数据集添加高斯噪声、降低亮度之后，通过具有75％透明元素的CFA将RGB图像处理成马赛克图像后进行数据预处理后构成训练集，并设置训练目标神经网络的参数。Step 1: First, after adding Gaussian noise to the original data set and reducing the brightness, the RGB image is processed into a mosaic image through CFA with 75% transparent elements. After data preprocessing, the training set is constructed, and the parameters of the training target neural network are set.

步骤4：利用训练好的所述神经网络模型，对已经添加了高斯噪声，降低了亮度，按照具有75％透明元素的彩色滤波阵列方式采样后的待处理图像进行处理，得到去马赛克的图像。Step 4: Using the trained neural network model, the image to be processed, which has been added with Gaussian noise and has reduced brightness, and sampled according to a color filter array with 75% transparent elements, is processed to obtain a demosaiced image.

以下实施例详细说明每一个步骤的具体实现：The following examples describe in detail the specific implementation of each step:

步骤1，首先对原始图像添加高斯噪声、降低亮度后通过具有75％透明元素的CFA将RGB图像处理成马赛克图像后，进行数据预处理后构成训练集，并设置训练目标神经网络的参数。具体包括：Step 1: First, add Gaussian noise to the original image, reduce the brightness, and then process the RGB image into a mosaic image through a CFA with 75% transparent elements. After data preprocessing, form a training set and set the parameters of the training target neural network. Specifically, it includes:

步骤1.1，为原始图像添加适合强度的高斯噪声，降低亮度：Step 1.1, add Gaussian noise of appropriate intensity to the original image to reduce the brightness:

通过计算单张黑照片的图像的方差v并以该值作为基准，将服从均值为0,方差为v分布的高斯噪声按照如下公式(1)复用到待处理的数据集的三个通道上。By calculating the variance v of a single black photo image and using this value as a benchmark, Gaussian noise with a mean of 0 and a variance of v is multiplexed onto the three channels of the data set to be processed according to the following formula (1).

Y＝C(A·X+N(0,B·v)) (1)Y＝C(A·X+N(0,B·v)) (1)

其中，in,

Y:处理后的低光照噪声图；Y: processed low-light noise image;

C:图像像素截断函数；C: image pixel truncation function;

A:光照降低倍数；A: The light reduction multiple;

X:待处理原始图片；X: original image to be processed;

N(a,b):服从均值为a，方差为b的高斯正态分布噪声产生函数；N(a,b): noise generating function that obeys the Gaussian normal distribution with mean a and variance b;

B:高斯噪声方差微调倍数。B: Gaussian noise variance fine-tuning multiple.

这一过程旨在模拟图像在传感器传输过程中在电子设备产生的光子读出噪声等噪声强度，进一步通过对比微调强度后的加噪降低亮度的处理效果，得到添加噪声服从均值0，方差为B·V的高斯噪声的低光照图片，如图1所示。This process is designed to simulate the intensity of noise such as photon readout noise generated by electronic devices during image transmission from the sensor. By further comparing the processing effect of reducing brightness by adding noise after fine-tuning the intensity, a low-light image with Gaussian noise with a mean of 0 and a variance of B·V is obtained, as shown in Figure 1.

根据以上结果，选取A＝0.2,B＝2和3的参数对图像进行处理。According to the above results, the parameters of A=0.2, B=2 and 3 are selected to process the image.

步骤1.2，通过具有75％透明元素的CFA将RGB图像处理成马赛克图像，进行数据预处理后构成训练集：Step 1.2, process the RGB image into a mosaic image through CFA with 75% transparent elements, and form a training set after data preprocessing:

以最小重复4*4单元为例，该4*4单元中一共有16个元素，最左上角的是按照bayer模式进行采样即主对角线对绿色值采样，右上角对红色值进行采样，左下角对蓝色值采样，其余部分为透明采样，即这些部分仅仅采得原始图片的亮度值(这里按照Gray＝R*0.299+G*0.587+B*0.114的方式计算)。具体采样方式可以参考如下论文：G_ANG L_UO,"A_{NOVEL COLOR FILTER ARRAY WITH} 75％TRANSPARENT ELEMENTS,"PROC.SPIE 6502,DIGITALPHOTOGRAPHY III,65020T(20FEBRUARY 2007)；DOI:10.1117/12.702950。Take the minimum repeating 4*4 unit as an example. There are 16 elements in the 4*4 unit. The upper left corner is sampled according to the Bayer mode, that is, the main diagonal samples the green value, the upper right corner samples the red value, and the lower left corner samples the blue value. The rest are transparent samples, that is, these parts only sample the brightness value of the original image (here calculated according to Gray = R*0.299+G*0.587+B*0.114). The specific sampling method can refer to the following paper: G _ANG L _UO , "A _{NOVEL COLOR FILTER ARRAY WITH} 75％ TRANSPARENT ELEMENTS," PROC. SPIE 6502, DIGITAL PHOTOGRAPHY III, 65020T (20 FEBRUARY 2007); DOI: 10.1117/12.702950.

分析本发明中所涉及彩色滤波阵列的数据采样方式(如图2所示)，对原始图像按照图2所示方式采样。The data sampling method of the color filter array involved in the present invention (as shown in FIG2 ) is analyzed, and the original image is sampled according to the method shown in FIG2 .

首先，将原始图像中像素横坐标对4取余为0，纵坐标对4取余为1的像素的R,G通道置0，得到仅含B通道采样像素的图像；将原始图像中像素横坐标对4取余为1，纵坐标对4取余为0的像素的B,G通道置0,得到仅含R通道采样像素的图像；将原始图像中像素横坐标对4取余为0，纵坐标对4取余为0的像素的R,G通道置0,得到仅含G通道采样像素的图像G1,将原始图像中像素横坐标对4取余为1，纵坐标对4取余为1的像素的R,G通道置0,得到仅含G通道采样像素的图像G2；将原始图像中除上述坐标像素之外的像素值按照公式Gray＝R*0.299+G*0.587+B*0.114计算每个像素的亮度之后采样，得到仅含透明通道采样像素的图像。First, the R and G channels of the pixels whose horizontal coordinates modulo 4 are 0 and whose vertical coordinates modulo 4 are 1 in the original image are set to 0, so as to obtain an image containing only B channel sampling pixels; the B and G channels of the pixels whose horizontal coordinates modulo 4 are 1 and whose vertical coordinates modulo 4 are 0 in the original image are set to 0, so as to obtain an image containing only R channel sampling pixels; the R and G channels of the pixels whose horizontal coordinates modulo 4 are 0 and whose vertical coordinates modulo 4 are 0 in the original image are set to 0, so as to obtain an image G1 containing only G channel sampling pixels; the R and G channels of the pixels whose horizontal coordinates modulo 4 are 1 and whose vertical coordinates modulo 4 are 1 in the original image are set to 0, so as to obtain an image G2 containing only G channel sampling pixels; the pixel values in the original image except the pixels at the above coordinates are calculated according to the formula Gray=R*0.299+G*0.587+B*0.114, and then sampled to obtain an image containing only transparent channel sampling pixels.

然后，将所述仅含B通道采样像素的图像、仅含R通道采样像素的图像、仅含G通道采样像素的图像G1、仅含G通道采样像素的图像G2、仅含透明通道采样图像分别分割成多个马赛克图像块。其中仅含透明通道采样图像大小为其他图像的3倍经过两次下采样之后，得到与其他图像块大小相同的透明通道图像块。该过程如图3所示。Then, the image containing only B channel sampling pixels, the image containing only R channel sampling pixels, the image G1 containing only G channel sampling pixels, the image G2 containing only G channel sampling pixels, and the image containing only transparent channel sampling pixels are respectively divided into a plurality of mosaic image blocks. The size of the image containing only transparent channel sampling pixels is three times that of other images, and after two downsamplings, a transparent channel image block with the same size as other image blocks is obtained. This process is shown in FIG3.

采用以上方式，对用于训练的数据集按照具有75％透明元素的彩色滤波整列的模式进行采样预处理得到本设计中神经网络训练所用数据集。In the above manner, the data set used for training is sampled and preprocessed according to the mode of color filtering array with 75% transparent elements to obtain the data set used for neural network training in this design.

步骤1.3，设置训练目标神经网络的参数。Step 1.3, set the parameters for training the target neural network.

具体的，将训练过程中的优化器设置为Adam，将初始学习率设置为0.001，在训练过程中，该学习率将会每10个epoch下降一半。训练过程中的小批量被设置为16。其余超参数将按照默认设置。Specifically, the optimizer during training is set to Adam, the initial learning rate is set to 0.001, and the learning rate is reduced by half every 10 epochs during training. The mini-batch size during training is set to 16. The rest of the hyperparameters are set by default.

步骤2，搭建以UNet++网络作为主框架的神经网络模型Step 2: Build a neural network model with UNet++ as the main framework

模型的主体框架为UNet++结构，网络特征如下：The main framework of the model is the UNet++ structure, and the network features are as follows:

本方法的总体模型如图4所示，神经网络模型包括特征抽取部分和图像重建模块。The overall model of this method is shown in Figure 4. The neural network model includes a feature extraction part and an image reconstruction module.

2.1特征抽取部分：2.1 Feature extraction part:

特征抽取部分包括三次高斯平滑与特征抽取和六次跨层连接以及一次上采样结构，处理过程：The feature extraction part includes three Gaussian smoothing and feature extraction, six cross-layer connections, and one upsampling structure. The processing process is as follows:

在所属特征抽取模块中，一共由三次高斯平滑，这三次高斯平滑的关系如下，输入图像首先经过一次高斯平滑后得到了经过了一次高斯平滑的图片，随后将该经过了一次高斯平滑的图片水平的送入特征抽取模块；同时也将该经过了一次高斯平滑的图片向下的再次经过一次高斯平滑提取出更高水平的语义，得到经过了两次高斯平滑的图片，随后，同样的将该经过了两次高斯平滑的图片水平的送入特征抽取模块；同时，也将该经过了两次高斯平滑的图片向下的再一次进故宫一次高斯平滑提取出更高水平的语义，得到经过了三次高斯平滑的图片，随后同样的将该经过了三次高斯平滑的图片水平的送入特征抽取模块。In the feature extraction module, there are three Gaussian smoothings in total. The relationship between these three Gaussian smoothings is as follows: the input image is first Gaussian smoothed once to obtain a picture that has been smoothed once, and then the picture that has been smoothed once is horizontally sent to the feature extraction module; at the same time, the picture that has been smoothed once is downwardly smoothed once again to extract higher-level semantics, and obtain a picture that has been smoothed twice, and then, the picture that has been smoothed twice is also horizontally sent to the feature extraction module; at the same time, the picture that has been smoothed twice is downwardly smoothed once again to extract higher-level semantics, and obtain a picture that has been smoothed thrice, and then, the picture that has been smoothed thrice is also horizontally sent to the feature extraction module.

在特征抽取模块中，首先将经过了0，1，2，3次高斯平滑的图片送入各层的特征抽取模块中，而在不同的特征抽取模块中卷积核尺寸根据平滑次数的不同而不同。在各层特征抽取模块中具有不同卷积核尺寸，其中经过了0次高斯平滑的卷积核尺寸为3*3，经过了1次高斯平滑的卷积核尺寸为3*3，经过了2次高斯平滑的卷积核尺寸为5*5，经过了3次高斯平滑的卷积核尺寸为7*7。这些卷积结构的左右在于进一步扩张感受域，使得网络可以在所涉及的三角形结构的上层获得更大范围的图像信息。在各层的卷积结构之后是三个密实连接单元，每个密实连接单元由三个深度可分离卷积和一个PReLU组成。在每个密实连接单元中各个深度可分离卷积结构之间都采用了密实连接结构，即第一层深度可分离卷积的输入和输出都将被一起送入到第二层深度可分离卷积的输入，而同样的第二层深度可分离卷积的输入和输出都将被一起送入到第三层深度可分离卷积的输入层。同时，在特征重建模块中，每个密实连接单元的输入和输出通道数相同。In the feature extraction module, the images that have been Gaussian smoothed 0, 1, 2, and 3 times are first sent to the feature extraction modules of each layer, and the convolution kernel size in different feature extraction modules varies according to the number of smoothing times. There are different convolution kernel sizes in the feature extraction modules of each layer, among which the convolution kernel size after 0 Gaussian smoothing is 3*3, the convolution kernel size after 1 Gaussian smoothing is 3*3, the convolution kernel size after 2 Gaussian smoothing is 5*5, and the convolution kernel size after 3 Gaussian smoothing is 7*7. The purpose of these convolution structures is to further expand the receptive field, so that the network can obtain a wider range of image information in the upper layer of the involved triangular structure. After the convolution structure of each layer are three densely connected units, each of which consists of three depth-separable convolutions and one PReLU. In each densely connected unit, each depth-separable convolution structure uses a dense connection structure, that is, the input and output of the first layer of depth-separable convolution will be sent together to the input of the second layer of depth-separable convolution, and the input and output of the second layer of depth-separable convolution will be sent together to the input layer of the third layer of depth-separable convolution. At the same time, in the feature reconstruction module, the number of input and output channels of each densely connected unit is the same.

在以上特征抽取模块中，输入图像经过三次高斯平滑后，网络得到三个不同层次的图像模糊水平，这一举动的好处在于，可以在获得不同语义水平特征图的同时，又不改变中间特征图的大小，便于后续实现跨层连接；图像在经过了高斯平滑后，送入到如图5所示的各层的特征抽取模块。In the above feature extraction module, after the input image is Gaussian smoothed three times, the network obtains three different levels of image blur levels. The advantage of this action is that it can obtain feature maps of different semantic levels without changing the size of the intermediate feature map, which is convenient for subsequent cross-layer connection. After Gaussian smoothing, the image is sent to the feature extraction modules of each layer as shown in Figure 5.

如图5所示，本方法所述神经网络模型的特征抽取模块在用残差连接结构连接的特征抽取模块中嵌入了具有残差连接结构的密实连接单元，该残差结构由一个卷积结构和三个密集连接单元组成，其中的卷积结构用于初步抽取出不同层的相应水平特征同时进一步扩展感受野，每一个密集连接单元包含三个深度可分离卷积以及六个PReLU激活层。该残差结构由一个卷积结构和三个密集连接单元组成，其中的卷积结构用于初步抽取出不同层的相应水平特征同时进一步扩展感受野，在本神经网络模型中，0、1、2、3层的卷积核的大小分别为3×3、3×3、5×5、7×7。特征抽取模块中还包含三个密集连接单元，每一个密集连接单元包含三个深度可分离卷积以及六个PReLU激活层。As shown in FIG5 , the feature extraction module of the neural network model described in the present method embeds a dense connection unit with a residual connection structure in the feature extraction module connected with a residual connection structure. The residual structure consists of a convolution structure and three dense connection units, wherein the convolution structure is used to preliminarily extract the corresponding horizontal features of different layers while further expanding the receptive field. Each dense connection unit contains three depth-separable convolutions and six PReLU activation layers. The residual structure consists of a convolution structure and three dense connection units, wherein the convolution structure is used to preliminarily extract the corresponding horizontal features of different layers while further expanding the receptive field. In the present neural network model, the sizes of the convolution kernels of layers 0, 1, 2, and 3 are 3×3, 3×3, 5×5, and 7×7, respectively. The feature extraction module also contains three dense connection units, each of which contains three depth-separable convolutions and six PReLU activation layers.

本方法所含网络模型的特征抽取模块的输入和输出通道数一致。中间层的通道数参数如表1所示。The input and output channel numbers of the feature extraction module of the network model included in this method are consistent. The channel number parameters of the middle layer are shown in Table 1.

表1Table 1

2.2图像重建部分：2.2 Image reconstruction part:

在输入图像经过上述特征抽取部分之后，特征图将被以如图4中紫色部分向右向上的重建成无马赛克无噪声的图像。而这一向右向上的过程即为图像重建部分。在这一部分，每一层的特征就将会与更靠前更低水平的每一层特征用1×1卷积连接起来(如图中的白色小圆所示)；该部分的每一次图像重建操作由一个3×3卷积和一个1×1卷积结构以及一个PReLU组成；图中的虚线部分表示跨不同语义水平层之间的连接，用1×1卷积实现；在该结构顶部的橙色箭头表示用3×3转置卷积结构实现的上采样。层与层之间最终的输出图像将会在结构的顶层分层得到的即为图中的L＝1,L＝2,L＝3等输出。在图像重建部分中，在转置卷积操作之前的过程中，图像尺寸不变，在转置卷积之后图像长宽各自变为原来的2倍，中间过程的输入图像尺寸、卷积核大小、输出图像尺寸如表2所示。After the input image passes through the above feature extraction part, the feature map will be reconstructed into a mosaic-free and noise-free image in the purple part in Figure 4. This rightward and upward process is the image reconstruction part. In this part, the features of each layer will be connected with the features of each layer of the previous and lower levels by 1×1 convolution (as shown by the white circle in the figure); each image reconstruction operation in this part consists of a 3×3 convolution, a 1×1 convolution structure and a PReLU; the dotted part in the figure represents the connection between layers of different semantic levels, which is realized by 1×1 convolution; the orange arrow at the top of the structure represents the upsampling realized by the 3×3 transposed convolution structure. The final output image between layers will be obtained at the top layer of the structure, which is the output of L＝1, L＝2, L＝3 in the figure. In the image reconstruction part, in the process before the transposed convolution operation, the image size remains unchanged. After the transposed convolution, the length and width of the image are each doubled. The input image size, convolution kernel size, and output image size of the intermediate process are shown in Table 2.

表2Table 2

步骤3，根据所述神经网络模型，分两阶段以最小化各自损失函数为目标训练相应的网络模型。Step 3: According to the neural network model, the corresponding network model is trained in two stages with the goal of minimizing the respective loss functions.

3.1数据选取，本方法选取ImageNet作为训练网络的数据集，在用于训练之前，首先将图片以中心为基准，裁成256*256分辨率的图片，随后将网络裁成128*128分辨率的图片后用于网络训练。3.1 Data Selection,This method selects ImageNet as the dataset for training the network. Before using it for training, the image is firstly cropped into 256*256 resolution images based on the center, and then the network is cropped into 128*128 resolution images for network training.

3.2优化器选取，本方法采用Adam优化，初始化学习率设为0.001,学习率每隔10个周期下降为原来的一半，小批量大小为16，其他超参数采用默认设置。3.2 Optimizer selection. This method uses Adam optimization. The initial learning rate is set to 0.001. The learning rate is reduced to half of the original value every 10 cycles. The mini-batch size is 16, and other hyperparameters are set by default.

其中SSIM损失函数L_SSIM如下：The SSIM loss function L _SSIM is as follows:

MSE损失函数L_MSE如下：The MSE loss function L _MSE is as follows:

总体损失函数如下：The overall loss function is as follows:

步骤4：利用训练好的所述神经网络模型，对已经添加了高斯噪声，降低了亮度，按照具有75％透明元素的彩色滤波阵列方式采样后的待处理图像进行处理，即可得到去马赛克的图像。Step 4: Using the trained neural network model, the image to be processed, which has been added with Gaussian noise and has reduced brightness, and sampled according to a color filter array with 75% transparent elements, is processed to obtain a de-mosaiced image.

由以上的实施例可见，该方法作为配套于具有75％透明元素的彩色滤波阵列的去马赛克算法，适用于低光照条件下小像素彩色滤波阵列，这一算法主要解决了特征图大小不一的问题，在利用该彩色滤波阵列的较高感光特性以及适用于小摄像头的特点的基础上又实现了去马赛克效果。It can be seen from the above embodiments that this method, as a de-mosaicing algorithm for a color filter array with 75% transparent elements, is suitable for small-pixel color filter arrays under low-light conditions. This algorithm mainly solves the problem of different sizes of feature maps, and achieves a de-mosaicing effect based on the higher photosensitivity of the color filter array and its suitability for small cameras.

并且，该方法实现了软硬件协同设计的思想，在保证较好的图像恢复质量的基础上又通过修改拓扑结构等方法，在保证可以接受的去马赛克效果的基础上从拓扑结构的角度减少参数量，减少了神经网络的参数量，将神经网络的参数量降低使神经网络适用于边缘计算设备。In addition, this method realizes the idea of collaborative design of software and hardware. On the basis of ensuring good image restoration quality, it reduces the number of parameters from the perspective of topological structure by modifying the topological structure and other methods while ensuring an acceptable de-mosaicing effect. This reduces the number of parameters of the neural network and makes the neural network suitable for edge computing devices.

Claims

1. A demosaicing method applicable to low-light small-pixel CFA sampling and edge computing equipment, comprising the steps of:

Step 1: firstly, after Gaussian noise is added to an original data set and the brightness is reduced, an RGB image is processed into a mosaic image through a color filter array mode (CFA) with 75% transparent elements, data preprocessing is carried out to form a training set, and parameters of a training target neural network are set;

Step 2: building a neural network model taking a UNet++ network as a main framework, wherein the neural network model comprises a feature extraction part and an image reconstruction module:

The characteristic extraction part comprises a three-time Gaussian smoothing and characteristic extraction module, a cross-layer connection structure and a one-time up-sampling structure; the feature extraction module is embedded with a dense connection unit with a residual connection structure in a feature extraction module connected by the residual connection structure, wherein the residual connection structure consists of a convolution structure and three dense connection units, the convolution structure is used for preliminarily extracting corresponding horizontal features of different layers and further expanding a receptive field, and each dense connection unit comprises three depth separable convolutions and six PReLU activation layers;

The image reconstruction module consists of a 3×3 convolution product, a1×1 convolution structure and a PReLU, and is used for reconstructing the feature map after feature extraction into a mosaic-free and noise-free image;

step 3: according to the neural network model, training a corresponding network model in two stages with the aim of minimizing respective loss functions, including:

3.1 data selection, namely selecting an ImageNet as a data set of a training network, firstly cutting a picture into 256-by-256 resolution pictures by taking a center as a reference before the training, and then cutting the network into 128-by-128 resolution pictures and then using the network for network training;

3.2, selecting by an optimizer, adopting Adam optimization, setting an initial learning rate to be 0.001, reducing the learning rate to be half of the original learning rate every 10 periods, and setting the small batch size to be 16, wherein other super parameters adopt default settings;

3.3 designing a loss function, wherein e is less than or equal to 10 in 10 training periods, SSIM indexes for measuring local similarity are adopted as the basis of the loss function design, and after 10 training periods, e is more than 10, MSE for measuring global characteristics is adopted as the basis of the loss function design, and the total experimental period number is set to 150;

Step 4: and processing the image to be processed which is sampled in a color filter array mode with 75% transparent elements by utilizing the trained neural network model, wherein Gaussian noise is added, the brightness is reduced, and a demosaiced image is obtained.

2. The method for demosaicing applicable to a low-light small-pixel CFA sampling and edge computing device according to claim 1, wherein in the step 1, the method for adding gaussian noise to an original data set and reducing brightness is as follows:

by calculating the variance v of the image of the single Zhang Hei photo and taking the value as a reference, gaussian noise subject to a mean of 0 and distributed with the variance v is multiplexed onto three channels of the original dataset according to the following formula (1)

Y＝C((A·X+N(0,B·v)) (1)

Wherein Y is a processed low-light noise image, C is an image pixel cut-off function, A is an illumination reduction multiple, X is an original image to be processed, N (a, B) is a Gaussian normal distribution noise generation function with a mean value of a and a variance of B, and B is a Gaussian noise variance fine adjustment multiple.

3. The method for demosaicing as recited in claim 1, wherein in step 1, the processing of the RGB image into the mosaic image by the CFA having 75% transparent elements means:

The original image is sampled in a mode of a color filter array with 75% transparent elements, and an image only containing B channel sampling pixels, an image only containing R channel sampling pixels, an image only containing G channel sampling pixels G2 and an image only containing transparent channel sampling pixels are respectively divided into a plurality of mosaic image blocks, wherein the size of the image only containing transparent channel sampling is 3 times that of other images, and after downsampling is carried out twice, the transparent channel image blocks with the same size as the other image blocks are obtained.

4. The low-light small-pixel CFA sampling and edge computing device-adapted demosaicing method of claim 1, wherein the sizes of the convolution kernels of the 0, 1,2, 3 layers in the neural network model are 3 x 3, 5 x 5, 7 x 7, respectively.

5. The low-light small-pixel CFA sampling and edge computing device-adapted demosaicing method of claim 1, wherein in step 3.3:

The SSIM loss function L _SSIM is as follows:

Wherein, Representing the structural similarity of two images, where x and y represent the two images used to calculate the structural similarity, where mu _x is the average of x, mu _y is the average of y,Is the variance of the x-value,Is the variance of y, σ _xy is the covariance of x and y, and C ₁＝(k₁L)²,C₂＝(k₂L)² is a constant used to maintain stability; l is the dynamic range of pixel values, k ₁＝0.01,k₂ =0.03; the size of the sliding window in the SSIM calculation process in this design is set to 11;

M in the loss function L _SSIM represents that the matrix abstracted from the two images has M rows in total, and N represents that the matrix abstracted from the two images has N columns in total;

After the number of epochs trained exceeds 10, the loss function takes the form of MSE loss function L _MSE, which is specified as follows:

The overall loss function is as follows:

the superscript 1 for the loss function in the above equation indicates that the layer loss functions L ₁、L₂、L₃ of the network that introduce deep supervision need to be considered together in the training process, and therefore the above SSIM loss functions representing layers 1, 2, and 3, the sameRepresenting layer 1, 2, 3 MSE loss functions.