CN115456908A

CN115456908A - A Robust Self-Supervised Image Denoising Method

Info

Publication number: CN115456908A
Application number: CN202211192386.1A
Authority: CN
Inventors: 谭伟敏; 颜波; 黄辰宇
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2022-12-09

Abstract

The invention belongs to the technical field of digital image processing, and particularly relates to a robust self-supervision image denoising method based on a pair noise sample. The method comprises the following steps: acquiring a rough denoising image of a noisy image through a pre-denoising network; taking the group as a unit to make a difference between the original noisy image and the rough de-noising image to obtain noise which is approximately truly distributed; performing intra-group cyclic shift operation on the grouped noise twice, and adding the result of the intra-group cyclic shift operation to the rough de-noised image to obtain two grouped noisy samples; a constructed paired noisy sample and an uncertainty perception loss function are utilized to train a dual-branch denoising network, so that the denoising performance and robustness of the network are improved. Experimental results show that the method overcomes the defects of the prior self-supervision image denoising method, effectively improves the definition of a denoised image, and has strong practical value in the method for constructing the denoised sample.

Description

A Robust Self-Supervised Image Denoising Method

技术领域technical field

本发明属于数字图像处理技术领域，具体涉及一种鲁棒的自监督图像去噪方法。The invention belongs to the technical field of digital image processing, and in particular relates to a robust self-supervised image denoising method.

背景技术Background technique

图像去噪旨在从噪声观测中恢复干净的信号，它是图像处理和低级计算机视觉中的重要任务之一。最近，随着神经网络的快速发展，基于学习的有监督去噪模型已经取得了令人满意的性能。然而，这些方法在很大程度上依赖于噪声-干净或噪声-噪声的配对图像。在实际应用中，收集这样的配对图像是复杂且昂贵的，甚至在动态场景和医学成像等任务中，由于现实条件的限制，符合要求的配对图像根本无法获得，这就导致有监督的图像去噪方法难以适应真实的去噪场景，或是难以达到理想的去噪效果。Image denoising aims to recover clean signals from noisy observations, and it is one of the important tasks in image processing and low-level computer vision. Recently, with the rapid development of neural networks, learning-based supervised denoising models have achieved satisfactory performance. However, these methods rely heavily on noise-clean or noise-noise paired images. In practical applications, collecting such paired images is complex and expensive, and even in tasks such as dynamic scenes and medical imaging, due to the constraints of real-world conditions, qualified paired images cannot be obtained at all, which leads to supervised image de- The denoising method is difficult to adapt to the real denoising scene, or it is difficult to achieve the ideal denoising effect.

相比于有监督的图像去噪方法，自监督图像去噪方法因其无需干净的图片和配对的图像做参考而更具有实践价值。目前绝大部分的自监督方法仅使用单张带噪图像就能实现去噪模型的训练，这些方法的核心思想是从单张带噪图像中构造成对的带噪样本。然而，这一过程是极具挑战性的，并且构造的成对带噪样本的质量与去噪模型的性能密切相关。现有的自监督去噪方法中广泛采用了盲点卷积和子采样这两种构建成对样本进行学习的策略，例如N2V^[1]、NBR^[4]和B2UB^[5]。这些策略实现了从单张噪声图像学习去噪结果的这一目标，但由于策略中存在着信息利用不足和像素错位的问题，导致去噪模型的性能提高受到阻碍。Compared with supervised image denoising methods, self-supervised image denoising methods are more practical because they do not require clean images and paired images as references. Most of the current self-supervised methods can realize the training of the denoising model only by using a single noisy image. The core idea of these methods is to construct pairs of noisy samples from a single noisy image. However, this process is extremely challenging, and the quality of the constructed pairs of noisy samples is closely related to the performance of the denoising model. Blind convolution and subsampling, two strategies built to learn from samples, are widely used in existing self-supervised denoising methods, such as N2V ^[1] , NBR ^[4] and B2UB ^[5] . These strategies achieve the goal of learning denoising results from a single noisy image, but the performance improvement of denoising models is hindered due to the problems of underutilization of information and pixel misalignment in the strategies.

此外，大多数自监督图像去噪方法都着眼于提高模型的去噪性能，而很少关注到模型的鲁棒性，导致这些去噪模型对未见的噪声非常敏感并产生严重的退化。在实践中，由于不能保证收集到的带噪图像的噪声分布和强度是严格符合要求的，因此去噪网络在面对复杂场景时必须保持鲁棒性。使用不确定性感知损失函数在之前的研究中^[3]已被证实能够有效提升模型的稳定性，但这一想法还未在去噪任务中被讨论。In addition, most self-supervised image denoising methods focus on improving the denoising performance of the model, while paying little attention to the robustness of the model, resulting in these denoising models being very sensitive to unseen noise and severely degraded. In practice, since the noise distribution and intensity of the collected noisy images cannot be guaranteed to be strictly compliant, the denoising network must remain robust in the face of complex scenes. Using an uncertainty-aware loss function has been shown to be effective in improving model stability in previous studies ^[3] , but this idea has not been discussed for denoising tasks.

发明内容Contents of the invention

本发明的目的在于克服现有自监督图像去噪任务的不足，提出一种鲁棒的自监督图像去噪方法，显著提升去噪网络的性能。The purpose of the present invention is to overcome the shortcomings of the existing self-supervised image denoising tasks, propose a robust self-supervised image denoising method, and significantly improve the performance of the denoising network.

本发明提供的鲁棒的自监督图像去噪方法，是基于构造成对噪声样本方法的；通过该方法合成的带噪样本具有相同的“干净”场景和独立同分布的噪声，可有效缓解信息利用率低和图像不对齐的问题，使去噪网络的性能得到显著提升；同时引入不确定性感知的损失函数来提升去噪模型的鲁棒性。The robust self-supervised image denoising method provided by the present invention is based on the method of constructing paired noise samples; the noisy samples synthesized by this method have the same "clean" scene and independent and identically distributed noise, which can effectively alleviate information The problem of low utilization rate and image misalignment has significantly improved the performance of the denoising network; at the same time, an uncertainty-aware loss function is introduced to improve the robustness of the denoising model.

本发明提供的鲁棒的自监督图像去噪方法，具体步骤如下：The robust self-supervised image denoising method provided by the present invention has the following steps:

(1)预去噪：采用Krull等人[1]提出的N2V去噪模型对带噪图像进行粗糙去噪；(1) Pre-denoising: use the N2V denoising model proposed by Krull et al. [1] to perform rough denoising on noisy images;

(2)构造成对带噪样本：采用原始带噪图像和粗糙去噪图像，通过简单的加减法和移位操作构造成对的噪声样本；(2) Construct pairs of noisy samples: use the original noisy image and rough denoised image, and construct paired noise samples through simple addition, subtraction and shift operations;

(3)训练双分支去噪网络：利用步骤(2)构造的成对噪声样本训练一个双分支去噪网络，训练过程中采用不确定性感知的损失函数约束网络，从而显著提升网络的去噪表现和鲁棒性。(3) Training a dual-branch denoising network: use the paired noise samples constructed in step (2) to train a dual-branch denoising network. During the training process, an uncertainty-aware loss function is used to constrain the network, thereby significantly improving the denoising of the network. performance and robustness.

步骤(1)中使用N2V去噪模型，记作f(·)，对一组带噪图像Y＝{y₁，y₂，y₃}进行预去噪，得到相对应的粗糙去噪结果X′＝{x′₁，x′₂，x′₃}，其中数字下标标记不同的带噪图像；即：In step (1), use the N2V denoising model, denoted as f( ), to pre-denoise a set of noisy images Y={y ₁ , y ₂ , y ₃ }, and obtain the corresponding rough denoising result X ′={x′ ₁ , x′ ₂ , x′ ₃ }, where the numerical subscripts mark different noisy images; namely:

X′＝f(Y)X'=f(Y)

该过程不涉及网络参数的更新。This process does not involve updating of network parameters.

步骤(2)中首先通过求带噪图像和粗糙去噪结果的差异来获取接近真实分布的噪声组N＝{n₁，n₂，n₃}，即：In step (2), first obtain the noise group N={n ₁ , n ₂ , n ₃ } close to the real distribution by calculating the difference between the noisy image and the rough denoising result, namely:

N＝Y-X′N=Y-X'

注意，此时噪声组内的噪声是独立同分布的。Note that the noise in the noise group is independent and identically distributed at this time.

其次，对噪声组执行两次循环移位操作得到两组仅顺序不同的噪声组，N₁和N₂；具体操作为，将噪声组中位于第一位的噪声移动至最后一位，重复两次；即有：Secondly, perform two cyclic shift operations on the noise group to obtain two groups of noise groups, N ₁ and N ₂ , which differ only in sequence; the specific operation is to move the first noise in the noise group to the last, and repeat two times; that is:

N₁＝{n₂，n₃，n₁}N ₁ ={n ₂ ,n ₃ ,n ₁ }

N₂＝{n₃，n₁，n₂}N ₂ ={n ₃ ,n ₁ ,n ₂ }

最后将两组噪声与粗糙去噪结果分别相加，构造成对的带噪样本组，Y′_A和Y′_B；即：Finally, add the two groups of noise and rough denoising results respectively to construct a pair of noisy sample groups, Y′ _A and Y′ _B ; namely:

Y′_A＝X′+N₁＝{y′_A1，y′_A2，y′_A3}Y′ _A =X′+N ₁ ={y′ _A1 , y′ _A2 , y′ _A3 }

Y′_B＝X′+N₂＝{y′_B1，y′_B2，y′_B3}Y′ _B =X′+N ₂ ={y′ _B1 , y′ _B2 , y′ _B3 }

进一步的，通过上述过程构造出的两张成对带噪图像，例如：y′_A1和y′_B1，具有相同的“干净”场景x′_*和独立同分布的噪声，其中*代表任意的数字下标。Furthermore, two pairs of noisy images constructed through the above process, for example: y′ _A1 and y′ _B1 , have the same “clean” scene x′ _* and independent and identically distributed noise, where * represents any number subscript.

步骤(3)中所述利用步骤(2)构造的成对带噪样本训练一个双分支去噪网络；具体为，将成对带噪样本中的一个带噪样本作为双分支网络的输入，另一个带噪样本作为网络所要学习的目标。In the step (3), the paired noisy samples constructed by the step (2) are used to train a double-branch denoising network; specifically, one noisy sample in the paired noisy samples is used as the input of the double-branched network, and the other The noisy samples serve as the target for the network to learn.

所述双分去噪支网络在DnCNN^[2]的基础上构建得到，DnCNN前两层是卷积层和ReLU激活层，之后是15个由卷积层、批量归一化层和ReLU激活层构成的模块，最后一层是卷积层。双分支网络中，在DnCNN网络中间处接入由卷积层、批量归一化层和ReLU激活层模块堆叠7次而构成的一个网络模块，作为双分支网络的一个去噪分支；原DnCNN中后半部分为另一分支，为不确定性分支；两个分支网络共享DnCNN的前半部分。分支网络中卷积层的卷积核大小为3×3。去噪分支输出为去噪后图像，不确定性分支输出为不确定性图像。The bipartite denoising branch network is constructed on the basis of DnCNN ^[2] . The first two layers of DnCNN are convolutional layers and ReLU activation layers, followed by 15 convolutional layers, batch normalization layers, and ReLU activation layers. Constructed modules, the last layer is a convolutional layer. In the dual-branch network, a network module composed of convolutional layer, batch normalization layer and ReLU activation layer modules stacked 7 times is connected in the middle of the DnCNN network as a denoising branch of the dual-branch network; in the original DnCNN The second half is another branch, which is the uncertainty branch; the two branch networks share the first half of DnCNN. The kernel size of the convolution layer in the branch network is 3×3. The output of the denoising branch is the denoised image, and the output of the uncertainty branch is the uncertainty image.

双分支网络训练过程中，采用一个不确定性感知的损失函数^[3]：In the process of dual-branch network training, an uncertainty-aware loss function ^[3] is used:

其中，K代表带噪样本数据集含有样本的总数，f₁(·)代表双分支网络中输出去噪结果的分支，u_i是双分支网络中预测不确定性的分支输出结果。Among them, K represents the total number of samples contained in the noisy sample data set, f ₁ (·) represents the branch that outputs the denoising result in the dual-branch network, and u _i is the output result of the branch that predicts uncertainty in the dual-branch network.

不确定性分支输出的不确定性图在像素级衡量f₁(·)分支输出的去噪结果的置信程度，即，在u_i中，数值较大的像素区域具有的不确定性较高，与之对应的f₁(·)输出的去噪结果中的区域具有较低的置信度，也就是不确定性高的像素值更加偏离真实的干净图像的像素值。The uncertainty map output by the uncertainty branch measures the confidence level of the denoising result output by the f ₁ (·) branch at the pixel level, that is, in u _i , the pixel region with a larger value has a higher uncertainty, Correspondingly, the region in the denoising result output by f ₁ (·) has a lower confidence level, that is, the pixel values with high uncertainty are more deviated from the pixel values of the real clean image.

相比于去噪任务中常用的MSE损失函数：Compared with the MSE loss function commonly used in denoising tasks:

采用不确定性损失函数训练的双分支网络具有对图像中不同区域去噪难度感知的能力，在对带噪图像去噪时不是平等的对待每个像素区域，而是为每个像素做出更明智的选择，这就使得双分支网络能够在处理未见过的噪声时，能分辨未见噪声对去噪难度的影响，进而有更好的去噪表现和鲁棒性。The dual-branch network trained with the uncertainty loss function has the ability to perceive the difficulty of denoising in different regions of the image. When denoising a noisy image, it does not treat each pixel region equally, but makes more for each pixel. A wise choice, this enables the dual-branch network to distinguish the impact of unseen noise on the difficulty of denoising when dealing with unseen noise, and thus has better denoising performance and robustness.

附图说明Description of drawings

图1为本发明流程图。Fig. 1 is the flow chart of the present invention.

图2本发明构造成对带噪样本流程图。Fig. 2 is a flow chart of paired noisy samples constructed in the present invention.

图3本发明双分支去噪网络结构示意图。Fig. 3 is a schematic diagram of the structure of the dual-branch denoising network of the present invention.

图4使用本发明对带噪图像去噪结果图。其中，(a)为带噪图，(b)是去噪图。Fig. 4 is a diagram of denoising results of a noisy image using the present invention. Among them, (a) is a noisy image, and (b) is a denoised image.

具体实施方式detailed description

下面对本发明实施方案进行详细说明，但是本发明的保护范围不局限于所述实施例。The embodiments of the present invention will be described in detail below, but the protection scope of the present invention is not limited to the examples.

具体步骤是：The specific steps are:

(1)在获取粗糙去噪结果时，输入网络的带噪图像被随机裁剪出128×128的带噪图像块，后续构造成对带噪样本时图像大小保持不变。构造出成对带噪样本数量为40000对，作为训练集；(1) When obtaining rough denoising results, the noisy image input to the network is randomly cropped into 128×128 noisy image blocks, and the size of the image remains unchanged when the pair of noisy samples is subsequently constructed. Construct a pair of noisy samples with a number of 40,000 pairs as a training set;

(2)训练时，构造的成对带噪样本一个作为输入，另一个作为双分支网络学习的目标。双分支网络训练周期为100，设置网络初始学习率为0.0003，衰减率0.5，每20个周期衰减一次。训练过程采用小批量随机梯度下降的方法最小化损失函数，批的大小设为4；(2) During training, one of the constructed pairs of noisy samples is used as input, and the other is used as the learning target of the dual-branch network. The training cycle of the dual-branch network is 100, the initial learning rate of the network is set to 0.0003, and the decay rate is 0.5, which decays once every 20 cycles. The training process uses the method of small batch stochastic gradient descent to minimize the loss function, and the batch size is set to 4;

(3)测试时，整张测试图像保持原有大小输入网络，仅采用输出去噪图像的分支的结果作为最终的去噪结果。(3) During the test, the entire test image keeps its original size and is input to the network, and only the result of the branch that outputs the denoised image is used as the final denoising result.

参考文献：references:

[1]Krull A,Buchholz T O,Jug F.Noise2Void-Learning Denoising FromSingle Noisy Images[C]//2019IEEE/CVF Conference on Computer Vision andPattern Recognition(CVPR).IEEE,2019.Mount J.The equivalence of logisticregression and maximum entropymodels[J].2011.[1]Krull A, Buchholz T O, Jug F.Noise2Void-Learning Denoising FromSingle Noisy Images[C]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019.Mount J.The equivalence of logistic regression and maximum entropymodels[J].2011.

[2]Zhang,Kai,et al."Beyond a gaussian denoiser:Residual learning ofdeep cnn for image denoising."IEEE transactions on image processing 26.7(2017):3142-3155.[2] Zhang, Kai, et al."Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising."IEEE transactions on image processing 26.7(2017):3142-3155.

[3]Kendall,Alex,and Yarin Gal."What uncertainties do we need inbayesian deep learning for computer vision？."Advances in neural informationprocessing systems 30(2017).[3]Kendall, Alex, and Yarin Gal."What uncertainties do we need inbayesian deep learning for computer vision?."Advances in neural informationprocessing systems 30(2017).

[4]Huang T,Li S,Jia X,et al.Neighbor2neighbor:Self-superviseddenoising from single noisy images[C]//Proceedings of the IEEE/CVF conferenceon computer vision and pattern recognition.2021:14781-14790.[4]Huang T, Li S, Jia X, et al.Neighbor2neighbor: Self-supervised denoising from single noisy images[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2021:14781-14790.

[5]Wang Z,Liu J,Li G,et al.Blind2Unblind:Self-Supervised ImageDenoising with Visible Blind Spots[C]//Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition.2022:2027-2036。[5]Wang Z, Liu J, Li G, et al.Blind2Unblind:Self-Supervised Image Denoising with Visible Blind Spots[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:2027-2036.

Claims

1. a robust self-supervision image denoising method based on a constructed pair noise sample is characterized by comprising the following specific steps:

(1) Pre-denoising: carrying out rough denoising on the image with noise by adopting an N2V denoising model;

(2) Configured to provide a noisy sample: constructing a pair of noise samples by adopting an original noisy image and a rough de-noised image through addition and subtraction and shift operations;

(3) Training a two-branch denoising network: and (3) training a dual-branch denoising network by using the pair of noise samples constructed in the step (2), and adopting a loss function of uncertainty perception to constrain the network in the training process, thereby remarkably improving the denoising performance and robustness of the network.

2. The method of claim 1, wherein in step (1), an N2V denoising model, denoted as f (-) is used to perform denoising on a set of noisy images: y = { Y ₁ ，y ₂ ，y ₃ Carrying out pre-denoising to obtain a corresponding rough denoising result: x '= { X' ₁ ，x′ ₂ ，x′ ₃ -wherein the digital subscripts mark different noisy images; namely:

X′＝f(Y)。

3. the self-supervised denoising method of claim 2, wherein the process configured to denoise the sample in step (2) is:

firstly, obtaining a noise group which is approximately distributed in a real way by solving the difference between a noisy image and a rough denoising result: n = { N = ₁ ，n ₂ ，n ₃ And that is:

N＝Y-X′

the noise in the noise group is independently and equally distributed;

secondly, performing two cyclic shift operations on the noise groups to obtain two groups of noise groups with different sequences, and marking the two groups of noise groups as N ₁ And N ₂ (ii) a Specifically, the operation is to move the noise in the first bit of the noise group to the last bit, and repeat the operation twice, that is, there are:

N ₁ ＝{n ₂ ，n ₃ ，n ₁ }

N ₂ ＝{n ₃ ，n ₁ ，n ₂ }

and finally, respectively adding the two groups of noise and the rough denoising result to construct a paired noisy sample group: y' _A And Y' _B ；

Y′ _A ＝X′+N ₁ ＝{y′ _A1 ，y′ _A2 ，y′ _A3 }

Y′ _B ＝X′+N ₂ ＝{y′ _B1 ，y′ _B2 ，y′ _B3 }

The two pairs of noisy images constructed by the above process have the same "clean" scene x' _* And independently identically distributed noise, where x represents any numerical subscript.

4. The self-supervised denoising method of claim 3, wherein the pair of noisy samples constructed in step (2) in step (3) is used to train a dual-branch denoising network, specifically: one of the noise samples is used as the input of the double-branch network, and the other noise sample is used as the target to be learned by the network; the double-branch denoising network is constructed on the basis of DnCNN, the first two layers of the DnCNN are a convolution layer and a ReLU active layer, the last two layers of the DnCNN are 15 modules consisting of the convolution layer, a batch normalization layer and the ReLU active layer, and the last layer is the convolution layer; a network module formed by stacking a convolution layer, a batch normalization layer and a ReLU active layer module for 7 times is accessed in the middle of the DnCN network and is used as a denoising branch of the double-branch network; the second half of the original DnCNN is another branch which is an uncertain branch; the two branch networks share the first half of the DnCNN; the convolution kernel size of the convolution layer in the branch network is 3 multiplied by 3; the denoising branch output is a denoised image, and the uncertainty branch output is an uncertainty image;

in the process of training the double-branch network, an uncertainty perception loss function is adopted:

wherein K represents the total number of samples contained in the noise sample data set, f ₁ (. Represents the branch of the output denoised result in a two-branch network, u _i Is a branch output result of prediction uncertainty in a dual-branch network;

uncertainty branch output uncertainty map measures at pixel level f ₁ Confidence level of denoised result of branch output, i.e. at u _i In the pixel region with larger value, f corresponding to f with higher uncertainty ₁ The area in the output denoised result has lower confidence, i.e. the pixel values with high uncertainty deviate more from the pixel values of the real clean image.