CN114529479A

CN114529479A - Mutual information loss function-based unsupervised one-pot multi-frame image denoising method

Info

Publication number: CN114529479A
Application number: CN202210202856.1A
Authority: CN
Inventors: 金录嘉; 卢闫晔
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-05-24
Anticipated expiration: 2042-03-02
Also published as: CN114529479B

Abstract

The invention discloses an unsupervised one-pot multi-frame image denoising method based on a mutual information loss function, and belongs to the technical field of image processing. The invention can complete denoising only by a single noisy image after training by using multi-frame image data when facing new data, meanwhile, the invention does not need any clean image as a label, and inputs the multi-frame image as a whole into a deep learning model so as to carry out more global learning and more comprehensive noisy mode information mining.

Description

Mutual information loss function-based unsupervised one-pot multi-frame image denoising method

Technical Field

The invention provides an image denoising method, in particular relates to an unsupervised one-pot multi-frame image denoising method based on a mutual information loss function, and belongs to the technical field of image processing.

Background

Existing image denoising methods can be classified into two categories, parsing-based and learning-based. Wherein, the analytic-based method estimates a noise model by analyzing a physical process formed by noise or evaluates local similarity by a specially defined filter to realize denoising. Representative techniques include a non-local mean method (NL-means) [1] and a three-dimensional block matching method (BM3D) [2 ]. However, since image denoising is mathematically a non-positive inverse problem, it is almost impossible to obtain an accurate noise model in real scenes. The difficulty of constructing a noise model based on physical experience or statistical estimation is high, and the method is difficult to show good universality in a real image scene.

The other is a learning-based method, which does not focus on a specific physical scene and noise model, but uses a large amount of data as a drive to fit a deep learning technique to obtain a model for denoising noisy images. These models typically have huge parameters in the order of millions or more. Representative techniques are denoising convolutional neural network (DnCNN) [3], and the like. The learning-based approach overcomes the above-mentioned drawbacks of the parsing-based approach, but such supervised approach (Noise2Clean, abbreviated as N2C) relies on a large number of high-quality Clean images as labels for model training. However, in many real scenes, a clean image is not possible or difficult to acquire. For example, the level of the CT radiation dose in medicine determines the intensity of the noise contained in the acquired image, but considering the radiation damage suffered by the patient, it is not desirable to perform a high dose of radiation for acquiring a clean image.

In order to overcome the problem of imbalance between the high dependence of the supervised learning method on clean image data and the difficulty in acquiring the clean image data, a plurality of unsupervised learning methods are developed. The methods do not need a clean image as a label, and a model with denoising capability can be obtained only by learning a noisy image. These methods can be divided into two categories, according to the number of corresponding noisy images required for the same acquisition field of view: one type only needs a single noisy image for each acquisition field during model training, and the method generally trains a model meeting J-invariance through the idea of mask recoding, so as to realize the self-supervision of a noisy image. Representative techniques include DIP 4, Noise2Void 5, and Noise2Self 6. Another class is Noise2Noise (abbreviated N2N) [7], which requires two noisy images for each acquisition field of view during model training and requires that the Noise between the two images must exhibit statistical irrelevancy at the pixel level and that the mean of the Noise distribution must be 0.

Although the above-mentioned unsupervised learning method proposes to some extent to solve the high dependence of the supervised method on the clean image, the denoising effect of many unsupervised methods including N2N and auto-supervision is significantly worse than that of the supervised method. Moreover, their utilization of data is practically inadequate: although it is difficult to acquire a clean image, in many real scenes, the acquisition of multiple frames of noisy images corresponding to the same acquisition field of view is easy and even natural. Such as exposure-bracket imaging in photography, deep-space superposition imaging in astronomy, optical coherence tomography in medicine, etc., the implementation of these techniques inherently involves the acquisition of multiple noisy images.

The method is less in research special for multi-frame image denoising, and is derived along with a single image denoising method. TiCo [8] originally transferred the NL-means method from a single image denoising scene to a multi-frame image denoising problem. Buades et al [9] provide a complete and sophisticated set of multi-frame image denoising schemes. Hasinoff et al [10] apply a hybrid 2D/3D wiener filter to multi-frame image fusion to generate high-quality high dynamic range images, which has been integrated on top of the Camera2 API and used by almost all Android phone users. Post-registration overlay (AAR) [11] linearly fuses multi-frame images with zero mean noise by using weighted averaging. The methods are all migration applications of single-image denoising methods and are non-learning methods, so that the generalization of the methods in a real denoising scene is very limited. The existing learning-based multi-frame image denoising methods are few, and a common method is to directly treat multi-frame data as independent single data, and then perform denoising by using the above unsupervised method (such as N2N). The method lacks of fully utilizing the cross-correlation information contained in the multi-frame images, and is difficult to truly and effectively dig out the noise mode hidden behind the multi-frame images, so that the denoising effect equivalent to that of the supervised method cannot be achieved.

Disclosure of Invention

Term definition "one-pot learning": the concept is migrated from the professional concepts of 'one-pot preparation', 'one-pot synthesis' and the like in the field of chemical synthesisTo do so. M frames of images { x) participating in denoising model training^j|j∈[1，∞]There is two-way supervision between any two of them, i.e. x is in the model training process^j，j∈[1，m]The method can be used as both input and label, and two responsibilities are respectively and practically assumed at different stages of the iterative training process, and we call such learning as one-pot learning (OPL). Wherein m is within 2 and infinity]。

Aiming at the defects in the prior art, the invention provides an unsupervised one-pot multi-frame image denoising method based on a mutual information loss function, which comprises the following steps:

s1, multi-frame image data preparation. The concrete implementation steps are as follows:

s1a, image acquisition. For a specific scene, acquiring data of m frames of noisy images of each view field for N view fields by using image acquisition equipment

Here we treat noisy images as a superposition of clean images and noise. Wherein x_iRepresenting a noise-free clean image corresponding to the ith field of view,

representing the noise of the j frame of noisy image corresponding to the i field of view, N representing the number of fields of view acquired, and m representing the number of frames of noisy image acquired for each field of view. The process needs to be fully considered: (1) selecting proper acquisition equipment according to the convenience of operation in a specific scene; (2) the method can ensure the registration of multiple frames of noisy images corresponding to the same field of view as much as possible so as to reduce the difficulty of the following M1-S1b steps; (3) the acquisition time difference between the multiple noisy images corresponding to the same field of view is reduced as much as possible, so that the aberration between the multiple images caused by the environmental factors such as brightness change in the field of view, target movement and the like is reduced.

S1b, image registration. And selecting a proper registration algorithm to register the multi-frame images corresponding to the same field of view according to the acquired data characteristics, so that the pixel-level registration consistency among the multi-frame images is ensured.

S1c, image preprocessing. And according to the acquired data characteristics, performing certain preprocessing on the image. Specific pretreatment methods include, but are not limited to: data screening, data enhancement, data standardization and the like.

And S2, constructing a deep learning model for image denoising. The concrete implementation steps are as follows:

s2a model building. Any one of the existing deep learning model structures suitable for the image denoising task can be constructed or a new deep learning model structure can be created. The constructed model is recorded as f_Θ(. The), where Θ is the set of parameters for the model. According to the structural characteristics, the models that can be constructed include but are not limited to: (1) single-pathway models such as DnCNN and the like; (2) multi-pass models such as ResNet and DenseNet, etc.; (3) u-type models, such as U-Net, etc.; (4) an antagonistic model such as Pix2Pix GAN, etc. is generated.

S2b, setting a learning module. And selecting the functional module required in training according to the specific denoising task and the data characteristics. Modules that may be provided include, but are not limited to: a parameter initialization module (frequently selecting an initialization algorithm such as MSRA), a parameter updating module (frequently selecting an algorithm such as random gradient descent), a learning rate updating module (frequently selecting an algorithm such as cosine period attenuation), and the like.

S2c, setting the hyper-parameter. And setting hyper-parameters required by model training according to the deep learning models constructed in S2a and S2b and the related learning modules thereof and the training process in S3. Settable hyper-parameters include, but are not limited to: batch size, learning rate, epoch number, etc.

And S3, training the deep learning model constructed in the S2 by using the unlabeled multi-frame image data obtained in the S1. The concrete implementation steps are as follows:

and S3a, initializing parameters. The parameters Θ of the deep learning model are initialized using the parameter initialization method set in S2b.

And S3b, packaging the data in batches. For the multiple frames of noisy image data obtained in S1

According to the batch size N set in S2_BDivided into several batchesEach batch of data comprises N_BN of each field of view_BX m images.

And S3c, forward propagation. Aggregating the image pairs obtained in S3b

As a training data set, the data is input into the deep learning model constructed in S2. If the input image is

The output image may be represented as

And S3d, loss calculation. Defining mutual information loss function

As a loss function used in model training:

wherein the content of the first and second substances,

and S3e, updating parameters. And updating the model parameters theta according to the parameter updating algorithm selected in the S2b.

S3f, repeating the steps S3 b-S3 e until the batch loss

Converge to a stable minimum. The model parameter state at this time is the finally trained model for denoising.

And S4, testing the denoising performance of the deep learning model trained in the S3. The concrete implementation steps are as follows:

s4a, preparing test data. The data used for testing can be a small part divided from the training data, or can be noisy image data which is independently collected from the training data; the image data may be multi-frame image data or single image data. In order to evaluate the denoising capability of the model, the test data needs to simultaneously contain a noisy image and a clean image for the same field of view.

S4b, predicting a clean image. And inputting the images for testing into the trained denoising model one by one to obtain an output predicted clean image.

And S4c, evaluating the denoising effect. And on one hand, visually subjective comparative evaluation is carried out on the clean image obtained by prediction in the S4b and a real clean image corresponding to the clean image, and on the other hand, objective quantitative comparative evaluation is carried out. Common quantitative assessment indicators include, but are not limited to: peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM), Root Mean Square Error (RMSE), Pearson correlation coefficient (R), and the like.

The principle of the invention is compared with the existing typical image denoising method (N2C and N2N) based on deep learning, and the invention is shown in figure 1.

Compared with the prior art, the invention has the following beneficial effects:

(1) as a learning-based method, the invention has no limitation on the physical pattern of image noise, while an analytic-based method often has strict applicable limitation on the physical pattern of image noise due to the specificity of the analytic-based method, and the effectiveness of the method depends heavily on the specific noise pattern in a specific task scene. For example, the AAR method can only denoise images containing zero-mean signal-independent noise. In addition, the method has sufficient generalization performance, and can complete denoising only by a single noisy image when new data is faced after training is carried out by using multi-frame image data.

(2) As an unsupervised learning method, the invention does not require any clean image as a label. While the existing image denoising methods (N2C) based on supervised deep learning are excellent in denoising performance, they heavily rely on clean images as training labels, which are often difficult or impossible to obtain. In addition, for multi-frame image data, the N2C method simply divides a multi-frame image in the same field of view into a plurality of independent single images and inputs them into a depth learning model one by one for training when constructing a data set, which fails to fully utilize the cross-correlation characteristics between multi-frame images. In contrast, the multi-frame image is input into the deep learning model as a whole so as to carry out more global learning and more comprehensive noise pattern information mining.

(3) As a mutual supervision learning method, the method can make the most of the interactive information implied in the multi-frame noisy images. In contrast, the existing N2N unsupervised approach is not sufficient to utilize data when faced with multi-frame image data. In the N2N method, pairwise pairs of multiple noisy images corresponding to the same field of view are randomly assigned prior to training, and the images in the data pairs that serve as inputs and labels are also randomly assigned. However, it is yet another advantage of the present invention that each of the multiple frame images should play an equal role in model training.

Drawings

FIG. 1 is a schematic comparison of the present invention with the prior art exemplary image denoising methods (N2C and N2N).

Fig. 2 is a flow chart of the method of the present invention.

Fig. 3 is a diagram of a U-Net network model architecture used in an embodiment of the present invention.

FIG. 4 is a comparison graph of the denoising result of an image with additive white Gaussian noise according to an embodiment of the present invention.

FIG. 5 is a comparison graph of the denoising result of an image with signal dependent Poisson noise according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 2, an embodiment of the present invention provides an unsupervised one-pot multi-frame image denoising method based on a mutual information loss function, and specifically, the embodiment of the present invention provides implementation steps for denoising additive white gaussian noise and signal-dependent poisson noise respectively. The additive white gaussian noise is a common noise pattern caused by poor illumination or high temperature in the digital image imaging process, and the signal-dependent poisson noise is a common noise pattern caused by photon uncertainty in the digital image imaging process. The specific implementation steps comprise:

s1a, image acquisition. 50,000 images in the validation set of the common open source image dataset ImageNet were used as raw clean data. Gaussian noise with σ of 25 and poisson noise with λ of 30 are added to these clean images, respectively, subject to

And

where x represents the pixel value of a location in the original clean image. The gaussian noise added here is additive noise and the poisson noise is signal dependent noise. The noise adding operation is repeated for 8 times, and because the noise distribution has probability randomness, 8 frames of noise-containing images can be obtained for each original clean image, and the noise distribution of the 8 frames of images is the same. Finally, the data set can be obtained

S1b, image registration. Since the multiple frames of noisy images used in this embodiment are generated by adding noise to the same clean image, 8 frames of noisy images corresponding to the same field of view have good registration, and no registration operation is required.

S1c, image preprocessing. Considering the uniformity of data flow and the limitation of video memory size in the model training process, all images in the data set obtained by the operation are uniformly cut into 256 multiplied by 256 pixel size.

And S2, constructing a deep learning model for image denoising. The hardware condition developed by the embodiment of the invention is six NVIDIA RTX 3090 display cards, and each display card is provided with 24GB display memory. All the following networks and training modules were built using a pytorech toolbox. Based on the specific tasks of the embodiment of the invention and considering the limitation of hardware conditions, the specific implementation steps of the model construction are as follows:

s2a model building. Preferably, in consideration of training efficiency, a simplified U-Net is constructed here as a deep learning network model for denoising, and the parameter quantity of the model is about 1,500 ten thousand. The model consists of two parts, a gradually contracting encoder and a gradually expanding decoder, and the structure of the model is shown in FIG. 3. The encoder gradually reduces the size of the feature map and increases the number of channels, and the deep noise features in the noisy image can be well extracted by the encoder due to the arrangement. The decoder gradually increases the size of the characteristic diagram and simultaneously reduces the number of channels step by step, so that the denoised image with the original size is finally recovered. The encoder and decoder each comprise five layers. Each layer of the encoder consists of three operations: two consecutive times (3 × 3 convolution + batch normalization + leak ReLU activation) and one immediately following pooling of 2 × 2 maxima with step size 2. Each layer of the decoder consists of four operations: one pooling of 2 x2 maxima with step size 2, one jump connection with a feature map of symmetric position in the encoder and two consecutive immediately following (3 x 3 convolution + batch normalization + leak ReLU activation).

S2b, setting a learning module. Preferably, Kaiming He initialization is set as a parameter initialization module, Adam is set as a parameter updating module, and linear gradient attenuation is set as a learning rate updating module.

S2c, setting the hyper-parameter. Preferably, the batch size is set to 64, and the initial learning rate is set to 7.2 × 10^-3The 10 epochs are not halved, setting the number of epochs to 300.

S3, using the multi-frame image data set obtained in S1

The deep learning model constructed in S2 is trained. The concrete implementation steps are as follows:

and S3a, initializing parameters. The parameters Θ of the deep learning model are initialized using the Kaiming He initialization algorithm.

According to the batch size N_BDividing into 782 batches of data, each batch of data contains 64 × 8 images of 64 fields.

And S3c, forward propagation. Assembling the image pairs obtained in S3b

The output image may be represented as

And S3d, loss calculation. Calculating mutual information loss

Wherein the content of the first and second substances,

and S3e, updating parameters. The model parameters Θ are updated using the Adam algorithm.

S3f, repeating the steps S3 b-S3 e untilMutual information loss

s4a, preparing test data. Three public open source image datasets, BSD300, KODAK and SET14, were used as raw data for the test. The image data for the test were added with additive white gaussian noise and signal dependent poisson noise, respectively, using the procedure consistent with that described in S1a. Different from the preparation of the training data set, the construction of the test data set does not need to repeat the noise adding operation for multiple times to obtain a multi-frame noise-containing image, but only carries out the noise adding operation once, and the obtained single noise-containing image can be used for the noise removing test.

S4b, predicting a clean image. And inputting the images for testing into the trained denoising model one by one to obtain an output predicted clean image. Fig. 4 and 5 respectively show an example result of randomly selecting one of the denoising result images of additive white gaussian noise and signal-dependent poisson noise, and also show an input noisy image, an original clean image (ground route) as a reference, a denoised image obtained by using the existing supervised learning method N2C, and a denoised image obtained by using the existing unsupervised learning method N2N. As can be seen from fig. 4 and 5, the denoising effect of the method of the present invention is visually superior to that of the existing unsupervised method N2N and is equivalent to that of the supervised method N2C, and especially, the method of the present invention can better retain high-frequency details in images, such as the partial images enlarged by two boxes in fig. 4 and 5 and the plant grass leaves and grass skirt filaments indicated by two arrows.

And S4c, evaluating denoising effect. And on one hand, visually subjective comparative evaluation is carried out on the clean image obtained by prediction in the S4b and a real clean image corresponding to the clean image, and on the other hand, objective quantitative comparative evaluation is carried out. Here, three quantization indices are selected: the image processing method comprises the steps of peak signal to noise ratio (PSNR), Structural Similarity (SSIM) and a Pearson correlation coefficient (R), wherein the PSNR evaluates the relative intensity difference between the overall signal and noise of the two images, the SSIM evaluates the structural similarity between the two images, and the R evaluates the linear correlation of the two images. The results of quantitative evaluation of additive white gaussian noise and signal dependent poisson noise denoising on three test SETs of BSD300, KODAK, and SET14 are shown in table 1 and table 2, respectively. Wherein the optimal result is shown bold. As can be seen from tables 1 and 2, the denoising effect of the method provided by the invention is significantly superior to that of the existing unsupervised method N2N in three quantitative indexes and is equivalent to that of the supervised method N2C.

TABLE 1 quantitative evaluation result for additive white Gaussian noise denoising

TABLE 2 quantitative evaluation results for signal-dependent Poisson noise denoising

The above-mentioned embodiments are only a part of the preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the changes made by the design principle of the present invention and the non-inventive work based on the above-mentioned embodiments should fall within the scope of the present invention.

Reference documents:

[1]Buades A,Coll B,Morel J M.A non-local algorithm for image denoising[C]//2005IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR'05).IEEE,2005,2:60-65.

[2]Dabov K,Foi A,Katkovnik V,et al.Image denoising by sparse 3-D transform-domain collaborative filtering[J].IEEE Transactions on image processing,2007,16(8):2080-2095.

[3]Zhang K,Zuo W,Chen Y,et al.Beyond a gaussian denoiser:Residual learning of deep cnn for image denoising[J].IEEE transactions on image processing,2017,26(7):3142-3155.

[4]Ulyanov D,Vedaldi A,Lempitsky V.Deep image prior[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018:9446-9454.

[5]Krull A,Buchholz T O,Jug F.Noise2void-learning denoising from single noisy images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2129-2137.

[6]Batson J,Royer L.Noise2self:Blind denoising by self-supervision[C]//International Conference on Machine Learning.PMLR,2019:524-533.

[7]Lehtinen J,Munkberg J,Hasselgren J,et al.Noise2noise:Learning image restoration without clean data[J].arXiv preprint arXiv:1803.04189,2018.

[8]Tico M.Multi-frame image denoising and stabilization[C]//2008 16th European Signal Processing Conference.IEEE,2008:1-4.

[9]Buades T,Lou Y,Morel J M,et al.A note on multi-image denoising[C]//2009International Workshop on Local and Non-Local Approximation in Image Processing.IEEE,2009:1-15.

[10]Hasinoff S W,Sharlet D,Geiss R,et al.Burst photography for high dynamic range and low-light imaging on mobile cameras[J].ACM Transactions on Graphics(ToG),2016,35(6):1-12.

[11]Buades A,Lou Y,Morel J M,et al.Multi image noise estimation and denoising[J].2010.

Claims

1. a multi-frame image denoising method is characterized by comprising the following steps:

s1, preparing multi-frame image data; the method specifically comprises the following steps:

s1a, aiming at a specific scene, acquiring data of m-frame noise-containing images of each view field for N view fields by using image acquisition equipment

Wherein x_iRepresenting a noise-free clean image corresponding to the ith field of view,

representing the noise of a j frame of noisy images corresponding to an ith view, N representing the number of acquired views, and m representing the number of noisy image frames acquired for each view;

s1b, selecting a proper registration algorithm to register multiple frames of images corresponding to the same field of view according to the acquired data characteristics, and ensuring that the multiple frames of images meet the pixel-level registration consistency;

s1c, preprocessing an image according to the acquired data characteristics;

s2, constructing a deep learning model for image denoising; the method specifically comprises the following steps:

s2a, selecting and constructing any one of the existing deep learning model structures suitable for the image denoising task or creating a new deep learning model structure, and recording the constructed model as f_Θ(. The), where Θ is a set of parameters for the model;

s2b, selecting a functional module required in training according to the specific de-noising task and the data characteristics;

s2c, setting hyper-parameters required by model training according to the deep learning models constructed in S2a and S2b and the related learning modules thereof;

s3, training the deep learning model constructed in the S2 by using the multi-frame image data without the label obtained in the S1;

s3a, initializing a parameter theta of the deep learning model by using a parameter initialization method set in S2 b;

s3b, the multi-frame noise-containing image data obtained in S1

According to the batch size N set in S2_BDividing the data into a plurality of batches, wherein each batch of data comprises N_BN of each field of view_BX m images;

s3c, using the image pair set obtained in S3b

As a training data set, the image is input to the deep learning model constructed in S2

The output image may be represented as

S3d, defining a mutual information loss function

Wherein, the first and the second end of the pipe are connected with each other,

calculating the loss between the output image and a reference image, the reference image being in a training set

Images paired with the input image;

s3e, updating the model parameter theta according to the parameter updating algorithm selected in the S2 b;

s3f, repeating the steps S3 b-S3 e until mutual information loss

Converge to a stable minimum;

and S4, testing the denoising performance of the deep learning model trained in the S3.

2. The multi-frame image denoising method of claim 1, wherein the specific method of image preprocessing in step S1c includes but is not limited to: data screening, data enhancement and data standardization.

3. The multi-frame image denoising method of claim 1, wherein the model constructed in step S2a includes but is not limited to: (1) a single-pass model; (2) a multi-pass model; (3) and (4) a U-shaped model.

4. The multi-frame image denoising method of claim 1, wherein the modules set in step S2b include but are not limited to: the device comprises a parameter initialization module, a parameter updating module and a learning rate updating module.

5. The multi-frame image denoising method of claim 1, wherein the hyper-parameters settable in the step S2c include but are not limited to: batch size, learning rate, epoch number.

6. The multi-frame image denoising method of claim 1, wherein the step S4 of testing the denoising performance of the deep learning model trained in S3 comprises:

s4a, data used for testing is a small part divided from training data or noisy image data which is independently collected from the training data;

s4b, inputting the images for testing into the trained denoising model one by one to obtain an output predicted clean image;

and S4c, performing visual subjective contrast evaluation on the clean image obtained by prediction in the S4b and a real clean image corresponding to the clean image, and performing objective quantitative contrast evaluation on the clean image.

7. The multi-frame image denoising method of claim 6, wherein the quantitative contrast evaluation in step S4c includes but is not limited to: peak signal-to-noise ratio, structural similarity, root mean square error, and Pearson correlation coefficient.